Hi Arrow devs,

There's some bugs in the Parquet implementation which affect reading of
data:

- https://issues.apache.org/jira/browse/ARROW-11269, which was opened today,
and I just saw now.
- an issue with list schema nulls from the parquet-format's logical types.
In this
case, we misinterpret the nullness of lists read from parquet-mr,
potentially leading
to incorrect data being read.

I discovered the second bug while bashing my head trying to fix a bug in
the
Parquet writer (sadly spent very long on it).

Anyways, I would like to work on PRs for the above 2 bugs tonight and
tomorrow.

@Krisztián @Andrew Lamb <al...@influxdata.com> , would we still be able to
merge them in time?
I've also seen the offset issues in equality checks, and am going to
review/help
out with them tomorrow.

I haven't been feeling very well this week, so I haven't been spending much
time
working on Arrow.

Thanks
Neville


On Sat, 16 Jan 2021 at 16:34, Krisztián Szűcs <szucs.kriszt...@gmail.com>
wrote:

> On Sat, Jan 16, 2021 at 12:51 PM Andrew Lamb <al...@influxdata.com> wrote:
> >
> > I just saw the RC0 candidate email -- thanks Krisztián.
> >
> > Does the RC0 mean that any subsequent merges to master can now proceed
> > without affecting the 3.0.0 branch?
> Technically we don't have a 3.0 release branch, but we can always create
> one.
> So yes, the merges can proceed on master.
>
> Thanks, Krisztian
> >
> > On Fri, Jan 15, 2021 at 10:22 AM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> > wrote:
> >
> > > The spark integration test fails against spark 3.0.1 with
> > >
> > > 12:21:51.996 WARN org.apache.spark.scheduler.TaskSetManager: Lost task
> > > 1.0 in stage 0.0 (TID 1, 5fc0f8cfe8d2, executor driver):
> > > java.lang.NoClassDefFoundError: Could not initialize class
> > > org.apache.spark.sql.util.ArrowUtils$
> > > ...
> > > Caused by: java.lang.RuntimeException: No DefaultAllocationManager
> > > found on classpath. Can't allocate Arrow buffers. Please consider
> > > adding arrow-memory-netty or arrow-memory-unsafe as a dependency.
> > >
> > > Since this change was introduced in
> > >
> > >
> https://github.com/apache/arrow/commit/2092e18752a9c0494799493b12eb1830052217a2
> > > which is already a part of arrow's 2.0 release, I guess this is not a
> > > blocker (or at least the changes are required on spark's side?).
> > >
> > > Either way, I'm going to proceed with the release.
> > >
> > >
> > > On Fri, Jan 15, 2021 at 2:53 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> > > >
> > > > That is great news  Krisztián -- thank you
> > > >
> > > > On Fri, Jan 15, 2021 at 6:50 AM Krisztián Szűcs <
> > > szucs.kriszt...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > My plan is to cut RC0 today, just want to make sure that the spark
> > > > > integration test works with spark's latest release.
> > > > >
> > > > > Thanks, Krisztian
> > > > >
> > > > > On Fri, Jan 15, 2021 at 12:35 PM Andrew Lamb <al...@influxdata.com
> >
> > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I apologize if I have missed this detail on previous emails; I
> > > wonder if
> > > > > > there is any estimate of when the Arrow 3.0 release might be
> > > finalized.
> > > > > >
> > > > > > The Rust implementation has a few PRs we have been holding off
> > > merging
> > > > > > until the release goes out and I wanted to know if there was any
> > > > > estimated
> > > > > > timeline.
> > > > > >
> > > > > > The wiki shows no blocking JIRA items (nice work everyone!) any
> > > longer:
> > > > > >
> > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+3.0.0+Release
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Andrew
> > > > >
> > >
>

Reply via email to