Hello,

There is an upstream bug in arrow-json that can cause the JSON reader to
return incorrect data for large integers [1]. It was recently fixed by
Raphael within the last 24 hours, but is not included in any release. The
bug was introduced in Arrow 48, which this DataFusion release will expose
users to.

Not sure what the precedent here is, but I think either we should consider
either (a) seeing if we can release and upgrade Arrow to include the fix,
or else (b) calling out the regression as a known bug so downstream
projects can include the path in their applications.

Best,

Will Jones

[1] https://github.com/apache/arrow-rs/issues/5038
[2] https://github.com/apache/arrow-rs/pull/5042

On Mon, Nov 6, 2023 at 12:25 PM Andrew Lamb <al...@influxdata.com> wrote:

> +1 (the tests passed for me). I have left a comment on
> https://github.com/apache/arrow-datafusion/issues/8069
>
> On Mon, Nov 6, 2023 at 2:02 PM Andy Grove <andygrov...@gmail.com> wrote:
>
> > I filed https://github.com/apache/arrow-datafusion/issues/8069
> >
> > On Mon, Nov 6, 2023 at 11:59 AM Andy Grove <andygrov...@gmail.com>
> wrote:
> >
> > > I see the same error when I run on my M1 Macbook Air with 16 GB RAM.
> > >
> > > ---- aggregates::tests::run_first_last_multi_partitions stdout ----
> > > Error: ResourcesExhausted("Failed to allocate additional 632 bytes for
> > > GroupedHashAggregateStream[0] with 1829 bytes already allocated -
> maximum
> > > available is 605")
> > >
> > > It worked fine on my workstation with 128 GB RAM.
> > >
> > >
> > >
> > > On Mon, Nov 6, 2023 at 11:23 AM L. C. Hsieh <vii...@gmail.com> wrote:
> > >
> > >> Hmm, ran verification script and got one failure:
> > >>
> > >> failures:
> > >>
> > >> ---- aggregates::tests::run_first_last_multi_partitions stdout ----
> > >> Error: ResourcesExhausted("Failed to allocate additional 632 bytes for
> > >> GroupedHashAggregateStream[0] with 1829 bytes already allocated -
> > >> maximum available is 605")
> > >>
> > >> failures:
> > >>     aggregates::tests::run_first_last_multi_partitions
> > >>
> > >> test result: FAILED. 557 passed; 1 failed; 1 ignored; 0 measured; 0
> > >> filtered out; finished in 2.21s
> > >>
> > >>
> > >>
> > >> On Mon, Nov 6, 2023 at 6:57 AM Andy Grove <andygrov...@gmail.com>
> > wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > I would like to propose a release of Apache Arrow DataFusion
> > >> Implementation,
> > >> > version 33.0.0.
> > >> >
> > >> > This release candidate is based on commit:
> > >> > 262f08778b8ec231d96792c01fc3e051640eb5d4 [1]
> > >> > The proposed release tarball and signatures are hosted at [2].
> > >> > The changelog is located at [3].
> > >> >
> > >> > Please download, verify checksums and signatures, run the unit
> tests,
> > >> and
> > >> > vote
> > >> > on the release. The vote will be open for at least 72 hours.
> > >> >
> > >> > Only votes from PMC members are binding, but all members of the
> > >> community
> > >> > are
> > >> > encouraged to test the release and vote with "(non-binding)".
> > >> >
> > >> > The standard verification procedure is documented at
> > >> >
> > >>
> >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > >> > .
> > >> >
> > >> > [ ] +1 Release this as Apache Arrow DataFusion 33.0.0
> > >> > [ ] +0
> > >> > [ ] -1 Do not release this as Apache Arrow DataFusion 33.0.0
> > because...
> > >> >
> > >> > Here is my vote:
> > >> >
> > >> > +1
> > >> >
> > >> > [1]:
> > >> >
> > >>
> >
> https://github.com/apache/arrow-datafusion/tree/262f08778b8ec231d96792c01fc3e051640eb5d4
> > >> > [2]:
> > >> >
> > >>
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-33.0.0-rc1
> > >> > [3]:
> > >> >
> > >>
> >
> https://github.com/apache/arrow-datafusion/blob/262f08778b8ec231d96792c01fc3e051640eb5d4/CHANGELOG.md
> > >>
> > >
> >
>

Reply via email to