Perhaps we can create an arrow 48.1.0 patch release to include the fix?

On Tue, Nov 7, 2023 at 12:48 AM Will Jones <will.jones...@gmail.com> wrote:

> Thanks for the clarification, Raphael. That likely narrows the scope of who
> is affected. If this bug is present in DataFusion 33, then delta-rs will
> likely skip upgrading until 34. If we're the only downstream project this
> parsing issue affects, then I think it's fine to release.
>
> On Mon, Nov 6, 2023 at 8:22 PM Raphael Taylor-Davies
> <r.taylordav...@googlemail.com.invalid> wrote:
>
> > Hi,
> >
> > To further clarify the bug concerns the serde compatibility feature that
> > allows converting a serde compatible data structure to arrow [1]. It will
> > not impact workloads reading JSON.
> >
> > I am not sure this is a sufficiently fundamental bug to warrant special
> > concern, but happy to defer to others.
> >
> > Kind Regards,
> >
> > Raphael
> >
> > [1]: https://docs.rs/arrow/latest/arrow/#serde-compatibility
> >
> > On 7 November 2023 03:20:59 GMT, Will Jones <will.jones...@gmail.com>
> > wrote:
> > >Hello,
> > >
> > >There is an upstream bug in arrow-json that can cause the JSON reader to
> > >return incorrect data for large integers [1]. It was recently fixed by
> > >Raphael within the last 24 hours, but is not included in any release.
> The
> > >bug was introduced in Arrow 48, which this DataFusion release will
> expose
> > >users to.
> > >
> > >Not sure what the precedent here is, but I think either we should
> consider
> > >either (a) seeing if we can release and upgrade Arrow to include the
> fix,
> > >or else (b) calling out the regression as a known bug so downstream
> > >projects can include the path in their applications.
> > >
> > >Best,
> > >
> > >Will Jones
> > >
> > >[1] https://github.com/apache/arrow-rs/issues/5038
> > >[2] https://github.com/apache/arrow-rs/pull/5042
> > >
> > >On Mon, Nov 6, 2023 at 12:25 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> > >
> > >> +1 (the tests passed for me). I have left a comment on
> > >> https://github.com/apache/arrow-datafusion/issues/8069
> > >>
> > >> On Mon, Nov 6, 2023 at 2:02 PM Andy Grove <andygrov...@gmail.com>
> > wrote:
> > >>
> > >> > I filed https://github.com/apache/arrow-datafusion/issues/8069
> > >> >
> > >> > On Mon, Nov 6, 2023 at 11:59 AM Andy Grove <andygrov...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > I see the same error when I run on my M1 Macbook Air with 16 GB
> RAM.
> > >> > >
> > >> > > ---- aggregates::tests::run_first_last_multi_partitions stdout
> ----
> > >> > > Error: ResourcesExhausted("Failed to allocate additional 632 bytes
> > for
> > >> > > GroupedHashAggregateStream[0] with 1829 bytes already allocated -
> > >> maximum
> > >> > > available is 605")
> > >> > >
> > >> > > It worked fine on my workstation with 128 GB RAM.
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Mon, Nov 6, 2023 at 11:23 AM L. C. Hsieh <vii...@gmail.com>
> > wrote:
> > >> > >
> > >> > >> Hmm, ran verification script and got one failure:
> > >> > >>
> > >> > >> failures:
> > >> > >>
> > >> > >> ---- aggregates::tests::run_first_last_multi_partitions stdout
> ----
> > >> > >> Error: ResourcesExhausted("Failed to allocate additional 632
> bytes
> > for
> > >> > >> GroupedHashAggregateStream[0] with 1829 bytes already allocated -
> > >> > >> maximum available is 605")
> > >> > >>
> > >> > >> failures:
> > >> > >>     aggregates::tests::run_first_last_multi_partitions
> > >> > >>
> > >> > >> test result: FAILED. 557 passed; 1 failed; 1 ignored; 0
> measured; 0
> > >> > >> filtered out; finished in 2.21s
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Mon, Nov 6, 2023 at 6:57 AM Andy Grove <andygrov...@gmail.com
> >
> > >> > wrote:
> > >> > >> >
> > >> > >> > Hi,
> > >> > >> >
> > >> > >> > I would like to propose a release of Apache Arrow DataFusion
> > >> > >> Implementation,
> > >> > >> > version 33.0.0.
> > >> > >> >
> > >> > >> > This release candidate is based on commit:
> > >> > >> > 262f08778b8ec231d96792c01fc3e051640eb5d4 [1]
> > >> > >> > The proposed release tarball and signatures are hosted at [2].
> > >> > >> > The changelog is located at [3].
> > >> > >> >
> > >> > >> > Please download, verify checksums and signatures, run the unit
> > >> tests,
> > >> > >> and
> > >> > >> > vote
> > >> > >> > on the release. The vote will be open for at least 72 hours.
> > >> > >> >
> > >> > >> > Only votes from PMC members are binding, but all members of the
> > >> > >> community
> > >> > >> > are
> > >> > >> > encouraged to test the release and vote with "(non-binding)".
> > >> > >> >
> > >> > >> > The standard verification procedure is documented at
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > >> > >> > .
> > >> > >> >
> > >> > >> > [ ] +1 Release this as Apache Arrow DataFusion 33.0.0
> > >> > >> > [ ] +0
> > >> > >> > [ ] -1 Do not release this as Apache Arrow DataFusion 33.0.0
> > >> > because...
> > >> > >> >
> > >> > >> > Here is my vote:
> > >> > >> >
> > >> > >> > +1
> > >> > >> >
> > >> > >> > [1]:
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> https://github.com/apache/arrow-datafusion/tree/262f08778b8ec231d96792c01fc3e051640eb5d4
> > >> > >> > [2]:
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-33.0.0-rc1
> > >> > >> > [3]:
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> https://github.com/apache/arrow-datafusion/blob/262f08778b8ec231d96792c01fc3e051640eb5d4/CHANGELOG.md
> > >> > >>
> > >> > >
> > >> >
> > >>
> >
>

Reply via email to