Perhaps we can create an arrow 48.1.0 patch release to include the fix? On Tue, Nov 7, 2023 at 12:48 AM Will Jones <will.jones...@gmail.com> wrote:
> Thanks for the clarification, Raphael. That likely narrows the scope of who > is affected. If this bug is present in DataFusion 33, then delta-rs will > likely skip upgrading until 34. If we're the only downstream project this > parsing issue affects, then I think it's fine to release. > > On Mon, Nov 6, 2023 at 8:22 PM Raphael Taylor-Davies > <r.taylordav...@googlemail.com.invalid> wrote: > > > Hi, > > > > To further clarify the bug concerns the serde compatibility feature that > > allows converting a serde compatible data structure to arrow [1]. It will > > not impact workloads reading JSON. > > > > I am not sure this is a sufficiently fundamental bug to warrant special > > concern, but happy to defer to others. > > > > Kind Regards, > > > > Raphael > > > > [1]: https://docs.rs/arrow/latest/arrow/#serde-compatibility > > > > On 7 November 2023 03:20:59 GMT, Will Jones <will.jones...@gmail.com> > > wrote: > > >Hello, > > > > > >There is an upstream bug in arrow-json that can cause the JSON reader to > > >return incorrect data for large integers [1]. It was recently fixed by > > >Raphael within the last 24 hours, but is not included in any release. > The > > >bug was introduced in Arrow 48, which this DataFusion release will > expose > > >users to. > > > > > >Not sure what the precedent here is, but I think either we should > consider > > >either (a) seeing if we can release and upgrade Arrow to include the > fix, > > >or else (b) calling out the regression as a known bug so downstream > > >projects can include the path in their applications. > > > > > >Best, > > > > > >Will Jones > > > > > >[1] https://github.com/apache/arrow-rs/issues/5038 > > >[2] https://github.com/apache/arrow-rs/pull/5042 > > > > > >On Mon, Nov 6, 2023 at 12:25 PM Andrew Lamb <al...@influxdata.com> > wrote: > > > > > >> +1 (the tests passed for me). I have left a comment on > > >> https://github.com/apache/arrow-datafusion/issues/8069 > > >> > > >> On Mon, Nov 6, 2023 at 2:02 PM Andy Grove <andygrov...@gmail.com> > > wrote: > > >> > > >> > I filed https://github.com/apache/arrow-datafusion/issues/8069 > > >> > > > >> > On Mon, Nov 6, 2023 at 11:59 AM Andy Grove <andygrov...@gmail.com> > > >> wrote: > > >> > > > >> > > I see the same error when I run on my M1 Macbook Air with 16 GB > RAM. > > >> > > > > >> > > ---- aggregates::tests::run_first_last_multi_partitions stdout > ---- > > >> > > Error: ResourcesExhausted("Failed to allocate additional 632 bytes > > for > > >> > > GroupedHashAggregateStream[0] with 1829 bytes already allocated - > > >> maximum > > >> > > available is 605") > > >> > > > > >> > > It worked fine on my workstation with 128 GB RAM. > > >> > > > > >> > > > > >> > > > > >> > > On Mon, Nov 6, 2023 at 11:23 AM L. C. Hsieh <vii...@gmail.com> > > wrote: > > >> > > > > >> > >> Hmm, ran verification script and got one failure: > > >> > >> > > >> > >> failures: > > >> > >> > > >> > >> ---- aggregates::tests::run_first_last_multi_partitions stdout > ---- > > >> > >> Error: ResourcesExhausted("Failed to allocate additional 632 > bytes > > for > > >> > >> GroupedHashAggregateStream[0] with 1829 bytes already allocated - > > >> > >> maximum available is 605") > > >> > >> > > >> > >> failures: > > >> > >> aggregates::tests::run_first_last_multi_partitions > > >> > >> > > >> > >> test result: FAILED. 557 passed; 1 failed; 1 ignored; 0 > measured; 0 > > >> > >> filtered out; finished in 2.21s > > >> > >> > > >> > >> > > >> > >> > > >> > >> On Mon, Nov 6, 2023 at 6:57 AM Andy Grove <andygrov...@gmail.com > > > > >> > wrote: > > >> > >> > > > >> > >> > Hi, > > >> > >> > > > >> > >> > I would like to propose a release of Apache Arrow DataFusion > > >> > >> Implementation, > > >> > >> > version 33.0.0. > > >> > >> > > > >> > >> > This release candidate is based on commit: > > >> > >> > 262f08778b8ec231d96792c01fc3e051640eb5d4 [1] > > >> > >> > The proposed release tarball and signatures are hosted at [2]. > > >> > >> > The changelog is located at [3]. > > >> > >> > > > >> > >> > Please download, verify checksums and signatures, run the unit > > >> tests, > > >> > >> and > > >> > >> > vote > > >> > >> > on the release. The vote will be open for at least 72 hours. > > >> > >> > > > >> > >> > Only votes from PMC members are binding, but all members of the > > >> > >> community > > >> > >> > are > > >> > >> > encouraged to test the release and vote with "(non-binding)". > > >> > >> > > > >> > >> > The standard verification procedure is documented at > > >> > >> > > > >> > >> > > >> > > > >> > > > https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates > > >> > >> > . > > >> > >> > > > >> > >> > [ ] +1 Release this as Apache Arrow DataFusion 33.0.0 > > >> > >> > [ ] +0 > > >> > >> > [ ] -1 Do not release this as Apache Arrow DataFusion 33.0.0 > > >> > because... > > >> > >> > > > >> > >> > Here is my vote: > > >> > >> > > > >> > >> > +1 > > >> > >> > > > >> > >> > [1]: > > >> > >> > > > >> > >> > > >> > > > >> > > > https://github.com/apache/arrow-datafusion/tree/262f08778b8ec231d96792c01fc3e051640eb5d4 > > >> > >> > [2]: > > >> > >> > > > >> > >> > > >> > > > >> > > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-33.0.0-rc1 > > >> > >> > [3]: > > >> > >> > > > >> > >> > > >> > > > >> > > > https://github.com/apache/arrow-datafusion/blob/262f08778b8ec231d96792c01fc3e051640eb5d4/CHANGELOG.md > > >> > >> > > >> > > > > >> > > > >> > > >