Thanks for the clarification, Raphael. That likely narrows the scope of who is affected. If this bug is present in DataFusion 33, then delta-rs will likely skip upgrading until 34. If we're the only downstream project this parsing issue affects, then I think it's fine to release.
On Mon, Nov 6, 2023 at 8:22 PM Raphael Taylor-Davies <[email protected]> wrote: > Hi, > > To further clarify the bug concerns the serde compatibility feature that > allows converting a serde compatible data structure to arrow [1]. It will > not impact workloads reading JSON. > > I am not sure this is a sufficiently fundamental bug to warrant special > concern, but happy to defer to others. > > Kind Regards, > > Raphael > > [1]: https://docs.rs/arrow/latest/arrow/#serde-compatibility > > On 7 November 2023 03:20:59 GMT, Will Jones <[email protected]> > wrote: > >Hello, > > > >There is an upstream bug in arrow-json that can cause the JSON reader to > >return incorrect data for large integers [1]. It was recently fixed by > >Raphael within the last 24 hours, but is not included in any release. The > >bug was introduced in Arrow 48, which this DataFusion release will expose > >users to. > > > >Not sure what the precedent here is, but I think either we should consider > >either (a) seeing if we can release and upgrade Arrow to include the fix, > >or else (b) calling out the regression as a known bug so downstream > >projects can include the path in their applications. > > > >Best, > > > >Will Jones > > > >[1] https://github.com/apache/arrow-rs/issues/5038 > >[2] https://github.com/apache/arrow-rs/pull/5042 > > > >On Mon, Nov 6, 2023 at 12:25 PM Andrew Lamb <[email protected]> wrote: > > > >> +1 (the tests passed for me). I have left a comment on > >> https://github.com/apache/arrow-datafusion/issues/8069 > >> > >> On Mon, Nov 6, 2023 at 2:02 PM Andy Grove <[email protected]> > wrote: > >> > >> > I filed https://github.com/apache/arrow-datafusion/issues/8069 > >> > > >> > On Mon, Nov 6, 2023 at 11:59 AM Andy Grove <[email protected]> > >> wrote: > >> > > >> > > I see the same error when I run on my M1 Macbook Air with 16 GB RAM. > >> > > > >> > > ---- aggregates::tests::run_first_last_multi_partitions stdout ---- > >> > > Error: ResourcesExhausted("Failed to allocate additional 632 bytes > for > >> > > GroupedHashAggregateStream[0] with 1829 bytes already allocated - > >> maximum > >> > > available is 605") > >> > > > >> > > It worked fine on my workstation with 128 GB RAM. > >> > > > >> > > > >> > > > >> > > On Mon, Nov 6, 2023 at 11:23 AM L. C. Hsieh <[email protected]> > wrote: > >> > > > >> > >> Hmm, ran verification script and got one failure: > >> > >> > >> > >> failures: > >> > >> > >> > >> ---- aggregates::tests::run_first_last_multi_partitions stdout ---- > >> > >> Error: ResourcesExhausted("Failed to allocate additional 632 bytes > for > >> > >> GroupedHashAggregateStream[0] with 1829 bytes already allocated - > >> > >> maximum available is 605") > >> > >> > >> > >> failures: > >> > >> aggregates::tests::run_first_last_multi_partitions > >> > >> > >> > >> test result: FAILED. 557 passed; 1 failed; 1 ignored; 0 measured; 0 > >> > >> filtered out; finished in 2.21s > >> > >> > >> > >> > >> > >> > >> > >> On Mon, Nov 6, 2023 at 6:57 AM Andy Grove <[email protected]> > >> > wrote: > >> > >> > > >> > >> > Hi, > >> > >> > > >> > >> > I would like to propose a release of Apache Arrow DataFusion > >> > >> Implementation, > >> > >> > version 33.0.0. > >> > >> > > >> > >> > This release candidate is based on commit: > >> > >> > 262f08778b8ec231d96792c01fc3e051640eb5d4 [1] > >> > >> > The proposed release tarball and signatures are hosted at [2]. > >> > >> > The changelog is located at [3]. > >> > >> > > >> > >> > Please download, verify checksums and signatures, run the unit > >> tests, > >> > >> and > >> > >> > vote > >> > >> > on the release. The vote will be open for at least 72 hours. > >> > >> > > >> > >> > Only votes from PMC members are binding, but all members of the > >> > >> community > >> > >> > are > >> > >> > encouraged to test the release and vote with "(non-binding)". > >> > >> > > >> > >> > The standard verification procedure is documented at > >> > >> > > >> > >> > >> > > >> > https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates > >> > >> > . > >> > >> > > >> > >> > [ ] +1 Release this as Apache Arrow DataFusion 33.0.0 > >> > >> > [ ] +0 > >> > >> > [ ] -1 Do not release this as Apache Arrow DataFusion 33.0.0 > >> > because... > >> > >> > > >> > >> > Here is my vote: > >> > >> > > >> > >> > +1 > >> > >> > > >> > >> > [1]: > >> > >> > > >> > >> > >> > > >> > https://github.com/apache/arrow-datafusion/tree/262f08778b8ec231d96792c01fc3e051640eb5d4 > >> > >> > [2]: > >> > >> > > >> > >> > >> > > >> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-33.0.0-rc1 > >> > >> > [3]: > >> > >> > > >> > >> > >> > > >> > https://github.com/apache/arrow-datafusion/blob/262f08778b8ec231d96792c01fc3e051640eb5d4/CHANGELOG.md > >> > >> > >> > > > >> > > >> >
