Re: Arrow community meeting April 12 at 16:00 UTC
AFAIK, the Parquet PMC no longer governs parquet-cpp in the practice. We should probably raise the issue to the priv...@parquet.apache.org for a formal discussion. Best, Gang On Sat, Apr 15, 2023 at 7:52 PM Andrew Lamb wrote: > > Rust Parquet was donated directly to the Arrow project and > developed under its auspices after donation. > > Yes, this is my recollection as well -- the original implementation I > believe is [1] > > Andrew > > [1] https://github.com/sunchao/parquet-rs > > On Fri, Apr 14, 2023 at 10:59 PM Micah Kornfield > wrote: > > > > > > > - Joris believes we can go ahead and do this; the Parquet Rust > > > implementation did something similar > > > > Small note here, IIRC the origins of the code in Rust and Parquet are > > different. Rust Parquet was donated directly to the Arrow project and > > developed under its auspices after donation. Parquet-cpp integration at > > the time was done with the agreement that it would still live under > > governance of the Parquet PMC (with the hope of it getting split out > again > > at some point). I think there has been enough code creep here that > without > > a significant amount of work separating out parquet C++ back out of Arrow > > is likely not tenable. > > > > I pinged the thread again to see if we can get the parquet PMC to weigh > in > > here. > > > > > > > > On Wed, Apr 12, 2023 at 12:39 PM Ian Cook wrote: > > > > > Below is a summary of the notes from today's meeting: > > > > > > Attendees: > > > > > > - Ian Cook > > > - Raúl Cumplido > > > - Xuwei Fu > > > - Will Jones > > > - Bryce Mecum > > > - Rok Mihevc > > > - Sri Nadukudy > > > - Ashish Paliwal > > > - Dane Pitkin > > > - David Dali Susanibar Arce > > > - Matthew Topol > > > - Joris Van den Bossche > > > - Jacob Wujciak > > > > > > > > > Discussion: > > > > > > 12.0.0 release > > > > > > - Code freeze is scheduled for later today, April 12 > > > - There are many nightly failures currently on main; Raúl and Jacob > > > have opened several blocker issues and we might need to create more > > > - Discussion of several current issues that might affect the release > > >- C# tests not finding Python > > >- PyArrow tests slowness on Windows [1] > > >- PyArrow wheels on Windows not uploading to Gemfury > > > - Important items to mention in release changelog, release blog, etc. > > > - Drop support for Ubuntu 18.04 [2] > > > - Acero refactor (splitting Acero out from core Arrow library) [3] > > > - Fixed shape tensor extension type [4] > > > - Run-end encoded layout [5] > > > - Plasma removal [6] and suggested alternatives [7] > > > - Reminder about Jira to GitHub move (which happened just before the > > > 11.0.0 release) > > > - Initial Swift implementation [8] > > > - nanoarrow (not technically a part of this release, but worth > > > drawing attention to) [9] > > > - Also see ASF board report > > > > > > > > > Parquet tickets are still tracked in the ASF Jira > > > > > > - We have to maintain a lot of code in Archery, etc. to automate the > > > tracking of Parquet C++ issues which are still in Jira, even though > > > there are only a few Parquet issues in each release (4 for 12.0.0) > > > - PARQUET-2201 Add stress test for RecordReader ReadRecords and > > > SkipRecords. (#14879) > > > - PARQUET-2225 Allow reading dense with RecordReader (#17877) > > > - PARQUET-2232 Add an api to ColumnChunkMetaData to indicate if the > > > column chunk uses a bloom filter (#33736) > > > - PARQUET-2250 Expose column descriptor through RecordReader (#34318) > > > - Can we move the Parquet C++ issues from the ASF Jira to GitHub? > > > - Joris believes we can go ahead and do this; the Parquet Rust > > > implementation did something similar > > > - There are already some Parquet issues that were reported and > > > resolved in the Arrow monorepo in this release without ever being > > > opened as Parquet Jira issues [10] > > > - Check with Micah Kornfield, Fatemah Panah > > > - There was a related Parquet mailing list discussion about this in > > > February [11] > > > > > > > > > [1] https://github.com/apache/arrow/issues/35078 > > > [2] https://github.com/apache/arrow/issues/33800 > > > [3] https://lists.apache.org/thread/5h5g9k9lvbybzl8fnbg4fppxczm42g6r > > > [4] > > > > > > https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#fixed-shape-tensor > > > [5] > > > > > > https://arrow.apache.org/docs/format/Columnar.html#run-end-encoded-layout > > > [6] https://github.com/apache/arrow/pull/34718 > > > [7] https://lists.apache.org/thread/lk277x3b9gjol42sjg27bst2ggm5s0j2 > > > [8] https://github.com/apache/arrow/issues/20484 > > > [9] https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/ > > > [10] > > > > > > https://github.com/apache/arrow/issues?q=is%3Aissue+label%3A%22Component%3A+Parquet%22+is%3Aclosed > > > [11] https://lists.apache.org/thread/jf9wos3t6xxk6xdyx2dof1jlkbpkr56p > > > > > > > > > On Tue, Apr 11, 2023 at 5:35 PM Ian Cook >
Re: Arrow community meeting April 12 at 16:00 UTC
> Rust Parquet was donated directly to the Arrow project and developed under its auspices after donation. Yes, this is my recollection as well -- the original implementation I believe is [1] Andrew [1] https://github.com/sunchao/parquet-rs On Fri, Apr 14, 2023 at 10:59 PM Micah Kornfield wrote: > > > > - Joris believes we can go ahead and do this; the Parquet Rust > > implementation did something similar > > Small note here, IIRC the origins of the code in Rust and Parquet are > different. Rust Parquet was donated directly to the Arrow project and > developed under its auspices after donation. Parquet-cpp integration at > the time was done with the agreement that it would still live under > governance of the Parquet PMC (with the hope of it getting split out again > at some point). I think there has been enough code creep here that without > a significant amount of work separating out parquet C++ back out of Arrow > is likely not tenable. > > I pinged the thread again to see if we can get the parquet PMC to weigh in > here. > > > > On Wed, Apr 12, 2023 at 12:39 PM Ian Cook wrote: > > > Below is a summary of the notes from today's meeting: > > > > Attendees: > > > > - Ian Cook > > - Raúl Cumplido > > - Xuwei Fu > > - Will Jones > > - Bryce Mecum > > - Rok Mihevc > > - Sri Nadukudy > > - Ashish Paliwal > > - Dane Pitkin > > - David Dali Susanibar Arce > > - Matthew Topol > > - Joris Van den Bossche > > - Jacob Wujciak > > > > > > Discussion: > > > > 12.0.0 release > > > > - Code freeze is scheduled for later today, April 12 > > - There are many nightly failures currently on main; Raúl and Jacob > > have opened several blocker issues and we might need to create more > > - Discussion of several current issues that might affect the release > >- C# tests not finding Python > >- PyArrow tests slowness on Windows [1] > >- PyArrow wheels on Windows not uploading to Gemfury > > - Important items to mention in release changelog, release blog, etc. > > - Drop support for Ubuntu 18.04 [2] > > - Acero refactor (splitting Acero out from core Arrow library) [3] > > - Fixed shape tensor extension type [4] > > - Run-end encoded layout [5] > > - Plasma removal [6] and suggested alternatives [7] > > - Reminder about Jira to GitHub move (which happened just before the > > 11.0.0 release) > > - Initial Swift implementation [8] > > - nanoarrow (not technically a part of this release, but worth > > drawing attention to) [9] > > - Also see ASF board report > > > > > > Parquet tickets are still tracked in the ASF Jira > > > > - We have to maintain a lot of code in Archery, etc. to automate the > > tracking of Parquet C++ issues which are still in Jira, even though > > there are only a few Parquet issues in each release (4 for 12.0.0) > > - PARQUET-2201 Add stress test for RecordReader ReadRecords and > > SkipRecords. (#14879) > > - PARQUET-2225 Allow reading dense with RecordReader (#17877) > > - PARQUET-2232 Add an api to ColumnChunkMetaData to indicate if the > > column chunk uses a bloom filter (#33736) > > - PARQUET-2250 Expose column descriptor through RecordReader (#34318) > > - Can we move the Parquet C++ issues from the ASF Jira to GitHub? > > - Joris believes we can go ahead and do this; the Parquet Rust > > implementation did something similar > > - There are already some Parquet issues that were reported and > > resolved in the Arrow monorepo in this release without ever being > > opened as Parquet Jira issues [10] > > - Check with Micah Kornfield, Fatemah Panah > > - There was a related Parquet mailing list discussion about this in > > February [11] > > > > > > [1] https://github.com/apache/arrow/issues/35078 > > [2] https://github.com/apache/arrow/issues/33800 > > [3] https://lists.apache.org/thread/5h5g9k9lvbybzl8fnbg4fppxczm42g6r > > [4] > > > https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#fixed-shape-tensor > > [5] > > > https://arrow.apache.org/docs/format/Columnar.html#run-end-encoded-layout > > [6] https://github.com/apache/arrow/pull/34718 > > [7] https://lists.apache.org/thread/lk277x3b9gjol42sjg27bst2ggm5s0j2 > > [8] https://github.com/apache/arrow/issues/20484 > > [9] https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/ > > [10] > > > https://github.com/apache/arrow/issues?q=is%3Aissue+label%3A%22Component%3A+Parquet%22+is%3Aclosed > > [11] https://lists.apache.org/thread/jf9wos3t6xxk6xdyx2dof1jlkbpkr56p > > > > > > On Tue, Apr 11, 2023 at 5:35 PM Ian Cook wrote: > > > > > > Hi all, > > > > > > Our biweekly Arrow community meeting is tomorrow at 16:00 UTC / 12:00 > > EDT. > > > > > > Zoom meeting URL: > > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 > > > Meeting ID: 876 4903 3008 > > > Passcode: 958092 > > > > > > The notes for this and future instances of this meeting will be > > > captured in this Google Doc: > > > > > > https://docs.google.com/document/d/1xrji8fc6_24TVmKiHJB4ECX1Zy2sy2eRbBjpVJMnPmk/ >
Re: Arrow community meeting April 12 at 16:00 UTC
> > - Joris believes we can go ahead and do this; the Parquet Rust > implementation did something similar Small note here, IIRC the origins of the code in Rust and Parquet are different. Rust Parquet was donated directly to the Arrow project and developed under its auspices after donation. Parquet-cpp integration at the time was done with the agreement that it would still live under governance of the Parquet PMC (with the hope of it getting split out again at some point). I think there has been enough code creep here that without a significant amount of work separating out parquet C++ back out of Arrow is likely not tenable. I pinged the thread again to see if we can get the parquet PMC to weigh in here. On Wed, Apr 12, 2023 at 12:39 PM Ian Cook wrote: > Below is a summary of the notes from today's meeting: > > Attendees: > > - Ian Cook > - Raúl Cumplido > - Xuwei Fu > - Will Jones > - Bryce Mecum > - Rok Mihevc > - Sri Nadukudy > - Ashish Paliwal > - Dane Pitkin > - David Dali Susanibar Arce > - Matthew Topol > - Joris Van den Bossche > - Jacob Wujciak > > > Discussion: > > 12.0.0 release > > - Code freeze is scheduled for later today, April 12 > - There are many nightly failures currently on main; Raúl and Jacob > have opened several blocker issues and we might need to create more > - Discussion of several current issues that might affect the release >- C# tests not finding Python >- PyArrow tests slowness on Windows [1] >- PyArrow wheels on Windows not uploading to Gemfury > - Important items to mention in release changelog, release blog, etc. > - Drop support for Ubuntu 18.04 [2] > - Acero refactor (splitting Acero out from core Arrow library) [3] > - Fixed shape tensor extension type [4] > - Run-end encoded layout [5] > - Plasma removal [6] and suggested alternatives [7] > - Reminder about Jira to GitHub move (which happened just before the > 11.0.0 release) > - Initial Swift implementation [8] > - nanoarrow (not technically a part of this release, but worth > drawing attention to) [9] > - Also see ASF board report > > > Parquet tickets are still tracked in the ASF Jira > > - We have to maintain a lot of code in Archery, etc. to automate the > tracking of Parquet C++ issues which are still in Jira, even though > there are only a few Parquet issues in each release (4 for 12.0.0) > - PARQUET-2201 Add stress test for RecordReader ReadRecords and > SkipRecords. (#14879) > - PARQUET-2225 Allow reading dense with RecordReader (#17877) > - PARQUET-2232 Add an api to ColumnChunkMetaData to indicate if the > column chunk uses a bloom filter (#33736) > - PARQUET-2250 Expose column descriptor through RecordReader (#34318) > - Can we move the Parquet C++ issues from the ASF Jira to GitHub? > - Joris believes we can go ahead and do this; the Parquet Rust > implementation did something similar > - There are already some Parquet issues that were reported and > resolved in the Arrow monorepo in this release without ever being > opened as Parquet Jira issues [10] > - Check with Micah Kornfield, Fatemah Panah > - There was a related Parquet mailing list discussion about this in > February [11] > > > [1] https://github.com/apache/arrow/issues/35078 > [2] https://github.com/apache/arrow/issues/33800 > [3] https://lists.apache.org/thread/5h5g9k9lvbybzl8fnbg4fppxczm42g6r > [4] > https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#fixed-shape-tensor > [5] > https://arrow.apache.org/docs/format/Columnar.html#run-end-encoded-layout > [6] https://github.com/apache/arrow/pull/34718 > [7] https://lists.apache.org/thread/lk277x3b9gjol42sjg27bst2ggm5s0j2 > [8] https://github.com/apache/arrow/issues/20484 > [9] https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/ > [10] > https://github.com/apache/arrow/issues?q=is%3Aissue+label%3A%22Component%3A+Parquet%22+is%3Aclosed > [11] https://lists.apache.org/thread/jf9wos3t6xxk6xdyx2dof1jlkbpkr56p > > > On Tue, Apr 11, 2023 at 5:35 PM Ian Cook wrote: > > > > Hi all, > > > > Our biweekly Arrow community meeting is tomorrow at 16:00 UTC / 12:00 > EDT. > > > > Zoom meeting URL: > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 > > Meeting ID: 876 4903 3008 > > Passcode: 958092 > > > > The notes for this and future instances of this meeting will be > > captured in this Google Doc: > > > https://docs.google.com/document/d/1xrji8fc6_24TVmKiHJB4ECX1Zy2sy2eRbBjpVJMnPmk/ > > If you plan to attend this meeting, you are welcome to edit the > > document to add the topics that you would like to discuss. > > > > Thanks, > > Ian >
Re: Arrow community meeting April 12 at 16:00 UTC
Below is a summary of the notes from today's meeting: Attendees: - Ian Cook - Raúl Cumplido - Xuwei Fu - Will Jones - Bryce Mecum - Rok Mihevc - Sri Nadukudy - Ashish Paliwal - Dane Pitkin - David Dali Susanibar Arce - Matthew Topol - Joris Van den Bossche - Jacob Wujciak Discussion: 12.0.0 release - Code freeze is scheduled for later today, April 12 - There are many nightly failures currently on main; Raúl and Jacob have opened several blocker issues and we might need to create more - Discussion of several current issues that might affect the release - C# tests not finding Python - PyArrow tests slowness on Windows [1] - PyArrow wheels on Windows not uploading to Gemfury - Important items to mention in release changelog, release blog, etc. - Drop support for Ubuntu 18.04 [2] - Acero refactor (splitting Acero out from core Arrow library) [3] - Fixed shape tensor extension type [4] - Run-end encoded layout [5] - Plasma removal [6] and suggested alternatives [7] - Reminder about Jira to GitHub move (which happened just before the 11.0.0 release) - Initial Swift implementation [8] - nanoarrow (not technically a part of this release, but worth drawing attention to) [9] - Also see ASF board report Parquet tickets are still tracked in the ASF Jira - We have to maintain a lot of code in Archery, etc. to automate the tracking of Parquet C++ issues which are still in Jira, even though there are only a few Parquet issues in each release (4 for 12.0.0) - PARQUET-2201 Add stress test for RecordReader ReadRecords and SkipRecords. (#14879) - PARQUET-2225 Allow reading dense with RecordReader (#17877) - PARQUET-2232 Add an api to ColumnChunkMetaData to indicate if the column chunk uses a bloom filter (#33736) - PARQUET-2250 Expose column descriptor through RecordReader (#34318) - Can we move the Parquet C++ issues from the ASF Jira to GitHub? - Joris believes we can go ahead and do this; the Parquet Rust implementation did something similar - There are already some Parquet issues that were reported and resolved in the Arrow monorepo in this release without ever being opened as Parquet Jira issues [10] - Check with Micah Kornfield, Fatemah Panah - There was a related Parquet mailing list discussion about this in February [11] [1] https://github.com/apache/arrow/issues/35078 [2] https://github.com/apache/arrow/issues/33800 [3] https://lists.apache.org/thread/5h5g9k9lvbybzl8fnbg4fppxczm42g6r [4] https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#fixed-shape-tensor [5] https://arrow.apache.org/docs/format/Columnar.html#run-end-encoded-layout [6] https://github.com/apache/arrow/pull/34718 [7] https://lists.apache.org/thread/lk277x3b9gjol42sjg27bst2ggm5s0j2 [8] https://github.com/apache/arrow/issues/20484 [9] https://arrow.apache.org/blog/2023/03/07/nanoarrow-0.1.0-release/ [10] https://github.com/apache/arrow/issues?q=is%3Aissue+label%3A%22Component%3A+Parquet%22+is%3Aclosed [11] https://lists.apache.org/thread/jf9wos3t6xxk6xdyx2dof1jlkbpkr56p On Tue, Apr 11, 2023 at 5:35 PM Ian Cook wrote: > > Hi all, > > Our biweekly Arrow community meeting is tomorrow at 16:00 UTC / 12:00 EDT. > > Zoom meeting URL: > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 > Meeting ID: 876 4903 3008 > Passcode: 958092 > > The notes for this and future instances of this meeting will be > captured in this Google Doc: > https://docs.google.com/document/d/1xrji8fc6_24TVmKiHJB4ECX1Zy2sy2eRbBjpVJMnPmk/ > If you plan to attend this meeting, you are welcome to edit the > document to add the topics that you would like to discuss. > > Thanks, > Ian
Arrow community meeting April 12 at 16:00 UTC
Hi all, Our biweekly Arrow community meeting is tomorrow at 16:00 UTC / 12:00 EDT. Zoom meeting URL: https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 Meeting ID: 876 4903 3008 Passcode: 958092 The notes for this and future instances of this meeting will be captured in this Google Doc: https://docs.google.com/document/d/1xrji8fc6_24TVmKiHJB4ECX1Zy2sy2eRbBjpVJMnPmk/ If you plan to attend this meeting, you are welcome to edit the document to add the topics that you would like to discuss. Thanks, Ian