I fixed an overwrite error that, I think, would be good to include in the 0.7.1 release https://github.com/apache/iceberg-python/pull/1023
André Anastácio On Thursday, August 8th, 2024 at 4:29 AM, Fokko Driesprong <fo...@apache.org> wrote: > Thanks everyone for the input here, and I agree that the aforementioned > [#995](https://github.com/apache/iceberg-python/pull/995/) and > [#997](https://github.com/apache/iceberg-python/pull/997/) by Sung, and > [#526](https://github.com/apache/iceberg-python/pull/526) by André would also > be good to include (I've added the milestone there). I have two minor ones > that are also good candidates to add to 0.7.1: > > - [Allow > setting](http://goog_2004148629)[write.parquet.row-group-limit](https://github.com/apache/iceberg-python/pull/1016) > - [Allow > setting](http://goog_2004148635)[write.parquet.page-row-limit](https://github.com/apache/iceberg-python/pull/1017) > > Kind regards, > Fokko > > Op di 6 aug 2024 om 21:17 schreef André Luis Anastácio > <ndrl...@proton.me.invalid>: > >> What do you think about adding the fix that excludes PyIceberg support for >> Python 3.9.7 in the 0.7.1 release?[1] It already doesn't work, so this is >> just to avoid any new issues. >> >> - [1]: https://github.com/apache/iceberg-python/pull/526 >> >> André Anastácio >> >> On Tuesday, August 6th, 2024 at 4:06 PM, Sung Yun <sun...@apache.org> wrote: >> >>> Sounds good folks! Thank you for sharing your thoughts. We'll work on >>> getting the patch release out, and continue the discussion on upgrading the >>> PyArrow version to 17.0.0 in time for 0.8.0 release. >>> >>> Just adding these two more fixes that were introduced that I think we >>> should pull into the patch release. These were added to the GitHub >>> milestone for 0.7.1, but just cross posting here for awareness: >>> >>> - Table scan fails when result is empty: >>> https://github.com/apache/iceberg-python/pull/997 >>> - Fix RestCatalog ListNamespace to correctly make use of the expected Rest >>> Catalog response: https://github.com/apache/iceberg-python/pull/997 >>> >>> Sung >>> >>> On 2024/08/06 18:29:50 Kevin Liu wrote: >>> >>> > > Typically we only push patches into the minor versions, we could also go >>> > > to version 0.8.0 immediately. >>> > >>> > The issues above sound like patches to me, fixing issues discovered during >>> > the 0.7.0 release. Is there a reason to move to 0.8.0? >>> > >>> > > I'm still on the fence regarding 17.0.0 upgrade. There are clear >>> > > functional upsides, but I feel that constraining PyIceberg to just one >>> > > published version would make the adoption of PyIceberg difficult for our >>> > > users. >>> > >>> > +1 on this concern. Is it possible to make the Arrow 17.0.0 upgrade >>> > optional first? So that folks who want the upgrade can test it out. >>> > >>> > Thanks, >>> > Kevin Liu >>> > >>> > On Fri, Aug 2, 2024 at 11:33 AM Sung Yun sun...@apache.org wrote: >>> > >>> > > Hi Fokko, >>> > > >>> > > That makes sense, thank you for the suggestion! The issue was quite >>> > > severe >>> > > for us that we had to fork the repo and have a fix ourselves in order to >>> > > run PyIceberg without our applications going OOM. So I think there will >>> > > be >>> > > value in getting the proposed config property out as early as possible >>> > > for >>> > > the larger community. >>> > > >>> > > I'm still on the fence regarding 17.0.0 upgrade. There are clear >>> > > functional upsides, but I feel that constraining PyIceberg to just one >>> > > published version would make the adoption of PyIceberg difficult for our >>> > > users. Users writing new applications won't have trouble with it, but >>> > > users >>> > > intending to use PyIceberg in an existing application may have to >>> > > upgrade >>> > > their PyArrow versions which could be a deterrent (or a welcome nudge). >>> > > Would it be worth starting that discussion on a separate thread? >>> > > >>> > > Sung >>> > > >>> > > On 2024/08/02 17:57:17 Fokko Driesprong wrote: >>> > > >>> > > > Hey Sung, >>> > > > >>> > > > Typically we only push patches into the minor versions, we could also >>> > > > go >>> > > > to >>> > > > version 0.8.0 immediately. >>> > > > >>> > > > Regarding the memory consumption, thanks for putting those numbers >>> > > > together! I would also love to get #929 >>> > > > https://github.com/apache/iceberg-python/pull/929, so we can push down >>> > > > the large/small type to PyArrow (only for to_arrow), and apply #986 >>> > > > https://github.com/apache/iceberg-python/pull/986 on top if you want >>> > > > to >>> > > > force it to either small or large types. >>> > > > >>> > > > WDYT? >>> > > > >>> > > > Kind regards, >>> > > > Fokko >>> > > > >>> > > > Op vr 2 aug 2024 om 19:46 schreef Sung Yun sun...@apache.org: >>> > > > >>> > > > > Hi folks, >>> > > > > >>> > > > > We identified inefficient memory usage hikes with the current way of >>> > > > > upcasting pyarrow types to large_<type> on read, when reading tables >>> > > > > with >>> > > > > certain characteristics. A detailed set of example benchmarks of >>> > > > > this >>> > > > > issue >>> > > > > is on the google document linked on PR #986: >>> > > > > https://github.com/apache/iceberg-python/pull/986 >>> > > > > >>> > > > > The proposed solution introduces a config to override this behavior >>> > > > > to >>> > > > > use >>> > > > > small types instead, and I'd like to add this into the patch >>> > > > > release to >>> > > > > give users better control over their memory usage. >>> > > > > >>> > > > > Also, this is just a gentle reminder that this DISCUSS thread is >>> > > > > still >>> > > > > open for any new issues that are identified from 0.7.0 release, >>> > > > > that we >>> > > > > should fix in the patch release. >>> > > > > >>> > > > > Thank you, >>> > > > > Sung >>> > > > > >>> > > > > On 2024/07/30 23:57:04 Sung Yun wrote: >>> > > > > >>> > > > > > Hi folks, >>> > > > > > >>> > > > > > We are starting to compile the list of issues to fix and port into >>> > > > > > the >>> > > > > > 0.7.1 release. >>> > > > > > >>> > > > > > The current list of known issues is as follows: >>> > > > > > >>> > > > > > Fix pydantic warning on table commit: #972 >>> > > > > > https://github.com/apache/iceberg-python/pull/972 (thanks for the >>> > > > > > quick >>> > > > > > fix ndrluis!) >>> > > > > > Issue when rewriting an unpartitioned table: #979 >>> > > > > > https://github.com/apache/iceberg-python/issues/979 >>> > > > > > Issue when evolving and writing in the same transaction: #980 >>> > > > > > https://github.com/apache/iceberg-python/issues/980 >>> > > > > > >>> > > > > > Please feel free to respond to this thread with any issues that >>> > > > > > should be >>> > > > > > tracked for the patch release. >>> > > > > > >>> > > > > > Thank you! >>> > > > > > Sung