What do you think about adding the fix that excludes PyIceberg support for 
Python 3.9.7 in the 0.7.1 release?[1] It already doesn't work, so this is just 
to avoid any new issues.

- [1]: https://github.com/apache/iceberg-python/pull/526

André Anastácio


On Tuesday, August 6th, 2024 at 4:06 PM, Sung Yun <sun...@apache.org> wrote:

> Sounds good folks! Thank you for sharing your thoughts. We'll work on getting 
> the patch release out, and continue the discussion on upgrading the PyArrow 
> version to 17.0.0 in time for 0.8.0 release.
> 
> Just adding these two more fixes that were introduced that I think we should 
> pull into the patch release. These were added to the GitHub milestone for 
> 0.7.1, but just cross posting here for awareness:
> 
> - Table scan fails when result is empty: 
> https://github.com/apache/iceberg-python/pull/997
> - Fix RestCatalog ListNamespace to correctly make use of the expected Rest 
> Catalog response: https://github.com/apache/iceberg-python/pull/997
> 
> Sung
> 
> On 2024/08/06 18:29:50 Kevin Liu wrote:
> 
> > > Typically we only push patches into the minor versions, we could also go
> > > to version 0.8.0 immediately.
> > 
> > The issues above sound like patches to me, fixing issues discovered during
> > the 0.7.0 release. Is there a reason to move to 0.8.0?
> > 
> > > I'm still on the fence regarding 17.0.0 upgrade. There are clear
> > > functional upsides, but I feel that constraining PyIceberg to just one
> > > published version would make the adoption of PyIceberg difficult for our
> > > users.
> > 
> > +1 on this concern. Is it possible to make the Arrow 17.0.0 upgrade
> > optional first? So that folks who want the upgrade can test it out.
> > 
> > Thanks,
> > Kevin Liu
> > 
> > On Fri, Aug 2, 2024 at 11:33 AM Sung Yun sun...@apache.org wrote:
> > 
> > > Hi Fokko,
> > > 
> > > That makes sense, thank you for the suggestion! The issue was quite severe
> > > for us that we had to fork the repo and have a fix ourselves in order to
> > > run PyIceberg without our applications going OOM. So I think there will be
> > > value in getting the proposed config property out as early as possible for
> > > the larger community.
> > > 
> > > I'm still on the fence regarding 17.0.0 upgrade. There are clear
> > > functional upsides, but I feel that constraining PyIceberg to just one
> > > published version would make the adoption of PyIceberg difficult for our
> > > users. Users writing new applications won't have trouble with it, but 
> > > users
> > > intending to use PyIceberg in an existing application may have to upgrade
> > > their PyArrow versions which could be a deterrent (or a welcome nudge).
> > > Would it be worth starting that discussion on a separate thread?
> > > 
> > > Sung
> > > 
> > > On 2024/08/02 17:57:17 Fokko Driesprong wrote:
> > > 
> > > > Hey Sung,
> > > > 
> > > > Typically we only push patches into the minor versions, we could also go
> > > > to
> > > > version 0.8.0 immediately.
> > > > 
> > > > Regarding the memory consumption, thanks for putting those numbers
> > > > together! I would also love to get #929
> > > > https://github.com/apache/iceberg-python/pull/929, so we can push down
> > > > the large/small type to PyArrow (only for to_arrow), and apply #986
> > > > https://github.com/apache/iceberg-python/pull/986 on top if you want
> > > > to
> > > > force it to either small or large types.
> > > > 
> > > > WDYT?
> > > > 
> > > > Kind regards,
> > > > Fokko
> > > > 
> > > > Op vr 2 aug 2024 om 19:46 schreef Sung Yun sun...@apache.org:
> > > > 
> > > > > Hi folks,
> > > > > 
> > > > > We identified inefficient memory usage hikes with the current way of
> > > > > upcasting pyarrow types to large_<type> on read, when reading tables
> > > > > with
> > > > > certain characteristics. A detailed set of example benchmarks of this
> > > > > issue
> > > > > is on the google document linked on PR #986:
> > > > > https://github.com/apache/iceberg-python/pull/986
> > > > > 
> > > > > The proposed solution introduces a config to override this behavior to
> > > > > use
> > > > > small types instead, and I'd like to add this into the patch release 
> > > > > to
> > > > > give users better control over their memory usage.
> > > > > 
> > > > > Also, this is just a gentle reminder that this DISCUSS thread is still
> > > > > open for any new issues that are identified from 0.7.0 release, that 
> > > > > we
> > > > > should fix in the patch release.
> > > > > 
> > > > > Thank you,
> > > > > Sung
> > > > > 
> > > > > On 2024/07/30 23:57:04 Sung Yun wrote:
> > > > > 
> > > > > > Hi folks,
> > > > > > 
> > > > > > We are starting to compile the list of issues to fix and port into
> > > > > > the
> > > > > > 0.7.1 release.
> > > > > > 
> > > > > > The current list of known issues is as follows:
> > > > > > 
> > > > > > Fix pydantic warning on table commit: #972
> > > > > > https://github.com/apache/iceberg-python/pull/972 (thanks for the
> > > > > > quick
> > > > > > fix ndrluis!)
> > > > > > Issue when rewriting an unpartitioned table: #979
> > > > > > https://github.com/apache/iceberg-python/issues/979
> > > > > > Issue when evolving and writing in the same transaction: #980
> > > > > > https://github.com/apache/iceberg-python/issues/980
> > > > > > 
> > > > > > Please feel free to respond to this thread with any issues that
> > > > > > should be
> > > > > > tracked for the patch release.
> > > > > > 
> > > > > > Thank you!
> > > > > > Sung

Reply via email to