Hello,

Pandas has Iceberg support merged (using PyIceberg in the backend), which
will be available in the upcoming Pandas 3 release. Any feedback from the
Iceberg community can be very helpful at this initial stage. But please
keep in mind that the goal is to democratize Iceberg for Python developers
and not provide the most complex features right away. Here are the PRs:

https://github.com/pandas-dev/pandas/pull/61383
https://github.com/pandas-dev/pandas/pull/61507

One challenge is that specifying partitioning for Iceberg write in
DataFrame.to_iceberg requires PyIceberg objects in the API signature, but
making PyIceberg a hard dependency is not acceptable for Pandas. Any
thoughts on other ways to specify partitioning in Python APIs?

BTW, Bodo has released an implementation of these Pandas APIs if you want
to play with them now quickly (since Pandas is not released yet):
https://docs.bodo.ai/latest/quick_start/quickstart_local_iceberg/

Best,
Ehsan

Reply via email to