Thanks a lot for the explanation Ryan and Fokko! This makes perfect sense. On Sat, Sep 6, 2025 at 1:21 PM Fokko Driesprong <[email protected]> wrote:
> Hey Alex, > > Thanks for raising this, and I'm happy to provide some historical context > around this. > > Back in the days when developing PyIceberg, I've tried to purely rely on > the output of the generator that we use for producing > rest-catalog-open-api.py, however, I quickly noticed that the structure > that we have is too complex to express in the open-api definition (at least > back then). One prominent example is how schema's are encoded (related > issues #6798 <https://github.com/apache/iceberg/issues/6798>, #6672 > <https://github.com/apache/iceberg/pull/6672>); a type can both be a > string (eg. fixed[22]), or an object (eg {"type": "map", ...}). In > PyIceberg, unfortunately, we had to add some deserialization logic > <https://github.com/apache/iceberg-python/blob/52d810efb62e39ec6d8d6a2f4cd2cad8165e2d2c/pyiceberg/types.py#L126-L128> > to > make this situation work. There are some more examples, like the arbitrary > fields in the snapshot summary that need some additional TLC when > validating. > > When adding new request/response models, I think it makes sense to copy > then from the generated code, but probably they need some more work to make > them usable. (Tip for anyone who's interested in picking up > iceberg-python#2302 <https://github.com/apache/iceberg-python/issues/2302>). > But > I don't think it makes a lot of sense publishing them as-is. > > Kind regards, > Fokko > > > > > Op za 6 sep 2025 om 01:38 schreef Alex Stephen > <[email protected]>: > >> Hi all, >> >> I noticed that we generate a set of Python models >> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.py> >> containing Request/Response objects for the REST Catalog. >> >> PyIceberg has recreated many of these models >> <https://github.com/apache/iceberg-python/blob/main/pyiceberg/catalog/rest/__init__.py> >> albeit >> with less detail in many cases. >> >> I'd like to propose that we publish the rest-catalog-open-api.py >> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.py> >> file >> as a Python package that PyIceberg can import. This would allow us to keep >> the REST Catalog models + the Python models in sync. It would also allow >> other Python packages to pick up the REST Catalog schema while staying in >> sync with any changes that we make to the REST Catalog. >> >> We could publish this new Python package out of the standard Iceberg >> repo. Preferably, we'd release it separately from the Java library so that >> Java + Python can develop concurrently. >> >> Any thoughts? >> >> Thanks! >> >> -- Alex Stephen >> >
