Thanks you all. I'm going to prepare a proposal PR for this. On Fri, Jul 12, 2024, at 10:06, Honah J. wrote: > Hello everyone, > > Thank you all for the valuable insights. I am also +1 on having standardized > names for File IO properties. Creating a dedicated section to summarize > property names in the Java implementation is a good starting point. Since > pyiceberg, icebergRust, and IcebergGolang will support only subsets of these > properties for some time (with the rest to be added in future development), > the existing Java implementation will serve as a useful reference. > Additionally, we could establish general naming conventions in the doc, such > as using the “s3.” prefix for S3 properties and hyphens to connect words. > > Best regards, > Honah > > On Wed, Jul 10, 2024 at 10:47 AM <ndrl...@proton.me.invalid> wrote: >> >> >> I don't know what the recommended way to start standardizing is. We can >> start a proposal for each context or have one proposal to handle all. >> >> Suggested contexts to start with: >> • Rest Catalog >> • FileIO >> >> I believe that most of the other cases are supported by the configuration >> topic in the Table section[1], but this is about the Java implementation. >> Maybe we need to create a page in the project section[2] to handle the >> properties in the table section and the Rest and FileIO contexts. >> >> >> [1]: https://iceberg.apache.org/docs/latest/configuration/ >> [2]: https://iceberg.apache.org/community/ >> On Wednesday, July 10th, 2024 at 11:58 AM, Russell Spitzer >> <russell.spit...@gmail.com> wrote: >>> Sounds reasonable to me >>> >>> On Wed, Jul 10, 2024 at 9:28 AM Renjie Liu <liurenjie2...@gmail.com> wrote: >>>> Hi: >>>> >>>> +1 for standardizing iceberg properties. This will help to align different >>>> language implementations. >>>> >>>> On Wed, Jul 10, 2024 at 9:44 PM <ndrl...@proton.me.invalid> wrote: >>>>> >>>>> Hello Everyone, >>>>> >>>>> I was considering discussing the standardization of Iceberg properties, >>>>> and I believe this thread could be a great place to start. >>>>> >>>>> I'm writing an Iceberg client in Elixir and using the Java, Python, and >>>>> Rust implementations as references. However, I've had some difficulty >>>>> determining which configurations we must support and what each client has >>>>> implemented. Therefore, I agree with Xuanwo about having a separate >>>>> section as a single source of truth (SSOT). >>>>> >>>>> Additionally, I think it would be beneficial for each client to show what >>>>> it does not support. This would make it easier for users to know that a >>>>> particular client might not work with some configuration that their >>>>> catalog could define as default or override. It would also help us, as >>>>> contributors, to know which configurations we need to implement support >>>>> for. >>>>> >>>>> For example, the "s3.signer"[1] and "s3.proxy-uri"[2] configurations only >>>>> exist in the Python implementation. I believe it is not clear that these >>>>> configurations are exclusive to Python, and they might be configurations >>>>> that the catalog could override or define as defaults in the get info >>>>> endpoint. Without an SSOT, this could be harder to track. >>>>> >>>>> Another example is the "rest.authorization-url" in Python and Rust versus >>>>> "oauth2_server_uri" in Java. Although this is a bit out of scope for this >>>>> thread, I will open another discussion topic about broader >>>>> standardization of available properties. >>>>> >>>>> >>>>> [1]: >>>>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python+s3.signer&type=code >>>>> [2]: >>>>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python%20S3_PROXY_URI&type=code >>>>> >>>>> On Wednesday, July 10th, 2024 at 7:51 AM, Fokko Driesprong >>>>> <fo...@apache.org> wrote: >>>>>> Hey Xuanwo, >>>>>> >>>>>> Thanks for raising this. >>>>>> • The S3 properties are largely covered under the S3FileIO page: >>>>>> https://iceberg.apache.org/docs/nightly/aws/#s3-fileio. But it looks >>>>>> like some important ones are missing indeed. I've raised an issue here >>>>>> <https://github.com/apache/iceberg/issues/10674>. >>>>>> • For PyIceberg it only supports like a subset of the functionality, >>>>>> and therefore also many properties are missing there. >>>>>> • For the REST Catalog, there is an open PR to add >>>>>> <https://github.com/apache/iceberg/pull/10576> the options for GCS and >>>>>> ADLS. It would be great to get some more eyes on there. >>>>>> That being said, I do think there is value in formalizing them. When >>>>>> adding configuration options to PyIceberg, I'll make sure to check out >>>>>> the Java implementation to ensure that we use the same property. >>>>>> >>>>>> Kind regards, >>>>>> Fokko >>>>>> >>>>>> Op wo 10 jul 2024 om 09:22 schreef Xuanwo <xua...@apache.org>: >>>>>>> Hello everyone >>>>>>> >>>>>>> I've been working on the iceberg-rust FileIO recently and have found it >>>>>>> challenging to identify all the necessary IO properties we need to >>>>>>> support. >>>>>>> >>>>>>> For instance, consider AWS S3. There are no documents specifying which >>>>>>> properties are supported by S3. >>>>>>> >>>>>>> The only relevant documentation I could find includes: >>>>>>> >>>>>>> - Iceberg AWS Integrations[1]: Does not define `s3.access-key-id` or >>>>>>> `s3.secret-access-key`. >>>>>>> - Pyiceberg configuration[2]: Missing several S3-related properties. >>>>>>> - Iceberg REST Catalog[3]: Does not cover all storage services. >>>>>>> >>>>>>> To gather this information, we must refer to the S3FileIO Java code[4]. >>>>>>> >>>>>>> I propose adding a separate section for agreeing upon these properties. >>>>>>> We could create a specification that outlines all IO properties with >>>>>>> indications of whether they are required or optional, along with their >>>>>>> expected behaviors. This would help ensure consistency across different >>>>>>> implementations without any conflicts. >>>>>>> >>>>>>> >>>>>>> [1]: https://iceberg.apache.org/docs/latest/aws/ >>>>>>> [2]: https://py.iceberg.apache.org/configuration/#s3 >>>>>>> [3]: >>>>>>> https://github.com/apache/iceberg/blob/eee81c59199a54e749ea58dae070eb066d9a5f9e/open-api/rest-catalog-open-api.yaml#L2737 >>>>>>> [4]: >>>>>>> https://github.com/apache/iceberg/blob/2b21020aedb63c26295005d150c05f0a5a5f0eb2/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L46 >>>>>>> >>>>>>> Xuanwo >>>>>>> >>>>>>> https://xuanwo.io/ Xuanwo
https://xuanwo.io/