I don't know what the recommended way to start standardizing is. We can start a 
proposal for each context or have one proposal to handle all.

Suggested contexts to start with:

- Rest Catalog

- FileIO

I believe that most of the other cases are supported by the configuration topic 
in the Table section[1], but this is about the Java implementation. Maybe we 
need to create a page in the project section[2] to handle the properties in the 
table section and the Rest and FileIO contexts.

[1]: https://iceberg.apache.org/docs/latest/configuration/
[2]: https://iceberg.apache.org/community/
On Wednesday, July 10th, 2024 at 11:58 AM, Russell Spitzer 
<russell.spit...@gmail.com> wrote:

> Sounds reasonable to me
>
> On Wed, Jul 10, 2024 at 9:28 AM Renjie Liu <liurenjie2...@gmail.com> wrote:
>
>> Hi:
>>
>> +1 for standardizing iceberg properties. This will help to align different 
>> language implementations.
>>
>> On Wed, Jul 10, 2024 at 9:44 PM <ndrl...@proton.me.invalid> wrote:
>>
>>> Hello Everyone,
>>>
>>> I was considering discussing the standardization of Iceberg properties, and 
>>> I believe this thread could be a great place to start.
>>>
>>> I'm writing an Iceberg client in Elixir and using the Java, Python, and 
>>> Rust implementations as references. However, I've had some difficulty 
>>> determining which configurations we must support and what each client has 
>>> implemented. Therefore, I agree with Xuanwo about having a separate section 
>>> as a single source of truth (SSOT).
>>>
>>> Additionally, I think it would be beneficial for each client to show what 
>>> it does not support. This would make it easier for users to know that a 
>>> particular client might not work with some configuration that their catalog 
>>> could define as default or override. It would also help us, as 
>>> contributors, to know which configurations we need to implement support for.
>>>
>>> For example, the "s3.signer"[1] and "s3.proxy-uri"[2] configurations only 
>>> exist in the Python implementation. I believe it is not clear that these 
>>> configurations are exclusive to Python, and they might be configurations 
>>> that the catalog could override or define as defaults in the get info 
>>> endpoint. Without an SSOT, this could be harder to track.
>>> Another example is the "rest.authorization-url" in Python and Rust versus 
>>> "oauth2_server_uri" in Java. Although this is a bit out of scope for this 
>>> thread, I will open another discussion topic about broader standardization 
>>> of available properties.
>>>
>>> [1]: 
>>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python+s3.signer&type=code
>>> [2]: 
>>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python%20S3_PROXY_URI&type=code
>>>
>>> On Wednesday, July 10th, 2024 at 7:51 AM, Fokko Driesprong 
>>> <fo...@apache.org> wrote:
>>>
>>>> Hey Xuanwo,
>>>>
>>>> Thanks for raising this.
>>>>
>>>> - The S3 properties are largely covered under the S3FileIO page: 
>>>> https://iceberg.apache.org/docs/nightly/aws/#s3-fileio. But it looks like 
>>>> some important ones are missing indeed. I've raised [an issue 
>>>> here](https://github.com/apache/iceberg/issues/10674).
>>>> - For PyIceberg it only supports like a subset of the functionality, and 
>>>> therefore also many properties are missing there.
>>>> - For the REST Catalog, there is [an open PR to 
>>>> add](https://github.com/apache/iceberg/pull/10576) the options for GCS and 
>>>> ADLS. It would be great to get some more eyes on there.
>>>>
>>>> That being said, I do think there is value in formalizing them. When 
>>>> adding configuration options to PyIceberg, I'll make sure to check out the 
>>>> Java implementation to ensure that we use the same property.
>>>>
>>>> Kind regards,
>>>> Fokko
>>>>
>>>> Op wo 10 jul 2024 om 09:22 schreef Xuanwo <xua...@apache.org>:
>>>>
>>>>> Hello everyone
>>>>>
>>>>> I've been working on the iceberg-rust FileIO recently and have found it 
>>>>> challenging to identify all the necessary IO properties we need to 
>>>>> support.
>>>>>
>>>>> For instance, consider AWS S3. There are no documents specifying which 
>>>>> properties are supported by S3.
>>>>>
>>>>> The only relevant documentation I could find includes:
>>>>>
>>>>> - Iceberg AWS Integrations[1]: Does not define `s3.access-key-id` or 
>>>>> `s3.secret-access-key`.
>>>>> - Pyiceberg configuration[2]: Missing several S3-related properties.
>>>>> - Iceberg REST Catalog[3]: Does not cover all storage services.
>>>>>
>>>>> To gather this information, we must refer to the S3FileIO Java code[4].
>>>>>
>>>>> I propose adding a separate section for agreeing upon these properties. 
>>>>> We could create a specification that outlines all IO properties with 
>>>>> indications of whether they are required or optional, along with their 
>>>>> expected behaviors. This would help ensure consistency across different 
>>>>> implementations without any conflicts.
>>>>>
>>>>> [1]: https://iceberg.apache.org/docs/latest/aws/
>>>>> [2]: https://py.iceberg.apache.org/configuration/#s3
>>>>> [3]: 
>>>>> https://github.com/apache/iceberg/blob/eee81c59199a54e749ea58dae070eb066d9a5f9e/open-api/rest-catalog-open-api.yaml#L2737
>>>>> [4]: 
>>>>> https://github.com/apache/iceberg/blob/2b21020aedb63c26295005d150c05f0a5a5f0eb2/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L46
>>>>>
>>>>> Xuanwo
>>>>>
>>>>> https://xuanwo.io/

Reply via email to