Thanks you all. I'm going to prepare a proposal PR for this. 

On Fri, Jul 12, 2024, at 10:06, Honah J. wrote:
> Hello everyone,
> 
> Thank you all for the valuable insights. I am also +1 on having standardized 
> names for File IO properties. Creating a dedicated section to summarize 
> property names in the Java implementation is a good starting point. Since 
> pyiceberg, icebergRust, and IcebergGolang will support only subsets of these 
> properties for some time (with the rest to be added in future development), 
> the existing Java implementation will serve as a useful reference. 
> Additionally, we could establish general naming conventions in the doc, such 
> as using the “s3.” prefix for S3 properties and hyphens to connect words.
> 
> Best regards,
> Honah
> 
> On Wed, Jul 10, 2024 at 10:47 AM <ndrl...@proton.me.invalid> wrote:
>> 
>> 
>> I don't know what the recommended way to start standardizing is. We can 
>> start a proposal for each context or have one proposal to handle all.
>> 
>> Suggested contexts to start with:
>>  • Rest Catalog
>>  • FileIO
>> 
>> I believe that most of the other cases are supported by the configuration 
>> topic in the Table section[1], but this is about the Java implementation. 
>> Maybe we need to create a page in the project section[2] to handle the 
>> properties in the table section and the Rest and FileIO contexts.
>> 
>> 
>> [1]: https://iceberg.apache.org/docs/latest/configuration/
>> [2]: https://iceberg.apache.org/community/
>> On Wednesday, July 10th, 2024 at 11:58 AM, Russell Spitzer 
>> <russell.spit...@gmail.com> wrote:
>>> Sounds reasonable to me
>>> 
>>> On Wed, Jul 10, 2024 at 9:28 AM Renjie Liu <liurenjie2...@gmail.com> wrote:
>>>> Hi:
>>>> 
>>>> +1 for standardizing iceberg properties. This will help to align different 
>>>> language implementations.
>>>> 
>>>> On Wed, Jul 10, 2024 at 9:44 PM <ndrl...@proton.me.invalid> wrote:
>>>>> 
>>>>> Hello Everyone,
>>>>> 
>>>>> I was considering discussing the standardization of Iceberg properties, 
>>>>> and I believe this thread could be a great place to start.
>>>>> 
>>>>> I'm writing an Iceberg client in Elixir and using the Java, Python, and 
>>>>> Rust implementations as references. However, I've had some difficulty 
>>>>> determining which configurations we must support and what each client has 
>>>>> implemented. Therefore, I agree with Xuanwo about having a separate 
>>>>> section as a single source of truth (SSOT).
>>>>> 
>>>>> Additionally, I think it would be beneficial for each client to show what 
>>>>> it does not support. This would make it easier for users to know that a 
>>>>> particular client might not work with some configuration that their 
>>>>> catalog could define as default or override. It would also help us, as 
>>>>> contributors, to know which configurations we need to implement support 
>>>>> for.
>>>>> 
>>>>> For example, the "s3.signer"[1] and "s3.proxy-uri"[2] configurations only 
>>>>> exist in the Python implementation. I believe it is not clear that these 
>>>>> configurations are exclusive to Python, and they might be configurations 
>>>>> that the catalog could override or define as defaults in the get info 
>>>>> endpoint. Without an SSOT, this could be harder to track.
>>>>> 
>>>>> Another example is the "rest.authorization-url" in Python and Rust versus 
>>>>> "oauth2_server_uri" in Java. Although this is a bit out of scope for this 
>>>>> thread, I will open another discussion topic about broader 
>>>>> standardization of available properties.
>>>>> 
>>>>> 
>>>>> [1]: 
>>>>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python+s3.signer&type=code
>>>>> [2]: 
>>>>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python%20S3_PROXY_URI&type=code
>>>>> 
>>>>> On Wednesday, July 10th, 2024 at 7:51 AM, Fokko Driesprong 
>>>>> <fo...@apache.org> wrote:
>>>>>> Hey Xuanwo,
>>>>>> 
>>>>>> Thanks for raising this.
>>>>>>  • The S3 properties are largely covered under the S3FileIO page: 
>>>>>> https://iceberg.apache.org/docs/nightly/aws/#s3-fileio. But it looks 
>>>>>> like some important ones are missing indeed. I've raised an issue here 
>>>>>> <https://github.com/apache/iceberg/issues/10674>.
>>>>>>  • For PyIceberg it only supports like a subset of the functionality, 
>>>>>> and therefore also many properties are missing there.
>>>>>>  • For the REST Catalog, there is an open PR to add 
>>>>>> <https://github.com/apache/iceberg/pull/10576> the options for GCS and 
>>>>>> ADLS. It would be great to get some more eyes on there.
>>>>>> That being said, I do think there is value in formalizing them. When 
>>>>>> adding configuration options to PyIceberg, I'll make sure to check out 
>>>>>> the Java implementation to ensure that we use the same property.
>>>>>> 
>>>>>> Kind regards,
>>>>>> Fokko
>>>>>> 
>>>>>> Op wo 10 jul 2024 om 09:22 schreef Xuanwo <xua...@apache.org>:
>>>>>>> Hello everyone
>>>>>>> 
>>>>>>> I've been working on the iceberg-rust FileIO recently and have found it 
>>>>>>> challenging to identify all the necessary IO properties we need to 
>>>>>>> support.
>>>>>>> 
>>>>>>> For instance, consider AWS S3. There are no documents specifying which 
>>>>>>> properties are supported by S3.
>>>>>>> 
>>>>>>> The only relevant documentation I could find includes:
>>>>>>> 
>>>>>>> - Iceberg AWS Integrations[1]: Does not define `s3.access-key-id` or 
>>>>>>> `s3.secret-access-key`.
>>>>>>> - Pyiceberg configuration[2]: Missing several S3-related properties.
>>>>>>> - Iceberg REST Catalog[3]: Does not cover all storage services.
>>>>>>> 
>>>>>>> To gather this information, we must refer to the S3FileIO Java code[4].
>>>>>>> 
>>>>>>> I propose adding a separate section for agreeing upon these properties. 
>>>>>>> We could create a specification that outlines all IO properties with 
>>>>>>> indications of whether they are required or optional, along with their 
>>>>>>> expected behaviors. This would help ensure consistency across different 
>>>>>>> implementations without any conflicts.
>>>>>>> 
>>>>>>> 
>>>>>>> [1]: https://iceberg.apache.org/docs/latest/aws/
>>>>>>> [2]: https://py.iceberg.apache.org/configuration/#s3
>>>>>>> [3]: 
>>>>>>> https://github.com/apache/iceberg/blob/eee81c59199a54e749ea58dae070eb066d9a5f9e/open-api/rest-catalog-open-api.yaml#L2737
>>>>>>> [4]: 
>>>>>>> https://github.com/apache/iceberg/blob/2b21020aedb63c26295005d150c05f0a5a5f0eb2/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L46
>>>>>>> 
>>>>>>> Xuanwo
>>>>>>> 
>>>>>>> https://xuanwo.io/
Xuanwo

https://xuanwo.io/

Reply via email to