Sounds reasonable to me

On Wed, Jul 10, 2024 at 9:28 AM Renjie Liu <liurenjie2...@gmail.com> wrote:

> Hi:
>
> +1 for standardizing iceberg properties. This will help to align different
> language implementations.
>
> On Wed, Jul 10, 2024 at 9:44 PM <ndrl...@proton.me.invalid> wrote:
>
>> Hello Everyone,
>>
>> I was considering discussing the standardization of Iceberg properties,
>> and I believe this thread could be a great place to start.
>>
>> I'm writing an Iceberg client in Elixir and using the Java, Python, and
>> Rust implementations as references. However, I've had some difficulty
>> determining which configurations we must support and what each client has
>> implemented. Therefore, I agree with Xuanwo about having a separate
>> section as a single source of truth (SSOT).
>>
>> Additionally, I think it would be beneficial for each client to show what
>> it does not support. This would make it easier for users to know that a
>> particular client might not work with some configuration that their catalog
>> could define as default or override. It would also help us, as
>> contributors, to know which configurations we need to implement support for.
>>
>> For example, the "s3.signer"[1] and "s3.proxy-uri"[2] configurations only
>> exist in the Python implementation. I believe it is not clear that these
>> configurations are exclusive to Python, and they might be configurations
>> that the catalog could override or define as defaults in the get info
>> endpoint. Without an SSOT, this could be harder to track.
>>
>> Another example is the "rest.authorization-url" in Python and Rust versus
>> "oauth2_server_uri" in Java. Although this is a bit out of scope for this
>> thread, I will open another discussion topic about broader standardization
>> of available properties.
>>
>> [1]:
>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python+s3.signer&type=code
>> [2]:
>> https://github.com/search?q=repo%3Aapache%2Ficeberg-python%20S3_PROXY_URI&type=code
>> On Wednesday, July 10th, 2024 at 7:51 AM, Fokko Driesprong <
>> fo...@apache.org> wrote:
>>
>> Hey Xuanwo,
>>
>> Thanks for raising this.
>>
>>    - The S3 properties are largely covered under the S3FileIO page:
>>    https://iceberg.apache.org/docs/nightly/aws/#s3-fileio. But it looks
>>    like some important ones are missing indeed. I've raised an issue here
>>    <https://github.com/apache/iceberg/issues/10674>.
>>    - For PyIceberg it only supports like a subset of the functionality,
>>    and therefore also many properties are missing there.
>>    - For the REST Catalog, there is an open PR to add
>>    <https://github.com/apache/iceberg/pull/10576> the options for GCS
>>    and ADLS. It would be great to get some more eyes on there.
>>
>> That being said, I do think there is value in formalizing them. When
>> adding configuration options to PyIceberg, I'll make sure to check out the
>> Java implementation to ensure that we use the same property.
>>
>> Kind regards,
>> Fokko
>>
>> Op wo 10 jul 2024 om 09:22 schreef Xuanwo <xua...@apache.org>:
>>
>>> Hello everyone
>>>
>>> I've been working on the iceberg-rust FileIO recently and have found it
>>> challenging to identify all the necessary IO properties we need to support.
>>>
>>> For instance, consider AWS S3. There are no documents specifying which
>>> properties are supported by S3.
>>>
>>> The only relevant documentation I could find includes:
>>>
>>> - Iceberg AWS Integrations[1]: Does not define `s3.access-key-id` or
>>> `s3.secret-access-key`.
>>> - Pyiceberg configuration[2]: Missing several S3-related properties.
>>> - Iceberg REST Catalog[3]: Does not cover all storage services.
>>>
>>> To gather this information, we must refer to the S3FileIO Java code[4].
>>>
>>> I propose adding a separate section for agreeing upon these properties.
>>> We could create a specification that outlines all IO properties with
>>> indications of whether they are required or optional, along with their
>>> expected behaviors. This would help ensure consistency across different
>>> implementations without any conflicts.
>>>
>>>
>>> [1]: https://iceberg.apache.org/docs/latest/aws/
>>> [2]: https://py.iceberg.apache.org/configuration/#s3
>>> [3]:
>>> https://github.com/apache/iceberg/blob/eee81c59199a54e749ea58dae070eb066d9a5f9e/open-api/rest-catalog-open-api.yaml#L2737
>>> [4]:
>>> https://github.com/apache/iceberg/blob/2b21020aedb63c26295005d150c05f0a5a5f0eb2/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L46
>>>
>>> Xuanwo
>>>
>>> https://xuanwo.io/
>>>
>>
>>

Reply via email to