Re: Case-insensitive schemas

Ryan Blue Wed, 31 Jul 2024 17:02:41 -0700

Steve,

I replied on the PR, but the gist is that you're right. Using a schema that
has fields that would be considered identical in a case insensitive context
will fail at runtime. That's the right behavior because Iceberg can't
control the case sensitivity of applications or engines.


Ryan

On Wed, Jul 31, 2024 at 4:17 PM Lessard, Steve
<steve.less...@teradata.com.invalid> wrote:

> Is there some kind of configuration or metadata flag that hints whether a
> Schema is intended to be used case-sensitive or case-insensitive?
>
>
>
> In my PR for adding case-insensitivity support to PartitionSpec Steven Wu
> asked
> <https://github.com/apache/iceberg/pull/10678#discussion_r1696122748>:
>
>
>
> caseInsensitiveFindField uses normalized lower-case string for name -> id
> indexing. who can ensure the schema don't have two fields with names like
> data and DATA? Otherwise, caseInsensitiveFindField search is ambiguous. I
> am wondering if the caseSensitive config need to be pushed into the Schema
>  and spec?
>
>
>
> I suppose the answer to this question is to NOT use a case-sensitive
> schema in a case-insensitive way. In other words, if the schema is
> case-sensitive and contains columns named Make and MAKE then the result of
> calling caseInsensitiveFindField("Make") is undefined. But that raises a
> new question: how does one know the Schema was created with the intention
> of being used case-sensitive or case-insensitive? I looked at
> https://iceberg.apache.org/docs/latest/configuration
> <https://iceberg.apache.org/docs/latest/configuration/#catalog-properties>
> but found nothing.
>
>
>
> Is there some kind of configuration or metadata flag that gives a hint?
>
>
>
> -Steve Lessard, Teradata
>
>
>
>
>


-- 
Ryan Blue
Databricks

Re: Case-insensitive schemas

Reply via email to