Case-insensitive schemas

Lessard, Steve Wed, 31 Jul 2024 16:16:46 -0700

Is there some kind of configuration or metadata flag that hints whether a 
Schema is intended to be used case-sensitive or case-insensitive?


In my PR for adding case-insensitivity support to PartitionSpec Steven Wu 
asked<https://github.com/apache/iceberg/pull/10678#discussion_r1696122748>:

caseInsensitiveFindField uses normalized lower-case string for name -> id 
indexing. who can ensure the schema don't have two fields with names like data 
and DATA? Otherwise, caseInsensitiveFindField search is ambiguous. I am 
wondering if the caseSensitive config need to be pushed into the Schema and 
spec?

I suppose the answer to this question is to NOT use a case-sensitive schema in 
a case-insensitive way. In other words, if the schema is case-sensitive and 
contains columns named Make and MAKE then the result of calling 
caseInsensitiveFindField("Make") is undefined. But that raises a new question: 
how does one know the Schema was created with the intention of being used 
case-sensitive or case-insensitive? I looked at 
https://iceberg.apache.org/docs/latest/configuration<https://iceberg.apache.org/docs/latest/configuration/#catalog-properties>
 but found nothing.

Is there some kind of configuration or metadata flag that gives a hint?

-Steve Lessard, Teradata

Case-insensitive schemas

Reply via email to