Steve, I replied on the PR, but the gist is that you're right. Using a schema that has fields that would be considered identical in a case insensitive context will fail at runtime. That's the right behavior because Iceberg can't control the case sensitivity of applications or engines.
Ryan On Wed, Jul 31, 2024 at 4:17 PM Lessard, Steve <steve.less...@teradata.com.invalid> wrote: > Is there some kind of configuration or metadata flag that hints whether a > Schema is intended to be used case-sensitive or case-insensitive? > > > > In my PR for adding case-insensitivity support to PartitionSpec Steven Wu > asked > <https://github.com/apache/iceberg/pull/10678#discussion_r1696122748>: > > > > caseInsensitiveFindField uses normalized lower-case string for name -> id > indexing. who can ensure the schema don't have two fields with names like > data and DATA? Otherwise, caseInsensitiveFindField search is ambiguous. I > am wondering if the caseSensitive config need to be pushed into the Schema > and spec? > > > > I suppose the answer to this question is to NOT use a case-sensitive > schema in a case-insensitive way. In other words, if the schema is > case-sensitive and contains columns named Make and MAKE then the result of > calling caseInsensitiveFindField("Make") is undefined. But that raises a > new question: how does one know the Schema was created with the intention > of being used case-sensitive or case-insensitive? I looked at > https://iceberg.apache.org/docs/latest/configuration > <https://iceberg.apache.org/docs/latest/configuration/#catalog-properties> > but found nothing. > > > > Is there some kind of configuration or metadata flag that gives a hint? > > > > -Steve Lessard, Teradata > > > > > -- Ryan Blue Databricks