swgillespie opened a new pull request, #7825: URL: https://github.com/apache/arrow-datafusion/pull/7825
## Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/7824. ## Rationale for this change It's impossible to write a logical plan to query a column from a Parquet data source whose type is `Map`. The `Map` type is not explicitly supported by `GetIndexedField`. Maps are a useful column, are already supported by both Arrow and Parquet, and it ## What changes are included in this PR? This commit extends the `NamedStructField` FieldAccess type to understand the `Map` data type. I chose this because the DataFusion SQL frontend parses the expression `x['y']` into a `NamedStructField`, which is a reasonable thing to do if we require that the argument to `x` be a constant scalar (which it is, in this implementation). The Arrow Map array is essentially a list of structs, where each struct is a two-field struct. The first field of the struct is the key, and the second field of the struct is the value. Arrow traditionally names these `key` and `value`, though this implementation does not assume what they are named and instead assumes that the second column is the `value` column and the first is the `key` column, which is the same assumption made by the Arrow implementation we use. To execute a mapped index access, we first scan the key column to identify entries that match the key that we are indexing, and again to gather the values corresponding to the keys that were selected. ## Are these changes tested? This PR adds a new test, `map.slt`, which includes a Parquet file with two `Map` columns (one mapping strings to strings, the other mapping strings to ints) and writes some queries that use them. ## Are there any user-facing changes? This change allows for the `GetIndexedField` type to now be usable with columns of type `Map`, which was not possible before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
