swgillespie opened a new pull request, #7825:
URL: https://github.com/apache/arrow-datafusion/pull/7825

   ## Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-datafusion/issues/7824.
   
   ## Rationale for this change
   
   It's impossible to write a logical plan to query a column from a Parquet 
data source whose type is `Map`. The `Map` type is not explicitly supported by 
`GetIndexedField`. Maps are a useful column, are already supported by both 
Arrow and Parquet, and it 
   
   ## What changes are included in this PR?
   
   This commit extends the `NamedStructField` FieldAccess type to understand 
the `Map` data type. I chose this because the DataFusion SQL frontend parses 
the expression `x['y']` into a `NamedStructField`, which is a reasonable thing 
to do if we require that the argument to `x` be a constant scalar (which it is, 
in this implementation).
   
   The Arrow Map array is essentially a list of structs, where each struct is a 
two-field struct. The first field of the struct is the key, and the second 
field of the struct is the value. Arrow traditionally names these `key` and 
`value`, though this implementation does not assume what they are named and 
instead assumes that the second column is the `value` column and the first is 
the `key` column, which is the same assumption made by the Arrow implementation 
we use.
   
   To execute a mapped index access, we first scan the key column to identify 
entries that match the key that we are indexing, and again to gather the values 
corresponding to the keys that were selected.
   
   ## Are these changes tested?
   
   This PR adds a new test, `map.slt`, which includes a Parquet file with two 
`Map` columns (one mapping strings to strings, the other mapping strings to 
ints) and writes some queries that use them.
   
   ## Are there any user-facing changes?
   
   This change allows for the `GetIndexedField` type to now be usable with 
columns of type `Map`, which was not possible before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to