alexanderbianchi commented on PR #22041:
URL: https://github.com/apache/datafusion/pull/22041#issuecomment-4397658813

   A bit more context on why this takes the approach of serializing the inner 
value:
   
   There are two unrelated "dictionary" concepts that are easy to conflate here:
   
   ```text
   Map/dictionary value:
     {"service": "beagle", "dc": "us1"}
     logical object/map type
   
   Arrow dictionary encoding:
     Dictionary(Int32, Utf8)
     physically encoded string column: integer keys + string values
   ```
   
   This PR is about the second one. The logical value is still just a string.
   
   The failing path we hit was:
   
   ```sql
   metric_name = 'req.latency'
   ```
   
   where the table schema exposes `metric_name` as:
   
   ```text
   Dictionary(Int32, Utf8)
   ```
   
   DataFusion type coercion makes both sides of the equality compatible, so the 
predicate becomes conceptually:
   
   ```text
   Column(metric_name: Dictionary(Int32, Utf8))
   =
   Literal(Dictionary(Int32, Utf8("req.latency")))
   ```
   
   That scalar is not a map/object value. It is DataFusion representing a 
string scalar that has been coerced to match a dictionary-encoded string column.
   
   The Substrait producer then failed with:
   
   ```text
   Unsupported literal: Dictionary(Int32, Utf8("req.latency"))
   ```
   
   In Substrait, there is no useful distinction between a string scalar and a 
"dictionary-encoded string scalar" here. Dictionary encoding is meaningful for 
arrays/columns, not for a single scalar literal. So the intended encoding is 
just the logical literal value:
   
   ```text
   Substrait string literal "req.latency"
   ```
   
   The column/scan can still be dictionary encoded when the plan is consumed 
against a table schema where `metric_name` is `Dictionary(Int32, Utf8)`. At 
that point DataFusion can again apply its normal coercion/execution behavior 
for comparing the dictionary column to the string literal.
   
   So the key point is: this PR is not trying to encode dictionary array layout 
into Substrait literals. It is preserving the logical scalar value while 
avoiding a producer failure caused by DataFusion's internal 
`ScalarValue::Dictionary` representation after type coercion.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to