alamb opened a new issue, #2874:
URL: https://github.com/apache/arrow-datafusion/issues/2874

   **Describe the bug**
   
   Various parts of the DataFusion codebase assume that the transformation 
between `ScalarValue` <--> `Array` have the same datatype. This would seem to 
be a reasonable assumption, however it does not hold for at least for 
`DictionaryArrays`
   
   For example, a `ScalarVaule` that is converted to an array,  `cast`ed to a 
`DictionaryArray<_>` due to coertion rules, and then converted back to a 
`ScalarVaule`. When that supposedly cast `ScalarValue` is converted back to an 
Array, it does not maintain its Dictionary encoding, instead it results in a 
DataType::Utf8
   
   **To Reproduce**
   ```rust
   fn bad_cast() {
       // here is a problem with round trip casting to/from a dictionary
       // array. It is desired to cast this ScalarValue to a Dictionary
       // (for coertion, for example)
       let scalar = ScalarValue::Utf8(Some("foo".to_string()));
   
       let desired_type = DataType::Dictionary(
           // key type
           Box::new(DataType::Int32),
           // value type
           Box::new(DataType::UInt8)
       );
   
       // convert from scalar --> Array to call cast
       let scalar_array = scalar.to_array();
       // cast the actual value
       let cast_array = kernels::cast::cast(&scalar_array, 
&desired_type).unwrap();
       // turn it back to a scalar
       let cast_scalar = ScalarValue::try_from_array(&cast_array, 0).unwrap();
   
       // Some time later the "cast" scalar is turned back into an array:
       let array = cast_scalar.to_array_of_size(10);
   
       // The datatype should be "Dictionary" but is actually Utf8!!!
       assert_eq!(array.data_type(), &desired_type)
   }
   ```
   
   Running this function results in 
   
   ```
   thread 'main' panicked at 'assertion failed: `(left == right)`
     left: `UInt8`,
    right: `Dictionary(Int32, UInt8)`', src/main.rs:76:5
   ```
   
   
   **Expected behavior**
   Test case should pass 
   
   **Additional context**
   I am not sure if it makes sense to add a `ScalarValue::Dictionary` type 
variant, or perhaps add a `is_dictionary` flag or something else, or maybe even 
just not assume a `ScalarValue` can be round tripped and maintain its data type
   
   
   This is the root cause of 
https://github.com/apache/arrow-datafusion/issues/2873 -- I added a patch for 
that particular case but this problem can occur elsewhere
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to