andygrove commented on PR #383:
URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2127417428

   @vidyasankarv I figured out what the issue is.
   
   I don't fully understand why, but when the fuzz test creates the DataFrame, 
the cast operation that gets performed is from a dictionary array not a string 
array:
   
   ```
   cast_array(from=Dictionary(Int32, Utf8), to_type=Date32)
   ```
   
   This means that we are not even calling your native date_parser but instead 
falling through to this catchall logic:
   
   ```
   _ => {
       // when we have no Spark-specific casting we delegate to DataFusion
       cast_with_options(&array, to_type, &CAST_OPTIONS)?
   }
   ```
   
   The solution is to add a specific match for casting dictionary to date:
   
   ```rust
               (
                   DataType::Dictionary(key_type, value_type),
                   DataType::Date32,
               ) if key_type.as_ref() == &DataType::Int32
                   && (value_type.as_ref() == &DataType::Utf8
                   || value_type.as_ref() == &DataType::LargeUtf8) =>
               {
                   match value_type.as_ref() {
                       DataType::Utf8 => {
                           let unpacked_array =
                               cast_with_options(&array, &DataType::Utf8, 
&CAST_OPTIONS)?;
                           Self::cast_string_to_date(&unpacked_array, to_type, 
self.eval_mode)?
                       }
                       DataType::LargeUtf8 => {
                           let unpacked_array =
                               cast_with_options(&array, &DataType::LargeUtf8, 
&CAST_OPTIONS)?;
                           Self::cast_string_to_date(&unpacked_array, to_type, 
self.eval_mode)?
                       }
                       dt => unreachable!(
                           "{}",
                           format!("invalid value type {dt} for 
dictionary-encoded string array")
                       ),
                   }
               },
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to