andygrove commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2127417428
@vidyasankarv I figured out what the issue is. I don't fully understand why, but when the fuzz test creates the DataFrame, the cast operation that gets performed is from a dictionary array not a string array: ``` cast_array(from=Dictionary(Int32, Utf8), to_type=Date32) ``` This means that we are not even calling your native date_parser but instead falling through to this catchall logic: ``` _ => { // when we have no Spark-specific casting we delegate to DataFusion cast_with_options(&array, to_type, &CAST_OPTIONS)? } ``` The solution is to add a specific match for casting dictionary to date: ```rust ( DataType::Dictionary(key_type, value_type), DataType::Date32, ) if key_type.as_ref() == &DataType::Int32 && (value_type.as_ref() == &DataType::Utf8 || value_type.as_ref() == &DataType::LargeUtf8) => { match value_type.as_ref() { DataType::Utf8 => { let unpacked_array = cast_with_options(&array, &DataType::Utf8, &CAST_OPTIONS)?; Self::cast_string_to_date(&unpacked_array, to_type, self.eval_mode)? } DataType::LargeUtf8 => { let unpacked_array = cast_with_options(&array, &DataType::LargeUtf8, &CAST_OPTIONS)?; Self::cast_string_to_date(&unpacked_array, to_type, self.eval_mode)? } dt => unreachable!( "{}", format!("invalid value type {dt} for dictionary-encoded string array") ), } }, ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org