friendlymatthew commented on PR #15361:
URL: https://github.com/apache/datafusion/pull/15361#issuecomment-2783424002
It may be beneficial to leave the existing code paths undisturbed and decide
whether to retry on err.
It avoids the overhead of custom `chrono::Strftime::parse` validation. Plus,
the retry logic for `to_char_scalar` is simple to implement. Something like:
<details>
<summary>to_char_scalar</summary>
<br>
```rust
let formatter = ArrayFormatter::try_new(array.as_ref(), &format_options)?;
let formatted: Result<Vec<Option<String>>, ArrowError> = (0..array.len())
.map(|i| {
if array.is_null(i) {
Ok(None)
} else {
formatter.value(i).try_to_string().map(Some)
}
})
.collect(); /*
will stop iterating upon the first err and
since we're dealing with one format string,
it'll halt after the first format attempt
*/
if let Ok(formatted) = formatted {
if is_scalar_expression {
Ok(ColumnarValue::Scalar(ScalarValue::Utf8(
formatted.first().unwrap().clone(),
)))
} else {
Ok(ColumnarValue::Array(
Arc::new(StringArray::from(formatted)) as ArrayRef
))
}
} else {
// if the format attempt is with a Date32, it's possible the attempt
failed because
// the format string contained time-specifiers, so we'll retry as
Date64s
if data_type == &Date32 {
return to_char_scalar(expression.clone().cast_to(&Date64, None)?,
format);
}
exec_err!("{}", formatted.unwrap_err())
}
```
</details>
`to_char_array` is a bit more complex (see
https://github.com/apache/datafusion/pull/15361#discussion_r2015455219), but
I'll think it through.
FWIW, it's worth implementing the retry logic and benchmarking the
performance. I'm happy to do the work. The decision between eager casting and
selective retry likely comes down to whether we're willing to slightly slow
down existing behavior in exchange for better performance in the new case. But
I'm just spitballing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]