Omega359 commented on code in PR #15361:
URL: https://github.com/apache/datafusion/pull/15361#discussion_r2020191523
##########
datafusion/functions/src/datetime/to_char.rs:
##########
@@ -220,6 +221,13 @@ fn _to_char_scalar(
}
}
+ // eagerly cast Date32 values to Date64 to support date formatting with
time-related specifiers
+ // without error.
+ if data_type == &Date32 {
Review Comment:
Some data from a quick little benchmark I wrote
```
test_date32_to_date64_cast_array_1000
time: [311.06 ns 314.48 ns 318.16 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
test_parse_performance_with_time_specifiers_array_1000
time: [343.79 ns 351.64 ns 359.98 ns]
Found 14 outliers among 100 measurements (14.00%)
10 (10.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe
test_parse_performance_without_time_specifiers_array_1000
time: [196.59 µs 201.06 µs 206.45 µs]
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
```
test_date32_to_date64_cast_array_1000: just casts from date32 to date64
test_parse_performance_with_time_specifiers_array_1000: parses the formats
looking for time specifiers and when found does the cast from date32 to date64
test_parse_performance_without_time_specifiers_array_1000: parses the
formats looking for time specifiers (doesn't find anything), no cast
Note that results on my machine will vary +-10% between runs. The check for
time specifiers in the format is simple and could be improved but I think is
enough to show general performance. Note that the list of time specifiers is
not complete
```
const TIME_SPECIFIERS: &[&str] = &[
"%H", "%M", "%S", "%c", "%f", "%k", "%I", "%l", "%p", "%R", "%T", "%x",
"%r", "%Z",
];
fn has_time_specifiers(str: &str) -> bool {
for specifier in TIME_SPECIFIERS {
if str.contains(specifier) {
return true;
}
}
false
}
```
A couple of takeaways from this. casting is not as cheap as I thought
however parsing is seems to be more expensive than that but perhaps with some
really good optimizations it could be reduced.
I don't see a good way to make this feature have no cost with Date32 && no
time specifiers in the format.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]