GWphua commented on code in PR #18477:
URL: https://github.com/apache/druid/pull/18477#discussion_r2321411252
##########
processing/src/test/java/org/apache/druid/java/util/common/IntervalsTest.java:
##########
@@ -79,6 +79,44 @@ public void testFindOverlappingInterval()
);
}
+ @Test
+ public void testValidIntervalStrings()
+ {
+ final String[] intervalStringRepresentations = new String[]{
+ // Tests that use does not fallback to Intervals.of()
+ // Zulu with millis
+ "2022-01-01T00:00:00.000Z/2022-01-02T00:00:00.000Z",
+ "2021-03-14T12:34:56.789Z/2021-03-15T12:34:56.789Z",
+
+ // Offset with colon
+ "2022-01-01T00:00:00.000+05:30/2022-01-01T01:00:00.000+05:30",
+ "2022-01-01T07:00:00.000-07:00/2022-01-01T08:00:00.000-07:00",
+
+ // Basic offset without colon
+ "2022-01-01T00:00:00.000+0530/2022-01-01T01:00:00.000+0530",
+
+ // Tests that fallback to Intervals.of()
+ // Zulu without millis
+ "2022-01-01T00:00:00Z/2022-01-02T00:00:00Z",
+ // Date-only
+ "2022-01-01/2022-01-02",
+ // start/period
+ "2022-01-01T00:00:00.000Z/P1D",
+ "2022-01-01T12:00:00Z/PT6H",
+ "2022-01-01T00:00:00Z/P2DT3H4M5S",
+ // period/end
+ "P1D/2022-01-02T00:00:00.000Z",
+ "PT6H/2022-01-01T18:00:00Z",
+ "P2DT3H4M5S/2022-01-03T03:04:05Z"
Review Comment:
Hi @kfaraz
I attempted to make `tryOptimizedIntervalDeserialization` more inclusive
(e.g. Work on Date-only, and Without Millis case), but this requires changing
the `FAST_ISO_UTC_FORMATTER` DateTimeFormatter to a more flexible formatter.
This causes performance to resemble the un-optimized versions.
These 2 formatters allow formats like `"2022-01-01/2022-01-02"` or
`"2022-01-01T00:00:00Z/2022-01-02T00:00:00Z"`, but the performance is lacking...
```
private static final DateTimeFormatter FAST_ISO_UTC_FORMATTER =
ISODateTimeFormat.dateOptionalTimeParser()
.withOffsetParsed()
.withChronology(ISOChronology.getInstanceUTC());
Benchmark
(numValues) Mode Cnt Score Error Units
JodaIntervalDeserializationBenchmark.deserializeLegacy
20000 avgt 2 21.243 ms/op
JodaIntervalDeserializationBenchmark.deserializeLegacyFallback
20000 avgt 2 15.217 ms/op
JodaIntervalDeserializationBenchmark.deserializeOptimized
20000 avgt 2 20.605 ms/op
JodaIntervalDeserializationBenchmark.deserializeOptimizedFallback
20000 avgt 2 31.051 ms/op
```
```
private static final DateTimeFormatter FAST_ISO_UTC_FORMATTER =
ISODateTimeFormat.dateTimeParser().withChronology(ISOChronology.getInstanceUTC());
Benchmark
(numValues) Mode Cnt Score Error Units
JodaIntervalDeserializationBenchmark.deserializeLegacy
20000 avgt 3 21.502 ± 0.995 ms/op
JodaIntervalDeserializationBenchmark.deserializeLegacyFallback
20000 avgt 3 15.365 ± 1.227 ms/op
JodaIntervalDeserializationBenchmark.deserializeOptimized
20000 avgt 3 21.239 ± 1.543 ms/op
JodaIntervalDeserializationBenchmark.deserializeOptimizedFallback
20000 avgt 3 31.526 ± 3.083 ms/op
```
I think we can simply take advantage of knowing how our data will look like,
and optimize the parse process for the most frequently-used data format.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]