leventov opened a new pull request #6370: Introduce SegmentId class URL: https://github.com/apache/incubator-druid/pull/6370 This PR introduces a `SegmentId` class for operating with segment ids in the codebase instead of Strings. Three reasons: - Memory efficiency: `SegmentId`'s footprint is about 50 bytes smaller than a String, assuming non-compressed Oops, if I calculated correctly (should be ~25 bytes smaller since Java 9's "compact strings"). Should be bigger with compressed Oops. - Code readability and safety: it's more readable and safer to operate with `Map<SegmentId, Something>` than with `Map<String, Something>` - `SegmentId` enabled to get rid of the `segments` map in `DruidServer` and `ImmutableDruidServer`, for additional memory savings. Also: - Replaced `ConcurrentSkipListMap` with `ConcurrentHashMap` in `DruidDataSource`, because it doesn't really need to be sorted. - Improved concurrency of `DruidServer`. Caveats of `SegmentId` introduction: - HTTP endpoints that return a lot of segment ids as parts of some JSON answers generate more garbage, because they convert `SegmentId` objects into Strings. Previously, those Strings were already stored in the heap. - HTTP endpoints which accept segment id need to do more work, because they need to iterate all possible parsings of a segment id string into `SegmentId`, that is ambiguous. But this should be minor, compared to the cost of an HTTP call itself. The exception is `MetadataResource.getDatabaseSegmentDataSourceSegment()`: to preserve it's former behaviour precisely, it would require to generate quite a lot of garbage `String` objects on each endpoint call, so the logic of this endpoint was changed (see below). Changed HTTP endpoint behaviour: - `MetadataResource.getDatabaseSegmentDataSourceSegment()`, `/datasources/{dataSourceName}/segments/{segmentId}`: it used to search for segment id match ignoring the case of the string, that's not in line with all other segmentId-accepting endpoints, and to me it doesn't look correctly, because `"ver"` and `"VER"` are two different versions according to the rest of the logic. It's changed to look for case-sensitive segmentId match. - `DatasourcesResource.getSegmentDataSourceSpecificInterval()`, `/{dataSourceName}/intervals/{interval}`, `simple` option: it creates a `HashMap` of intervals, unlike same method, `full` option, and both `simple` and `full` options of `getSegmentDataSourceIntervals()`, which all create `TreeMap`s of intervals. To me it looked like a slip, so I changed this endpoint to create a `TreeMap` as well (i. e. return interval JSON maps in different order the endpoint does currently). This PR is labelled `Incompatible` because it changed multiple Public APIs (to return `SegmentId` instead of `String` from some getters).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org