[GitHub] leventov opened a new pull request #6370: Introduce SegmentId class

GitBox Sat, 22 Sep 2018 02:37:26 -0700

leventov opened a new pull request #6370: Introduce SegmentId class
URL: https://github.com/apache/incubator-druid/pull/6370
 
 
   This PR introduces a `SegmentId` class for operating with segment ids in the 
codebase instead of Strings.
   
   Three reasons:
    - Memory efficiency: `SegmentId`'s footprint is about 50 bytes smaller than 
a String, assuming non-compressed Oops, if I calculated correctly (should be 
~25 bytes smaller since Java 9's "compact strings"). Should be bigger with 
compressed Oops.
    - Code readability and safety: it's more readable and safer to operate with 
`Map<SegmentId, Something>` than with `Map<String, Something>`
    - `SegmentId` enabled to get rid of the `segments` map in `DruidServer` and 
`ImmutableDruidServer`, for additional memory savings.
   
   Also:
    - Replaced `ConcurrentSkipListMap` with `ConcurrentHashMap` in 
`DruidDataSource`, because it doesn't really need to be sorted.
    - Improved concurrency of `DruidServer`.
   
   Caveats of `SegmentId` introduction:
    - HTTP endpoints that return a lot of segment ids as parts of some JSON 
answers generate more garbage, because they convert `SegmentId` objects into 
Strings. Previously, those Strings were already stored in the heap.
    - HTTP endpoints which accept segment id need to do more work, because they 
need to iterate all possible parsings of a segment id string into `SegmentId`, 
that is ambiguous. But this should be minor, compared to the cost of an HTTP 
call itself. The exception is 
`MetadataResource.getDatabaseSegmentDataSourceSegment()`: to preserve it's 
former behaviour precisely, it would require to generate quite a lot of garbage 
`String` objects on each endpoint call, so the logic of this endpoint was 
changed (see below).
   
   Changed HTTP endpoint behaviour:
    - `MetadataResource.getDatabaseSegmentDataSourceSegment()`, 
`/datasources/{dataSourceName}/segments/{segmentId}`: it used to search for 
segment id match ignoring the case of the string, that's not in line with all 
other segmentId-accepting endpoints, and to me it doesn't look correctly, 
because `"ver"` and `"VER"` are two different versions according to the rest of 
the logic. It's changed to look for case-sensitive segmentId match.
    - `DatasourcesResource.getSegmentDataSourceSpecificInterval()`, 
`/{dataSourceName}/intervals/{interval}`, `simple` option: it creates a 
`HashMap` of intervals, unlike same method, `full` option, and both `simple` 
and `full` options of `getSegmentDataSourceIntervals()`, which all create 
`TreeMap`s of intervals. To me it looked like a slip, so I changed this 
endpoint to create a `TreeMap` as well (i. e. return interval JSON maps in 
different order the endpoint does currently).
   
   This PR is labelled `Incompatible` because it changed multiple Public APIs 
(to return `SegmentId` instead of `String` from some getters).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] leventov opened a new pull request #6370: Introduce SegmentId class

Reply via email to