[ 
https://issues.apache.org/jira/browse/OAK-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-2896:
-------------------------------
    Fix Version/s:     (was: 1.3.0)
                   1.3.5

> Putting many elements into a map results in many small segments. 
> -----------------------------------------------------------------
>
>                 Key: OAK-2896
>                 URL: https://issues.apache.org/jira/browse/OAK-2896
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segmentmk
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: performance
>             Fix For: 1.3.5
>
>         Attachments: OAK-2896.png, OAK-2896.xlsx
>
>
> There is an issue with how the HAMT implementation 
> ({{SegmentWriter.writeMap()}} interacts with the 256 segment references limit 
> when putting many entries into the map: This limit gets regularly reached 
> once the maps contains about 200k entries. At that points segments get 
> prematurely flushed resulting in more segments, thus more references and thus 
> even smaller segments. It is common for segments to be as small as 7k with a 
> tar file containing up to 35k segments. This is problematic as at this point 
> handling of the segment graph becomes expensive, both memory and CPU wise. I 
> have seen persisted segment graphs as big as 35M where the usual size is a 
> couple of ks. 
> As the HAMT map is used for storing children of a node this might have an 
> advert effect on nodes with many child nodes. 
> The following code can be used to reproduce the issue: 
> {code}
> SegmentWriter writer = new SegmentWriter(segmentStore, getTracker(), V_11);
> MapRecord baseMap = null;
> for (;;) {
>     Map<String, RecordId> map = newHashMap();
>     for (int k = 0; k < 1000; k++) {
>         RecordId stringId = 
> writer.writeString(String.valueOf(rnd.nextLong()));
>         map.put(String.valueOf(rnd.nextLong()), stringId);
>     }
>     Stopwatch w = Stopwatch.createStarted();
>     baseMap = writer.writeMap(baseMap, map);
>     System.out.println(baseMap.size() + " " + w.elapsed());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to