[ 
https://issues.apache.org/jira/browse/OAK-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953737#comment-15953737
 ] 

Michael Dürig commented on OAK-5469:
------------------------------------

Maybe realted / interesting here: [shared memory and gc 
pauses|http://www.evanjones.ca/jvm-mmap-pause.html].

> TarMK: scaling the content
> --------------------------
>
>                 Key: OAK-5469
>                 URL: https://issues.apache.org/jira/browse/OAK-5469
>             Project: Jackrabbit Oak
>          Issue Type: Epic
>          Components: segment-tar
>            Reporter: Michael Dürig
>              Labels: scalability
>             Fix For: 1.8
>
>         Attachments: segment-per-path.png
>
>
> Production experience has shown that big repositories are prone to thrashing:
> {quote}
> Monitoring showed as massive level of major page faults, load averages 
> several times the number of cores, system cpu levels well above 50% and 
> extreme levels of IO. As more IOPS was provisioned the instance consumed all 
> available IOPS. The TechOps team reported many TB of read IO per hour and 
> hardly any write IO.
> Investigation revealed that the repository size was just larger than the 
> available RAM on the machine. The instance was running in MMAPED mode and the 
> IO was due to major page faults mapping in and out pages of memory. This was 
> made worse by transparent huge page settings causing huge pages to be mapped 
> proactively on major page faults. Compaction reduced the repository size to 
> less than RAM. The TechOps team now monitor the total tar file size and dont 
> let it exceed the RAM on the machine, scheduling compactions to keep within 
> limits. Since the default to TarMK was to run memory mapped rather than on 
> heap, the JVM had no visibility of the mayhem being caused at OS level.
> {quote}
> This epic is all about improving scalability of the TarMK wrt. the content. 
> Below are some initial points to consider. Let's create issues and link them 
> to this epic as we go.
> * What kind of internal / external monitoring do we need to understand and 
> optimally predict thrashing? Can we monitor the working set (active pages)? 
> The number of segments in the segment cache might be a good starting point.
> * (How) can we reproduce the thrashing (easily enough)? Can we scale it down 
> (i.e. to an instance with littler RAM)?
> * What is the impact of transparent huge pages (and switching it off)? How 
> much do we suffer from read amplification? What would be the impact of not 
> memory mapping but instead increasing the size of the segment buffer 
> accordingly? Both approaches aim at having finer grained control over the 
> data actually being loaded into RAM.
> * What other OS level tweaks should / can we look at? 
> * Can we reduce the working set by keeping it more compact? E.g. running 
> GC/compaction, reducing read amplification (see above), improving 
> de-duplication of values, storing values more efficiently (e.g. dates, and 
> boolean), can we on the fly compress buffers (e.g. segments)?
> * How do we testing with big repositories?
>   * What is a big repository? (Potential target: 100 GB segment store - 500M 
> nodes, TBC)
>   * What to measure (indicators of size): size on disk (after compaction), 
> number of JCR nodes, number of node records (reachable vs. waste)
>   * How to measure?
>     * {{oak-run debug}} (needs improvements for better scalability)
>     * one-line tool to provide all the info?
>   * How to obtain big repositories (generate or re-use existing)?
>   * What to analyze / monitor / debug?
>     * Possible limits: number of nodes (relative to RAM) for which trashing 
> starts to occur, max. number of direct children, max. concurrent requests 
> during online garbage collection.
>     * Platform monitoring: 
>       * basic: disc size, IO, CPU, memory
>       * Asses impact of hardware upgrades on performance. E.g. what impact 
> does doubling RAM/IO/CPU have on our test scenarios.
>       * in depth: page faults, writes / reads per process, working set of 
> nodes, commit statistics, incoming requests vs Oak operations, other hiccups
>       * Tools: [Ganglia|http://ganglia.info/], 
> [jHiccups|https://github.com/giltene/jHiccup], 
> [AppDynamics|https://www.appdynamics.com/]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to