[jira] [Commented] (OAK-5469) TarMK: scaling the content

2017-04-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953737#comment-15953737
 ] 

Michael Dürig commented on OAK-5469:


Maybe realted / interesting here: [shared memory and gc 
pauses|http://www.evanjones.ca/jvm-mmap-pause.html].

> TarMK: scaling the content
> --
>
> Key: OAK-5469
> URL: https://issues.apache.org/jira/browse/OAK-5469
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: segment-tar
>Reporter: Michael Dürig
>  Labels: scalability
> Fix For: 1.8
>
> Attachments: segment-per-path.png
>
>
> Production experience has shown that big repositories are prone to thrashing:
> {quote}
> Monitoring showed as massive level of major page faults, load averages 
> several times the number of cores, system cpu levels well above 50% and 
> extreme levels of IO. As more IOPS was provisioned the instance consumed all 
> available IOPS. The TechOps team reported many TB of read IO per hour and 
> hardly any write IO.
> Investigation revealed that the repository size was just larger than the 
> available RAM on the machine. The instance was running in MMAPED mode and the 
> IO was due to major page faults mapping in and out pages of memory. This was 
> made worse by transparent huge page settings causing huge pages to be mapped 
> proactively on major page faults. Compaction reduced the repository size to 
> less than RAM. The TechOps team now monitor the total tar file size and dont 
> let it exceed the RAM on the machine, scheduling compactions to keep within 
> limits. Since the default to TarMK was to run memory mapped rather than on 
> heap, the JVM had no visibility of the mayhem being caused at OS level.
> {quote}
> This epic is all about improving scalability of the TarMK wrt. the content. 
> Below are some initial points to consider. Let's create issues and link them 
> to this epic as we go.
> * What kind of internal / external monitoring do we need to understand and 
> optimally predict thrashing? Can we monitor the working set (active pages)? 
> The number of segments in the segment cache might be a good starting point.
> * (How) can we reproduce the thrashing (easily enough)? Can we scale it down 
> (i.e. to an instance with littler RAM)?
> * What is the impact of transparent huge pages (and switching it off)? How 
> much do we suffer from read amplification? What would be the impact of not 
> memory mapping but instead increasing the size of the segment buffer 
> accordingly? Both approaches aim at having finer grained control over the 
> data actually being loaded into RAM.
> * What other OS level tweaks should / can we look at? 
> * Can we reduce the working set by keeping it more compact? E.g. running 
> GC/compaction, reducing read amplification (see above), improving 
> de-duplication of values, storing values more efficiently (e.g. dates, and 
> boolean), can we on the fly compress buffers (e.g. segments)?
> * How do we testing with big repositories?
>   * What is a big repository? (Potential target: 100 GB segment store - 500M 
> nodes, TBC)
>   * What to measure (indicators of size): size on disk (after compaction), 
> number of JCR nodes, number of node records (reachable vs. waste)
>   * How to measure?
> * {{oak-run debug}} (needs improvements for better scalability)
> * one-line tool to provide all the info?
>   * How to obtain big repositories (generate or re-use existing)?
>   * What to analyze / monitor / debug?
> * Possible limits: number of nodes (relative to RAM) for which trashing 
> starts to occur, max. number of direct children, max. concurrent requests 
> during online garbage collection.
> * Platform monitoring: 
>   * basic: disc size, IO, CPU, memory
>   * Asses impact of hardware upgrades on performance. E.g. what impact 
> does doubling RAM/IO/CPU have on our test scenarios.
>   * in depth: page faults, writes / reads per process, working set of 
> nodes, commit statistics, incoming requests vs Oak operations, other hiccups
>   * Tools: [Ganglia|http://ganglia.info/], 
> [jHiccups|https://github.com/giltene/jHiccup], 
> [AppDynamics|https://www.appdynamics.com/]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-5469) TarMK: scaling the content

2017-02-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860282#comment-15860282
 ] 

Michael Dürig commented on OAK-5469:


I came up with some [tooling|https://github.com/mduerig/script-oak] for 
creating segment incidence plots: 

!segment-per-path.png|width=700!

The x axis represents the segments in chronological order while on the y axis a 
segment is marked with a dot if it occurs in the sub tree rooted in the 
respective part. Such plots should be helpful to analyse how far content is 
spread over various segments. 

I used the following [script-oak|https://github.com/mduerig/script-oak] code to 
generate above plot:

{code}
val paths = (fs.getNode().analyse/"root").nodes.map("root/" + _.name)
writePlot(incidenceSeries(paths), wd/"segment-per-path", range = Some(500, 
1200))
{code}


> TarMK: scaling the content
> --
>
> Key: OAK-5469
> URL: https://issues.apache.org/jira/browse/OAK-5469
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: segment-tar
>Reporter: Michael Dürig
>  Labels: scalability
> Fix For: 1.8
>
> Attachments: segment-per-path.png
>
>
> Production experience has shown that big repositories are prone to thrashing:
> {quote}
> Monitoring showed as massive level of major page faults, load averages 
> several times the number of cores, system cpu levels well above 50% and 
> extreme levels of IO. As more IOPS was provisioned the instance consumed all 
> available IOPS. The TechOps team reported many TB of read IO per hour and 
> hardly any write IO.
> Investigation revealed that the repository size was just larger than the 
> available RAM on the machine. The instance was running in MMAPED mode and the 
> IO was due to major page faults mapping in and out pages of memory. This was 
> made worse by transparent huge page settings causing huge pages to be mapped 
> proactively on major page faults. Compaction reduced the repository size to 
> less than RAM. The TechOps team now monitor the total tar file size and dont 
> let it exceed the RAM on the machine, scheduling compactions to keep within 
> limits. Since the default to TarMK was to run memory mapped rather than on 
> heap, the JVM had no visibility of the mayhem being caused at OS level.
> {quote}
> This epic is all about improving scalability of the TarMK wrt. the content. 
> Below are some initial points to consider. Let's create issues and link them 
> to this epic as we go.
> * What kind of internal / external monitoring do we need to understand and 
> optimally predict thrashing? Can we monitor the working set (active pages)? 
> The number of segments in the segment cache might be a good starting point.
> * (How) can we reproduce the thrashing (easily enough)? Can we scale it down 
> (i.e. to an instance with littler RAM)?
> * What is the impact of transparent huge pages (and switching it off)? How 
> much do we suffer from read amplification? What would be the impact of not 
> memory mapping but instead increasing the size of the segment buffer 
> accordingly? Both approaches aim at having finer grained control over the 
> data actually being loaded into RAM.
> * What other OS level tweaks should / can we look at? 
> * Can we reduce the working set by keeping it more compact? E.g. running 
> GC/compaction, reducing read amplification (see above), improving 
> de-duplication of values, storing values more efficiently (e.g. dates, and 
> boolean), can we on the fly compress buffers (e.g. segments)?
> * How do we testing with big repositories?
>   * What is a big repository? (Potential target: 100 GB segment store - 500M 
> nodes, TBC)
>   * What to measure (indicators of size): size on disk (after compaction), 
> number of JCR nodes, number of node records (reachable vs. waste)
>   * How to measure?
> * {{oak-run debug}} (needs improvements for better scalability)
> * one-line tool to provide all the info?
>   * How to obtain big repositories (generate or re-use existing)?
>   * What to analyze / monitor / debug?
> * Possible limits: number of nodes (relative to RAM) for which trashing 
> starts to occur, max. number of direct children, max. concurrent requests 
> during online garbage collection.
> * Platform monitoring: 
>   * basic: disc size, IO, CPU, memory
>   * Asses impact of hardware upgrades on performance. E.g. what impact 
> does doubling RAM/IO/CPU have on our test scenarios.
>   * in depth: page faults, writes / reads per process, working set of 
> nodes, commit statistics, incoming requests vs Oak operations, other hiccups
>   * Tools: [Ganglia|http://ganglia.info/], 
> [jHiccups|https://github.com/giltene/jHiccup], 
> [AppDynamics|https://www.appdynamics.com/]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-5469) TarMK: scaling the content

2017-01-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826263#comment-15826263
 ] 

Michael Dürig commented on OAK-5469:


[~dulceanu], [~alex.parvulescu], [~volteanu], [~frm] FYI.

> TarMK: scaling the content
> --
>
> Key: OAK-5469
> URL: https://issues.apache.org/jira/browse/OAK-5469
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: segment-tar
>Reporter: Michael Dürig
>  Labels: scalability
> Fix For: 1.8
>
>
> Production experience has shown that big repositories are prone to thrashing:
> {quote}
> Monitoring showed as massive level of major page faults, load averages 
> several times the number of cores, system cpu levels well above 50% and 
> extreme levels of IO. As more IOPS was provisioned the instance consumed all 
> available IOPS. The TechOps team reported many TB of read IO per hour and 
> hardly any write IO.
> Investigation revealed that the repository size was just larger than the 
> available RAM on the machine. The instance was running in MMAPED mode and the 
> IO was due to major page faults mapping in and out pages of memory. This was 
> made worse by transparent huge page settings causing huge pages to be mapped 
> proactively on major page faults. Compaction reduced the repository size to 
> less than RAM. The TechOps team now monitor the total tar file size and dont 
> let it exceed the RAM on the machine, scheduling compactions to keep within 
> limits. Since the default to TarMK was to run memory mapped rather than on 
> heap, the JVM had no visibility of the mayhem being caused at OS level.
> {quote}
> This epic is all about improving scalability of the TarMK wrt. the content. 
> Below are some initial points to consider. Let's create issues and link them 
> to this epic as we go.
> * What kind of internal / external monitoring do we need to understand and 
> optimally predict thrashing? Can we monitor the working set (active pages)? 
> The number of segments in the segment cache might be a good starting point.
> * (How) can we reproduce the thrashing (easily enough)? Can we scale it down 
> (i.e. to an instance with littler RAM)?
> * What is the impact of transparent huge pages (and switching it off)? How 
> much do we suffer from read amplification? What would be the impact of not 
> memory mapping but instead increasing the size of the segment buffer 
> accordingly? Both approaches aim at having finer grained control over the 
> data actually being loaded into RAM.
> * What other OS level tweaks should / can we look at? 
> * Can we reduce the working set by keeping it more compact? E.g. running 
> GC/compaction, reducing read amplification (see above), improving 
> de-duplication of values, storing values more efficiently (e.g. dates, and 
> boolean), can we on the fly compress buffers (e.g. segments)?
> * How do we testing with big repositories?
>   * What is a big repository? (Potential target: 100 GB segment store - 500M 
> nodes, TBC)
>   * What to measure (indicators of size): size on disk (after compaction), 
> number of JCR nodes, number of node records (reachable vs. waste)
>   * How to measure?
> * {{oak-run debug}} (needs improvements for better scalability)
> * one-line tool to provide all the info?
>   * How to obtain big repositories (generate or re-use existing)?
>   * What to analyze / monitor / debug?
> * Possible limits: number of nodes (relative to RAM) for which trashing 
> starts to occur, max. number of direct children, max. concurrent requests 
> during online garbage collection.
> * Platform monitoring: 
>   * basic: disc size, IO, CPU, memory
>   * Asses impact of hardware upgrades on performance. E.g. what impact 
> does doubling RAM/IO/CPU have on our test scenarios.
>   * in depth: page faults, writes / reads per process, working set of 
> nodes, commit statistics, incoming requests vs Oak operations, other hiccups
>   * Tools: [Ganglia|http://ganglia.info/], 
> [jHiccups|https://github.com/giltene/jHiccup], 
> [AppDynamics|https://www.appdynamics.com/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5469) TarMK: scaling the content

2017-01-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826262#comment-15826262
 ] 

Michael Dürig commented on OAK-5469:


OAK-4649, OAK-5192, OAK-5042, OAK-1905 should go into this epic. Currently Jira 
prevents me from epic links so I record them here for now.  See INFRA-13347.

> TarMK: scaling the content
> --
>
> Key: OAK-5469
> URL: https://issues.apache.org/jira/browse/OAK-5469
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: segment-tar
>Reporter: Michael Dürig
>  Labels: scalability
> Fix For: 1.8
>
>
> Production experience has shown that big repositories are prone to thrashing:
> {quote}
> Monitoring showed as massive level of major page faults, load averages 
> several times the number of cores, system cpu levels well above 50% and 
> extreme levels of IO. As more IOPS was provisioned the instance consumed all 
> available IOPS. The TechOps team reported many TB of read IO per hour and 
> hardly any write IO.
> Investigation revealed that the repository size was just larger than the 
> available RAM on the machine. The instance was running in MMAPED mode and the 
> IO was due to major page faults mapping in and out pages of memory. This was 
> made worse by transparent huge page settings causing huge pages to be mapped 
> proactively on major page faults. Compaction reduced the repository size to 
> less than RAM. The TechOps team now monitor the total tar file size and dont 
> let it exceed the RAM on the machine, scheduling compactions to keep within 
> limits. Since the default to TarMK was to run memory mapped rather than on 
> heap, the JVM had no visibility of the mayhem being caused at OS level.
> {quote}
> This epic is all about improving scalability of the TarMK wrt. the content. 
> Below are some initial points to consider. Let's create issues and link them 
> to this epic as we go.
> * What kind of internal / external monitoring do we need to understand and 
> optimally predict thrashing? Can we monitor the working set (active pages)? 
> The number of segments in the segment cache might be a good starting point.
> * (How) can we reproduce the thrashing (easily enough)? Can we scale it down 
> (i.e. to an instance with littler RAM)?
> * What is the impact of transparent huge pages (and switching it off)? How 
> much do we suffer from read amplification? What would be the impact of not 
> memory mapping but instead increasing the size of the segment buffer 
> accordingly? Both approaches aim at having finer grained control over the 
> data actually being loaded into RAM.
> * What other OS level tweaks should / can we look at? 
> * Can we reduce the working set by keeping it more compact? E.g. running 
> GC/compaction, reducing read amplification (see above), improving 
> de-duplication of values, storing values more efficiently (e.g. dates, and 
> boolean), can we on the fly compress buffers (e.g. segments)?
> * How do we testing with big repositories?
>   * What is a big repository? (Potential target: 100 GB segment store - 500M 
> nodes, TBC)
>   * What to measure (indicators of size): size on disk (after compaction), 
> number of JCR nodes, number of node records (reachable vs. waste)
>   * How to measure?
> * {{oak-run debug}} (needs improvements for better scalability)
> * one-line tool to provide all the info?
>   * How to obtain big repositories (generate or re-use existing)?
>   * What to analyze / monitor / debug?
> * Possible limits: number of nodes (relative to RAM) for which trashing 
> starts to occur, max. number of direct children, max. concurrent requests 
> during online garbage collection.
> * Platform monitoring: 
>   * basic: disc size, IO, CPU, memory
>   * Asses impact of hardware upgrades on performance. E.g. what impact 
> does doubling RAM/IO/CPU have on our test scenarios.
>   * in depth: page faults, writes / reads per process, working set of 
> nodes, commit statistics, incoming requests vs Oak operations, other hiccups
>   * Tools: [Ganglia|http://ganglia.info/], 
> [jHiccups|https://github.com/giltene/jHiccup], 
> [AppDynamics|https://www.appdynamics.com/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)