[ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138934#comment-16138934
 ] 

Tommaso Teofili commented on OAK-5192:
--------------------------------------

This is supposed to be the final test run for this issue, I think the mitigated 
MP currently sits between the TieredMergePolicy (which is sometimes too 
aggressive) and NoMergePolicy (which produces slow queries in the long run).
For the seek of completeness I've tested also the other existing merge policies 
in Lucene (LogByteMergePolicy and LogDocsMergePolicy), and all of them with all 
oak, Lucene46 and customCodec (the one which is supposed to bring the best 
level of compression).
The changes at https://github.com/tteofili/jackrabbit-oak/tree/oak-5192 also 
include the exposed configuration for merge policy (OAK-6514).
With the current recommended _minRecordLength_ of 4k the gain from mitigated MP 
on oakCodec and Lucene46 (as observed on these tests) is around 10%.
Possible further improvements :
* for the commit rate the mp currently looks back at the previous merge attempt 
only, while it could be interesting to see if making it more aware of the trend 
who lead to better decisions on merging or not
* current commit rate parameters are fixed and should instead adapt

If there's no objection I'll commit from my git branch to the Oak SVN tomorrow.

||codec||min record length||merge policy||segment size||FDS size||
|oakCodec|4000|default|167.4 MB|1364 MB|
|oakCodec|4000|mitigated|167.4 MB|1255 MB|
|oakCodec|4000|no|167.3 MB|1223 MB|
|oakCodec|4000|logbyte|167.4 MB|1503 MB|
|oakCodec|4000|logdoc|167.4 MB|1495 MB|
|Lucene46|4000|default|167.3 MB|1253 MB|
|Lucene46|4000|mitigated|167.2 MB|1151 MB|
|Lucene46|4000|no|167.2 MB|1123 MB|
|Lucene46|4000|logbyte|167.3 MB|1379 MB|
|Lucene46|4000|logdoc|167.3 MB|1372 MB|
|customCodec|4000|default|167.3 MB|1189 MB|
|customCodec|4000|mitigated|167.3 MB|1112 MB|
|customCodec|4000|no|167.2 MB|1066 MB|
|customCodec|4000|logbyte|167.3 MB|1309 MB|
|customCodec|4000|logdoc|167.3 MB|1303 MB|


> Reduce Lucene related growth of repository size
> -----------------------------------------------
>
>                 Key: OAK-5192
>                 URL: https://issues.apache.org/jira/browse/OAK-5192
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, segment-tar
>            Reporter: Michael Dürig
>            Assignee: Tommaso Teofili
>              Labels: perfomance, scalability
>             Fix For: 1.8, 1.7.8
>
>         Attachments: added-bytes-zoom.png, binSize100.txt, binSize16384.txt, 
> binSizeTotal.txt, diff.txt.zip, nonBinSizeTotal.txt, OAK-5192.0.patch, Screen 
> Shot 2017-07-03 at 16.50.00.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to