[ 
https://issues.apache.org/jira/browse/OAK-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Mari updated OAK-6915:
--------------------------------
    Attachment: OAK-6915-diagnostics-02.patch

I'm not convinced about the third version of the patch and the usefulness of 
{{SegmentId#unloaded}}.

In order to gather some data, I improved the diagnostics patch. The patch now 
computes how many segments are written, so to have a feeling about the amount 
of unique segments the {{FileStore}} has to work with. Additionally, every 
statistics has been split for data and bulk segments. Since {{SegmentCache}} 
behaves differently for data and bulk segments, it makes sense to gather 
separate numbers.

I run {{StandbyTestIT#testSyncLoop}} with the diagnostic patch in place and 
some variations.

|trunk|{noformat}TarMK data segment ID allocations: 54
TarMK bulk segment ID allocations: 4
TarMK uncached data segment reads: 99854
TarMK uncached bulk segment reads: 2
TarMK written data segments......: 30
TarMK written bulk segments......: 2
{noformat}|
|OAK-6915.patch|{noformat}TarMK data segment ID allocations: 56
TarMK bulk segment ID allocations: 4
TarMK uncached data segment reads: 101649
TarMK uncached bulk segment reads: 2
TarMK written data segments......: 32
TarMK written bulk segments......: 2{noformat}|
|OAK-6915-02.patch|{noformat}TarMK data segment ID allocations: 38
TarMK bulk segment ID allocations: 2
TarMK uncached data segment reads: 33
TarMK uncached bulk segment reads: 2
TarMK written data segments......: 29
TarMK written bulk segments......: 2{noformat}|

Even if we ignore the number of uncached data segment reads, the ratio between 
data segment ID allocations and number of written data segments shows something 
important: segment IDs are not as perfectly internalized as we thought. This 
has two important consequences.

First, no matter if we call {{SegmentId#unloaded}}, there will probably be 
another {{SegmentId}} out there with the same MSB/LSB that contains a reference 
to the segment. Actually, as speculated in OAK-6919, {{SegmentCache}} itself 
might be the culprit of this behaviour.

Second, {{SegmentCache}} should not depend on {{SegmentId}} to be perfectly 
internalized. It is very easy to break this design assumption, and the 
consequences of it are usually disastrous. It is, in my opinion, better to 
implement a cache that doesn't work with this assumption in mind.

> Minimize the amount of uncached segment reads
> ---------------------------------------------
>
>                 Key: OAK-6915
>                 URL: https://issues.apache.org/jira/browse/OAK-6915
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>             Fix For: 1.8, 1.7.12
>
>         Attachments: OAK-6915-01.patch, OAK-6915-02.patch, 
> OAK-6915-diagnostics-02.patch, OAK-6915-diagnostics.patch, OAK-6915.patch
>
>
> The current implementation of {{SegmentCache}} should make better use of the 
> underlying Guava cache by relying on the cached segments instead of 
> unconditionally performing an uncached segment read via the 
> {{Callable<Segment>}} passed to {{SegmentCache#getSegment}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to