[ https://issues.apache.org/jira/browse/OAK-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francesco Mari updated OAK-6915: -------------------------------- Attachment: OAK-6915-diagnostics-02.patch I'm not convinced about the third version of the patch and the usefulness of {{SegmentId#unloaded}}. In order to gather some data, I improved the diagnostics patch. The patch now computes how many segments are written, so to have a feeling about the amount of unique segments the {{FileStore}} has to work with. Additionally, every statistics has been split for data and bulk segments. Since {{SegmentCache}} behaves differently for data and bulk segments, it makes sense to gather separate numbers. I run {{StandbyTestIT#testSyncLoop}} with the diagnostic patch in place and some variations. |trunk|{noformat}TarMK data segment ID allocations: 54 TarMK bulk segment ID allocations: 4 TarMK uncached data segment reads: 99854 TarMK uncached bulk segment reads: 2 TarMK written data segments......: 30 TarMK written bulk segments......: 2 {noformat}| |OAK-6915.patch|{noformat}TarMK data segment ID allocations: 56 TarMK bulk segment ID allocations: 4 TarMK uncached data segment reads: 101649 TarMK uncached bulk segment reads: 2 TarMK written data segments......: 32 TarMK written bulk segments......: 2{noformat}| |OAK-6915-02.patch|{noformat}TarMK data segment ID allocations: 38 TarMK bulk segment ID allocations: 2 TarMK uncached data segment reads: 33 TarMK uncached bulk segment reads: 2 TarMK written data segments......: 29 TarMK written bulk segments......: 2{noformat}| Even if we ignore the number of uncached data segment reads, the ratio between data segment ID allocations and number of written data segments shows something important: segment IDs are not as perfectly internalized as we thought. This has two important consequences. First, no matter if we call {{SegmentId#unloaded}}, there will probably be another {{SegmentId}} out there with the same MSB/LSB that contains a reference to the segment. Actually, as speculated in OAK-6919, {{SegmentCache}} itself might be the culprit of this behaviour. Second, {{SegmentCache}} should not depend on {{SegmentId}} to be perfectly internalized. It is very easy to break this design assumption, and the consequences of it are usually disastrous. It is, in my opinion, better to implement a cache that doesn't work with this assumption in mind. > Minimize the amount of uncached segment reads > --------------------------------------------- > > Key: OAK-6915 > URL: https://issues.apache.org/jira/browse/OAK-6915 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar > Reporter: Francesco Mari > Assignee: Francesco Mari > Fix For: 1.8, 1.7.12 > > Attachments: OAK-6915-01.patch, OAK-6915-02.patch, > OAK-6915-diagnostics-02.patch, OAK-6915-diagnostics.patch, OAK-6915.patch > > > The current implementation of {{SegmentCache}} should make better use of the > underlying Guava cache by relying on the cached segments instead of > unconditionally performing an uncached segment read via the > {{Callable<Segment>}} passed to {{SegmentCache#getSegment}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)