[jira] [Commented] (OAK-6405) Cleanup the o.a.j.o.segment.file.tar package
[ https://issues.apache.org/jira/browse/OAK-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068536#comment-16068536 ] Francesco Mari commented on OAK-6405: - In r1800300 I removed some unused methods, parameters and instance variables from the code in {{TarReader}}. > Cleanup the o.a.j.o.segment.file.tar package > > > Key: OAK-6405 > URL: https://issues.apache.org/jira/browse/OAK-6405 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Francesco Mari >Assignee: Francesco Mari > Fix For: 1.8 > > > This issue tracks the cleanup and rearrangement of the internals of the > {{o.a.j.o.segment.file.tar}} package. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6410) NPE when removing inexistent property from checked in node
Ioan-Cristian Linte created OAK-6410: Summary: NPE when removing inexistent property from checked in node Key: OAK-6410 URL: https://issues.apache.org/jira/browse/OAK-6410 Project: Jackrabbit Oak Issue Type: Bug Components: jcr Affects Versions: 1.6.1 Reporter: Ioan-Cristian Linte Priority: Minor While running AEM 6.3 which uses oak 1.6.1 the following exception was seen in the logs: Stacktrace: java.lang.NullPointerException: null at org.apache.jackrabbit.oak.jcr.session.NodeImpl$37.checkPreconditions(NodeImpl.java:1449) at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.prePerform(SessionDelegate.java:615) at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:205) at org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:112) at org.apache.jackrabbit.oak.jcr.session.NodeImpl.internalRemoveProperty(NodeImpl.java:1444) at org.apache.jackrabbit.oak.jcr.session.NodeImpl.setProperty(NodeImpl.java:354) ... AEM Code I have reproduced the NPE with the following code: {code:java} Node parent = session.getRootNode().addNode("parent", "nt:unstructured"); Node child = parent.addNode("child", "nt:unstructured"); child.addMixin("mix:versionable"); session.save(); session.getWorkspace().getVersionManager().checkin(child.getPath()); Node node = (Node) session.getItem("/parent/child"); node.setProperty("inexistent", (Value) null); {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6180) Tune cursor batch/limit size
[ https://issues.apache.org/jira/browse/OAK-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068516#comment-16068516 ] Chetan Mehrotra commented on OAK-6180: -- +1. Used same approach to test document traversal for indexing. Though did not find much difference for that usecase. But I have seen cases where a fetching first few nodes of a large child node parent leads to reading quite a lot more rows from Mongo. So would be useful > Tune cursor batch/limit size > > > Key: OAK-6180 > URL: https://issues.apache.org/jira/browse/OAK-6180 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Fix For: 1.8 > > Attachments: OAK-6180.patch > > > MongoDocumentStore uses the default batch size, which means MongoDB will > initially get 100 documents and then as many documents as fit into 4MB. > Depending on the document size, the number of documents may be quite high and > the risk of running into the 60 seconds query timeout defined by Oak > increases. > Tuning the batch size (or using a limit) may also be helpful in optimizing > the amount of data transferred from MongoDB to Oak. The DocumentNodeStore > fetches child nodes in batches as well. The logic there is slightly > different. The initial batch size is 100 and every subsequent batch doubles > in size until it reaches 1600. Bandwidth is wasted if the MongoDB Java driver > fetches way more than requested by Oak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OAK-6021) Remove segment graph functionality from oak-run
[ https://issues.apache.org/jira/browse/OAK-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Mari resolved OAK-6021. - Resolution: Fixed Fix Version/s: 1.7.3 Fixed at r1800297. > Remove segment graph functionality from oak-run > --- > > Key: OAK-6021 > URL: https://issues.apache.org/jira/browse/OAK-6021 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run, segment-tar >Reporter: Michael Dürig >Assignee: Francesco Mari > Labels: technical_debt, tooling > Fix For: 1.8, 1.7.3 > > > We could probably remove the segment graph functionality from oak-run. This > has been implemented mainly (and solely?) for the purpose of analysing the > problems around OAK-3348 and I assume it would quickly start falling behind > as we move forward. Also for this kind of analysis I have switched to > [oak-script|https://github.com/mduerig/script-oak], which is far more > flexible. > Let's decide closer to cutting 1.8 how to go forward here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6081) Indexing tooling via oak-run
[ https://issues.apache.org/jira/browse/OAK-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068499#comment-16068499 ] Chetan Mehrotra commented on OAK-6081: -- Currently it uses normal IndexEditors. For Lucene the editor uses a FSDirectoryFactory so index files get stored on file system. To support other indexes we need to think of a way to store the index data outside of NodeStore. We can use a SegmentStore or a flat file to store the index data and then need to provide a corresponding importer to read that index data and apply it to actual NodeStore > Indexing tooling via oak-run > > > Key: OAK-6081 > URL: https://issues.apache.org/jira/browse/OAK-6081 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: indexing, run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8 > > > To enable better management for indexing related operation specially around > reindexing indexes on large repository setup we should implement some tooling > as part of oak-run > The tool would support > # *Resumable tarversal* - It should be able to reindex large repo with > resumable traversal such that even if indexing breaks due to some issue it > can resume from last state (OAK-5833) > # *Multithreaded traversal* - Current indexing is single threaded and hence > for large repo it can take long time. Plan here is to support multi threaded > indexing where each thread can be assigned a part of repository tree to index > and in the end the indexes are merged > # For DocumentNodeStore setup it would be possible to connect oak-run to a > live cluster and it would take care of indexing -> storing index on disk -> > merging index -> importing it back at end. This would ensure that live setup > faces minimum disruption and is not loaded much > # For SegementNodeStore setup it would be possible to index on a cloned setup > and then provide a way to copy the index back -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-6081) Indexing tooling via oak-run
[ https://issues.apache.org/jira/browse/OAK-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068499#comment-16068499 ] Chetan Mehrotra edited comment on OAK-6081 at 6/29/17 3:46 PM: --- [~tmueller] Currently it uses normal IndexEditors. For Lucene the editor uses a FSDirectoryFactory so index files get stored on file system. To support other indexes we need to think of a way to store the index data outside of NodeStore. We can use a SegmentStore or a flat file to store the index data and then need to provide a corresponding importer to read that index data and apply it to actual NodeStore was (Author: chetanm): Currently it uses normal IndexEditors. For Lucene the editor uses a FSDirectoryFactory so index files get stored on file system. To support other indexes we need to think of a way to store the index data outside of NodeStore. We can use a SegmentStore or a flat file to store the index data and then need to provide a corresponding importer to read that index data and apply it to actual NodeStore > Indexing tooling via oak-run > > > Key: OAK-6081 > URL: https://issues.apache.org/jira/browse/OAK-6081 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: indexing, run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8 > > > To enable better management for indexing related operation specially around > reindexing indexes on large repository setup we should implement some tooling > as part of oak-run > The tool would support > # *Resumable tarversal* - It should be able to reindex large repo with > resumable traversal such that even if indexing breaks due to some issue it > can resume from last state (OAK-5833) > # *Multithreaded traversal* - Current indexing is single threaded and hence > for large repo it can take long time. Plan here is to support multi threaded > indexing where each thread can be assigned a part of repository tree to index > and in the end the indexes are merged > # For DocumentNodeStore setup it would be possible to connect oak-run to a > live cluster and it would take care of indexing -> storing index on disk -> > merging index -> importing it back at end. This would ensure that live setup > faces minimum disruption and is not loaded much > # For SegementNodeStore setup it would be possible to index on a cloned setup > and then provide a way to copy the index back -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6405) Cleanup the o.a.j.o.segment.file.tar package
[ https://issues.apache.org/jira/browse/OAK-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068473#comment-16068473 ] Francesco Mari commented on OAK-6405: - In r1800291 I replaced {{ReferenceCollector}} with {{Consumer}} in {{TarReader}}. This avoids a dependency from {{org.apache.jackrabbit.oak.segment.file.tar}} to {{org.apache.jackrabbit.oak.plugins.blob}}. > Cleanup the o.a.j.o.segment.file.tar package > > > Key: OAK-6405 > URL: https://issues.apache.org/jira/browse/OAK-6405 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Francesco Mari >Assignee: Francesco Mari > Fix For: 1.8 > > > This issue tracks the cleanup and rearrangement of the internals of the > {{o.a.j.o.segment.file.tar}} package. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-3349) Partial compaction
[ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068445#comment-16068445 ] Michael Dürig commented on OAK-3349: bq. Would it make sense to have two different {{r}} for the two different generation numbers? So far I don't think so as the intended semantics is that revisions no older than {{r}} generations are retained at any point in time. This is currently the case no matter how we interleave tail and full compaction runs. > Partial compaction > -- > > Key: OAK-3349 > URL: https://issues.apache.org/jira/browse/OAK-3349 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar >Reporter: Michael Dürig >Assignee: Michael Dürig > Labels: compaction, gc, scalability > Fix For: 1.8, 1.7.4 > > Attachments: compaction-time.png, cycle-count.png, post-gc-size.png > > > On big repositories compaction can take quite a while to run as it needs to > create a full deep copy of the current root node state. For such cases it > could be beneficial if we could partially compact the repository thus > splitting full compaction over multiple cycles. > Partial compaction would run compaction on a sub-tree just like we now run it > on the full tree. Afterwards it would create a new root node state by > referencing the previous root node state replacing said sub-tree with the > compacted one. > Todo: Asses feasibility and impact, implement prototype. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-6405) Cleanup the o.a.j.o.segment.file.tar package
[ https://issues.apache.org/jira/browse/OAK-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068442#comment-16068442 ] Francesco Mari edited comment on OAK-6405 at 6/29/17 3:01 PM: -- In r1800290 I improved the documentation of {{TarReader}}, either adding new JavaDoc comments or expanding the existing ones. was (Author: frm): In r1800290 I improved the documentation of {{TarReader}}, either adding new JavaDoc comments or or expanding the existing ones. > Cleanup the o.a.j.o.segment.file.tar package > > > Key: OAK-6405 > URL: https://issues.apache.org/jira/browse/OAK-6405 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Francesco Mari >Assignee: Francesco Mari > Fix For: 1.8 > > > This issue tracks the cleanup and rearrangement of the internals of the > {{o.a.j.o.segment.file.tar}} package. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6405) Cleanup the o.a.j.o.segment.file.tar package
[ https://issues.apache.org/jira/browse/OAK-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068442#comment-16068442 ] Francesco Mari commented on OAK-6405: - In r1800290 I improved the documentation of {{TarReader}}, either adding new JavaDoc comments or or expanding the existing ones. > Cleanup the o.a.j.o.segment.file.tar package > > > Key: OAK-6405 > URL: https://issues.apache.org/jira/browse/OAK-6405 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Francesco Mari >Assignee: Francesco Mari > Fix For: 1.8 > > > This issue tracks the cleanup and rearrangement of the internals of the > {{o.a.j.o.segment.file.tar}} package. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-3349) Partial compaction
[ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068438#comment-16068438 ] Francesco Mari commented on OAK-3349: - Both the solutions sound reasonable to me. Another question: the algorithm uses the same {{r}} to compare both full and young generation numbers. Would it make sense to have two different {{r}} for the two different generation numbers? > Partial compaction > -- > > Key: OAK-3349 > URL: https://issues.apache.org/jira/browse/OAK-3349 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar >Reporter: Michael Dürig >Assignee: Michael Dürig > Labels: compaction, gc, scalability > Fix For: 1.8, 1.7.4 > > Attachments: compaction-time.png, cycle-count.png, post-gc-size.png > > > On big repositories compaction can take quite a while to run as it needs to > create a full deep copy of the current root node state. For such cases it > could be beneficial if we could partially compact the repository thus > splitting full compaction over multiple cycles. > Partial compaction would run compaction on a sub-tree just like we now run it > on the full tree. Afterwards it would create a new root node state by > referencing the previous root node state replacing said sub-tree with the > compacted one. > Todo: Asses feasibility and impact, implement prototype. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-3349) Partial compaction
[ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068434#comment-16068434 ] Michael Dürig commented on OAK-3349: bq. how are we going to encode T.c in the segments? I would not use an extra slot in the segment header just for this flag but rather encode it within the generation number. The [POC|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC] uses a even odd scheme to encode this within the young generation number (i.e. an odd number indicated a segment written by tail compaction). We could either stick with this or preferably use a positive/negative scheme using a negative sign to indicate segments written by tail compaction. > Partial compaction > -- > > Key: OAK-3349 > URL: https://issues.apache.org/jira/browse/OAK-3349 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar >Reporter: Michael Dürig >Assignee: Michael Dürig > Labels: compaction, gc, scalability > Fix For: 1.8, 1.7.4 > > Attachments: compaction-time.png, cycle-count.png, post-gc-size.png > > > On big repositories compaction can take quite a while to run as it needs to > create a full deep copy of the current root node state. For such cases it > could be beneficial if we could partially compact the repository thus > splitting full compaction over multiple cycles. > Partial compaction would run compaction on a sub-tree just like we now run it > on the full tree. Afterwards it would create a new root node state by > referencing the previous root node state replacing said sub-tree with the > compacted one. > Todo: Asses feasibility and impact, implement prototype. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6180) Tune cursor batch/limit size
[ https://issues.apache.org/jira/browse/OAK-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger updated OAK-6180: -- Attachment: OAK-6180.patch Attached proposed changes [^OAK-6180.patch]. > Tune cursor batch/limit size > > > Key: OAK-6180 > URL: https://issues.apache.org/jira/browse/OAK-6180 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Fix For: 1.8 > > Attachments: OAK-6180.patch > > > MongoDocumentStore uses the default batch size, which means MongoDB will > initially get 100 documents and then as many documents as fit into 4MB. > Depending on the document size, the number of documents may be quite high and > the risk of running into the 60 seconds query timeout defined by Oak > increases. > Tuning the batch size (or using a limit) may also be helpful in optimizing > the amount of data transferred from MongoDB to Oak. The DocumentNodeStore > fetches child nodes in batches as well. The logic there is slightly > different. The initial batch size is 100 and every subsequent batch doubles > in size until it reaches 1600. Bandwidth is wasted if the MongoDB Java driver > fetches way more than requested by Oak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-3349) Partial compaction
[ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068424#comment-16068424 ] Francesco Mari commented on OAK-3349: - [~mduerig], thanks for the example, it helped me a lot to clarify how cleanup interacts with partial compaction. While we discussed offline about the young and full generation, how are we going to encode {{T.c}} in the segments? Are we going to use some of the unused space in the header? > Partial compaction > -- > > Key: OAK-3349 > URL: https://issues.apache.org/jira/browse/OAK-3349 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar >Reporter: Michael Dürig >Assignee: Michael Dürig > Labels: compaction, gc, scalability > Fix For: 1.8, 1.7.4 > > Attachments: compaction-time.png, cycle-count.png, post-gc-size.png > > > On big repositories compaction can take quite a while to run as it needs to > create a full deep copy of the current root node state. For such cases it > could be beneficial if we could partially compact the repository thus > splitting full compaction over multiple cycles. > Partial compaction would run compaction on a sub-tree just like we now run it > on the full tree. Afterwards it would create a new root node state by > referencing the previous root node state replacing said sub-tree with the > compacted one. > Todo: Asses feasibility and impact, implement prototype. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6081) Indexing tooling via oak-run
[ https://issues.apache.org/jira/browse/OAK-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068421#comment-16068421 ] Thomas Mueller commented on OAK-6081: - [~chetanm] can we use this to index non-lucene indexes (property indexes, the counter index, the nodetype index,...) as well? > Indexing tooling via oak-run > > > Key: OAK-6081 > URL: https://issues.apache.org/jira/browse/OAK-6081 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: indexing, run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8 > > > To enable better management for indexing related operation specially around > reindexing indexes on large repository setup we should implement some tooling > as part of oak-run > The tool would support > # *Resumable tarversal* - It should be able to reindex large repo with > resumable traversal such that even if indexing breaks due to some issue it > can resume from last state (OAK-5833) > # *Multithreaded traversal* - Current indexing is single threaded and hence > for large repo it can take long time. Plan here is to support multi threaded > indexing where each thread can be assigned a part of repository tree to index > and in the end the indexes are merged > # For DocumentNodeStore setup it would be possible to connect oak-run to a > live cluster and it would take care of indexing -> storing index on disk -> > merging index -> importing it back at end. This would ensure that live setup > faces minimum disruption and is not loaded much > # For SegementNodeStore setup it would be possible to index on a cloned setup > and then provide a way to copy the index back -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6409) Oak-run indexing: improved (user friendly) output
[ https://issues.apache.org/jira/browse/OAK-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-6409: Component/s: indexing > Oak-run indexing: improved (user friendly) output > - > > Key: OAK-6409 > URL: https://issues.apache.org/jira/browse/OAK-6409 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing >Reporter: Thomas Mueller > Fix For: 1.8 > > > The oak-run indexing (OAK-6081) output should be human readable, and if > possible minimal. Detailed output should be written to a log file, but stdout > should be easy for a user to understand. For example some header info when > starting, where to find the detailed output, then one line every 3 seconds > about the progress (in %, number of nodes read, ETA), and when done some info > on what to do next. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6409) Oak-run indexing: improved (user friendly) output
Thomas Mueller created OAK-6409: --- Summary: Oak-run indexing: improved (user friendly) output Key: OAK-6409 URL: https://issues.apache.org/jira/browse/OAK-6409 Project: Jackrabbit Oak Issue Type: Improvement Reporter: Thomas Mueller The oak-run indexing (OAK-6081) output should be human readable, and if possible minimal. Detailed output should be written to a log file, but stdout should be easy for a user to understand. For example some header info when starting, where to find the detailed output, then one line every 3 seconds about the progress (in %, number of nodes read, ETA), and when done some info on what to do next. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6409) Oak-run indexing: improved (user friendly) output
[ https://issues.apache.org/jira/browse/OAK-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-6409: Fix Version/s: 1.8 > Oak-run indexing: improved (user friendly) output > - > > Key: OAK-6409 > URL: https://issues.apache.org/jira/browse/OAK-6409 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Thomas Mueller > Fix For: 1.8 > > > The oak-run indexing (OAK-6081) output should be human readable, and if > possible minimal. Detailed output should be written to a log file, but stdout > should be easy for a user to understand. For example some header info when > starting, where to find the detailed output, then one line every 3 seconds > about the progress (in %, number of nodes read, ETA), and when done some info > on what to do next. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-3349) Partial compaction
[ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068414#comment-16068414 ] Michael Dürig commented on OAK-3349: Following a short description of tail / full compaction and the proposed data structure in pseudo code: For each segment {{S}} we encode in its header: a young generation number {{y}}, a full generation number {{f}} and a flag indicating segments written by tail compaction {{c}}. Full compaction: * determine the segment {{S}} of the current head state {{h}} * clone {{h}} and increase the full generation number for all written Segments {{T}} to {{T.f = S.f + 1}} Tail compaction: * determine the segment {{S}} of the current head state {{h}} * determine the head state created by the last compaction (full or tail) {{h'}} * rebase {{h}} on top of {{h'}} and increase the young generation number for all written Segments {{T}} to {{T.y = S.y + 1}} Cleanup: * let {{r}} be the number of retained generations * determine the segment {{S}} of the current head state * For each segment {{T}} ** if {{T.f <= S.f - r}} reclaim {{T}}. ** if {{T.y <= S.y - r && T.c == false}} reclaim {{T}}. Below I tried to visualise the evolution of a repository through a couple of tail / full compaction cycles. Segment retention is 2 generations. A sequence of segments is symbolised by a triple in square brackets. The first component indicates the operations that created the segments (w: a regular write operation, tc: a tail compaction, fc: a full compaction). The second component is the young generation pertaining to tail compaction and the third component is the full generation pertaining to full compaction. {noformat} Write content: [w,0,0] Tail compact: [w,0,0] [tc,1,0] Clean up (no effect since segment retention is 2 young generations): [w,0,0] [tc,1,0] Write content: [w,0,0] [tc,1,0] [w,1,0] Tail compact: [w,0,0] [tc,1,0] [w,1,0] [tc,2,0] Clean up (remove w segments that are at least 2 young generations old): [tc,1,0] [w,1,0] [tc,2,0] Write content: [tc,1,0] [w,1,0] [tc,2,0] [w,2,0] Tail compact: [tc,1,0] [w,1,0] [tc,2,0] [w,2,0] [tc,3,0] Clean up (remove w segments that are at least 2 young generations old): [tc,1,0] [tc,2,0] [w,2,0] [tc,3,0] ... After a couple of more write / tail compaction cycles: [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] Full compact: [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1] Cleanup looking (no effect since segment retention is 2 full generations) [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1] Write content: [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1] [w,4,1] Tail compact: [tc,1,0] [tc,2,0] [tc,3,0] [w,3,0] [tc,4,0] [w,4,0] [fc,4,1] [w,4,1] [tc,5,1] Clean up (remove w segments that are at least 2 young generations old): [tc,1,0] [tc,2,0] [tc,3,0] [tc,4,0] [w,4,0] [fc,4,1] [w,4,1] [tc,5,1] ... After a couple of more write / tail compaction cycles: [tc,1,0] [tc,2,0] [tc,3,0] [tc,4,0] [fc,4,1] [tc,5,1] [tc,6,1] [w,6,1] [tc,7,1] [w,7,1] Full compact: [tc,1,0] [tc,2,0] [tc,3,0] [tc,4,0] [fc,4,1] [tc,5,1] [tc,6,1] [w,6,1] [tc,7,1] [w,7,1] [fc,7,2] Cleanup up (remove all segments that are at least 2 full generations old) [fc,4,1] [tc,5,1] [tc,6,1] [w,6,1] [tc,7,1] [w,7,1] [fc,7,2] {noformat} > Partial compaction > -- > > Key: OAK-3349 > URL: https://issues.apache.org/jira/browse/OAK-3349 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar >Reporter: Michael Dürig >Assignee: Michael Dürig > Labels: compaction, gc, scalability > Fix For: 1.8, 1.7.4 > > Attachments: compaction-time.png, cycle-count.png, post-gc-size.png > > > On big repositories compaction can take quite a while to run as it needs to > create a full deep copy of the current root node state. For such cases it > could be beneficial if we could partially compact the repository thus > splitting full compaction over multiple cycles. > Partial compaction would run compaction on a sub-tree just like we now run it > on the full tree. Afterwards it would create a new root node state by > referencing the previous root node state replacing said sub-tree with the > compacted one. > Todo: Asses feasibility and impact, implement prototype. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6381) Improved index analysis tools
[ https://issues.apache.org/jira/browse/OAK-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068411#comment-16068411 ] Thomas Mueller commented on OAK-6381: - We should check if the Luke tool can be used: * https://jackrabbit.apache.org/oak/docs/query/lucene.html#Analyzing_created_Lucene_Index * https://code.google.com/archive/p/luke/ > Improved index analysis tools > - > > Key: OAK-6381 > URL: https://issues.apache.org/jira/browse/OAK-6381 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Thomas Mueller >Assignee: Thomas Mueller > Fix For: 1.8 > > > It would be good to have more tools to analyze indexes: > * For Lucene indexes, get a histogram of samples (terms). We have > "getFieldInfo", which shows which fields are how common, but we don't have > terms. For example the /oak:index/lucene index contains 1 million fulltext > fields and node names for 1 million nodes, but I wonder why, and what typical > nodes names are, and maybe fulltext for most nodes is actually empty. Maybe a > new method "getTermHistogram(int sampleCount)" or similar > * For property indexes, number of updated nodes per second or so. Right now > we can just analyze the counts per key, but some indexes / keys are very > volatile (see many short lived entries) > * For Lucene indexes, writes per second or so (in MB). > * How indexes are used (approximate read nodes / MB per hours) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-6407) Refactor oak.spi.query into a separate module/bundle
[ https://issues.apache.org/jira/browse/OAK-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068354#comment-16068354 ] angela edited comment on OAK-6407 at 6/29/17 1:45 PM: -- [~tmueller], [~stillalex], [~mduerig], may i ask you to review the proposed m12n effort for the spi.query package space? imho i will help us avoiding reintroducing the cyclic dependency chaos (that we likely ended up with inadvertently) and allow to address proper package exports as we move on. was (Author: anchela): [~tmueller], [~stillalex], [~mduerig], may i ask you to review the proposed m12n effort? imho i will help us avoiding reintroducing the cyclic dependency chaos (that we likely ended up with inadvertently) and allow to address proper package exports as we move on. > Refactor oak.spi.query into a separate module/bundle > -- > > Key: OAK-6407 > URL: https://issues.apache.org/jira/browse/OAK-6407 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, query >Reporter: angela >Assignee: angela > Labels: modularization > Attachments: OAK-6407.patch > > > now that OAK-6304 and OAK-6355 have been resolved, i would like to suggest > that we move the _o.a.j.oak.spi.query_ code base into a separate > module/bundle in order to prevent the introduction of bogus cycles and odd > package exports in the future. > [~tmueller], patch will follow asap. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6407) Refactor oak.spi.query into a separate module/bundle
[ https://issues.apache.org/jira/browse/OAK-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angela updated OAK-6407: Attachment: OAK-6407.patch [~tmueller], [~stillalex], [~mduerig], may i ask you to review the proposed m12n effort? imho i will help us avoiding reintroducing the cyclic dependency chaos (that we likely ended up with inadvertently) and allow to address proper package exports as we move on. > Refactor oak.spi.query into a separate module/bundle > -- > > Key: OAK-6407 > URL: https://issues.apache.org/jira/browse/OAK-6407 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, query >Reporter: angela >Assignee: angela > Labels: modularization > Attachments: OAK-6407.patch > > > now that OAK-6304 and OAK-6355 have been resolved, i would like to suggest > that we move the _o.a.j.oak.spi.query_ code base into a separate > module/bundle in order to prevent the introduction of bogus cycles and odd > package exports in the future. > [~tmueller], patch will follow asap. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (OAK-6021) Remove segment graph functionality from oak-run
[ https://issues.apache.org/jira/browse/OAK-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Mari reassigned OAK-6021: --- Assignee: Francesco Mari > Remove segment graph functionality from oak-run > --- > > Key: OAK-6021 > URL: https://issues.apache.org/jira/browse/OAK-6021 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run, segment-tar >Reporter: Michael Dürig >Assignee: Francesco Mari > Labels: technical_debt, tooling > Fix For: 1.8 > > > We could probably remove the segment graph functionality from oak-run. This > has been implemented mainly (and solely?) for the purpose of analysing the > problems around OAK-3348 and I assume it would quickly start falling behind > as we move forward. Also for this kind of analysis I have switched to > [oak-script|https://github.com/mduerig/script-oak], which is far more > flexible. > Let's decide closer to cutting 1.8 how to go forward here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6408) Review package exports for o.a.j.oak.plugins.index.*
angela created OAK-6408: --- Summary: Review package exports for o.a.j.oak.plugins.index.* Key: OAK-6408 URL: https://issues.apache.org/jira/browse/OAK-6408 Project: Jackrabbit Oak Issue Type: Improvement Components: core, indexing Reporter: angela while working on OAK-6304 and OAK-6355, i noticed that the _o.a.j.oak.plugins.index.*_ contains both internal api/utilities and implementation details which get equally exported (though without having any package export version set). in the light of the modularization effort, i would like to suggest that we try to sort that out and separate the _public_ parts from implementation details. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-3349) Partial compaction
[ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068298#comment-16068298 ] Michael Dürig commented on OAK-3349: [~dulceanu], [~frm] do above notes make sense to you? Is there anything fundamental I forgot? Especially re. tooling and, upgrade, standby etc. > Partial compaction > -- > > Key: OAK-3349 > URL: https://issues.apache.org/jira/browse/OAK-3349 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar >Reporter: Michael Dürig >Assignee: Michael Dürig > Labels: compaction, gc, scalability > Fix For: 1.8, 1.7.4 > > Attachments: compaction-time.png, cycle-count.png, post-gc-size.png > > > On big repositories compaction can take quite a while to run as it needs to > create a full deep copy of the current root node state. For such cases it > could be beneficial if we could partially compact the repository thus > splitting full compaction over multiple cycles. > Partial compaction would run compaction on a sub-tree just like we now run it > on the full tree. Afterwards it would create a new root node state by > referencing the previous root node state replacing said sub-tree with the > compacted one. > Todo: Asses feasibility and impact, implement prototype. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6407) Refactor oak.spi.query into a separate module/bundle
angela created OAK-6407: --- Summary: Refactor oak.spi.query into a separate module/bundle Key: OAK-6407 URL: https://issues.apache.org/jira/browse/OAK-6407 Project: Jackrabbit Oak Issue Type: Improvement Components: core, query Reporter: angela Assignee: angela now that OAK-6304 and OAK-6355 have been resolved, i would like to suggest that we move the _o.a.j.oak.spi.query_ code base into a separate module/bundle in order to prevent the introduction of bogus cycles and odd package exports in the future. [~tmueller], patch will follow asap. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-3349) Partial compaction
[ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068288#comment-16068288 ] Michael Dürig edited comment on OAK-3349 at 6/29/17 12:50 PM: -- h6. Implementation note on tail compaction In contrast to the existing compaction approach (full compaction) tail compaction rebases all changes since the last compaction on top of the result of that last compaction. Cleanup subsequently cleans up the uncompacted changes. Each tail compaction cycle creates a new generation incrementing the generation number. Cleanup remove all non compacted segments whose generation is no bigger than the current generation minus a certain amount of retained generations (2 by default). To make this work we need to be able to determine the age of a segment (in number of generations) and whether a segment has been written by the compactor or by a regular writer (and is thus uncompacted). The [POC|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC] implemented this by assigning even generation numbers to regular segments and odd ones to segment written by tail compaction while at the same time completely removing support for full compaction. To combine tail compaction with full compaction I suggest to introduce a young generation field in the segment header, which is used by tail compaction as described. The existing generation field will thus keep being used for full compaction without changing its semantics. The proposed approach has the advantage of tail and full compaction being completely orthogonal. You can run either of which or both without one affecting or influencing the other. Both compaction and cleanup methods solely rely on the information in the segment headers. A predicate for determining which segments to retain can be inferred from the segment containing the head revision. There is no need to rely on auxiliary information with the small exception of tail compaction using the {{gc.log}} file to determine the base revision to compact onto. This is not problematic though wrt. to resilience as we can always fall back to full compaction should the base revision be invalid. (A base revision can be invalid in two ways: either is is not found or it is one not written by the compactor. Both cases can only occur after manual tampering with the {{journal.log}}.) Finally the approach plays well with upgrading: while the additional young generation field requires us to bump the segment version we can easily maintain backwards compatibility and do a rolling upgrade segment by segment. Segments of the prevision version will just not be eligible for cleanup under tail compaction. was (Author: mduerig): h6. Implementing note on tail compaction In contrast to the existing compaction approach (full compaction) tail compaction rebases all changes since the last compaction on top of the result of that last compaction. Cleanup subsequently cleans up the uncompacted changes. Each tail compaction cycle creates a new generation incrementing the generation number. Cleanup remove all non compacted segments whose generation is no bigger than the current generation minus a certain amount of retained generations (2 by default). To make this work we need to be able to determine the age of a segment (in number of generations) and whether a segment has been written by the compactor or by a regular writer (and is thus uncompacted). The [POC|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC] implemented this by assigning even generation numbers to regular segments and odd ones to segment written by tail compaction while at the same time completely removing support for full compaction. To combine tail compaction with full compaction I suggest to introduce a young generation field in the segment header, which is used by tail compaction as described. The existing generation field will thus keep being used for full compaction without changing its semantics. The proposed approach has the advantage of tail and full compaction being completely orthogonal. You can run either of which or both without one affecting or influencing the other. Both compaction and cleanup methods solely rely on the information in the segment headers. A predicate for determining which segments to retain can be inferred from the segment containing the head revision. There is no need to rely on auxiliary information with the small exception of tail compaction using the {{gc.log}} file to determine the base revision to compact onto. This is not problematic though wrt. to resilience as we can always fall back to full compaction should the base revision be invalid. (A base revision can be invalid in two ways: either is is not found or it is one not written by the compactor. Both cases can only occur after manual tampering with the {{journa
[jira] [Commented] (OAK-3349) Partial compaction
[ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068288#comment-16068288 ] Michael Dürig commented on OAK-3349: h6. Implementing note on tail compaction In contrast to the existing compaction approach (full compaction) tail compaction rebases all changes since the last compaction on top of the result of that last compaction. Cleanup subsequently cleans up the uncompacted changes. Each tail compaction cycle creates a new generation incrementing the generation number. Cleanup remove all non compacted segments whose generation is no bigger than the current generation minus a certain amount of retained generations (2 by default). To make this work we need to be able to determine the age of a segment (in number of generations) and whether a segment has been written by the compactor or by a regular writer (and is thus uncompacted). The [POC|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC] implemented this by assigning even generation numbers to regular segments and odd ones to segment written by tail compaction while at the same time completely removing support for full compaction. To combine tail compaction with full compaction I suggest to introduce a young generation field in the segment header, which is used by tail compaction as described. The existing generation field will thus keep being used for full compaction without changing its semantics. The proposed approach has the advantage of tail and full compaction being completely orthogonal. You can run either of which or both without one affecting or influencing the other. Both compaction and cleanup methods solely rely on the information in the segment headers. A predicate for determining which segments to retain can be inferred from the segment containing the head revision. There is no need to rely on auxiliary information with the small exception of tail compaction using the {{gc.log}} file to determine the base revision to compact onto. This is not problematic though wrt. to resilience as we can always fall back to full compaction should the base revision be invalid. (A base revision can be invalid in two ways: either is is not found or it is one not written by the compactor. Both cases can only occur after manual tampering with the {{journal.log}}.) Finally the approach plays well with upgrading: while the additional young generation field requires us to bump the segment version we can easily maintain backwards compatibility and do a rolling upgrade segment by segment. Segments of the prevision version will just not be eligible for cleanup under tail compaction. > Partial compaction > -- > > Key: OAK-3349 > URL: https://issues.apache.org/jira/browse/OAK-3349 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar >Reporter: Michael Dürig >Assignee: Michael Dürig > Labels: compaction, gc, scalability > Fix For: 1.8, 1.7.4 > > Attachments: compaction-time.png, cycle-count.png, post-gc-size.png > > > On big repositories compaction can take quite a while to run as it needs to > create a full deep copy of the current root node state. For such cases it > could be beneficial if we could partially compact the repository thus > splitting full compaction over multiple cycles. > Partial compaction would run compaction on a sub-tree just like we now run it > on the full tree. Afterwards it would create a new root node state by > referencing the previous root node state replacing said sub-tree with the > compacted one. > Todo: Asses feasibility and impact, implement prototype. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6388) Enable Azure shared access signature for blob store connector
[ https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068280#comment-16068280 ] Andrei Kalfas commented on OAK-6388: Looks good, some white spaces, but nothing more as far as I can tell. > Enable Azure shared access signature for blob store connector > - > > Key: OAK-6388 > URL: https://issues.apache.org/jira/browse/OAK-6388 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob-cloud >Reporter: Andrei Kalfas >Assignee: Thomas Mueller > Fix For: 1.8 > > Attachments: AzureSAS-v4.patch > > > Azure storage account can be access with access keys or with shared access > signatures. Currently the blob connector only allows access keys, limiting > the use cases where the storage account must be regarded as a read only one. > Access keys enable all access while shared access signatures can be limited > to certain operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OAK-5057) BulkCreateOrUpdateClusterTest fails sometimes
[ https://issues.apache.org/jira/browse/OAK-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved OAK-5057. - Resolution: Fixed > BulkCreateOrUpdateClusterTest fails sometimes > - > > Key: OAK-5057 > URL: https://issues.apache.org/jira/browse/OAK-5057 > Project: Jackrabbit Oak > Issue Type: Bug > Components: rdbmk >Reporter: Thomas Mueller >Assignee: Thomas Mueller > > The test BulkCreateOrUpdateClusterTest.testConcurrentWithConflict sometimes > fails (on a slow machine). It seems to be caused by a hardcoded limit of 10 > seconds (t.join(1)). It works when using 75000 instead of 1 (the > whole test takes 65 seconds). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-5949) XPath: string literals parsed as identifiers
[ https://issues.apache.org/jira/browse/OAK-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-5949: Fix Version/s: 1.7.1 > XPath: string literals parsed as identifiers > > > Key: OAK-5949 > URL: https://issues.apache.org/jira/browse/OAK-5949 > Project: Jackrabbit Oak > Issue Type: Bug > Components: query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Critical > Labels: candidate_oak_1_6 > Fix For: 1.8, 1.7.1 > > > The following query (for example) is not parsed correctly, as {{@}} is parsed > as attribute prefix: > {noformat} > /jcr:root/home//element(*,rep:Authorizable)[jcr:like(@rep:authorizableId,'@')] > {noformat} > Possibly XPathToSQL2Converter should use currentTokenQuoted for this. > Possibly a similar problem can occur in SQL2Parser (needs to be tested). > Right now, it looks like currentTokenQuoted is never set to true; that should > probably happen in read(). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6388) Enable Azure shared access signature for blob store connector
[ https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068226#comment-16068226 ] Thomas Mueller commented on OAK-6388: - Committed in http://svn.apache.org/r1800269 (trunk). Could you check if the result matches your expectation, because the patch didn't apply cleanly. > Enable Azure shared access signature for blob store connector > - > > Key: OAK-6388 > URL: https://issues.apache.org/jira/browse/OAK-6388 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob-cloud >Reporter: Andrei Kalfas > Fix For: 1.8 > > Attachments: AzureSAS-v4.patch > > > Azure storage account can be access with access keys or with shared access > signatures. Currently the blob connector only allows access keys, limiting > the use cases where the storage account must be regarded as a read only one. > Access keys enable all access while shared access signatures can be limited > to certain operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (OAK-6388) Enable Azure shared access signature for blob store connector
[ https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller reassigned OAK-6388: --- Assignee: Thomas Mueller > Enable Azure shared access signature for blob store connector > - > > Key: OAK-6388 > URL: https://issues.apache.org/jira/browse/OAK-6388 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob-cloud >Reporter: Andrei Kalfas >Assignee: Thomas Mueller > Fix For: 1.8 > > Attachments: AzureSAS-v4.patch > > > Azure storage account can be access with access keys or with shared access > signatures. Currently the blob connector only allows access keys, limiting > the use cases where the storage account must be regarded as a read only one. > Access keys enable all access while shared access signatures can be limited > to certain operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6388) Enable Azure shared access signature for blob store connector
[ https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-6388: Fix Version/s: 1.8 > Enable Azure shared access signature for blob store connector > - > > Key: OAK-6388 > URL: https://issues.apache.org/jira/browse/OAK-6388 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob-cloud >Reporter: Andrei Kalfas > Fix For: 1.8 > > Attachments: AzureSAS-v4.patch > > > Azure storage account can be access with access keys or with shared access > signatures. Currently the blob connector only allows access keys, limiting > the use cases where the storage account must be regarded as a read only one. > Access keys enable all access while shared access signatures can be limited > to certain operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-3598) Export cache related classes for usage in other oak bundle
[ https://issues.apache.org/jira/browse/OAK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068221#comment-16068221 ] angela commented on OAK-3598: - [~chetanm], [~mreutegg], in the light of our m12n effort the {{CacheStats}} has been moved out of _oak-core_ to _oak-core-spi_. i think it would make sense to revisit the special handling inlining the {{CacheStats}} in _oak-lucene_. => added it to the m12n epic to make sure we don't forget about it. > Export cache related classes for usage in other oak bundle > -- > > Key: OAK-3598 > URL: https://issues.apache.org/jira/browse/OAK-3598 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: cache >Reporter: Chetan Mehrotra > Labels: tech-debt > Fix For: 1.8 > > > For OAK-3092 oak-lucene would need to access classes from > {{org.apache.jackrabbit.oak.cache}} package. For now its limited to > {{CacheStats}} to expose the cache related statistics. > This task is meant to determine steps needed to export the package > * Update the pom.xml to export the package > * Review current set of classes to see if they need to be reviewed -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OAK-6406) Cleanup constants in Segment class
[ https://issues.apache.org/jira/browse/OAK-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Dürig resolved OAK-6406. Resolution: Fixed Fixed at http://svn.apache.org/viewvc?rev=1800265&view=rev > Cleanup constants in Segment class > -- > > Key: OAK-6406 > URL: https://issues.apache.org/jira/browse/OAK-6406 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Michael Dürig >Assignee: Michael Dürig > Labels: technical_debt > Fix For: 1.8, 1.7.3 > > > Some of the constants in the {{Segment}} class still refer to the old 255 > segment references limit. We should fix the comments, the constants and their > usage to reflect the current situation where that limit has been lifted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (OAK-6406) Cleanup constants in Segment class
[ https://issues.apache.org/jira/browse/OAK-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Dürig reassigned OAK-6406: -- Assignee: Michael Dürig > Cleanup constants in Segment class > -- > > Key: OAK-6406 > URL: https://issues.apache.org/jira/browse/OAK-6406 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Michael Dürig >Assignee: Michael Dürig > Labels: technical_debt > Fix For: 1.8, 1.7.3 > > > Some of the constants in the {{Segment}} class still refer to the old 255 > segment references limit. We should fix the comments, the constants and their > usage to reflect the current situation where that limit has been lifted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6406) Cleanup constants in Segment class
Michael Dürig created OAK-6406: -- Summary: Cleanup constants in Segment class Key: OAK-6406 URL: https://issues.apache.org/jira/browse/OAK-6406 Project: Jackrabbit Oak Issue Type: Improvement Components: segment-tar Reporter: Michael Dürig Fix For: 1.8, 1.7.3 Some of the constants in the {{Segment}} class still refer to the old 255 segment references limit. We should fix the comments, the constants and their usage to reflect the current situation where that limit has been lifted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6406) Cleanup constants in Segment class
[ https://issues.apache.org/jira/browse/OAK-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Dürig updated OAK-6406: --- Labels: technical_debt (was: ) > Cleanup constants in Segment class > -- > > Key: OAK-6406 > URL: https://issues.apache.org/jira/browse/OAK-6406 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Michael Dürig > Labels: technical_debt > Fix For: 1.8, 1.7.3 > > > Some of the constants in the {{Segment}} class still refer to the old 255 > segment references limit. We should fix the comments, the constants and their > usage to reflect the current situation where that limit has been lifted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6405) Cleanup the o.a.j.o.segment.file.tar package
[ https://issues.apache.org/jira/browse/OAK-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068126#comment-16068126 ] Francesco Mari commented on OAK-6405: - In r1800260 I reduced the visibility of {{TarReader#getEntrySize}}. > Cleanup the o.a.j.o.segment.file.tar package > > > Key: OAK-6405 > URL: https://issues.apache.org/jira/browse/OAK-6405 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Francesco Mari >Assignee: Francesco Mari > Fix For: 1.8 > > > This issue tracks the cleanup and rearrangement of the internals of the > {{o.a.j.o.segment.file.tar}} package. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6405) Cleanup the o.a.j.o.segment.file.tar package
[ https://issues.apache.org/jira/browse/OAK-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068109#comment-16068109 ] Francesco Mari commented on OAK-6405: - In r1800259 I moved some constants shared between {{TarReader}} and {{TarWriter}} to {{TarConstants}}. > Cleanup the o.a.j.o.segment.file.tar package > > > Key: OAK-6405 > URL: https://issues.apache.org/jira/browse/OAK-6405 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Francesco Mari >Assignee: Francesco Mari > Fix For: 1.8 > > > This issue tracks the cleanup and rearrangement of the internals of the > {{o.a.j.o.segment.file.tar}} package. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6388) Enable Azure shared access signature for blob store connector
[ https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068099#comment-16068099 ] Raul Hudea commented on OAK-6388: - Yes, please. > Enable Azure shared access signature for blob store connector > - > > Key: OAK-6388 > URL: https://issues.apache.org/jira/browse/OAK-6388 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob-cloud >Reporter: Andrei Kalfas > Attachments: AzureSAS-v4.patch > > > Azure storage account can be access with access keys or with shared access > signatures. Currently the blob connector only allows access keys, limiting > the use cases where the storage account must be regarded as a read only one. > Access keys enable all access while shared access signatures can be limited > to certain operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6405) Cleanup the o.a.j.o.segment.file.tar package
Francesco Mari created OAK-6405: --- Summary: Cleanup the o.a.j.o.segment.file.tar package Key: OAK-6405 URL: https://issues.apache.org/jira/browse/OAK-6405 Project: Jackrabbit Oak Issue Type: Improvement Reporter: Francesco Mari Assignee: Francesco Mari Fix For: 1.8 This issue tracks the cleanup and rearrangement of the internals of the {{o.a.j.o.segment.file.tar}} package. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6388) Enable Azure shared access signature for blob store connector
[ https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068087#comment-16068087 ] Thomas Mueller commented on OAK-6388: - The new patch looks good to me. Should I commit it? > Enable Azure shared access signature for blob store connector > - > > Key: OAK-6388 > URL: https://issues.apache.org/jira/browse/OAK-6388 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob-cloud >Reporter: Andrei Kalfas > Attachments: AzureSAS-v4.patch > > > Azure storage account can be access with access keys or with shared access > signatures. Currently the blob connector only allows access keys, limiting > the use cases where the storage account must be regarded as a read only one. > Access keys enable all access while shared access signatures can be limited > to certain operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OAK-6404) Move TAR handling logic in its own package
[ https://issues.apache.org/jira/browse/OAK-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Mari resolved OAK-6404. - Resolution: Fixed Fixed at r1800258. > Move TAR handling logic in its own package > -- > > Key: OAK-6404 > URL: https://issues.apache.org/jira/browse/OAK-6404 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar >Reporter: Francesco Mari >Assignee: Francesco Mari > Fix For: 1.8, 1.7.3 > > > Classes related to TAR handling can be moved into a separate package. Some of > those classes ({{FileAccess}}, {{TarReader}}, {{TarWriter}}, {{TarEntry}}) > contain implementation details but are currently unused from the rest of the > code. Access to these classes is currently encapsulated by {{TarFiles}}, > which makes the refactoring almost straightforward. > The refactoring also involves moving some monitoring interfaces > ({{FileStoreMonitor}}, {{IOMonitor}}) that are supposed to be implemented > externally and passed to the TAR subsystem. The TAR subsystem will use the > provided implementations to communicate the progress of internal operations. > Implementation of those interfaces will stay where they are. > Finally, the refactoring involves moving {{TarRecovery}} too. This interface > is implemented by the TAR subsystem and is the only reason why {{TarWriter}} > is exposed to the rest of the code. This problem can be easily solved by > introducing a new interface that hides the usage of a {{TarWriter}} for the > recovery of a TAR entry. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6404) Move TAR handling logic in its own package
Francesco Mari created OAK-6404: --- Summary: Move TAR handling logic in its own package Key: OAK-6404 URL: https://issues.apache.org/jira/browse/OAK-6404 Project: Jackrabbit Oak Issue Type: Bug Components: segment-tar Reporter: Francesco Mari Assignee: Francesco Mari Fix For: 1.8, 1.7.3 Classes related to TAR handling can be moved into a separate package. Some of those classes ({{FileAccess}}, {{TarReader}}, {{TarWriter}}, {{TarEntry}}) contain implementation details but are currently unused from the rest of the code. Access to these classes is currently encapsulated by {{TarFiles}}, which makes the refactoring almost straightforward. The refactoring also involves moving some monitoring interfaces ({{FileStoreMonitor}}, {{IOMonitor}}) that are supposed to be implemented externally and passed to the TAR subsystem. The TAR subsystem will use the provided implementations to communicate the progress of internal operations. Implementation of those interfaces will stay where they are. Finally, the refactoring involves moving {{TarRecovery}} too. This interface is implemented by the TAR subsystem and is the only reason why {{TarWriter}} is exposed to the rest of the code. This problem can be easily solved by introducing a new interface that hides the usage of a {{TarWriter}} for the recovery of a TAR entry. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size
[ https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Dürig updated OAK-5192: --- Fix Version/s: (was: 1.7.3) 1.7.8 > Reduce Lucene related growth of repository size > --- > > Key: OAK-5192 > URL: https://issues.apache.org/jira/browse/OAK-5192 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, segment-tar >Reporter: Michael Dürig >Assignee: Tommaso Teofili > Labels: perfomance, scalability > Fix For: 1.8, 1.7.8 > > Attachments: added-bytes-zoom.png, binSize100.txt, binSize16384.txt, > binSizeTotal.txt, diff.txt.zip, nonBinSizeTotal.txt, OAK-5192.0.patch > > > I observed Lucene indexing contributing to up to 99% of repository growth. > While the size of the index itself is well inside reasonable bounds, the > overall turnover of data being written and removed again can be as much as > 99%. > In the case of the TarMK this negatively impacts overall system performance > due to fast growing number of tar files / segments, bad locality of > reference, cache misses/thrashing when looking up segments and vastly > prolonged garbage collection cycles. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6388) Enable Azure shared access signature for blob store connector
[ https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067979#comment-16067979 ] Raul Hudea commented on OAK-6388: - Looks good. Thanks. > Enable Azure shared access signature for blob store connector > - > > Key: OAK-6388 > URL: https://issues.apache.org/jira/browse/OAK-6388 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob-cloud >Reporter: Andrei Kalfas > Attachments: AzureSAS-v4.patch > > > Azure storage account can be access with access keys or with shared access > signatures. Currently the blob connector only allows access keys, limiting > the use cases where the storage account must be regarded as a read only one. > Access keys enable all access while shared access signatures can be limited > to certain operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6388) Enable Azure shared access signature for blob store connector
[ https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Kalfas updated OAK-6388: --- Attachment: AzureSAS-v4.patch > Enable Azure shared access signature for blob store connector > - > > Key: OAK-6388 > URL: https://issues.apache.org/jira/browse/OAK-6388 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob-cloud >Reporter: Andrei Kalfas > Attachments: AzureSAS-v4.patch > > > Azure storage account can be access with access keys or with shared access > signatures. Currently the blob connector only allows access keys, limiting > the use cases where the storage account must be regarded as a read only one. > Access keys enable all access while shared access signatures can be limited > to certain operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6388) Enable Azure shared access signature for blob store connector
[ https://issues.apache.org/jira/browse/OAK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Kalfas updated OAK-6388: --- Attachment: (was: AzureSAS-v3.patch) > Enable Azure shared access signature for blob store connector > - > > Key: OAK-6388 > URL: https://issues.apache.org/jira/browse/OAK-6388 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob-cloud >Reporter: Andrei Kalfas > Attachments: AzureSAS-v4.patch > > > Azure storage account can be access with access keys or with shared access > signatures. Currently the blob connector only allows access keys, limiting > the use cases where the storage account must be regarded as a read only one. > Access keys enable all access while shared access signatures can be limited > to certain operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6309) Not always convert XPath "primaryType in a, b" to union
[ https://issues.apache.org/jira/browse/OAK-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-6309: Component/s: query > Not always convert XPath "primaryType in a, b" to union > --- > > Key: OAK-6309 > URL: https://issues.apache.org/jira/browse/OAK-6309 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Critical > Fix For: 1.8 > > > Currently, queries with multiple primary types are always converted to a > "union", but this is not alway the best solution. The main problem is that > results are not sorted by score as expected. Example: > {noformat} > /jcr:root/content//element(*, nt:hierarchyNode)[jcr:contains(., 'abc) > and (@jcr:primaryType = 'acme:Page' or @jcr:primaryType = 'acme:Asset')] > {noformat} > This is currently converted to a union, even if the same index is used for > buth subqueries (assuming there is an index on nt:hierarchyNode). > A workaround is to use: > {noformat} > /jcr:root/content//element(*, nt:hierarchyNode)[jcr:contains(., 'abc) > and (./@jcr:primaryType = 'acme:Page' or ./@jcr:primaryType = 'acme:Asset')] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-3262) oak-jcr: update test exclusions once JCR-3901 is resolved
[ https://issues.apache.org/jira/browse/OAK-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067922#comment-16067922 ] Julian Reschke commented on OAK-3262: - trunk: [r1800244|http://svn.apache.org/r1800244] > oak-jcr: update test exclusions once JCR-3901 is resolved > - > > Key: OAK-3262 > URL: https://issues.apache.org/jira/browse/OAK-3262 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: jcr >Affects Versions: 1.0.18, 1.2.3, 1.3.3 >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Labels: candidate_oak_1_6 > Fix For: 1.8, 1.7.3 > > Attachments: OAK-3262.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OAK-3262) oak-jcr: update test exclusions once JCR-3901 is resolved
[ https://issues.apache.org/jira/browse/OAK-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Reschke resolved OAK-3262. - Resolution: Fixed Fix Version/s: 1.7.3 > oak-jcr: update test exclusions once JCR-3901 is resolved > - > > Key: OAK-3262 > URL: https://issues.apache.org/jira/browse/OAK-3262 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: jcr >Affects Versions: 1.0.18, 1.2.3, 1.3.3 >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Labels: candidate_oak_1_6 > Fix For: 1.8, 1.7.3 > > Attachments: OAK-3262.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6375) RevisionGCMbeans are not filtered correctly in the RepositoryManagement
[ https://issues.apache.org/jira/browse/OAK-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomek Rękawek updated OAK-6375: --- Fix Version/s: (was: 1.7.4) 1.7.3 > RevisionGCMbeans are not filtered correctly in the RepositoryManagement > --- > > Key: OAK-6375 > URL: https://issues.apache.org/jira/browse/OAK-6375 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core >Affects Versions: 1.7.2 >Reporter: Tomek Rękawek >Assignee: Tomek Rękawek > Fix For: 1.8, 1.7.3 > > > The RepositoryManagement uses the new Whiteboard#track(Class,Map) method to > get services with given role. When the OSGi implementation of the whiteboard > is being used, the filter is transformed into OSGi filter expression. > Particularly, the {{role}} property is being used to get the RevisionGCMBean > of the right type. > However, the RevisionGCMBean role is being set as a part of the > {{jmx.objectname}} property, not a separate property. The {{jmx.objectname}} > property has following form: > {noformat} > org.apache.jackrabbit.oak:name=Revision garbage collection - > secondary,type=RevisionGarbageCollection,role=secondary > {noformat} > Because of that, an attempt to call the > RepositoryManagement#startRevisionGC() method (or its parametrized version) > fails. > //cc: [~chetanm], [~mduerig] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6375) RevisionGCMbeans are not filtered correctly in the RepositoryManagement
[ https://issues.apache.org/jira/browse/OAK-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomek Rękawek updated OAK-6375: --- Affects Version/s: (was: 1.7.3) 1.7.2 > RevisionGCMbeans are not filtered correctly in the RepositoryManagement > --- > > Key: OAK-6375 > URL: https://issues.apache.org/jira/browse/OAK-6375 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core >Affects Versions: 1.7.2 >Reporter: Tomek Rękawek >Assignee: Tomek Rękawek > Fix For: 1.8, 1.7.3 > > > The RepositoryManagement uses the new Whiteboard#track(Class,Map) method to > get services with given role. When the OSGi implementation of the whiteboard > is being used, the filter is transformed into OSGi filter expression. > Particularly, the {{role}} property is being used to get the RevisionGCMBean > of the right type. > However, the RevisionGCMBean role is being set as a part of the > {{jmx.objectname}} property, not a separate property. The {{jmx.objectname}} > property has following form: > {noformat} > org.apache.jackrabbit.oak:name=Revision garbage collection - > secondary,type=RevisionGarbageCollection,role=secondary > {noformat} > Because of that, an attempt to call the > RepositoryManagement#startRevisionGC() method (or its parametrized version) > fails. > //cc: [~chetanm], [~mduerig] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-3262) oak-jcr: update test exclusions once JCR-3901 is resolved
[ https://issues.apache.org/jira/browse/OAK-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Reschke updated OAK-3262: Labels: candidate_oak_1_6 (was: ) > oak-jcr: update test exclusions once JCR-3901 is resolved > - > > Key: OAK-3262 > URL: https://issues.apache.org/jira/browse/OAK-3262 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: jcr >Affects Versions: 1.0.18, 1.2.3, 1.3.3 >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Minor > Labels: candidate_oak_1_6 > Fix For: 1.8 > > Attachments: OAK-3262.diff > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)