[jira] [Resolved] (OAK-9966) Internal code calls Node.isCheckedOut and VersionManager.isCheckedOut
[ https://issues.apache.org/jira/browse/OAK-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Angela Schreiber resolved OAK-9966. --- Fix Version/s: 1.46.0 Resolution: Fixed > Internal code calls Node.isCheckedOut and VersionManager.isCheckedOut > - > > Key: OAK-9966 > URL: https://issues.apache.org/jira/browse/OAK-9966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, jcr >Reporter: Angela Schreiber >Assignee: Angela Schreiber >Priority: Major > Labels: performance > Fix For: 1.46.0 > > > while investigating a report about performance issues in Oak i came across > stacktraces reporting excessive permission evaluation during > {{ReadOnlyVersionManager.isCheckedOut(Tree)}} > there are a couple of things that struck me and which might be valuable > improvements: > h2. Internal Calls to VersionManager.isCheckedOut(String) > - JCR call {{NodeImpl.isCheckedOut}} calls JCR API > {{VersionManager.isCheckedOut(String path)}}, i.e. forcing the {{Tree}} > object to be accessed again, when it was already present with the NodeImpl > - JCR call {{NodeImpl.canAddMixin}} calls JCR API > {{VersionManager.isCheckedOut(String path)}}, i.e. forcing the {{Tree}} > object to be accessed again, when it was already present with the NodeImpl > - {{ImporterImpl}} constructor calls JCR API > {{VersionManager.isCheckedOut(String path)}}, i.e. forcing the {{Tree}} > object to be accessed again despite the fact that it has been obtained just > before > h2. Internal Calls to Node.isCheckedOut() which calls > VersionManager.isCheckedOut(String) > - JCR call {{NodeImpl.setPrimaryType}} calls JCR API {{Node.isCheckedOut}} > - JCR call {{NodeImpl.addMixin}} calls JCR API {{Node.isCheckedOut}} > - JCR call {{NodeImpl.removeMixin}} calls JCR API {{Node.isCheckedOut}} > - Jackrabbit API call {{NodeImpl.setMixins}} calls JCR API > {{Node.isCheckedOut}} > - JCR call {{QueryImpl.storeAsNode}} calls JCR API {{Node.isCheckedOut}} > - JCR call {{SessionImpl.hasCapability}} calls JCR API {{Node.isCheckedOut}} > after having retrieved the node.- JCR call {{PropertyImpl.remove()}} calls > JCR API {{Node.isCheckedOut}} on parent node > - internal call {{NodeImpl.internalSetProperty(String,Value,boolean)}} calls > JCR API {{Node.isCheckedOut}} > - internal call {{NodeImpl.internalSetProperty(String,Value[],boolean)}} > calls JCR API {{Node.isCheckedOut}} > - internal call {{NodeImpl.internalRemoveProperty}} calls JCR API > {{Node.isCheckedOut}} > - internal call {{PropertyImpl.internalSetValue(Value)}} calls JCR API > {{Node.isCheckedOut}} on parent node > - - internal call {{PropertyImpl.internalSetValue(Value[])}} calls JCR API > {{Node.isCheckedOut}} on parent node > h2. ReadOnlyVersionManager.isCheckedOut(Tree) > - The implementation of {{ReadOnlyVersionManager.isCheckedOut(Tree)}} > verifies that the passed tree exists despite the fact that most callers of > this method have already verified that the tree exists (e.g. the node was > retrieved through JCR API). > - The implementation will recursively walk up the hierarchy to check if any > of the parents is check-in (again verifying the existence and accessibility > of the parent, which in this case is likely not relevant as it is an internal > call that doesn't leak any information if the parent tree is not readable to > the editing session) > [~jhoh], [~mreutegg] wdyt? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (OAK-9790) Implement parallel indexing for speeding up oak run indexing command
[ https://issues.apache.org/jira/browse/OAK-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Jain reassigned OAK-9790: -- Assignee: Amit Jain > Implement parallel indexing for speeding up oak run indexing command > > > Key: OAK-9790 > URL: https://issues.apache.org/jira/browse/OAK-9790 > Project: Jackrabbit Oak > Issue Type: Story >Reporter: Jun Zhang >Assignee: Amit Jain >Priority: Major > > Implement parallel indexing for speeding up oak run indexing command > Since indexing was single threads, which is slow for large repository. In > order to improve the indexing performance we need to implement parallel > indexing. > The work is cover for both lucene and elastic indexing. In order to support > parallel indexing, it need to split the big flat file store file ahead, which > add a big overhead, but make parallel index possible and much faster. > Another change together is support the LZ4 compression since which is much > faster compare to gzip. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-9968) Enable LZ4 compression for parallel indexing
Amit Jain created OAK-9968: -- Summary: Enable LZ4 compression for parallel indexing Key: OAK-9968 URL: https://issues.apache.org/jira/browse/OAK-9968 Project: Jackrabbit Oak Issue Type: Improvement Components: indexing Reporter: Amit Jain Assignee: Amit Jain Enable LZ4 indexing using repository [https://github.com/lz4/lz4-java] for parallel indexing introduced OAK-9790 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-9790) Implement parallel indexing for speeding up oak run indexing command
[ https://issues.apache.org/jira/browse/OAK-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619223#comment-17619223 ] Amit Jain commented on OAK-9790: Created https://issues.apache.org/jira/browse/OAK-9968 for LZ4 support > Implement parallel indexing for speeding up oak run indexing command > > > Key: OAK-9790 > URL: https://issues.apache.org/jira/browse/OAK-9790 > Project: Jackrabbit Oak > Issue Type: Story >Reporter: Jun Zhang >Assignee: Amit Jain >Priority: Major > > Implement parallel indexing for speeding up oak run indexing command > Since indexing was single threads, which is slow for large repository. In > order to improve the indexing performance we need to implement parallel > indexing. > The work is cover for both lucene and elastic indexing. In order to support > parallel indexing, it need to split the big flat file store file ahead, which > add a big overhead, but make parallel index possible and much faster. > Another change together is support the LZ4 compression since which is much > faster compare to gzip. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-4796) Filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller updated OAK-4796: Summary: Filter events before adding to ChangeProcessor's queue (was: filter events before adding to ChangeProcessor's queue) > Filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: observation > Fix For: 1.5.13, 1.6.0 > > Attachments: OAK-4796.changeSet.patch, OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-9790) Implement parallel indexing for speeding up oak run indexing command
[ https://issues.apache.org/jira/browse/OAK-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618828#comment-17618828 ] Julian Reschke commented on OAK-9790: - I would recommend moving the compression method change into a separate issue (and do real-world benchmarks there as well). > Implement parallel indexing for speeding up oak run indexing command > > > Key: OAK-9790 > URL: https://issues.apache.org/jira/browse/OAK-9790 > Project: Jackrabbit Oak > Issue Type: Story >Reporter: Jun Zhang >Priority: Major > > Implement parallel indexing for speeding up oak run indexing command > Since indexing was single threads, which is slow for large repository. In > order to improve the indexing performance we need to implement parallel > indexing. > The work is cover for both lucene and elastic indexing. In order to support > parallel indexing, it need to split the big flat file store file ahead, which > add a big overhead, but make parallel index possible and much faster. > Another change together is support the LZ4 compression since which is much > faster compare to gzip. > -- This message was sent by Atlassian Jira (v8.20.10#820010)