[jira] [Resolved] (OAK-9966) Internal code calls Node.isCheckedOut and VersionManager.isCheckedOut

2022-10-17 Thread Angela Schreiber (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angela Schreiber resolved OAK-9966.
---
Fix Version/s: 1.46.0
   Resolution: Fixed

> Internal code calls Node.isCheckedOut and VersionManager.isCheckedOut
> -
>
> Key: OAK-9966
> URL: https://issues.apache.org/jira/browse/OAK-9966
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, jcr
>Reporter: Angela Schreiber
>Assignee: Angela Schreiber
>Priority: Major
>  Labels: performance
> Fix For: 1.46.0
>
>
> while investigating a report about performance issues in Oak i came across 
> stacktraces reporting excessive permission evaluation during 
> {{ReadOnlyVersionManager.isCheckedOut(Tree)}}
> there are a couple of things that struck me and which might be valuable 
> improvements:
> h2. Internal Calls to VersionManager.isCheckedOut(String)
> - JCR call {{NodeImpl.isCheckedOut}} calls JCR API 
> {{VersionManager.isCheckedOut(String path)}}, i.e. forcing the {{Tree}} 
> object to be accessed again, when it was already present with the NodeImpl
> - JCR call {{NodeImpl.canAddMixin}} calls JCR API 
> {{VersionManager.isCheckedOut(String path)}}, i.e. forcing the {{Tree}} 
> object to be accessed again, when it was already present with the NodeImpl
> - {{ImporterImpl}} constructor calls JCR API 
> {{VersionManager.isCheckedOut(String path)}}, i.e. forcing the {{Tree}} 
> object to be accessed again despite the fact that it has been obtained just 
> before
> h2. Internal Calls to Node.isCheckedOut() which calls 
> VersionManager.isCheckedOut(String)
> - JCR call {{NodeImpl.setPrimaryType}} calls JCR API {{Node.isCheckedOut}} 
> - JCR call {{NodeImpl.addMixin}} calls JCR API {{Node.isCheckedOut}} 
> - JCR call {{NodeImpl.removeMixin}} calls JCR API {{Node.isCheckedOut}} 
> - Jackrabbit API call {{NodeImpl.setMixins}} calls JCR API 
> {{Node.isCheckedOut}} 
> - JCR call {{QueryImpl.storeAsNode}} calls JCR API {{Node.isCheckedOut}} 
> - JCR call {{SessionImpl.hasCapability}} calls JCR API {{Node.isCheckedOut}} 
> after having retrieved the node.- JCR call {{PropertyImpl.remove()}} calls 
> JCR API {{Node.isCheckedOut}} on parent node
> - internal call {{NodeImpl.internalSetProperty(String,Value,boolean)}} calls 
> JCR API {{Node.isCheckedOut}} 
> - internal call {{NodeImpl.internalSetProperty(String,Value[],boolean)}} 
> calls JCR API {{Node.isCheckedOut}} 
> - internal call {{NodeImpl.internalRemoveProperty}}  calls JCR API 
> {{Node.isCheckedOut}} 
> - internal call {{PropertyImpl.internalSetValue(Value)}} calls JCR API 
> {{Node.isCheckedOut}}  on parent node
> - - internal call {{PropertyImpl.internalSetValue(Value[])}} calls JCR API 
> {{Node.isCheckedOut}}  on parent node
> h2. ReadOnlyVersionManager.isCheckedOut(Tree)
> - The implementation of {{ReadOnlyVersionManager.isCheckedOut(Tree)}} 
> verifies that the passed tree exists despite the fact that most callers of 
> this method have already verified that the tree exists (e.g. the node was 
> retrieved through JCR API).
> - The implementation will recursively walk up the hierarchy to check if any 
> of the parents is check-in (again verifying the existence and accessibility 
> of the parent, which in this case is likely not relevant as it is an internal 
> call that doesn't leak any information if the parent tree is not readable to 
> the editing session)
> [~jhoh], [~mreutegg] wdyt?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (OAK-9790) Implement parallel indexing for speeding up oak run indexing command

2022-10-17 Thread Amit Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Jain reassigned OAK-9790:
--

Assignee: Amit Jain

> Implement parallel indexing for speeding up oak run indexing command
> 
>
> Key: OAK-9790
> URL: https://issues.apache.org/jira/browse/OAK-9790
> Project: Jackrabbit Oak
>  Issue Type: Story
>Reporter: Jun Zhang
>Assignee: Amit Jain
>Priority: Major
>
> Implement parallel indexing for speeding up oak run indexing command
> Since indexing was single threads, which is slow for large repository. In 
> order to improve the indexing performance we need to implement parallel 
> indexing.
> The work is cover for both lucene and elastic indexing. In order to support 
> parallel indexing, it need to split the big flat file store file ahead, which 
> add a big overhead, but make parallel index possible and much faster.
> Another change together is support the LZ4 compression since which is much 
> faster compare to gzip.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-9968) Enable LZ4 compression for parallel indexing

2022-10-17 Thread Amit Jain (Jira)
Amit Jain created OAK-9968:
--

 Summary: Enable LZ4 compression for parallel indexing
 Key: OAK-9968
 URL: https://issues.apache.org/jira/browse/OAK-9968
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Amit Jain
Assignee: Amit Jain


Enable LZ4 indexing using repository [https://github.com/lz4/lz4-java] for 
parallel indexing introduced OAK-9790



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-9790) Implement parallel indexing for speeding up oak run indexing command

2022-10-17 Thread Amit Jain (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619223#comment-17619223
 ] 

Amit Jain commented on OAK-9790:


Created https://issues.apache.org/jira/browse/OAK-9968 for LZ4 support

> Implement parallel indexing for speeding up oak run indexing command
> 
>
> Key: OAK-9790
> URL: https://issues.apache.org/jira/browse/OAK-9790
> Project: Jackrabbit Oak
>  Issue Type: Story
>Reporter: Jun Zhang
>Assignee: Amit Jain
>Priority: Major
>
> Implement parallel indexing for speeding up oak run indexing command
> Since indexing was single threads, which is slow for large repository. In 
> order to improve the indexing performance we need to implement parallel 
> indexing.
> The work is cover for both lucene and elastic indexing. In order to support 
> parallel indexing, it need to split the big flat file store file ahead, which 
> add a big overhead, but make parallel index possible and much faster.
> Another change together is support the LZ4 compression since which is much 
> faster compare to gzip.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-4796) Filter events before adding to ChangeProcessor's queue

2022-10-17 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-4796:

Summary: Filter events before adding to ChangeProcessor's queue  (was: 
filter events before adding to ChangeProcessor's queue)

> Filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: observation
> Fix For: 1.5.13, 1.6.0
>
> Attachments: OAK-4796.changeSet.patch, OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-9790) Implement parallel indexing for speeding up oak run indexing command

2022-10-17 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618828#comment-17618828
 ] 

Julian Reschke commented on OAK-9790:
-

I would recommend moving the compression method change into a separate issue 
(and do real-world benchmarks there as well).

> Implement parallel indexing for speeding up oak run indexing command
> 
>
> Key: OAK-9790
> URL: https://issues.apache.org/jira/browse/OAK-9790
> Project: Jackrabbit Oak
>  Issue Type: Story
>Reporter: Jun Zhang
>Priority: Major
>
> Implement parallel indexing for speeding up oak run indexing command
> Since indexing was single threads, which is slow for large repository. In 
> order to improve the indexing performance we need to implement parallel 
> indexing.
> The work is cover for both lucene and elastic indexing. In order to support 
> parallel indexing, it need to split the big flat file store file ahead, which 
> add a big overhead, but make parallel index possible and much faster.
> Another change together is support the LZ4 compression since which is much 
> faster compare to gzip.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)