[jira] [Created] (OAK-10804) Indexing job: optimize check for if a node is hidden

2024-05-15 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10804:
-

 Summary: Indexing job: optimize check for if a node is hidden
 Key: OAK-10804
 URL: https://issues.apache.org/jira/browse/OAK-10804
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos


While downloading the repository from Mongo, the indexing job has to discard 
hidden entries. This is being done by a call to 
`NodeStateUtils.isHiddenPath()`. This call is rather expensive, as it creates 
an iterator over the path segments, which requires creating a new string for 
each path segment. As the indexing job has to check every entry to verify if it 
is hidden, this creates a significant overhead.

The implementation of checking for hidden paths can be replaced by a simple 
search for {{"/:"}} in the string representing the path, which requires no 
object allocation and should therefore be much faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10804) Indexing job: optimize check for hidden nodes

2024-05-15 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10804:
--
Summary: Indexing job: optimize check for hidden nodes  (was: Indexing job: 
optimize check for if a node is hidden)

> Indexing job: optimize check for hidden nodes
> -
>
> Key: OAK-10804
> URL: https://issues.apache.org/jira/browse/OAK-10804
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>
> While downloading the repository from Mongo, the indexing job has to discard 
> hidden entries. This is being done by a call to 
> `NodeStateUtils.isHiddenPath()`. This call is rather expensive, as it creates 
> an iterator over the path segments, which requires creating a new string for 
> each path segment. As the indexing job has to check every entry to verify if 
> it is hidden, this creates a significant overhead.
> The implementation of checking for hidden paths can be replaced by a simple 
> search for {{"/:"}} in the string representing the path, which requires no 
> object allocation and should therefore be much faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10804) Indexing job: optimize check for hidden nodes

2024-05-15 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos updated OAK-10804:
--
Description: 
While downloading the repository from Mongo, the indexing job has to discard 
hidden entries. This is being done by a call to 
{{{}NodeStateUtils.isHiddenPath(){}}}. This call is rather expensive, as it 
creates an iterator over the path segments, which requires creating a new 
string for each path segment. As the indexing job has to check every entry to 
verify if it is hidden, this creates a significant overhead.

The implementation of checking for hidden paths can be replaced by a simple 
search for {{"/:"}} in the string representing the path, which requires no 
object allocation and should therefore be much faster.

  was:
While downloading the repository from Mongo, the indexing job has to discard 
hidden entries. This is being done by a call to 
`NodeStateUtils.isHiddenPath()`. This call is rather expensive, as it creates 
an iterator over the path segments, which requires creating a new string for 
each path segment. As the indexing job has to check every entry to verify if it 
is hidden, this creates a significant overhead.

The implementation of checking for hidden paths can be replaced by a simple 
search for {{"/:"}} in the string representing the path, which requires no 
object allocation and should therefore be much faster.


> Indexing job: optimize check for hidden nodes
> -
>
> Key: OAK-10804
> URL: https://issues.apache.org/jira/browse/OAK-10804
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
>
> While downloading the repository from Mongo, the indexing job has to discard 
> hidden entries. This is being done by a call to 
> {{{}NodeStateUtils.isHiddenPath(){}}}. This call is rather expensive, as it 
> creates an iterator over the path segments, which requires creating a new 
> string for each path segment. As the indexing job has to check every entry to 
> verify if it is hidden, this creates a significant overhead.
> The implementation of checking for hidden paths can be replaced by a simple 
> search for {{"/:"}} in the string representing the path, which requires no 
> object allocation and should therefore be much faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10805) Build Jackrabbit/jackrabbit-oak-trunk #1472 failed

2024-05-15 Thread Hudson (Jira)
Hudson created OAK-10805:


 Summary: Build Jackrabbit/jackrabbit-oak-trunk #1472 failed
 Key: OAK-10805
 URL: https://issues.apache.org/jira/browse/OAK-10805
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: continuous integration
Reporter: Hudson


No description is provided

The build Jackrabbit/jackrabbit-oak-trunk #1472 has failed.
First failed run: [Jackrabbit/jackrabbit-oak-trunk 
#1472|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1472/]
 [console 
log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1472/console]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10800) DictionaryCompoundWordTokenFilter not supported in Elastic

2024-05-15 Thread Fabrizio Fortino (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabrizio Fortino resolved OAK-10800.

Fix Version/s: 1.64.0
   Resolution: Fixed

> DictionaryCompoundWordTokenFilter not supported in Elastic
> --
>
> Key: OAK-10800
> URL: https://issues.apache.org/jira/browse/OAK-10800
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Fabrizio Fortino
>Assignee: Fabrizio Fortino
>Priority: Minor
> Fix For: 1.64.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10788) Indexing job downloader: shutdown gracefully all threads in case of failure

2024-05-15 Thread Nuno Santos (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-10788.
---
Fix Version/s: 1.64.0
   Resolution: Done

> Indexing job downloader: shutdown gracefully all threads in case of failure
> ---
>
> Key: OAK-10788
> URL: https://issues.apache.org/jira/browse/OAK-10788
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Minor
> Fix For: 1.64.0
>
>
> If the download fails, the threads created by the Pipeline strategy are not 
> all being correctly shutdown, some of them may be left behind. As they are 
> all daemon threads, they will not prevent the JVM from shutting down. But 
> when they are forcibly closed at the JVM shutdown, they print in the logs 
> several exceptions (connections closed abruptly, trying to access objects 
> that were already closed) that are confusing and distract from the root cause 
> of the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10806) Expose Elastic indexes active status in OakIndexStats

2024-05-15 Thread Nitin Gupta (Jira)
Nitin Gupta created OAK-10806:
-

 Summary: Expose Elastic indexes active status in OakIndexStats
 Key: OAK-10806
 URL: https://issues.apache.org/jira/browse/OAK-10806
 Project: Jackrabbit Oak
  Issue Type: Improvement
Reporter: Nitin Gupta


ES indexes are currently present in the OakIndexStats but they are all shown as 
active.
We should make sure only the latest version of the ES indexes are shown as 
active.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10805) Build Jackrabbit/jackrabbit-oak-trunk #1472 failed

2024-05-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846601#comment-17846601
 ] 

Hudson commented on OAK-10805:
--

Previously failing build now is OK.
 Passed run: [Jackrabbit/jackrabbit-oak-trunk 
#1473|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1473/]
 [console 
log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1473/console]

> Build Jackrabbit/jackrabbit-oak-trunk #1472 failed
> --
>
> Key: OAK-10805
> URL: https://issues.apache.org/jira/browse/OAK-10805
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: continuous integration
>Reporter: Hudson
>Priority: Major
>
> No description is provided
> The build Jackrabbit/jackrabbit-oak-trunk #1472 has failed.
> First failed run: [Jackrabbit/jackrabbit-oak-trunk 
> #1472|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1472/]
>  [console 
> log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1472/console]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10791) Build Jackrabbit/jackrabbit-oak-trunk-java17 #18 failed

2024-05-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846613#comment-17846613
 ] 

Hudson commented on OAK-10791:
--

Previously failing build now is OK.
 Passed run: [Jackrabbit/jackrabbit-oak-trunk-java17 
#20|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk-java17/20/]
 [console 
log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk-java17/20/console]

> Build Jackrabbit/jackrabbit-oak-trunk-java17 #18 failed
> ---
>
> Key: OAK-10791
> URL: https://issues.apache.org/jira/browse/OAK-10791
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: continuous integration
>Reporter: Hudson
>Priority: Major
>
> No description is provided
> The build Jackrabbit/jackrabbit-oak-trunk-java17 #18 has failed.
> First failed run: [Jackrabbit/jackrabbit-oak-trunk-java17 
> #18|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk-java17/18/]
>  [console 
> log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk-java17/18/console]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10807) Improve FileStoreBuilder to accept URIs for remote repositories

2024-05-15 Thread Andrei Dulceanu (Jira)
Andrei Dulceanu created OAK-10807:
-

 Summary: Improve FileStoreBuilder to accept URIs for remote 
repositories
 Key: OAK-10807
 URL: https://issues.apache.org/jira/browse/OAK-10807
 Project: Jackrabbit Oak
  Issue Type: Story
Reporter: Andrei Dulceanu
Assignee: Andrei Dulceanu


Currently {{FileStoreBuilder}} accepts only {{File}} path arguments in its 
constructor making it impossible to use for {{oak-run}} tooling for remote 
repositories. It should accept {{{}URI{}}}s as well, creating by default Azure 
persistence backends.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10805) Build Jackrabbit/jackrabbit-oak-trunk #1472 failed

2024-05-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846691#comment-17846691
 ] 

Hudson commented on OAK-10805:
--

Previously failing build now is OK.
 Passed run: [Jackrabbit/jackrabbit-oak-trunk 
#1474|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1474/]
 [console 
log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1474/console]

> Build Jackrabbit/jackrabbit-oak-trunk #1472 failed
> --
>
> Key: OAK-10805
> URL: https://issues.apache.org/jira/browse/OAK-10805
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: continuous integration
>Reporter: Hudson
>Priority: Major
>
> No description is provided
> The build Jackrabbit/jackrabbit-oak-trunk #1472 has failed.
> First failed run: [Jackrabbit/jackrabbit-oak-trunk 
> #1472|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1472/]
>  [console 
> log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1472/console]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10778) Indexing job: support parallel download from MongoDB with two connections in Pipelined strategy

2024-05-15 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846702#comment-17846702
 ] 

Julian Reschke commented on OAK-10778:
--

The new test seems to *require* Mongo to be present. That breaks a few 
assumptions in CI and release management. We usually skip these type of tests 
when a MongoDB instance is not available.

> Indexing job: support parallel download from MongoDB with two connections in 
> Pipelined strategy
> ---
>
> Key: OAK-10778
> URL: https://issues.apache.org/jira/browse/OAK-10778
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Assignee: Nuno Santos
>Priority: Major
> Fix For: 1.64.0
>
>
> The current version of the Pipelined download strategy uses a single 
> connection/thread to download from MongoDB. We can further increase the 
> download speed by using an additional MongoDB connection. A Mongo deployment 
> has 1 primary and 2 secondaries, so in principle we could have 1 connection 
> to each secondary, effectively doubling the download speed.
> There are a few points to observe:
>  - Connections should go to different secondaries. If both connections go to 
> the same secondary, there's a high change that they will be limited by what a 
> single replica can provide and of overloading that replica. So each secondary 
> should have one and only one connection.
>  - How to partition the range of documents to download between two threads. 
> We are already downloading from Mongo in order of {{(_modified, _id)}}. A 
> simple and effective partition strategy for 2 connections is for one to 
> download in ascending and the other in descending order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (OAK-10778) Indexing job: support parallel download from MongoDB with two connections in Pipelined strategy

2024-05-15 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke reopened OAK-10778:
--

> Indexing job: support parallel download from MongoDB with two connections in 
> Pipelined strategy
> ---
>
> Key: OAK-10778
> URL: https://issues.apache.org/jira/browse/OAK-10778
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Priority: Major
> Fix For: 1.64.0
>
>
> The current version of the Pipelined download strategy uses a single 
> connection/thread to download from MongoDB. We can further increase the 
> download speed by using an additional MongoDB connection. A Mongo deployment 
> has 1 primary and 2 secondaries, so in principle we could have 1 connection 
> to each secondary, effectively doubling the download speed.
> There are a few points to observe:
>  - Connections should go to different secondaries. If both connections go to 
> the same secondary, there's a high change that they will be limited by what a 
> single replica can provide and of overloading that replica. So each secondary 
> should have one and only one connection.
>  - How to partition the range of documents to download between two threads. 
> We are already downloading from Mongo in order of {{(_modified, _id)}}. A 
> simple and effective partition strategy for 2 connections is for one to 
> download in ascending and the other in descending order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (OAK-10778) Indexing job: support parallel download from MongoDB with two connections in Pipelined strategy

2024-05-15 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke reassigned OAK-10778:


Assignee: Nuno Santos

> Indexing job: support parallel download from MongoDB with two connections in 
> Pipelined strategy
> ---
>
> Key: OAK-10778
> URL: https://issues.apache.org/jira/browse/OAK-10778
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Assignee: Nuno Santos
>Priority: Major
> Fix For: 1.64.0
>
>
> The current version of the Pipelined download strategy uses a single 
> connection/thread to download from MongoDB. We can further increase the 
> download speed by using an additional MongoDB connection. A Mongo deployment 
> has 1 primary and 2 secondaries, so in principle we could have 1 connection 
> to each secondary, effectively doubling the download speed.
> There are a few points to observe:
>  - Connections should go to different secondaries. If both connections go to 
> the same secondary, there's a high change that they will be limited by what a 
> single replica can provide and of overloading that replica. So each secondary 
> should have one and only one connection.
>  - How to partition the range of documents to download between two threads. 
> We are already downloading from Mongo in order of {{(_modified, _id)}}. A 
> simple and effective partition strategy for 2 connections is for one to 
> download in ascending and the other in descending order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10805) Build Jackrabbit/jackrabbit-oak-trunk #1472 failed

2024-05-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846746#comment-17846746
 ] 

Hudson commented on OAK-10805:
--

Previously failing build now is OK.
 Passed run: [Jackrabbit/jackrabbit-oak-trunk 
#1475|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1475/]
 [console 
log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1475/console]

> Build Jackrabbit/jackrabbit-oak-trunk #1472 failed
> --
>
> Key: OAK-10805
> URL: https://issues.apache.org/jira/browse/OAK-10805
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: continuous integration
>Reporter: Hudson
>Priority: Major
>
> No description is provided
> The build Jackrabbit/jackrabbit-oak-trunk #1472 has failed.
> First failed run: [Jackrabbit/jackrabbit-oak-trunk 
> #1472|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1472/]
>  [console 
> log|https://ci-builds.apache.org/job/Jackrabbit/job/jackrabbit-oak-trunk/1472/console]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-10787) oak-lucene: backport fix for lucene-core vulnerability

2024-05-15 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-10787:
-
Labels: candidate_oak_1_22  (was: )

> oak-lucene: backport fix for lucene-core vulnerability
> --
>
> Key: OAK-10787
> URL: https://issues.apache.org/jira/browse/OAK-10787
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: lucene
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>  Labels: candidate_oak_1_22
> Fix For: 1.64.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10787) oak-lucene: backport fix for lucene-core vulnerability

2024-05-15 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke resolved OAK-10787.
--
Resolution: Fixed

> oak-lucene: backport fix for lucene-core vulnerability
> --
>
> Key: OAK-10787
> URL: https://issues.apache.org/jira/browse/OAK-10787
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: lucene
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
> Fix For: 1.64.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10787) oak-lucene: backport fix for lucene-core vulnerability

2024-05-15 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846797#comment-17846797
 ] 

Julian Reschke commented on OAK-10787:
--

trunk: 
[283a1d7fea|https://github.com/apache/jackrabbit-oak/commit/283a1d7fea23fdceff8dda6e88b059d6990eff09]

> oak-lucene: backport fix for lucene-core vulnerability
> --
>
> Key: OAK-10787
> URL: https://issues.apache.org/jira/browse/OAK-10787
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: lucene
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>  Labels: candidate_oak_1_22
> Fix For: 1.64.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OAK-10719) oak-lucene uses Lucene version that can throw a StackOverflowException

2024-05-15 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-10719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke resolved OAK-10719.
--
Resolution: Fixed

> oak-lucene uses Lucene version that can throw a StackOverflowException
> --
>
> Key: OAK-10719
> URL: https://issues.apache.org/jira/browse/OAK-10719
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>  Labels: candidate_oak_1_22
> Fix For: 1.64.0
>
>
> See .
> Analysis so far:
> - oak-lucene uses lucene-core (4.7.2) (see OAK-10716); that version has 
> reached EOL a long time ago
> - the lucene version can in some cases throw a StackOverflowException, see 
> OAK-10713
> - oak-lucene *embeds* and *exports* lucene-core
> - update to version >= 4.8 non-trivial due to backwards compat breakage
> Work in :
> - inlined lucene-core as of git tag "releases/lucene-solr/4.7.2" into 
> oak-lucene
> - fixed two JDK11 compile issues (potentially uninitialized vars in finally 
> block) 
> - backported fix from https://github.com/apache/lucene/issues/11537
> - enable test added in OAK-10713
> - ran Oak integration tests
> Open questions:
> - Lucene 4.7.2 builds with ant/ivy - does it make sense to try to replicate 
> that
> - should we ask Lucene team for a public release (might be hard sell)
> - alternatively, as tried here, inline source code into oak-lucene (maybe add 
> explainers to all source files)
> - do we need to adopt the lucene test suite as well?
> - lucene-core dependencies in other Oak modules to be checked (seems mostly 
> for tests, or for run modules)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10719) oak-lucene uses Lucene version that can throw a StackOverflowException

2024-05-15 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846798#comment-17846798
 ] 

Julian Reschke commented on OAK-10719:
--

Two subtasks resolved: OAK-10786 and OAK-10787.

> oak-lucene uses Lucene version that can throw a StackOverflowException
> --
>
> Key: OAK-10719
> URL: https://issues.apache.org/jira/browse/OAK-10719
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>  Labels: candidate_oak_1_22
> Fix For: 1.64.0
>
>
> See .
> Analysis so far:
> - oak-lucene uses lucene-core (4.7.2) (see OAK-10716); that version has 
> reached EOL a long time ago
> - the lucene version can in some cases throw a StackOverflowException, see 
> OAK-10713
> - oak-lucene *embeds* and *exports* lucene-core
> - update to version >= 4.8 non-trivial due to backwards compat breakage
> Work in :
> - inlined lucene-core as of git tag "releases/lucene-solr/4.7.2" into 
> oak-lucene
> - fixed two JDK11 compile issues (potentially uninitialized vars in finally 
> block) 
> - backported fix from https://github.com/apache/lucene/issues/11537
> - enable test added in OAK-10713
> - ran Oak integration tests
> Open questions:
> - Lucene 4.7.2 builds with ant/ivy - does it make sense to try to replicate 
> that
> - should we ask Lucene team for a public release (might be hard sell)
> - alternatively, as tried here, inline source code into oak-lucene (maybe add 
> explainers to all source files)
> - do we need to adopt the lucene test suite as well?
> - lucene-core dependencies in other Oak modules to be checked (seems mostly 
> for tests, or for run modules)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OAK-10808) PipelinedMongoConnectionFailureIT should not fail if Mongo is not available

2024-05-15 Thread Nuno Santos (Jira)
Nuno Santos created OAK-10808:
-

 Summary: PipelinedMongoConnectionFailureIT should not fail if 
Mongo is not available
 Key: OAK-10808
 URL: https://issues.apache.org/jira/browse/OAK-10808
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10778) Indexing job: support parallel download from MongoDB with two connections in Pipelined strategy

2024-05-15 Thread Nuno Santos (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846831#comment-17846831
 ] 

Nuno Santos commented on OAK-10778:
---

Fix here:
[https://github.com/apache/jackrabbit-oak/pull/1463]

> Indexing job: support parallel download from MongoDB with two connections in 
> Pipelined strategy
> ---
>
> Key: OAK-10778
> URL: https://issues.apache.org/jira/browse/OAK-10778
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Nuno Santos
>Assignee: Nuno Santos
>Priority: Major
> Fix For: 1.64.0
>
>
> The current version of the Pipelined download strategy uses a single 
> connection/thread to download from MongoDB. We can further increase the 
> download speed by using an additional MongoDB connection. A Mongo deployment 
> has 1 primary and 2 secondaries, so in principle we could have 1 connection 
> to each secondary, effectively doubling the download speed.
> There are a few points to observe:
>  - Connections should go to different secondaries. If both connections go to 
> the same secondary, there's a high change that they will be limited by what a 
> single replica can provide and of overloading that replica. So each secondary 
> should have one and only one connection.
>  - How to partition the range of documents to download between two threads. 
> We are already downloading from Mongo in order of {{(_modified, _id)}}. A 
> simple and effective partition strategy for 2 connections is for one to 
> download in ascending and the other in descending order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)