[jira] [Created] (OAK-9671) Increase dynamicBoost and dynamicBoostLite full-text test coverage

2022-01-18 Thread Jun Zhang (Jira)
Jun Zhang created OAK-9671:
--

 Summary: Increase dynamicBoost and dynamicBoostLite full-text test 
coverage
 Key: OAK-9671
 URL: https://issues.apache.org/jira/browse/OAK-9671
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: elastic-search, lucene
Reporter: Jun Zhang


dynamicBoost and dynamicBoostLite have limited full-text capabilities. The 
Elastic implementation of dynamicBoost offers some full text capability without 
affecting the index size

In general, this feature has a good test coverage for the indexing part but 
very basic tests around the queries. The reason of this is that in Lucene the 
query logic is not part of oak but it resides in an external component not 
owned by the indexing team.

The goal of this task is to:

1) improve unit tests for dynamicBoostLite (this can be done for all index 
types)

2) improve full-text unit tests for dynamicBoost in Elastic. Compared to 
Lucene, we have more flexibility since there are no dependencies with external 
code.

Once 1 is implemented, we can potentially improve full-text support for 
dynamicBoostLite using more sophisticated queries (currently a simple Term 
query is used).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OAK-9668) Update H2DB dependency

2022-01-18 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478082#comment-17478082
 ] 

Stefan Egli commented on OAK-9668:
--

* trunk PR merged : https://github.com/apache/jackrabbit-oak/pull/466
* 1.22 backport PR created : https://github.com/apache/jackrabbit-oak/pull/468

> Update H2DB dependency
> --
>
> Key: OAK-9668
> URL: https://issues.apache.org/jira/browse/OAK-9668
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk, parent
>Reporter: Julian Reschke
>Assignee: Stefan Egli
>Priority: Major
>  Labels: candidate_oak_1_22
> Fix For: 1.44.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OAK-9665) Unparseable date property causes entire node to fail indexing

2022-01-18 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478014#comment-17478014
 ] 

Thomas Mueller commented on OAK-9665:
-

[~angela.fabregues] actually both: all the other fields in this document are 
indexes. And, all other documents are indexed.

That is, if I understand correctly (I didn't test it myself).

For Elastic, I don't know if currently, just this document isn't indexed, or if 
all documents in this batch are not indexed. I _assume_ right now only the 
current document is affected. But I don't really know I'm afraid.

> Unparseable date property causes entire node to fail indexing
> -
>
> Key: OAK-9665
> URL: https://issues.apache.org/jira/browse/OAK-9665
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: elastic-search, indexing
>Reporter: Thomas Mueller
>Priority: Major
>
> If the index definition defines a property as a Date, but the value is not in 
> the appropriate Date format, then indexing will (partially) fail.
> The behaviour in this situation is different between Lucene and Elastic:
> * With a Lucene index, WARN [1] is logged but the rest is indexed.
> * With Elastic index, ERROR [2] is logged and no document is created.
> [1]
> {noformat}
> Ignoring ordered property. Could not convert property  ... of type STRING to 
> type DATE for path ... 
> java.lang.NullPointerException: null
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.FieldFactory.dateToLong(FieldFactory.java:186)
>  [org.apache.jackrabbit.oak-lucene:1.8.24]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneDocumentMaker.addTypedOrderedFields(LuceneDocumentMaker.java:385)
>  [org.apache.jackrabbit.oak-lucene:1.8.24]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneDocumentMaker.access$100(LuceneDocumentMaker.java:67)
>  [org.apache.jackrabbit.oak-lucene:1.8.24]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneDocumentMaker$1.onResult(LuceneDocumentMaker.java:590)
>  [org.apache.jackrabbit.oak-lucene:1.8.24]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.Aggregate$PropertyInclude.collectResults(Aggregate.java:396)
>  [org.apache.jackrabbit.oak-lucene:1.8.24]
> {noformat}
> [2]
> {noformat}
> 00:49:19.521 [I/O dispatcher 1] ERROR 
> o.a.j.o.p.i.e.i.ElasticBulkProcessorHandler - Failure Details: BulkItem ID: 
> ..., Failure Cause: {}
> org.elasticsearch.ElasticsearchException: Elasticsearch exception 
> [type=mapper_parsing_exception, reason=failed to parse field [...] of type 
> [date] in document with id '...'. Preview of field's value: '2021-09-01 
> 00:01']
>   at 
> org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
>   at 
> org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
>   at 
> org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:139)
>   at 
> org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:188)
>   at 
> org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (OAK-9668) Update H2DB dependency

2022-01-18 Thread Stefan Egli (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli reassigned OAK-9668:


Assignee: Stefan Egli  (was: Marcel Reutegger)

> Update H2DB dependency
> --
>
> Key: OAK-9668
> URL: https://issues.apache.org/jira/browse/OAK-9668
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: documentmk, parent
>Reporter: Julian Reschke
>Assignee: Stefan Egli
>Priority: Major
>  Labels: candidate_oak_1_22
> Fix For: 1.44.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OAK-9670) Log an WARN when a fulltext query cannot find an appropriate index

2022-01-18 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477960#comment-17477960
 ] 

Thomas Mueller commented on OAK-9670:
-

Merged on 2022-01-18

> Log an WARN when a fulltext query cannot find an appropriate index
> --
>
> Key: OAK-9670
> URL: https://issues.apache.org/jira/browse/OAK-9670
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Tom Blackford
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.44.0
>
>
> Fulltext queries cannot be handled by traversal, so we need to highlight 
> prominently when a fulltext query has been issues but no appropriate index 
> can be found.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OAK-9670) Log an WARN when a fulltext query cannot find an appropriate index

2022-01-18 Thread Tom Blackford (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Blackford updated OAK-9670:
---
Summary: Log an WARN when a fulltext query cannot find an appropriate index 
 (was: Log an ERROR when a fulltext query cannot find an appropriate index)

> Log an WARN when a fulltext query cannot find an appropriate index
> --
>
> Key: OAK-9670
> URL: https://issues.apache.org/jira/browse/OAK-9670
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Tom Blackford
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.44.0
>
>
> Fulltext queries cannot be handled by traversal, so we need to highlight 
> prominently when a fulltext query has been issues but no appropriate index 
> can be found.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OAK-9670) Log an ERROR when a fulltext query cannot find an appropriate index

2022-01-18 Thread Tom Blackford (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477944#comment-17477944
 ] 

Tom Blackford commented on OAK-9670:


PR raised https://github.com/apache/jackrabbit-oak/pull/467

cc [~thomasm]

> Log an ERROR when a fulltext query cannot find an appropriate index
> ---
>
> Key: OAK-9670
> URL: https://issues.apache.org/jira/browse/OAK-9670
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Tom Blackford
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.44.0
>
>
> Fulltext queries cannot be handled by traversal, so we need to highlight 
> prominently when a fulltext query has been issues but no appropriate index 
> can be found.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OAK-9670) Log an ERROR when a fulltext query cannot find an appropriate index

2022-01-18 Thread Tom Blackford (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Blackford updated OAK-9670:
---
Description: Fulltext queries cannot be handled by traversal, so we need to 
highlight prominently when a fulltext query has been issues but no appropriate 
index can be found.  (was: Fulltext queries cannot be handled by traversal, so 
we need to highlight prominently )

> Log an ERROR when a fulltext query cannot find an appropriate index
> ---
>
> Key: OAK-9670
> URL: https://issues.apache.org/jira/browse/OAK-9670
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Tom Blackford
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.44.0
>
>
> Fulltext queries cannot be handled by traversal, so we need to highlight 
> prominently when a fulltext query has been issues but no appropriate index 
> can be found.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OAK-9670) Log an ERROR when a fulltext query cannot find an appropriate index

2022-01-18 Thread Tom Blackford (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Blackford updated OAK-9670:
---
Summary: Log an ERROR when a fulltext query cannot find an appropriate 
index  (was: Log a Warning when a fulltext query cannot find an appropriate 
index)

> Log an ERROR when a fulltext query cannot find an appropriate index
> ---
>
> Key: OAK-9670
> URL: https://issues.apache.org/jira/browse/OAK-9670
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Tom Blackford
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.44.0
>
>
> Fulltext queries cannot be handled by traversal, so we need to highlight 
> prominently 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OAK-9670) Log a Warning when a fulltext query cannot find an appropriate index

2022-01-18 Thread Tom Blackford (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Blackford updated OAK-9670:
---
Description: Fulltext queries cannot be handled by traversal, so we need to 
highlight prominently   (was: Oak only considers a query as slow if it scans 
over 100'000 nodes.

We can change the limit to 5000.)

> Log a Warning when a fulltext query cannot find an appropriate index
> 
>
> Key: OAK-9670
> URL: https://issues.apache.org/jira/browse/OAK-9670
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Tom Blackford
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.44.0
>
>
> Fulltext queries cannot be handled by traversal, so we need to highlight 
> prominently 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (OAK-9670) Log a Warning when a fulltext query cannot find an appropriate index

2022-01-18 Thread Tom Blackford (Jira)
Tom Blackford created OAK-9670:
--

 Summary: Log a Warning when a fulltext query cannot find an 
appropriate index
 Key: OAK-9670
 URL: https://issues.apache.org/jira/browse/OAK-9670
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: query
Reporter: Tom Blackford
Assignee: Thomas Mueller
 Fix For: 1.44.0


Oak only considers a query as slow if it scans over 100'000 nodes.

We can change the limit to 5000.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OAK-9662) Perform inequality matches in Lucene+Elastic, rather than just in the query engine

2022-01-18 Thread Nitin Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477762#comment-17477762
 ] 

Nitin Gupta commented on OAK-9662:
--

PR : [https://github.com/apache/jackrabbit-oak/pull/465] 

> Perform inequality matches in Lucene+Elastic, rather than just in the query 
> engine
> --
>
> Key: OAK-9662
> URL: https://issues.apache.org/jira/browse/OAK-9662
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>
> Currently it appears that inequality matches are never fully performed inside 
> Lucene (whereas obviously equality matches ARE).
>  
> Inequality query like 
> {code:java}
> /jcr:root/test/a//element(*, nt:base)[testProp != 'sample']  {code}
> This yields the plan which includes the Lucene query like below - effectively 
> matching documents with _any_ value of this field and relying on the Query 
> Engine to filter out those which are not equal to the passed value.
> {code:java}
> +:ancestors:/test/a +testProp:[* TO *] {code}
> And then all the results are fetched and the inequality condition is 
> satisfied by QueryEngine - this leads to slower queries and even 
> IndexTraversed read limit being exceeded in case of large repositories.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (OAK-9662) Perform inequality matches in Lucene+Elastic, rather than just in the query engine

2022-01-18 Thread Nitin Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nitin Gupta reassigned OAK-9662:


Assignee: Nitin Gupta

> Perform inequality matches in Lucene+Elastic, rather than just in the query 
> engine
> --
>
> Key: OAK-9662
> URL: https://issues.apache.org/jira/browse/OAK-9662
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>
> Currently it appears that inequality matches are never fully performed inside 
> Lucene (whereas obviously equality matches ARE).
>  
> Inequality query like 
> {code:java}
> /jcr:root/test/a//element(*, nt:base)[testProp != 'sample']  {code}
> This yields the plan which includes the Lucene query like below - effectively 
> matching documents with _any_ value of this field and relying on the Query 
> Engine to filter out those which are not equal to the passed value.
> {code:java}
> +:ancestors:/test/a +testProp:[* TO *] {code}
> And then all the results are fetched and the inequality condition is 
> satisfied by QueryEngine - this leads to slower queries and even 
> IndexTraversed read limit being exceeded in case of large repositories.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OAK-9669) Update commons-io dependency to 2.11.0

2022-01-18 Thread Julian Reschke (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-9669:

Fix Version/s: 1.44.0

> Update commons-io dependency to 2.11.0
> --
>
> Key: OAK-9669
> URL: https://issues.apache.org/jira/browse/OAK-9669
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: parent
>Affects Versions: 1.42.0
>Reporter: Julian Reschke
>Priority: Major
> Fix For: 1.44.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (OAK-9669) Update commons-io dependency to 2.11.0

2022-01-18 Thread Julian Reschke (Jira)
Julian Reschke created OAK-9669:
---

 Summary: Update commons-io dependency to 2.11.0
 Key: OAK-9669
 URL: https://issues.apache.org/jira/browse/OAK-9669
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: parent
Affects Versions: 1.42.0
Reporter: Julian Reschke






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OAK-9665) Unparseable date property causes entire node to fail indexing

2022-01-18 Thread Angela Fabregues (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477677#comment-17477677
 ] 

Angela Fabregues commented on OAK-9665:
---

Hi [~thomasm] ,I need some help to understand the expected behaviour.

By "{_}With a Lucene index, WARN [1] is logged but +the rest+ is indexed{_}", 
do you mean (A) that the rest of the fields in the document are indexed? Or you 
are referring to (B) the rest of the documents that come after in the bulk 
indexing?

> Unparseable date property causes entire node to fail indexing
> -
>
> Key: OAK-9665
> URL: https://issues.apache.org/jira/browse/OAK-9665
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: elastic-search, indexing
>Reporter: Thomas Mueller
>Priority: Major
>
> If the index definition defines a property as a Date, but the value is not in 
> the appropriate Date format, then indexing will (partially) fail.
> The behaviour in this situation is different between Lucene and Elastic:
> * With a Lucene index, WARN [1] is logged but the rest is indexed.
> * With Elastic index, ERROR [2] is logged and no document is created.
> [1]
> {noformat}
> Ignoring ordered property. Could not convert property  ... of type STRING to 
> type DATE for path ... 
> java.lang.NullPointerException: null
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.FieldFactory.dateToLong(FieldFactory.java:186)
>  [org.apache.jackrabbit.oak-lucene:1.8.24]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneDocumentMaker.addTypedOrderedFields(LuceneDocumentMaker.java:385)
>  [org.apache.jackrabbit.oak-lucene:1.8.24]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneDocumentMaker.access$100(LuceneDocumentMaker.java:67)
>  [org.apache.jackrabbit.oak-lucene:1.8.24]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneDocumentMaker$1.onResult(LuceneDocumentMaker.java:590)
>  [org.apache.jackrabbit.oak-lucene:1.8.24]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.Aggregate$PropertyInclude.collectResults(Aggregate.java:396)
>  [org.apache.jackrabbit.oak-lucene:1.8.24]
> {noformat}
> [2]
> {noformat}
> 00:49:19.521 [I/O dispatcher 1] ERROR 
> o.a.j.o.p.i.e.i.ElasticBulkProcessorHandler - Failure Details: BulkItem ID: 
> ..., Failure Cause: {}
> org.elasticsearch.ElasticsearchException: Elasticsearch exception 
> [type=mapper_parsing_exception, reason=failed to parse field [...] of type 
> [date] in document with id '...'. Preview of field's value: '2021-09-01 
> 00:01']
>   at 
> org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
>   at 
> org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
>   at 
> org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:139)
>   at 
> org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:188)
>   at 
> org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)