date:20200302

iverase merged pull request #1258: LUCENE-9225: Rectangle should extend 
LatLonGeometry
URL: https://github.com/apache/lucene-solr/pull/1258
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry

2020-03-02 Thread ASF subversion and git services (Jira)

iverase commented on a change in pull request #1258: LUCENE-9225: Rectangle 
should extend LatLonGeometry
URL: https://github.com/apache/lucene-solr/pull/1258#discussion_r386822636
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/document/LatLonShapeBoundingBoxQuery.java
 ##
 @@ -108,4 +115,385 @@ public String toString(String field) {
 sb.append(rectangle.toString());
 return sb.toString();
   }
+
+  /** Holds spatial logic for a bounding box that works in the encoded space */
+  private static class EncodedRectangle {
 
 Review comment:
   This class is an specialisation for bounding box queries working on the 
encoded space. As such it feels like the correct place to package this logic is 
in the query itself. I am pushing this change, if you disagree we can re-think 
how to package this logic later on.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-10397) Port 'autoAddReplicas' feature to the autoscaling framework and make it work with non-shared filesystems

2020-03-02 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049938#comment-17049938
 ] 

David Smiley commented on SOLR-10397:
-

I know this is an old issue, but I was looking at 
{{CoreContainer.isSharedFs(CoreDescriptor)}} which was added here.  It looks up 
the Core to then call {{core.getDirectoryFactory().isSharedStorage()}} or 
failing that (null core) it loads the config on the fly.  Either path is bad 
IMO; with transient/lazy cores, we don't want to potentially trigger a core 
load, nor do we want to be loading configs which can potentially be expensive.  
IMO the "sharedStorage" nature of a core is so important that it ought to go in 
the core descriptor.  WDYT?

> Port 'autoAddReplicas' feature to the autoscaling framework and make it work 
> with non-shared filesystems
> 
>
> Key: SOLR-10397
> URL: https://issues.apache.org/jira/browse/SOLR-10397
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Cao Manh Dat
>Priority: Major
>  Labels: autoscaling
> Fix For: 7.1, 8.0
>
> Attachments: SOLR-10397.1.patch, SOLR-10397.2.patch, 
> SOLR-10397.2.patch, SOLR-10397.2.patch, SOLR-10397.patch, 
> SOLR-10397_remove_nocommit.patch
>
>
> Currently 'autoAddReplicas=true' can be specified in the Collection Create 
> API to automatically add replicas when a replica becomes unavailable. I 
> propose to move this feature to the autoscaling cluster policy rules design.
> This will include the following:
> * Trigger support for ‘nodeLost’ event type
> * Modification of existing implementation of ‘autoAddReplicas’ to 
> automatically create the appropriate ‘nodeLost’ trigger.
> * Any such auto-created trigger must be marked internally such that setting 
> ‘autoAddReplicas=false’ via the Modify Collection API should delete or 
> disable corresponding trigger.
> * Support for non-HDFS filesystems while retaining the optimization afforded 
> by HDFS i.e. the replaced replica can point to the existing data dir of the 
> old replica.
> * Deprecate/remove the feature of enabling/disabling ‘autoAddReplicas’ across 
> the entire cluster using cluster properties in favor of using the 
> suspend-trigger/resume-trigger APIs.
> This will retain backward compatibility for the most part and keep a common 
> use-case easy to enable as well as make it available to more people (i.e. 
> people who don't use HDFS).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9251) Polygon tessellator fails to detect some collinear points

2020-03-02 Thread Ignacio Vera (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-9251.
--
Fix Version/s: 8.5
 Assignee: Ignacio Vera
   Resolution: Fixed

> Polygon tessellator fails to detect some collinear points
> -
>
> Key: LUCENE-9251
> URL: https://issues.apache.org/jira/browse/LUCENE-9251
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A user of Elasticsearch [has reported| 
> https://discuss.elastic.co/t/unable-to-tessellate-shape-error-on-indexing-es-7-6/220867]
>  a tessellation error in a valid polygon. The reported polygon is quite 
> complex but after digging a bit, the problem is the tesselator fails to 
> detect some colinearities. In particular, in complex tessellation we can end 
> up with two equal edges with different flag in {{isEdgeFromPolygon}}. Still 
> we should be able to remove that co-linearity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14300) Some conditional clauses on unindexed field will be ignored by query parser in some specific cases

2020-03-02 Thread KOUTA SETSU (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KOUTA SETSU updated SOLR-14300:
---
Description: 
In some specific cases, some conditional clauses on unindexed field will be 
ignored
 * for query like, q=A:1 OR B:1 OR A:2 OR B:2
 if field B is not indexed(but docValues="true"), "B:1" will be lost.
  
 * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
 it will work perfect.

the only difference of two queries is that they are wrote in different orders.
 one is *ABAB*, another is *AABB.*

 

*steps of reproduce*
 you can easily reproduce this problem on a solr collection with _default 
configset and exampledocs/books.csv data.
 # create a _default collection
{code:java}
bin/solr create -c books -s 2 -rf 2{code}
 # post books.csv.
{code:java}
bin/post -c books example/exampledocs/books.csv{code}
 # run followed query.
 ** query1: 
[http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)&debug=query]
 ** query2: 
[http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)&debug=query]
 ** then you can find the parsedqueries are different.
 *** query1.  ("name_str:Foundation" is lost.)
{code:json}
 "debug":{
     "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
     "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
     "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",
     "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]])",
     "QParser":"LuceneQParser"}}{code}
 *** query2.  ("name_str:Foundation" isn't lost.)
{code:json}
   "debug":{
     "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
     "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
     "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])))",
     "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 
69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]]))",
     "QParser":"LuceneQParser"}{code}

  was:
In some specific cases, some conditional clauses on unindexed field will be 
ignored
 * for query like, q=A:1 OR B:1 OR A:2 OR B:2
 if field B is not indexed(but docValues="true"), "B:1" will be lost.
  
 * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
 it will work perfect.

the only difference of two queries is that they are wrote in different orders.
 one is *ABAB*, another is *AABB.*

 

*steps of reproduce*
 you can easily reproduce this problem on a solr collection with _default 
configset and exampledocs/books.csv data.
 # create a _default collection
{code:java}
bin/solr create -c books -s 2 -rf 2{code}

 # post books.csv.
{code:java}
bin/post -c books example/exampledocs/books.csv{code}

 # run followed query.
 ** query1: 
[http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)&debug=query]
 ** query2: 
[http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)&debug=query]
 ** then you can find the parsedqueries are different.
 *** query1.  ("name_str:Foundation" is lost.)
{code:json}
 "debug":{
     "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
     "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
     "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",
     "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]])",
     "QParser":"LuceneQParser"}}{code}

 *** query2.  ("name_str:Foundation" isn't lost.)
{code:json}
   "debug":{
     "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
     "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
     "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])))",
     "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 
69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]]))",
     "QParser":"LuceneQParser"}{code}


> Some conditional clauses on unindexed field will be ignored by query parser 
> in some specific cases
> --
>
> Key: SOLR-14300
> URL: https://issues.apache.org/jira/browse/SOLR-14300
> Project: Solr
>  Issue Type: B

[jira] [Resolved] (LUCENE-9239) TestLatLonMultiPolygonShapeQueries error with CIRCLE queries

2020-03-02 Thread Ignacio Vera (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-9239.
--
Fix Version/s: 8.5
 Assignee: Ignacio Vera
   Resolution: Fixed

> TestLatLonMultiPolygonShapeQueries error with CIRCLE queries
> 
>
> Key: LUCENE-9239
> URL: https://issues.apache.org/jira/browse/LUCENE-9239
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.5
>
> Attachments: screenshot-1.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The failure can be reproduced with:
> {code}
> ant test  -Dtestcase=TestLatLonMultiPolygonShapeQueries 
> -Dtests.method=testRandomBig -Dtests.seed=844FBD6099212BE8 
> -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
> -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt
>  -Dtests.locale=sr-BA -Dtests.timezone=Asia/Ashkhabad -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> {code}
> The error message:
> {code}
> query=LatLonShapeQuery: 
> field=shape:[CIRCLE([78.01086555431775,0.9513280497489234] radius = 
> 1097753.4254892308 meters),] docID=43632
>   shape=[[-22.350172194105966, 49.931598911327825] [90.0, 49.931598911327825] 
> [90.0, 51.408196891378765] [-22.350172194105966, 51.408196891378765] 
> [-22.350172194105966, 49.931598911327825] , [76.12283244781244, 
> -28.218674420982268] [81.7520930577503, -28.218674420982268] 
> [81.7520930577503, -1.0286448278003566E-32] [76.12283244781244, 
> -1.0286448278003566E-32] [76.12283244781244, -28.218674420982268] ]
>   deleted?=false  distanceQuery=CIRCLE([78.01086555431775,0.9513280497489234] 
> radius = 1097753.4254892308 meters)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14300) Some conditional clauses on unindexed field will be ignored by query parser in some specific cases

2020-03-02 Thread KOUTA SETSU (Jira)

KOUTA SETSU created SOLR-14300:
--

 Summary: Some conditional clauses on unindexed field will be 
ignored by query parser in some specific cases
 Key: SOLR-14300
 URL: https://issues.apache.org/jira/browse/SOLR-14300
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: query parsers
Affects Versions: 7.3.1
 Environment: Solr 7.3.1 

centos7.5
Reporter: KOUTA SETSU


In some specific cases, some conditional clauses on unindexed field will be 
ignored
 * for query like, q=A:1 OR B:1 OR A:2 OR B:2
 if field B is not indexed(but docValues="true"), "B:1" will be lost.
  
 * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
 it will work perfect.

the only difference of two queries is that they are wrote in different orders.
 one is *ABAB*, another is *AABB.*

 

*steps of reproduce*
 you can easily reproduce this problem on a solr collection with _default 
configset and exampledocs/books.csv data.
 # create a _default collection
{code:java}
bin/solr create -c books -s 2 -rf 2{code}

 # post books.csv.
{code:java}
bin/post -c books example/exampledocs/books.csv{code}

 # run followed query.
 ** query1: 
[http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)&debug=query]
 ** query2: 
[http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)&debug=query]
 ** then you can find the parsedqueries are different.
 *** query1.  ("name_str:Foundation" is lost.)
{code:json}
 "debug":{
     "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
     "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
     "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",
     "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]])",
     "QParser":"LuceneQParser"}}{code}

 *** query2.  ("name_str:Foundation" isn't lost.)
{code:json}
   "debug":{
     "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
     "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
     "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])))",
     "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 
69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]]))",
     "QParser":"LuceneQParser"}{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data

2020-03-02 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049930#comment-17049930
 ] 

Ishan Chattopadhyaya commented on SOLR-13942:
-

bq. I would like to merge it soon
+1, this is a convenience to have.

> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> example
> download the {{state.json}} of
> {code}
> GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json
> {code}
> get a list of all children under {{/live_nodes}}
> {code}
> GET http://localhost:8983/api/cluster/zk/live_nodes
> {code}
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data

2020-03-02 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049929#comment-17049929
 ] 

Ishan Chattopadhyaya commented on SOLR-13942:
-

bq. I would like you to explain the true benefit of this, explain why it is not 
a security risk and then gain consensus before continuing.
Why is this a security risk? Nothing in this issue makes Solr more insecure 
than it already is.

> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> example
> download the {{state.json}} of
> {code}
> GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json
> {code}
> get a list of all children under {{/live_nodes}}
> {code}
> GET http://localhost:8983/api/cluster/zk/live_nodes
> {code}
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9251) Polygon tessellator fails to detect some collinear points



[ 
https://issues.apache.org/jira/browse/LUCENE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049927#comment-17049927
 ] 

ASF subversion and git services commented on LUCENE-9251:
-

Commit e5036e963c93960a968c1d02025910590e9b242a in lucene-solr's branch 
refs/heads/branch_8x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e5036e9 ]

LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon 
(#1290)

Fix bug in the polygon tessellator where edges with different value on 
#isEdgeFromPolygon were bot filtered out properly


> Polygon tessellator fails to detect some collinear points
> -
>
> Key: LUCENE-9251
> URL: https://issues.apache.org/jira/browse/LUCENE-9251
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A user of Elasticsearch [has reported| 
> https://discuss.elastic.co/t/unable-to-tessellate-shape-error-on-indexing-es-7-6/220867]
>  a tessellation error in a valid polygon. The reported polygon is quite 
> complex but after digging a bit, the problem is the tesselator fails to 
> detect some colinearities. In particular, in complex tessellation we can end 
> up with two equal edges with different flag in {{isEdgeFromPolygon}}. Still 
> we should be able to remove that co-linearity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9251) Polygon tessellator fails to detect some collinear points

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049926#comment-17049926
 ] 

ASF subversion and git services commented on LUCENE-9251:
-

Commit c313365c5ffe76192e6179f3dfe23d056f7076c5 in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c313365 ]

LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon 
(#1290)

Fix bug in the polygon tessellator where edges with different value on 
#isEdgeFromPolygon were bot filtered out properly


> Polygon tessellator fails to detect some collinear points
> -
>
> Key: LUCENE-9251
> URL: https://issues.apache.org/jira/browse/LUCENE-9251
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A user of Elasticsearch [has reported| 
> https://discuss.elastic.co/t/unable-to-tessellate-shape-error-on-indexing-es-7-6/220867]
>  a tessellation error in a valid polygon. The reported polygon is quite 
> complex but after digging a bit, the problem is the tesselator fails to 
> detect some colinearities. In particular, in complex tessellation we can end 
> up with two equal edges with different flag in {{isEdgeFromPolygon}}. Still 
> we should be able to remove that co-linearity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase merged pull request #1290: LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon

2020-03-02 Thread ASF subversion and git services (Jira)

iverase merged pull request #1290: LUCENE-9251: Filter equal edges with 
different value on isEdgeFromPolygon
URL: https://github.com/apache/lucene-solr/pull/1290
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9239) TestLatLonMultiPolygonShapeQueries error with CIRCLE queries



[ 
https://issues.apache.org/jira/browse/LUCENE-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049924#comment-17049924
 ] 

ASF subversion and git services commented on LUCENE-9239:
-

Commit 3fb10787910ad57959fc636bd66d83d9fdde7ea5 in lucene-solr's branch 
refs/heads/branch_8x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3fb1078 ]

LUCENE-9239: Circle2D#WithinTriangle detects properly if a triangle is Within 
distance. (#1280)




> TestLatLonMultiPolygonShapeQueries error with CIRCLE queries
> 
>
> Key: LUCENE-9239
> URL: https://issues.apache.org/jira/browse/LUCENE-9239
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: screenshot-1.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The failure can be reproduced with:
> {code}
> ant test  -Dtestcase=TestLatLonMultiPolygonShapeQueries 
> -Dtests.method=testRandomBig -Dtests.seed=844FBD6099212BE8 
> -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
> -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt
>  -Dtests.locale=sr-BA -Dtests.timezone=Asia/Ashkhabad -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> {code}
> The error message:
> {code}
> query=LatLonShapeQuery: 
> field=shape:[CIRCLE([78.01086555431775,0.9513280497489234] radius = 
> 1097753.4254892308 meters),] docID=43632
>   shape=[[-22.350172194105966, 49.931598911327825] [90.0, 49.931598911327825] 
> [90.0, 51.408196891378765] [-22.350172194105966, 51.408196891378765] 
> [-22.350172194105966, 49.931598911327825] , [76.12283244781244, 
> -28.218674420982268] [81.7520930577503, -28.218674420982268] 
> [81.7520930577503, -1.0286448278003566E-32] [76.12283244781244, 
> -1.0286448278003566E-32] [76.12283244781244, -28.218674420982268] ]
>   deleted?=false  distanceQuery=CIRCLE([78.01086555431775,0.9513280497489234] 
> radius = 1097753.4254892308 meters)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9239) TestLatLonMultiPolygonShapeQueries error with CIRCLE queries

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049923#comment-17049923
 ] 

ASF subversion and git services commented on LUCENE-9239:
-

Commit b732ce700258ab05590f3904c8d2cd332aa4e0cb in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b732ce7 ]

LUCENE-9239: Circle2D#WithinTriangle detects properly if a triangle is Within 
distance. (#1280)




> TestLatLonMultiPolygonShapeQueries error with CIRCLE queries
> 
>
> Key: LUCENE-9239
> URL: https://issues.apache.org/jira/browse/LUCENE-9239
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: screenshot-1.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The failure can be reproduced with:
> {code}
> ant test  -Dtestcase=TestLatLonMultiPolygonShapeQueries 
> -Dtests.method=testRandomBig -Dtests.seed=844FBD6099212BE8 
> -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
> -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt
>  -Dtests.locale=sr-BA -Dtests.timezone=Asia/Ashkhabad -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> {code}
> The error message:
> {code}
> query=LatLonShapeQuery: 
> field=shape:[CIRCLE([78.01086555431775,0.9513280497489234] radius = 
> 1097753.4254892308 meters),] docID=43632
>   shape=[[-22.350172194105966, 49.931598911327825] [90.0, 49.931598911327825] 
> [90.0, 51.408196891378765] [-22.350172194105966, 51.408196891378765] 
> [-22.350172194105966, 49.931598911327825] , [76.12283244781244, 
> -28.218674420982268] [81.7520930577503, -28.218674420982268] 
> [81.7520930577503, -1.0286448278003566E-32] [76.12283244781244, 
> -1.0286448278003566E-32] [76.12283244781244, -28.218674420982268] ]
>   deleted?=false  distanceQuery=CIRCLE([78.01086555431775,0.9513280497489234] 
> radius = 1097753.4254892308 meters)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase merged pull request #1280: LUCENE-9239: Change withinTriangle logic for Circles

iverase merged pull request #1280: LUCENE-9239: Change withinTriangle logic for 
Circles
URL: https://github.com/apache/lucene-solr/pull/1280
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] noblepaul opened a new pull request #1308: SOLR-13942 /api/cluster/zk/* to fetch raw ZK data

noblepaul opened a new pull request #1308: SOLR-13942 /api/cluster/zk/* to 
fetch raw ZK data
URL: https://github.com/apache/lucene-solr/pull/1308
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-03-02 Thread Houston Putman (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049901#comment-17049901
 ] 

Houston Putman commented on SOLR-11746:
---

I'm not sure why that'd be an issue, those links do work both on the released 
8.4 ref guide, and on the master branch.

[https://lucene.apache.org/solr/guide/8_4/the-standard-query-parser.html#differences-between-lucenes-classic-query-parser-and-solrs-standard-query-parser]

[https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/the-standard-query-parser.adoc#differences-between-lucenes-classic-query-parser-and-solrs-standard-query-parser]

I also tested {{ant build-site}}, which works for me off of master.

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14291) OldAnalyticsRequestConverter should support fields names with dots

2020-03-02 Thread Houston Putman (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049897#comment-17049897
 ] 

Houston Putman commented on SOLR-14291:
---

This looks good to me, especially with the test. Thanks for the fix 
[~anatolii_siuniaev]!

> OldAnalyticsRequestConverter should support fields names with dots
> --
>
> Key: SOLR-14291
> URL: https://issues.apache.org/jira/browse/SOLR-14291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search, SearchComponents - other
>Reporter: Anatolii Siuniaev
>Priority: Trivial
> Attachments: SOLR-14291.patch
>
>
> If you send a query with range facets using old olap-style syntax (see pdf 
> [here|https://issues.apache.org/jira/browse/SOLR-5302]), 
> OldAnalyticsRequestConverter just silently (no exception thrown) omits 
> parameters like
> {code:java}
> olap..rangefacet..start
> {code}
> in case if __ has dots inside (for instance field name is 
> _Project.Value_). And thus no range facets are returned in response.  
> Probably the same happens in case of field faceting. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data



 [ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-13942:
--
Description: 
example
download the {{state.json}} of
{code}
GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json
{code}

get a list of all children under {{/live_nodes}}
{code}
GET http://localhost:8983/api/cluster/zk/live_nodes
{code}

If the requested path is a node with children show the list of child nodes and 
their meta data

  was:If the requested path is a node with children show the list of child 
nodes and their meta data


> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> example
> download the {{state.json}} of
> {code}
> GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json
> {code}
> get a list of all children under {{/live_nodes}}
> {code}
> GET http://localhost:8983/api/cluster/zk/live_nodes
> {code}
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14270) export command to have an option to write to a zip file

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049883#comment-17049883
 ] 

Noble Paul commented on SOLR-14270:
---

Thanks [~ctargett]

 

> export command to have an option to write to a zip file
> ---
>
> Key: SOLR-14270
> URL: https://issues.apache.org/jira/browse/SOLR-14270
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Labels: cli
> Fix For: 8.5
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Plain json files are too big. Export to a compressed file 
> {{bin/solr export -url http://localhost:8983/solr/gettingstarted -out 
> gettingstarted.json.gz}}
> This will write the data to a file called {{gettingstarted.json.gz}} in a zip 
> format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9150) Restore support for dynamic PlanetModel in Geo3D



[ 
https://issues.apache.org/jira/browse/LUCENE-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049866#comment-17049866
 ] 

ASF subversion and git services commented on LUCENE-9150:
-

Commit ab6fb77c63df5a8d347c0d67335e8919df683f64 in lucene-solr's branch 
refs/heads/branch_8x from Nicholas Knize
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ab6fb77 ]

LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d


> Restore support for dynamic PlanetModel in Geo3D
> 
>
> Key: LUCENE-9150
> URL: https://issues.apache.org/jira/browse/LUCENE-9150
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Nick Knize
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> LUCENE-7072 removed dynamic planet model support in Geo3D. This was logical 
> at the time (given the state of Lucene and spatial projections and coordinate 
> reference systems). Since then, however, there have been a lot of new 
> developments within the OGC community around [Coordinate Reference 
> Systems|https://docs.opengeospatial.org/as/18-005r4/18-005r4.html], [Dynamic 
> Coordinate Reference 
> Systems|http://docs.opengeospatial.org/DRAFTS/18-058.html], and [Updated ISO 
> Standards|https://www.iso.org/obp/ui/#iso:std:iso:19111:ed-3:v1:en].
> It would be useful for Geo3D (and eventually LatLon*) to support different 
> geographic datums to make lucene a viable option for indexing/searching in 
> different spatial reference systems (e.g., more accurately computing query 
> shape relations to BKD's internal nodes using datum consistent with the 
> spatial projection). This would also provide an alternative to other 
> limitations of the {{LatLon*/XY*}} implementation (e.g., pole/dateline 
> crossing, quantization of small polygons). 
> I'd like to propose keeping the current WGS84 static datum as the default for 
> Geo3D but adding back the constructors to accept custom planet models. 
> Perhaps this could be listed as an "expert" API feature?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13502) Investigate using something other than ZooKeeper's "4 letter words" for the admin UI status

2020-03-02 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-13502.
---
Resolution: Won't Fix

I don't find anything that looks useful.

Plus, this is very low-benefit, we already have a workable solution that 
nobody's complained about since 8.2, so I don't think this is worth the effort.

If someone else wants to take it over, it can be reopened.

> Investigate using something other than ZooKeeper's "4 letter words" for the 
> admin UI status
> ---
>
> Key: SOLR-13502
> URL: https://issues.apache.org/jira/browse/SOLR-13502
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> ZooKeeper 3.5.5 requires a whitelist of allowed "4 letter words". The only 
> place I see on a quick look at the Solr code where 4lws are used is in the 
> admin UI "ZK Status" link.
> In order to use the admin UI "ZK Status" link, users will have to modify 
> their zoo.cfg file with
> {code}
> 4lw.commands.whitelist=mntr,conf,ruok
> {code}
> This JIRA is to see if there are alternatives to using 4lw for the admin UI.
> This depends on SOLR-8346. If we find an alternative, we need to remove the 
> additions to the ref guide that mention changing zoo.cfg (just scan for 4lw 
> in all the .adoc files) and remove SolrZkServer.ZK_WHITELIST_PROPERTY and all 
> references to it (SolrZkServer and SolrTestCaseJ4).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-03-02 Thread Lucene/Solr QA (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049810#comment-17049810
 ] 

Lucene/Solr QA commented on SOLR-12325:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 34s{color} 
| {color:red} solr_core generated 1 new + 99 unchanged - 0 fixed = 100 total 
(was 99) {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m 34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m 34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 76m 
16s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} test-framework in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 85m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-12325 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12995363/SOLR-12325.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP 
Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / e308e53 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
| javac | 
https://builds.apache.org/job/PreCommit-SOLR-Build/693/artifact/out/diff-compile-javac-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/693/testReport/ |
| modules | C: solr/core solr/test-framework U: solr |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/693/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch, 
> SOLR-12325.patch, SOLR-12325.patch, 
> SOLR-12325_Random_test_for_uniqueBlockQuery (1).patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14299) IndexFetcher doesnt' reset errorCount to 0 after the last packet is received

2020-03-02 Thread Mike Drob (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved SOLR-14299.
--
Fix Version/s: 8.5
 Assignee: Mike Drob
   Resolution: Fixed

Thanks for the patch [~praste], I added a line to CHANGES and committed this.

> IndexFetcher doesnt' reset errorCount to 0 after the last packet is received
> 
>
> Key: SOLR-14299
> URL: https://issues.apache.org/jira/browse/SOLR-14299
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>Affects Versions: 7.7.1
>Reporter: Pushkar Raste
>Assignee: Mike Drob
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While fetching the files from master `IndexFetcher` retries 5 times before 
> giving up. It resets the  errorCount after successfully receiving the packet 
> except for the last packet. Seems like an oversight. 
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1739-L1742]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14299) IndexFetcher doesnt' reset errorCount to 0 after the last packet is received

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049803#comment-17049803
 ] 

ASF subversion and git services commented on SOLR-14299:


Commit fa2ce13fde08a8cfbf4f18279a13e273039d6eb6 in lucene-solr's branch 
refs/heads/branch_8x from Pushkar Raste
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fa2ce13 ]

SOLR-14299 IndexFetcher doesn't reset count to 0 after the last packet is 
received


> IndexFetcher doesnt' reset errorCount to 0 after the last packet is received
> 
>
> Key: SOLR-14299
> URL: https://issues.apache.org/jira/browse/SOLR-14299
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>Affects Versions: 7.7.1
>Reporter: Pushkar Raste
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While fetching the files from master `IndexFetcher` retries 5 times before 
> giving up. It resets the  errorCount after successfully receiving the packet 
> except for the last packet. Seems like an oversight. 
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1739-L1742]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14299) IndexFetcher doesnt' reset errorCount to 0 after the last packet is received

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049789#comment-17049789
 ] 

ASF subversion and git services commented on SOLR-14299:


Commit 17c576a36f7419166554c1cfd3438d063b751e2b in lucene-solr's branch 
refs/heads/master from Pushkar Raste
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=17c576a ]

SOLR-14299 IndexFetcher doesn't reset count to 0 after the last packet is 
received


> IndexFetcher doesnt' reset errorCount to 0 after the last packet is received
> 
>
> Key: SOLR-14299
> URL: https://issues.apache.org/jira/browse/SOLR-14299
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>Affects Versions: 7.7.1
>Reporter: Pushkar Raste
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While fetching the files from master `IndexFetcher` retries 5 times before 
> giving up. It resets the  errorCount after successfully receiving the packet 
> except for the last packet. Seems like an oversight. 
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1739-L1742]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits

madrob commented on a change in pull request #1297: SOLR-14253 Replace various 
sleep calls with ZK waits
URL: https://github.com/apache/lucene-solr/pull/1297#discussion_r386726709
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
 ##
 @@ -1684,58 +1685,37 @@ private void 
doGetShardIdAndNodeNameProcess(CoreDescriptor cd) {
   }
 
   private void waitForCoreNodeName(CoreDescriptor descriptor) {
-int retryCount = 320;
-log.debug("look for our core node name");
-while (retryCount-- > 0) {
-  final DocCollection docCollection = zkStateReader.getClusterState()
-  
.getCollectionOrNull(descriptor.getCloudDescriptor().getCollectionName());
-  if (docCollection != null && docCollection.getSlicesMap() != null) {
-final Map slicesMap = docCollection.getSlicesMap();
-for (Slice slice : slicesMap.values()) {
-  for (Replica replica : slice.getReplicas()) {
-// TODO: for really large clusters, we could 'index' on this
-
-String nodeName = replica.getStr(ZkStateReader.NODE_NAME_PROP);
-String core = replica.getStr(ZkStateReader.CORE_NAME_PROP);
-
-String msgNodeName = getNodeName();
-String msgCore = descriptor.getName();
-
-if (msgNodeName.equals(nodeName) && core.equals(msgCore)) {
-  descriptor.getCloudDescriptor()
-  .setCoreNodeName(replica.getName());
-  getCoreContainer().getCoresLocator().persist(getCoreContainer(), 
descriptor);
-  return;
-}
-  }
+log.debug("waitForCoreNodeName >>> look for our core node name");
+try {
+  zkStateReader.waitForState(descriptor.getCollectionName(), 320, 
TimeUnit.SECONDS, c -> {
+String name = ClusterStateMutator.getAssignedCoreNodeName(c, 
getNodeName(), descriptor.getName());
+if (name == null) {
+  return false;
 }
-  }
-  try {
-Thread.sleep(1000);
-  } catch (InterruptedException e) {
-Thread.currentThread().interrupt();
-  }
+descriptor.getCloudDescriptor().setCoreNodeName(name);
 
 Review comment:
   Dug into this deeper, and I believe that the latch in `waitForState` will 
guarantee data visibility.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits

madrob commented on a change in pull request #1297: SOLR-14253 Replace various 
sleep calls with ZK waits
URL: https://github.com/apache/lucene-solr/pull/1297#discussion_r386726758
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
 ##
 @@ -1684,58 +1685,37 @@ private void 
doGetShardIdAndNodeNameProcess(CoreDescriptor cd) {
   }
 
   private void waitForCoreNodeName(CoreDescriptor descriptor) {
-int retryCount = 320;
-log.debug("look for our core node name");
-while (retryCount-- > 0) {
-  final DocCollection docCollection = zkStateReader.getClusterState()
-  
.getCollectionOrNull(descriptor.getCloudDescriptor().getCollectionName());
-  if (docCollection != null && docCollection.getSlicesMap() != null) {
-final Map slicesMap = docCollection.getSlicesMap();
-for (Slice slice : slicesMap.values()) {
-  for (Replica replica : slice.getReplicas()) {
-// TODO: for really large clusters, we could 'index' on this
-
-String nodeName = replica.getStr(ZkStateReader.NODE_NAME_PROP);
-String core = replica.getStr(ZkStateReader.CORE_NAME_PROP);
-
-String msgNodeName = getNodeName();
-String msgCore = descriptor.getName();
-
-if (msgNodeName.equals(nodeName) && core.equals(msgCore)) {
-  descriptor.getCloudDescriptor()
-  .setCoreNodeName(replica.getName());
-  getCoreContainer().getCoresLocator().persist(getCoreContainer(), 
descriptor);
-  return;
-}
-  }
+log.debug("waitForCoreNodeName >>> look for our core node name");
+try {
+  zkStateReader.waitForState(descriptor.getCollectionName(), 320, 
TimeUnit.SECONDS, c -> {
+String name = ClusterStateMutator.getAssignedCoreNodeName(c, 
getNodeName(), descriptor.getName());
+if (name == null) {
+  return false;
 }
-  }
-  try {
-Thread.sleep(1000);
-  } catch (InterruptedException e) {
-Thread.currentThread().interrupt();
-  }
+descriptor.getCloudDescriptor().setCoreNodeName(name);
+return true;
+  });
+} catch (TimeoutException | InterruptedException e) {
+  throw new SolrException(ErrorCode.SERVER_ERROR, "Timeout waiting for 
collection state", e);
 
 Review comment:
   Will do!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-02 Thread ASF subversion and git services (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov resolved LUCENE-8962.
-
Fix Version/s: 8.5
   Resolution: Fixed

pushed to master, and also backported to branch8x

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 8.5
>
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049754#comment-17049754
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit e308e538731f392eb81ba81cfb3ec5fc526fd383 in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e308e53 ]

Add CHANGES entry for LUCENE-8962


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049750#comment-17049750
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit a1791e77143aa8087c0b5ee0e8eb57422e59a09a in lucene-solr's branch 
refs/heads/branch_8x from msfroh
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a1791e7 ]

LUCENE-8962: Add ability to selectively merge on commit (#1155)

* LUCENE-8962: Add ability to selectively merge on commit

This adds a new "findCommitMerges" method to MergePolicy, which can
specify merges to be executed before the
IndexWriter.prepareCommitInternal method returns.

If we have many index writer threads, they will flush their DWPT buffers
on commit, resulting in many small segments, which can be merged before
the commit returns.

* Add missing Javadoc

* Fix incorrect comment

* Refactoring and fix intermittent test failure

1. Made some changes to the callback to update toCommit, leveraging
SegmentInfos.applyMergeChanges.
2. I realized that we'll never end up with 0 registered merges, because
we throw an exception if we fail to register a merge.
3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before
we call MergeScheduler.merge, since we may not be merging on another
thread.
4. There was an intermittent test failure due to randomness in the time
it takes for merges to complete. Before doing the final commit, we wait
for pending merges to finish. We may still end up abandoning the final
merge, but we can detect that and assert that either the merge was
abandoned (and we have > 1 segment) or we did merge down to 1 segment.

* Fix typo

* Fix/improve comments based on PR feedback

* More comment improvements from PR feedback

* Rename method and add new MergeTrigger

1. Renamed findCommitMerges -> findFullFlushMerges.
2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to
   MergeScheduler when merging on commit.

* Update renamed method name in strings and comments


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049751#comment-17049751
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit a1791e77143aa8087c0b5ee0e8eb57422e59a09a in lucene-solr's branch 
refs/heads/branch_8x from msfroh
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a1791e7 ]

LUCENE-8962: Add ability to selectively merge on commit (#1155)

* LUCENE-8962: Add ability to selectively merge on commit

This adds a new "findCommitMerges" method to MergePolicy, which can
specify merges to be executed before the
IndexWriter.prepareCommitInternal method returns.

If we have many index writer threads, they will flush their DWPT buffers
on commit, resulting in many small segments, which can be merged before
the commit returns.

* Add missing Javadoc

* Fix incorrect comment

* Refactoring and fix intermittent test failure

1. Made some changes to the callback to update toCommit, leveraging
SegmentInfos.applyMergeChanges.
2. I realized that we'll never end up with 0 registered merges, because
we throw an exception if we fail to register a merge.
3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before
we call MergeScheduler.merge, since we may not be merging on another
thread.
4. There was an intermittent test failure due to randomness in the time
it takes for merges to complete. Before doing the final commit, we wait
for pending merges to finish. We may still end up abandoning the final
merge, but we can detect that and assert that either the merge was
abandoned (and we have > 1 segment) or we did merge down to 1 segment.

* Fix typo

* Fix/improve comments based on PR feedback

* More comment improvements from PR feedback

* Rename method and add new MergeTrigger

1. Renamed findCommitMerges -> findFullFlushMerges.
2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to
   MergeScheduler when merging on commit.

* Update renamed method name in strings and comments


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049753#comment-17049753
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit fdac6d866344611290c45c164112277581328bc9 in lucene-solr's branch 
refs/heads/branch_8x from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fdac6d8 ]

Add CHANGES entry for LUCENE-8962


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049752#comment-17049752
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit a5475de57fed6b339cd5565bd1bd2650f265a537 in lucene-solr's branch 
refs/heads/branch_8x from Michael Froh
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a5475de ]

LUCENE-8962: Fix intermittent test failures

1. TestIndexWriterMergePolicy.testMergeOnCommit will fail if the last
   commit (the one that should trigger the full merge) doesn't have any
   pending changes (which could occur if the last indexing thread
   commits at the end). We can fix that by adding one more document
   before that commit.
2. The previous implementation was throwing IOException if the commit
   thread gets interrupted while waiting for merges to complete. This
   violates IndexWriter's documented behavior of throwing
   ThreadInterruptedException.


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049748#comment-17049748
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit f017ae465ec416b9cf5ac91f9aa12ff71abd7de0 in lucene-solr's branch 
refs/heads/master from msfroh
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f017ae4 ]

LUCENE-8962: Fix intermittent test failures (#1307)

1. TestIndexWriterMergePolicy.testMergeOnCommit will fail if the last
   commit (the one that should trigger the full merge) doesn't have any
   pending changes (which could occur if the last indexing thread
   commits at the end). We can fix that by adding one more document
   before that commit.
2. The previous implementation was throwing IOException if the commit
   thread gets interrupted while waiting for merges to complete. This
   violates IndexWriter's documented behavior of throwing
   ThreadInterruptedException.

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049747#comment-17049747
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit f017ae465ec416b9cf5ac91f9aa12ff71abd7de0 in lucene-solr's branch 
refs/heads/master from msfroh
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f017ae4 ]

LUCENE-8962: Fix intermittent test failures (#1307)

1. TestIndexWriterMergePolicy.testMergeOnCommit will fail if the last
   commit (the one that should trigger the full merge) doesn't have any
   pending changes (which could occur if the last indexing thread
   commits at the end). We can fix that by adding one more document
   before that commit.
2. The previous implementation was throwing IOException if the commit
   thread gets interrupted while waiting for merges to complete. This
   violates IndexWriter's documented behavior of throwing
   ThreadInterruptedException.

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on issue #1307: LUCENE-8962: Fix intermittent test failures

msokolov commented on issue #1307: LUCENE-8962: Fix intermittent test failures
URL: https://github.com/apache/lucene-solr/pull/1307#issuecomment-593676692
 
 
   OK, phew after this change all the renewed testing effort failed to turn up 
any failures, so I'll push


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov merged pull request #1307: LUCENE-8962: Fix intermittent test failures

msokolov merged pull request #1307: LUCENE-8962: Fix intermittent test failures
URL: https://github.com/apache/lucene-solr/pull/1307
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation

dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's 
Default Cost Implementation
URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593675019
 
 
   Ehh; nevermind my ill-thought-out idea of a cost on the Map context.  There 
are many  ValueSource.getValues impls that'd need to parse it, and then there's 
a concern that we wouldn't want it to propagate to sub-FunctionValues.
   
   Alternative proposal:  When FunctionRangeQuery calls 
functionValues.getRangeScorer, it gets back a ValueSourceScorer.  We could just 
add a mutable cost on VSC that if set will be returned by VSC and if not VSC 
will delegate to the proposed `FV.cost`.  While the mutability of it isn't 
pretty, it's also quite minor.  It saves FRQ from having to wrap the scorer 
only to specify a matchCost.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on issue #1307: LUCENE-8962: Fix intermittent test failures

msokolov commented on issue #1307: LUCENE-8962: Fix intermittent test failures
URL: https://github.com/apache/lucene-solr/pull/1307#issuecomment-593674134
 
 
   I re-ran the whole test suite locally and will beast these: 
TestIndexWriter.testThreadInterruptDeadlock, 
TestIndexWriterMergePolicy.testMergeOnCommit
   
   and verify failing seeds from jenkins: 
   5760178D4A8250A6:73E60D8AAB67B286 (failed on master)
   EF4E611015C3B5B0:CBC87B17F4265790


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field



[ 
https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049738#comment-17049738
 ] 

Michele Palmia commented on LUCENE-9258:


I added a patch with the fix together with a(n addition to a) test that fails 
with the current implementation. Any advice on improving the testing would be 
greatly appreciated (is it ok to test the Scorer independently? Should I mock 
the Weight?).

> DocTermsIndexDocValues should not assume it's operating on a SortedDocValues 
> field
> --
>
> Key: LUCENE-9258
> URL: https://issues.apache.org/jira/browse/LUCENE-9258
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9258.patch
>
>
> When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from 
> _DocTermsIndexDocValues_ , the latter instantiates a new iterator on 
> _SortedDocValues_ regardless of the fact that the underlying field can 
> actually be of a different type (e.g. a _SortedSetDocValues_ processed 
> through a _SortedSetSelector_).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field



 [ 
https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michele Palmia updated LUCENE-9258:
---
   Attachment: LUCENE-9258.patch
Lucene Fields: New,Patch Available  (was: New)
Review Patch?: Yes

> DocTermsIndexDocValues should not assume it's operating on a SortedDocValues 
> field
> --
>
> Key: LUCENE-9258
> URL: https://issues.apache.org/jira/browse/LUCENE-9258
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9258.patch
>
>
> When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from 
> _DocTermsIndexDocValues_ , the latter instantiates a new iterator on 
> _SortedDocValues_ regardless of the fact that the underlying field can 
> actually be of a different type (e.g. a _SortedSetDocValues_ processed 
> through a _SortedSetSelector_).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field

2020-03-02 Thread ASF subversion and git services (Jira)

Michele Palmia created LUCENE-9258:
--

 Summary: DocTermsIndexDocValues should not assume it's operating 
on a SortedDocValues field
 Key: LUCENE-9258
 URL: https://issues.apache.org/jira/browse/LUCENE-9258
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 8.4
Reporter: Michele Palmia


When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from 
_DocTermsIndexDocValues_ , the latter instantiates a new iterator on 
_SortedDocValues_ regardless of the fact that the underlying field can actually 
be of a different type (e.g. a _SortedSetDocValues_ processed through a 
_SortedSetSelector_).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14241) Streaming Expression for deleting documents by IDs (from tuples)



[ 
https://issues.apache.org/jira/browse/SOLR-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049729#comment-17049729
 ] 

ASF subversion and git services commented on SOLR-14241:


Commit f2a6ff1494c33d9e70e73864ca892958103e170a in lucene-solr's branch 
refs/heads/branch_8x from Cassandra Targett
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f2a6ff1 ]

SOLR-14241: fix typos & incorrect example param


> Streaming Expression for deleting documents by IDs (from tuples)
> 
>
> Key: SOLR-14241
> URL: https://issues.apache.org/jira/browse/SOLR-14241
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: DELQ-adds-and-deletes.png, DELQ-only-adds.png, 
> SOLR-14241.patch, STREAM-adds-and-deletes.png, STREAM-only-adds.png, 
> microbenchmark_scripts.zip
>
>
> Streaming expressions currently supports an {{update(...)}} decorator 
> function for wrapping another stream and treating each Tuple from the inner 
> stream as a document to be added to an index.
> I've implemented an analogous subclass of the {{UpdateStream}} called 
> {{DeleteStream}} that uses the tuples from the inner stream to identify the 
> uniqueKeys of documents that should be deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14270) export command to have an option to write to a zip file

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049728#comment-17049728
 ] 

ASF subversion and git services commented on SOLR-14270:


Commit 1f549dc4742e9adc5d8354c812e05f662e195309 in lucene-solr's branch 
refs/heads/branch_8x from Cassandra Targett
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1f549dc ]

SOLR-14270: Move .gz example to CLI page; Remove bin/solr export from 
command-line-utilities.adoc


> export command to have an option to write to a zip file
> ---
>
> Key: SOLR-14270
> URL: https://issues.apache.org/jira/browse/SOLR-14270
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Labels: cli
> Fix For: 8.5
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Plain json files are too big. Export to a compressed file 
> {{bin/solr export -url http://localhost:8983/solr/gettingstarted -out 
> gettingstarted.json.gz}}
> This will write the data to a file called {{gettingstarted.json.gz}} in a zip 
> format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14241) Streaming Expression for deleting documents by IDs (from tuples)

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049726#comment-17049726
 ] 

ASF subversion and git services commented on SOLR-14241:


Commit 422d994612280ab5c4e13ac34260ddf05e3f7ad5 in lucene-solr's branch 
refs/heads/master from Cassandra Targett
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=422d994 ]

SOLR-14241: fix typos & incorrect example param


> Streaming Expression for deleting documents by IDs (from tuples)
> 
>
> Key: SOLR-14241
> URL: https://issues.apache.org/jira/browse/SOLR-14241
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: DELQ-adds-and-deletes.png, DELQ-only-adds.png, 
> SOLR-14241.patch, STREAM-adds-and-deletes.png, STREAM-only-adds.png, 
> microbenchmark_scripts.zip
>
>
> Streaming expressions currently supports an {{update(...)}} decorator 
> function for wrapping another stream and treating each Tuple from the inner 
> stream as a document to be added to an index.
> I've implemented an analogous subclass of the {{UpdateStream}} called 
> {{DeleteStream}} that uses the tuples from the inner stream to identify the 
> uniqueKeys of documents that should be deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14270) export command to have an option to write to a zip file

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049725#comment-17049725
 ] 

ASF subversion and git services commented on SOLR-14270:


Commit 27523b5e40921499212bd0c5f7f56c35cdebe073 in lucene-solr's branch 
refs/heads/master from Cassandra Targett
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=27523b5 ]

SOLR-14270: Move .gz example to CLI page; Remove bin/solr export from 
command-line-utilities.adoc


> export command to have an option to write to a zip file
> ---
>
> Key: SOLR-14270
> URL: https://issues.apache.org/jira/browse/SOLR-14270
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Labels: cli
> Fix For: 8.5
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Plain json files are too big. Export to a compressed file 
> {{bin/solr export -url http://localhost:8983/solr/gettingstarted -out 
> gettingstarted.json.gz}}
> This will write the data to a file called {{gettingstarted.json.gz}} in a zip 
> format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14270) export command to have an option to write to a zip file

2020-03-02 Thread Cassandra Targett (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049718#comment-17049718
 ] 

Cassandra Targett commented on SOLR-14270:
--

Just FYI, the docs here got a little messed up.

Docs for the bin/solr export command were already added where they belonged 
(solr-control-script.adoc - about bin/solr) back in SOLR-13862, but this issue 
adds slightly different and less-detailed docs to the wrong place 
(command-line-utilities.adoc - about zkcli.sh), while not updating the original 
docs to include the ability to .zip the output.

I'll fix it (by removing the wrong docs from the wrong place but copying the 
one new example that's needed in the right place), but just wanted to mention 
it.

> export command to have an option to write to a zip file
> ---
>
> Key: SOLR-14270
> URL: https://issues.apache.org/jira/browse/SOLR-14270
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Labels: cli
> Fix For: 8.5
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Plain json files are too big. Export to a compressed file 
> {{bin/solr export -url http://localhost:8983/solr/gettingstarted -out 
> gettingstarted.json.gz}}
> This will write the data to a file called {{gettingstarted.json.gz}} in a zip 
> format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9150) Restore support for dynamic PlanetModel in Geo3D

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049716#comment-17049716
 ] 

ASF subversion and git services commented on LUCENE-9150:
-

Commit a6e80d004d84213886a5ce52fd220d2e5112e43e in lucene-solr's branch 
refs/heads/master from Nicholas Knize
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a6e80d0 ]

LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d


> Restore support for dynamic PlanetModel in Geo3D
> 
>
> Key: LUCENE-9150
> URL: https://issues.apache.org/jira/browse/LUCENE-9150
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Nick Knize
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> LUCENE-7072 removed dynamic planet model support in Geo3D. This was logical 
> at the time (given the state of Lucene and spatial projections and coordinate 
> reference systems). Since then, however, there have been a lot of new 
> developments within the OGC community around [Coordinate Reference 
> Systems|https://docs.opengeospatial.org/as/18-005r4/18-005r4.html], [Dynamic 
> Coordinate Reference 
> Systems|http://docs.opengeospatial.org/DRAFTS/18-058.html], and [Updated ISO 
> Standards|https://www.iso.org/obp/ui/#iso:std:iso:19111:ed-3:v1:en].
> It would be useful for Geo3D (and eventually LatLon*) to support different 
> geographic datums to make lucene a viable option for indexing/searching in 
> different spatial reference systems (e.g., more accurately computing query 
> shape relations to BKD's internal nodes using datum consistent with the 
> spatial projection). This would also provide an alternative to other 
> limitations of the {{LatLon*/XY*}} implementation (e.g., pole/dateline 
> crossing, quantization of small polygons). 
> I'd like to propose keeping the current WGS84 static datum as the default for 
> Geo3D but adding back the constructors to accept custom planet models. 
> Perhaps this could be listed as an "expert" API feature?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] asfgit merged pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d

asfgit merged pull request #1253: LUCENE-9150: Restore support for dynamic 
PlanetModel in spatial3d
URL: https://github.com/apache/lucene-solr/pull/1253
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation

dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's 
Default Cost Implementation
URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593657932
 
 
   Why a separate PR for my proposed test?
   
   Your proposal is better than the status quo but I think is rather lacking if 
that's it.  If your proposal can also accommodate a query-time user supplied 
cost, especially by FunctionRangeQuery somehow, then I think we're then in good 
shape as it'll allow a user to set this on the fly.  (BTW ignore the identical 
named class in Solr, which I plan on removing).  Perhaps this cost could sneak 
in by putting the cost on the "context" Map supplied to ValueSource.getValues ? 
 Yeah; that'd be cool :-)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049708#comment-17049708
 ] 

Michael Sokolov commented on LUCENE-8962:
-

Thanks, [~msfroh] I'll take a look

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049700#comment-17049700
 ] 

Michael Froh commented on LUCENE-8962:
--

Posted a PR with fixes for the above test failures: 
[https://github.com/apache/lucene-solr/pull/1307]

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msfroh opened a new pull request #1307: LUCENE-8962: Fix intermittent test failures

msfroh opened a new pull request #1307: LUCENE-8962: Fix intermittent test
failures
URL: https://github.com/apache/lucene-solr/pull/1307

1. TestIndexWriterMergePolicy.testMergeOnCommit will fail if the last
commit (the one that should trigger the full merge) doesn't have any
pending changes (which could occur if the last indexing thread
commits at the end). We can fix that by adding one more document
before that commit.
2. The previous implementation was throwing IOException if the commit
thread gets interrupted while waiting for merges to complete. This
violates IndexWriter's documented behavior of throwing
ThreadInterruptedException.

# Description

This fixes intermittent test failures related to the previous commit on
LUCENE-8962.

# Solution

There were two separate bugs in the previous commit:

1. TestIndexWriterMergePolicy.testMergeOnCommit could sometimes fail the
last assertion, because the final commit in the test method triggered no
merges. This could happen if multiple indexing threads committed after adding
their last documents. To guarantee that the final commit in the test method
triggers a merge, we can add one more document (so there is a change to commit).
2. TestIndexWriter. testThreadInterruptDeadlock verifies IndexWriter's
documented behavior of throwing ThreadInterruptedException when interrupted.
The previous commit for LUCENE-8962 violated this behavior. This commit fixes
that.

# Tests

After applying these fixes, I have run both TestIndexWriter and
TestIndexWriterMergePolicy multiple times with previously-failing seeds and
random seeds, and have not seen the test failures occur again.

# Checklist

Please review the following and check all that apply:

- [X] I have reviewed the guidelines for [How to
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms
to the standards described there to the best of my ability.
- [X] I have created a Jira issue and added the issue ID to my pull request
title.
- [X] I have given Solr maintainers
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
to contribute to my PR branch. (optional but recommended)
- [X] I have developed this patch against the `master` branch.
- [X] I have run `ant precommit` and the appropriate test suite.
- [X] I have added tests for my changes.
- [ ] I have added documentation for the [Ref
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide)
(for Solr changes only).

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] nknize commented on a change in pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d

nknize commented on a change in pull request #1253: LUCENE-9150: Restore 
support for dynamic PlanetModel in spatial3d
URL: https://github.com/apache/lucene-solr/pull/1253#discussion_r386676232
 
 

 ##
 File path: 
lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/PlanetModel.java
 ##
 @@ -383,30 +509,233 @@ public GeoPoint surfacePointOnBearing(final GeoPoint 
from, final double dist, fi
   Δσ = B * sinσ * (cos2σM + B / 4.0 * (cosσ * (-1.0 + 2.0 * cos2σM * 
cos2σM) -
   B / 6.0 * cos2σM * (-3.0 + 4.0 * sinσ * sinσ) * (-3.0 + 4.0 * cos2σM 
* cos2σM)));
   σʹ = σ;
-  σ = dist / (c * inverseScale * A) + Δσ;
+  σ = dist / (zScaling * inverseScale * A) + Δσ;
 } while (Math.abs(σ - σʹ) >= Vector.MINIMUM_RESOLUTION && ++iterations < 
100);
 double x = sinU1 * sinσ - cosU1 * cosσ * cosα1;
-double φ2 = Math.atan2(sinU1 * cosσ + cosU1 * sinσ * cosα1, (1.0 - 
flattening) * Math.sqrt(sinα * sinα + x * x));
+double φ2 = Math.atan2(sinU1 * cosσ + cosU1 * sinσ * cosα1, (1.0 - 
scaledFlattening) * Math.sqrt(sinα * sinα + x * x));
 double λ = Math.atan2(sinσ * sinα1, cosU1 * cosσ - sinU1 * sinσ * cosα1);
-double C = flattening / 16.0 * cosSqα * (4.0 + flattening * (4.0 - 3.0 * 
cosSqα));
-double L = λ - (1.0 - C) * flattening * sinα *
+double C = scaledFlattening / 16.0 * cosSqα * (4.0 + scaledFlattening * 
(4.0 - 3.0 * cosSqα));
+double L = λ - (1.0 - C) * scaledFlattening * sinα *
 (σ + C * sinσ * (cos2σM + C * cosσ * (-1.0 + 2.0 * cos2σM * cos2σM)));
 double λ2 = (lon + L + 3.0 * Math.PI) % (2.0 * Math.PI) - Math.PI;  // 
normalise to -180..+180
 
 return new GeoPoint(this, φ2, λ2);
   }
 
+  /** Utility class for encoding / decoding from lat/lon (decimal degrees) 
into sortable doc value numerics (integers) */
+  public static class DocValueEncoder {
+private final PlanetModel planetModel;
+
+// These are the multiplicative constants we need to use to arrive at 
values that fit in 21 bits.
+// The formula we use to go from double to encoded value is:  
Math.floor((value - minimum) * factor + 0.5)
+// If we plug in maximum for value, we should get 0x1F.
+// So, 0x1F = Math.floor((maximum - minimum) * factor + 0.5)
+// We factor out the 0.5 and Math.floor by stating instead:
+// 0x1F = (maximum - minimum) * factor
+// So, factor = 0x1F / (maximum - minimum)
+
+private final static double inverseMaximumValue = 1.0 / (double)(0x1F);
+
+private final double inverseXFactor;
+private final double inverseYFactor;
+private final double inverseZFactor;
+
+private final double xFactor;
+private final double yFactor;
+private final double zFactor;
+
+// Fudge factor for step adjustments.  This is here solely to handle 
inaccuracies in bounding boxes
+// that occur because of quantization.  For unknown reasons, the fudge 
factor needs to be
+// 10.0 rather than 1.0.  See LUCENE-7430.
+
+private final static double STEP_FUDGE = 10.0;
+
+// These values are the delta between a value and the next value in each 
specific dimension
+
+private final double xStep;
+private final double yStep;
+private final double zStep;
+
+/** construct an encoder/decoder instance from the provided PlanetModel 
definition */
+public DocValueEncoder(final PlanetModel planetModel) {
 
 Review comment:
   :+1: good call! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] nknize commented on a change in pull request #1290: LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon

nknize commented on a change in pull request #1290: LUCENE-9251: Filter equal 
edges with different value on isEdgeFromPolygon
URL: https://github.com/apache/lucene-solr/pull/1290#discussion_r386674661
 
 

 ##
 File path: lucene/core/src/test/org/apache/lucene/geo/TestTessellator.java
 ##
 @@ -561,6 +561,18 @@ public void testComplexPolygon39() throws Exception {
 checkPolygon(wkt);
   }
 
+  @Nightly
+  public void testComplexPolygon40() throws Exception {
+String wkt = GeoTestUtil.readShape("lucene-9251.wkt.gz");
 
 Review comment:
   nice!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry

nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle 
should extend LatLonGeometry
URL: https://github.com/apache/lucene-solr/pull/1258#discussion_r380381973
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/geo/Rectangle2D.java
 ##
 @@ -415,16 +217,63 @@ public boolean equals(Object o) {
 return minX == that.minX &&
 maxX == that.maxX &&
 minY == that.minY &&
-maxY == that.maxY &&
-Arrays.equals(bbox, that.bbox) &&
-Arrays.equals(west, that.west);
+maxY == that.maxY;
   }
 
   @Override
   public int hashCode() {
 int result = Objects.hash(minX, maxX, minY, maxY);
-result = 31 * result + Arrays.hashCode(bbox);
-result = 31 * result + Arrays.hashCode(west);
 return result;
   }
-}
+
+  @Override
+  public String toString() {
+final StringBuilder sb = new StringBuilder();
+sb.append("XYRectangle(x=");
+sb.append(minX);
+sb.append(" TO ");
+sb.append(maxX);
+sb.append(" y=");
+sb.append(minY);
+sb.append(" TO ");
+sb.append(maxY);
+sb.append(")");
+return sb.toString();
+  }
+
+  /** create a component2D from the provided XY rectangle */
+  static Component2D create(XYRectangle rectangle) {
+return new Rectangle2D(rectangle.minX, rectangle.maxX, rectangle.minY, 
rectangle.maxY);
+  }
+
+  private static double MIN_LON_INCL_QUANTIZE = 
decodeLongitude(encodeLongitude(MIN_LON_INCL));
 
 Review comment:
   ```suggestion
 private static double MIN_LON_INCL_QUANTIZE = 
decodeLongitude(GeoEncodingUtils.MIN_LON_ENCODED);
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry

nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle 
should extend LatLonGeometry
URL: https://github.com/apache/lucene-solr/pull/1258#discussion_r380385091
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/document/LatLonShapeBoundingBoxQuery.java
 ##
 @@ -108,4 +115,385 @@ public String toString(String field) {
 sb.append(rectangle.toString());
 return sb.toString();
   }
+
+  /** Holds spatial logic for a bounding box that works in the encoded space */
+  private static class EncodedRectangle {
 
 Review comment:
   Is this class needed because `Rectangle2D` is package private?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry

nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle 
should extend LatLonGeometry
URL: https://github.com/apache/lucene-solr/pull/1258#discussion_r380382109
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/geo/Rectangle2D.java
 ##
 @@ -415,16 +217,63 @@ public boolean equals(Object o) {
 return minX == that.minX &&
 maxX == that.maxX &&
 minY == that.minY &&
-maxY == that.maxY &&
-Arrays.equals(bbox, that.bbox) &&
-Arrays.equals(west, that.west);
+maxY == that.maxY;
   }
 
   @Override
   public int hashCode() {
 int result = Objects.hash(minX, maxX, minY, maxY);
-result = 31 * result + Arrays.hashCode(bbox);
-result = 31 * result + Arrays.hashCode(west);
 return result;
   }
-}
+
+  @Override
+  public String toString() {
+final StringBuilder sb = new StringBuilder();
+sb.append("XYRectangle(x=");
+sb.append(minX);
+sb.append(" TO ");
+sb.append(maxX);
+sb.append(" y=");
+sb.append(minY);
+sb.append(" TO ");
+sb.append(maxY);
+sb.append(")");
+return sb.toString();
+  }
+
+  /** create a component2D from the provided XY rectangle */
+  static Component2D create(XYRectangle rectangle) {
+return new Rectangle2D(rectangle.minX, rectangle.maxX, rectangle.minY, 
rectangle.maxY);
+  }
+
+  private static double MIN_LON_INCL_QUANTIZE = 
decodeLongitude(encodeLongitude(MIN_LON_INCL));
+  private static double MAX_LON_INCL_QUANTIZE = 
decodeLongitude(encodeLongitude(MAX_LON_INCL));
 
 Review comment:
   ```suggestion
 private static double MAX_LON_INCL_QUANTIZE = 
decodeLongitude(GeoEncodingUtils.MAX_LON_ENCODED);
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal



[ 
https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049681#comment-17049681
 ] 

Michele Palmia edited comment on LUCENE-8674 at 3/2/20 9:49 PM:


This is due to a _VectorValueSource_ being fed to a _FunctionRangeQuery_, that 
is therefore trying to use its _floatVal_. By default, requesting the 
_floatVal(int doc)_ of a _VectorValueSource_ throws an 
_UnsupportedOperationException_, since no algorithm for merging the (possibly 
multiple) values is implemented.

For reference, the query Solr tries to do is the following,

{code:java}
new ConstantScoreQuery(
new FunctionRangeQuery(
new VectorValueSource(
new BytesRefFieldSource("any_field"),
new SortedSetFieldSource("another_field")
), 0, 100, true, true));
{code}

that always throws an exception if there are documents in the index.

>From the way it's implemented (with the _UnsupportedOperationException_) it 
>doesn't look like this kind of inconsistencies are meant to be fixed in 
>Lucene. But not sure about that.

Any suggestions are appreciated!


was (Author: micpalmia):
This is due to a _VectorValueSource_ being fed to a _FunctionRangeQuery_, that 
is therefore trying to use its _floatVal_. By default, requesting the 
_floatVal(int doc)_ of a _VectorValueSource_ throws an 
_UnsupportedOperationException_, since no algorithm for merging the (possibly 
multiple) values is implemented.

For reference, the query Solr tries to do is the following,

{code:java}
final ConstantScoreQuery query = new ConstantScoreQuery(
new FunctionRangeQuery(
new VectorValueSource(
new BytesRefFieldSource("any_field"),
new SortedSetFieldSource("another_field")
), 0, 100, true, true));

{code}

that always throws an exception if there are documents in the index.

>From the way it's implemented (with the _UnsupportedOperationException_) it 
>doesn't look like this kind of inconsistencies are meant to be fixed in 
>Lucene. But not sure about that.

Any suggestions are appreciated!

> UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
> --
>
> Key: LUCENE-8674
> URL: https://issues.apache.org/jira/browse/LUCENE-8674
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection and reproducing the bug
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html].
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> curl -v “URL_BUG”
> {noformat}
> Please check the issue description below to find the “URL_BUG” that will 
> allow you to reproduce the issue reported.
>Reporter: Johannes Kloos
>Priority: Minor
>  Labels: diffblue, newdev
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.UnsupportedOperationException
> at 
> org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47)
> at 
> org.apache.lucene.queries.function.FunctionValues$3.match

[jira] [Comment Edited] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal



[ 
https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049681#comment-17049681
 ] 

Michele Palmia edited comment on LUCENE-8674 at 3/2/20 9:47 PM:


This is due to a _VectorValueSource_ being fed to a _FunctionRangeQuery_, that 
is therefore trying to use its _floatVal_. By default, requesting the 
_floatVal(int doc)_ of a _VectorValueSource_ throws an 
_UnsupportedOperationException_, since no algorithm for merging the (possibly 
multiple) values is implemented.

For reference, the query Solr tries to do is the following,

{code:java}
final ConstantScoreQuery query = new ConstantScoreQuery(
new FunctionRangeQuery(
new VectorValueSource(
new BytesRefFieldSource("any_field"),
new SortedSetFieldSource("another_field")
), 0, 100, true, true));

{code}

that always throws an exception if there are documents in the index.

>From the way it's implemented (with the _UnsupportedOperationException_) it 
>doesn't look like this kind of inconsistencies are meant to be fixed in 
>Lucene. But not sure about that.

Any suggestions are appreciated!


was (Author: micpalmia):
This is due to a `VectorValueSource` being fed to a `FunctionRangeQuery`, that 
is therefore trying to use its `floatVal`. By default, requesting the 
`floatVal(int doc)` of a `VectorValueSource` throws an 
`UnsupportedOperationException`, since no algorithm for merging the (possibly 
multiple) values is implemented.

For reference, the query Solr tries to do is the following,

```java
final ConstantScoreQuery query = new ConstantScoreQuery(
new FunctionRangeQuery(
new VectorValueSource(
new BytesRefFieldSource("any_field"),
new SortedSetFieldSource("another_field")
), 0, 100, true, true));
```

that always throws an exception if there are documents in the index.

>From the way it's implemented (with the `UnsupportedOperationException`) it 
>doesn't look like this kind of inconsistencies are meant to be fixed in 
>Lucene. But not sure about that.

Any suggestions are appreciated!

> UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
> --
>
> Key: LUCENE-8674
> URL: https://issues.apache.org/jira/browse/LUCENE-8674
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection and reproducing the bug
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html].
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> curl -v “URL_BUG”
> {noformat}
> Please check the issue description below to find the “URL_BUG” that will 
> allow you to reproduce the issue reported.
>Reporter: Johannes Kloos
>Priority: Minor
>  Labels: diffblue, newdev
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.UnsupportedOperationException
> at 
> org.ap

[jira] [Commented] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal



[ 
https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049681#comment-17049681
 ] 

Michele Palmia commented on LUCENE-8674:


This is due to a `VectorValueSource` being fed to a `FunctionRangeQuery`, that 
is therefore trying to use its `floatVal`. By default, requesting the 
`floatVal(int doc)` of a `VectorValueSource` throws an 
`UnsupportedOperationException`, since no algorithm for merging the (possibly 
multiple) values is implemented.

For reference, the query Solr tries to do is the following,

```java
final ConstantScoreQuery query = new ConstantScoreQuery(
new FunctionRangeQuery(
new VectorValueSource(
new BytesRefFieldSource("any_field"),
new SortedSetFieldSource("another_field")
), 0, 100, true, true));
```

that always throws an exception if there are documents in the index.

>From the way it's implemented (with the `UnsupportedOperationException`) it 
>doesn't look like this kind of inconsistencies are meant to be fixed in 
>Lucene. But not sure about that.

Any suggestions are appreciated!

> UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
> --
>
> Key: LUCENE-8674
> URL: https://issues.apache.org/jira/browse/LUCENE-8674
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection and reproducing the bug
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html].
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> curl -v “URL_BUG”
> {noformat}
> Please check the issue description below to find the “URL_BUG” that will 
> allow you to reproduce the issue reported.
>Reporter: Johannes Kloos
>Priority: Minor
>  Labels: diffblue, newdev
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.UnsupportedOperationException
> at 
> org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47)
> at 
> org.apache.lucene.queries.function.FunctionValues$3.matches(FunctionValues.java:188)
> at 
> org.apache.lucene.queries.function.ValueSourceScorer$1.matches(ValueSourceScorer.java:53)
> at 
> org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.doNext(TwoPhaseIterator.java:89)
> at 
> org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.nextDoc(TwoPhaseIterator.java:77)
> at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:261)
> at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
> at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
> at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151)
> at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1177)
> at 
> org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049666#comment-17049666
 ] 

Michael Froh commented on LUCENE-8962:
--

I was able to reproduce the {{testMergeOnCommit}} failure on master sometimes 
with the following options:

{{-Dtestcase=TestIndexWriterMergePolicy -Dtests.method=testMergeOnCommit 
-Dtests.seed=F8DD5AD20994FDDF -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=fi-FI -Dtests.timezone=America/Danmarkshavn -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8}}

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-03-02 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049655#comment-17049655
 ] 

David Smiley commented on SOLR-13749:
-

I was about to say add a new enum option but as I look at this, the existing 
"method=index" (default) seems appropriate, it's just that we're now able to 
handle scenarios that were not handled before -- multiple shards and also when 
the collection isn't on the same node.  Ideally, the existing code would detect 
when the current code can be used and if not use XCJF.  It'll take some work to 
make this transition; not "hard" but I don't want to ask more time of you than 
you have.  If we're not ready to do what's best for Solr for 8.5, then I think 
we should un-document it with a big comment or something like that so that 
people don't start using a feature that isn't quite ready.  In my experience 
with Solr, once something is released in a certain way, it tends to be set it 
stone (sadly).  I'm sorry if this is unsatisfying to everyone who put awesome 
work into this but I want to do what's best for Solr in the long term.

> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered 
> valid, in seconds.  Defaults to 3600 (one hour).  
> The

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049651#comment-17049651
 ] 

Michael Froh commented on LUCENE-8962:
--

Regarding {{TestIndexWriter.testThreadInterruptDeadlock}}, I think that's a bug 
in the implementation. 

When waiting for merges to complete, I added a {{catch}} for 
{{InterruptedException}} that sets the interrupt flag and throws an 
{{IOException}}. The documented behavior of {{IndexWriter}} is to clear the 
interrupt flag and throw {{ThreadInterruptedException}}. 

Again, not sure why the tests on master didn't fail. Maybe we just got lucky 
with the branch_8x tests.

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049641#comment-17049641
 ] 

Michael Froh commented on LUCENE-8962:
--

I think the failure in {{testMergeOnCommit}} occurs because of a difference in 
the random behavior of the test. 

Specifically, sometimes the last writing thread happens to choose to 
{{commit()}} at the end, so there are no pending changes by the time we do the 
last {{commit()}} which should merge all segments (or abandon the merge, if it 
takes too long).

If we add one more doc before that last commit (ensuring that the 
{{anyChanges}} check in {{IndexWriter.prepareCommitInternal()}} is {{true}}), 
the test passes consistently. 

I'm not sure why we don't see the same failure sometimes on master, though.

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14291) OldAnalyticsRequestConverter should support fields names with dots

2020-03-02 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049635#comment-17049635
 ] 

Mikhail Khludnev commented on SOLR-14291:
-

[~houstonputman], what's your opinion? 

> OldAnalyticsRequestConverter should support fields names with dots
> --
>
> Key: SOLR-14291
> URL: https://issues.apache.org/jira/browse/SOLR-14291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search, SearchComponents - other
>Reporter: Anatolii Siuniaev
>Priority: Trivial
> Attachments: SOLR-14291.patch
>
>
> If you send a query with range facets using old olap-style syntax (see pdf 
> [here|https://issues.apache.org/jira/browse/SOLR-5302]), 
> OldAnalyticsRequestConverter just silently (no exception thrown) omits 
> parameters like
> {code:java}
> olap..rangefacet..start
> {code}
> in case if __ has dots inside (for instance field name is 
> _Project.Value_). And thus no range facets are returned in response.  
> Probably the same happens in case of field faceting. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-03-02 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049634#comment-17049634
 ] 

Mikhail Khludnev commented on SOLR-12325:
-

Attaching random test patch by [~anatolii_siuniaev], I'd like to reduce its' 
footprint on existing test codebase.

> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch, 
> SOLR-12325.patch, SOLR-12325.patch, 
> SOLR-12325_Random_test_for_uniqueBlockQuery (1).patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-03-02 Thread Mikhail Khludnev (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-12325:

Attachment: SOLR-12325.patch
Status: Patch Available  (was: Patch Available)

> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch, 
> SOLR-12325.patch, SOLR-12325.patch, 
> SOLR-12325_Random_test_for_uniqueBlockQuery (1).patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data



[ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049600#comment-17049600
 ] 

Noble Paul edited comment on SOLR-13942 at 3/2/20 8:38 PM:
---

[~erickerickson] 

*why this is needed?*

bin/solr is not something that everyone has everywhere. People have curl & wget 
& a browser everywhere.

If I want to script something up and read data in ZK there is no alternative

*Security aspect*

There is already a REST API to fetch data stored in ZK ({{/admin/zookeeper}} is 
the end point). It has no proper API docs. It is used by our admin UI all the 
time 

I would like to merge it soon


was (Author: noble.paul):
[~erickerickson] 

*why this is needed?*

bin/solr is not everyone has everywhere. People have curl & wget & a browser 
everywhere.

If I want to script something up and read data in ZK there is no alternative

*Security aspect*

There is already a REST API to fetch data stored in ZK ({{/admin/zookeeper}} is 
the end point). It has no proper API docs. It is used by our admin UI all the 
time 

I would like to merge it soon

> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data



[ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049600#comment-17049600
 ] 

Noble Paul edited comment on SOLR-13942 at 3/2/20 8:22 PM:
---

[~erickerickson] 

*why this is needed?*

bin/solr is not everyone has everywhere. People have curl & wget & a browser 
everywhere.

If I want to script something up and read data in ZK there is no alternative

*Security aspect*

There is already a REST API to fetch data stored in ZK ({{/admin/zookeeper}} is 
the end point). It has no proper API docs. It is used by our admin UI all the 
time 

I would like to merge it soon


was (Author: noble.paul):
[~erickerickson] 

*why this is needed?*

bin/solr is not everyone has everywhere. People have curl & wget & a browser 
everywhere.

If I want to script something up and read data in ZK there is no alternative

*Security aspect*

There is already a REST API to fetch data stored in ZK. It ha no proper API 
docs. It is used by our admin UI all the time 

I would like to merge it soon

> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data



[ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049621#comment-17049621
 ] 

Noble Paul commented on SOLR-13942:
---

[~janhoy] 
I feel like I'm talking to a wall . the comment I posted 20mins back just 
addresses these 2 questions

> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-03-02 Thread Gus Heck (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049617#comment-17049617
 ] 

Gus Heck commented on SOLR-13749:
-

Can you be more explicit about what you want here? Do you have a suggestion for 
how that logic to "know when to use" should work? 

> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered 
> valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate 
> results.  
> After the ttl period has expired, the XCJF query will re-execute the join 
> against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local 
> param.|
>  
> Example Solr Config.xml changes:
>  
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{   }}{{class}}{{=}}{{"solr.LRUCache"}}
>  {{   }}{{size}}{{=}}{{"128"}}
>  {{   }}{{initialSize}}{{=}}{{"0"}}
>  {{   }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>   
>  {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} 
> {{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
>  {{  }}{{<}}{{str}} {{name}}{{=}}{{"rou

[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data

2020-03-02 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049608#comment-17049608
 ] 

Jan Høydahl commented on SOLR-13942:


{quote}I would like to merge it soon
{quote}
I would like you to explain the true benefit of this, explain why it is not a 
security risk and then gain consensus before continuing.

> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data



[ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049600#comment-17049600
 ] 

Noble Paul edited comment on SOLR-13942 at 3/2/20 8:05 PM:
---

[~erickerickson] 

*why this is needed?*

bin/solr is not everyone has everywhere. People have curl & wget & a browser 
everywhere.

If I want to script something up and read data in ZK there is no alternative

*Security aspect*

There is already a REST API to fetch data stored in ZK. It ha no proper API 
docs. It is used by our admin UI all the time 

I would like to merge it soon


was (Author: noble.paul):
[~erickerickson] 

bin/solr is not everyone has everywhere. People have curl & wget & a browser 
everywhere.

If I want to script something up and read data in ZK there is no alternative

I would like to merge it soon

> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data



 [ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-13942:
-

Assignee: Noble Paul

> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data



[ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049600#comment-17049600
 ] 

Noble Paul commented on SOLR-13942:
---

[~erickerickson] 

bin/solr is not everyone has everywhere. People have curl & wget & a browser 
everywhere.

If I want to script something up and read data in ZK there is no alternative

I would like to merge it soon

> /api/cluster/zk/* to fetch raw ZK data
> --
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
>  Issue Type: Bug
>Reporter: Noble Paul
>Priority: Major
>
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] praste commented on a change in pull request #1306: SOLR-14299: IndexFetcher doesnt' reset count to 0 after the last packet is received

praste commented on a change in pull request #1306: SOLR-14299: IndexFetcher 
doesnt' reset count to 0 after the last packet is received
URL: https://github.com/apache/lucene-solr/pull/1306#discussion_r386591132
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java
 ##
 @@ -1728,20 +1728,21 @@ private int fetchPackets(FastInputStream fis) throws 
Exception {
 long checkSumClient = checksum.getValue();
 if (checkSumClient != checkSumServer) {
   log.error("Checksum not matched between client and server for 
file: {}", fileName);
-  //if checksum is wrong it is a problem return for retry
+  //if checksum is wrong it is a problem return (there doesn't 
seem to be a retry in this case.)
   return 1;
 }
   }
   //if everything is fine, write down the packet to the file
   file.write(buf, packetSize);
   bytesDownloaded += packetSize;
   log.debug("Fetched and wrote {} bytes of file: {}", bytesDownloaded, 
fileName);
-  if (bytesDownloaded >= size)
-return 0;
   //errorCount is always set to zero after a successful packet
   errorCount = 0;
+  if (bytesDownloaded >= size)
+return 0;
 }
   } catch (ReplicationHandlerException e) {
+log.warn("Aborting index replication", e);
 
 Review comment:
   agreed. Logging here is not needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049564#comment-17049564
 ] 

Michael Froh commented on LUCENE-8962:
--

I'm looking into the {{branch_8x}} failures. 

I'm able to reproduce on my machine and will step through to see what's 
different.

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #1306: SOLR-14299: IndexFetcher doesnt' reset count to 0 after the last packet is received

madrob commented on a change in pull request #1306: SOLR-14299: IndexFetcher 
doesnt' reset count to 0 after the last packet is received
URL: https://github.com/apache/lucene-solr/pull/1306#discussion_r386585563
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java
 ##
 @@ -1728,20 +1728,21 @@ private int fetchPackets(FastInputStream fis) throws 
Exception {
 long checkSumClient = checksum.getValue();
 if (checkSumClient != checkSumServer) {
   log.error("Checksum not matched between client and server for 
file: {}", fileName);
-  //if checksum is wrong it is a problem return for retry
+  //if checksum is wrong it is a problem return (there doesn't 
seem to be a retry in this case.)
   return 1;
 }
   }
   //if everything is fine, write down the packet to the file
   file.write(buf, packetSize);
   bytesDownloaded += packetSize;
   log.debug("Fetched and wrote {} bytes of file: {}", bytesDownloaded, 
fileName);
-  if (bytesDownloaded >= size)
-return 0;
   //errorCount is always set to zero after a successful packet
   errorCount = 0;
+  if (bytesDownloaded >= size)
+return 0;
 }
   } catch (ReplicationHandlerException e) {
+log.warn("Aborting index replication", e);
 
 Review comment:
   Is log-and-throw necessary here? it looks like we already log this higher up 
in the call stack.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] praste opened a new pull request #1306: SOLR-14299: IndexFetcher doesnt' reset count to 0 after the last packet is received

praste opened a new pull request #1306: SOLR-14299: IndexFetcher doesnt' reset
count to 0 after the last packet is received
URL: https://github.com/apache/lucene-solr/pull/1306

* SOLR-14299: IndexFetcher should reset the `errorCount` to 0 after
successfully receive the last packet while fetching the file.

# Description
While fetching the files from master `IndexFetcher` retries 5 times before
giving up. It resets the errorCount after successfully receiving the packet
except for the last packet. Seems like an oversight.

# Solution

Reset the errorCount to 0 before verifying if it is last packet for the
while.

# Tests
This is a trivial change, no test cases added for now

# Checklist
Please review the following and check all that apply:

- [ ] I have reviewed the guidelines for [How to
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms
to the standards described there to the best of my ability.
- [ ] I have created a Jira issue and added the issue ID to my pull request
title.
- [ ] I have given Solr maintainers
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
to contribute to my PR branch. (optional but recommended)
- [ ] I have developed this patch against the `master` branch.
- [ ] I have run `ant precommit` and the appropriate test suite.
- [ ] I have added tests for my changes.
- [ ] I have added documentation for the [Ref
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide)
(for Solr changes only).

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13502) Investigate using something other than ZooKeeper's "4 letter words" for the admin UI status

2020-03-02 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049544#comment-17049544
 ] 

Jan Høydahl commented on SOLR-13502:


Check whether the Zk java client supports admin commands? 

We might have to support both old and new 4lw for as long as zk supports old?

May zk admin server have SSL? Then we might need to support custom keyStore for 
it and mutualSsl. What about auth? Where to configure it’s host/port? Will it 
always run on all zk servers?

This might be more work than it looks like on the surface...

> Investigate using something other than ZooKeeper's "4 letter words" for the 
> admin UI status
> ---
>
> Key: SOLR-13502
> URL: https://issues.apache.org/jira/browse/SOLR-13502
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> ZooKeeper 3.5.5 requires a whitelist of allowed "4 letter words". The only 
> place I see on a quick look at the Solr code where 4lws are used is in the 
> admin UI "ZK Status" link.
> In order to use the admin UI "ZK Status" link, users will have to modify 
> their zoo.cfg file with
> {code}
> 4lw.commands.whitelist=mntr,conf,ruok
> {code}
> This JIRA is to see if there are alternatives to using 4lw for the admin UI.
> This depends on SOLR-8346. If we find an alternative, we need to remove the 
> additions to the ref guide that mention changing zoo.cfg (just scan for 4lw 
> in all the .adoc files) and remove SolrZkServer.ZK_WHITELIST_PROPERTY and all 
> references to it (SolrZkServer and SolrTestCaseJ4).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14299) IndexFetcher doesnt' reset errorCount to 0 after the last packet is received

2020-03-02 Thread Pushkar Raste (Jira)

Pushkar Raste created SOLR-14299:


 Summary: IndexFetcher doesnt' reset errorCount to 0 after the last 
packet is received
 Key: SOLR-14299
 URL: https://issues.apache.org/jira/browse/SOLR-14299
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: replication (java)
Affects Versions: 7.7.1
Reporter: Pushkar Raste


While fetching the files from master `IndexFetcher` retries 5 times before 
giving up. It resets the  errorCount after successfully receiving the packet 
except for the last packet. Seems like an oversight. 

 

[https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1739-L1742]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation

atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's 
Default Cost Implementation
URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593544138
 
 
   > Lets add the following test to TestFunctionRangeQuery:
   > 
   > ```
   >   @Test
   >   public void testTwoRangeQueries() throws IOException {
   > Query rq1 = new FunctionRangeQuery(INT_VALUESOURCE, 2, 4, true, true);
   > Query rq2 = new FunctionRangeQuery(INT_VALUESOURCE, 8, 10, true, true);
   > Query bq = new BooleanQuery.Builder()
   > .add(rq1, BooleanClause.Occur.SHOULD)
   > .add(rq2, BooleanClause.Occur.SHOULD)
   > .build();
   > 
   > ScoreDoc[] scoreDocs = indexSearcher.search(bq, N_DOCS).scoreDocs;
   > expectScores(scoreDocs, 10, 9, 8, 4, 3, 2);
   >   }
   > ```
   Thanks, will add it in a separate PR.
   
   > > Maybe have FunctionValues expose an abstract cost() method, have all FV 
derivatives implement it and then simply let VSC's matchCost use that method?
   > 
   > Yes; we certainly need the FV to provide the cost; the TPI.matchCost 
should simply look it up. By the FV (or VS) having a cost, it then becomes 
straight-forward for anyone's custom FV/VS to specify what their cost is. It's 
debatable wether this cost should be on the VS vs FV.
   
   Ok, so my next iteration will have a cost method in FV (I am inclined to add 
the method in FV since VSC has a direct member of that type?) and have VSC 
owned TPI's matchCost() refer to the cost of the delegated FV.
   
   I also feel that FV class itself should have a default implementation of 
cost method where it just returns a dumb value (100?) as the cost. The idea is 
that if you are using the default FV, you probably dont care about costs. If 
you plug in your own FV which has a smarter costing algorithm, VSC will 
automatically pick it up.
   
   WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-03-02 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049517#comment-17049517
 ] 

David Smiley commented on SOLR-13749:
-

I had commented on SOLR-11384 but I think I should have placed it here.  
Basically, lets enhance JoinQParserPlugin to know when to use this new 
implementation instead of adding a new query parser that looks like the current 
one.  The existing one already has a "method" and branches.  Can we get this in 
ASAP for 8.5 please?

> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered 
> valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate 
> results.  
> After the ttl period has expired, the XCJF query will re-execute the join 
> against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local 
> param.|
>  
> Example Solr Config.xml changes:
>  
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{   }}{{class}}{{=}}{{"solr.LRUCache"}}
>  {{   }}{{size}}{{=}}{{"128"}}
>  {{   }}{{initialSize}}{{=}}{{"0"}}
>  {{   }}{{regenerator}}{{=}}

[jira] [Created] (SOLR-14298) LBSolrClient.checkAZombieServer should be less stupid

2020-03-02 Thread Chris M. Hostetter (Jira)

Chris M. Hostetter created SOLR-14298:
-

 Summary: LBSolrClient.checkAZombieServer should be less stupid
 Key: SOLR-14298
 URL: https://issues.apache.org/jira/browse/SOLR-14298
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter


LBSolrClient.checkAZombieServer() currently does /select query for {{\*:\*}} 
with distrib=false, rows=0, sort=\_docid\_ ... but this can still chew up a lot 
of time if the shard is big, and it's not self evident wtf is going on in the 
server logs.

At a minimum, these requests should include some sort of tracing param to 
identify the point of he query (ie: {{_zombieservercheck=true}}) and should 
probably be changed to hit something like the /ping handler, or the node status 
handler, or if it's important to folks that it do a "search" that actaully uses 
the index searcher, then it should use  options like timeAllowed / 
segmentTerminateEarly, and/or {{q=-\*:\*}} instead .. or maybe a cusorMark ... 
something to make it not have the overhead of counting all the hits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049504#comment-17049504
 ] 

Michael Sokolov edited comment on LUCENE-8962 at 3/2/20 6:01 PM:
-

Tests seemed to pass consistently on master, so I pushed there, but now I 
cherry-picked to 8x branch and see consistent failures there:

[junit4] Tests with failures [seed: 5036466712CCD5FE]:
[junit4] - org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock
[junit4] - org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeOnCommit


was (Author: sokolov):
Tests seemed to pass consistently on master, so I pushed there, but now I 
cherry-picked to 8x branch and see consistent failures there:


{{ [junit4] Tests with failures [seed: 5036466712CCD5FE]:}}
{{ [junit4] - 
org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock}}
{{ [junit4] - 
org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeOnCommit}}

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049504#comment-17049504
 ] 

Michael Sokolov edited comment on LUCENE-8962 at 3/2/20 6:00 PM:
-

Tests seemed to pass consistently on master, so I pushed there, but now I 
cherry-picked to 8x branch and see consistent failures there:


{{ [junit4] Tests with failures [seed: 5036466712CCD5FE]:}}
{{ [junit4] - 
org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock}}
{{ [junit4] - 
org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeOnCommit}}


was (Author: sokolov):
Tests seemed to pass consistently on master, so I pushed there, but now I 
cherry-picked to 8x branch and see consistent failures there:

```
[junit4] Tests with failures [seed: 5036466712CCD5FE]:
   [junit4]   - 
org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock
   [junit4]   - 
org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeOnCommit
```


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049504#comment-17049504
 ] 

Michael Sokolov commented on LUCENE-8962:
-

Tests seemed to pass consistently on master, so I pushed there, but now I 
cherry-picked to 8x branch and see consistent failures there:

```
[junit4] Tests with failures [seed: 5036466712CCD5FE]:
   [junit4]   - 
org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock
   [junit4]   - 
org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeOnCommit
```


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9253) Support custom dictionaries in KoreanTokenizer



[ 
https://issues.apache.org/jira/browse/LUCENE-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049483#comment-17049483
 ] 

ASF subversion and git services commented on LUCENE-9253:
-

Commit 0de87de90039032b595a44a0e31c0a7da5a4064d in lucene-solr's branch 
refs/heads/branch_8x from Namgyu Kim
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0de87de ]

LUCENE-9253: Support custom dictionaries in KoreanTokenizer

Signed-off-by: Namgyu Kim 


> Support custom dictionaries in KoreanTokenizer
> --
>
> Key: LUCENE-9253
> URL: https://issues.apache.org/jira/browse/LUCENE-9253
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> KoreanTokenizer does not support custom dictionaries(system, unknown) now, 
> even though Nori provides DictionaryBuilder that creates custom dictionary.
> In the current state, it is very difficult for Nori users to use a custom 
> dictionary.
> Therefore, we need to open a new constructor that uses it.
> Kuromoji is already supported(LUCENE-8971) that, and I referenced it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049458#comment-17049458
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit 043c5dff6f44c9bb2415005ac97db3c2c561ab45 in lucene-solr's branch 
refs/heads/master from msfroh
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=043c5df ]

LUCENE-8962: Add ability to selectively merge on commit (#1155)

* LUCENE-8962: Add ability to selectively merge on commit

This adds a new "findCommitMerges" method to MergePolicy, which can
specify merges to be executed before the
IndexWriter.prepareCommitInternal method returns.

If we have many index writer threads, they will flush their DWPT buffers
on commit, resulting in many small segments, which can be merged before
the commit returns.

* Add missing Javadoc

* Fix incorrect comment

* Refactoring and fix intermittent test failure

1. Made some changes to the callback to update toCommit, leveraging
SegmentInfos.applyMergeChanges.
2. I realized that we'll never end up with 0 registered merges, because
we throw an exception if we fail to register a merge.
3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before
we call MergeScheduler.merge, since we may not be merging on another
thread.
4. There was an intermittent test failure due to randomness in the time
it takes for merges to complete. Before doing the final commit, we wait
for pending merges to finish. We may still end up abandoning the final
merge, but we can detect that and assert that either the merge was
abandoned (and we have > 1 segment) or we did merge down to 1 segment.

* Fix typo

* Fix/improve comments based on PR feedback

* More comment improvements from PR feedback

* Rename method and add new MergeTrigger

1. Renamed findCommitMerges -> findFullFlushMerges.
2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to
   MergeScheduler when merging on commit.

* Update renamed method name in strings and comments


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-03-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049459#comment-17049459
 ] 

ASF subversion and git services commented on LUCENE-8962:
-

Commit 043c5dff6f44c9bb2415005ac97db3c2c561ab45 in lucene-solr's branch 
refs/heads/master from msfroh
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=043c5df ]

LUCENE-8962: Add ability to selectively merge on commit (#1155)

* LUCENE-8962: Add ability to selectively merge on commit

This adds a new "findCommitMerges" method to MergePolicy, which can
specify merges to be executed before the
IndexWriter.prepareCommitInternal method returns.

If we have many index writer threads, they will flush their DWPT buffers
on commit, resulting in many small segments, which can be merged before
the commit returns.

* Add missing Javadoc

* Fix incorrect comment

* Refactoring and fix intermittent test failure

1. Made some changes to the callback to update toCommit, leveraging
SegmentInfos.applyMergeChanges.
2. I realized that we'll never end up with 0 registered merges, because
we throw an exception if we fail to register a merge.
3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before
we call MergeScheduler.merge, since we may not be merging on another
thread.
4. There was an intermittent test failure due to randomness in the time
it takes for merges to complete. Before doing the final commit, we wait
for pending merges to finish. We may still end up abandoning the final
merge, but we can detect that and assert that either the merge was
abandoned (and we have > 1 segment) or we did merge down to 1 segment.

* Fix typo

* Fix/improve comments based on PR feedback

* More comment improvements from PR feedback

* Rename method and add new MergeTrigger

1. Renamed findCommitMerges -> findFullFlushMerges.
2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to
   MergeScheduler when merging on commit.

* Update renamed method name in strings and comments


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8962_demo.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov merged pull request #1155: LUCENE-8962: Add ability to selectively merge on commit

2020-03-02 Thread ASF subversion and git services (Jira)

msokolov merged pull request #1155: LUCENE-8962: Add ability to selectively 
merge on commit
URL: https://github.com/apache/lucene-solr/pull/1155
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9253) Support custom dictionaries in KoreanTokenizer



[ 
https://issues.apache.org/jira/browse/LUCENE-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049452#comment-17049452
 ] 

ASF subversion and git services commented on LUCENE-9253:
-

Commit b2dbd18f96e05478146a838108d096df7348d1b4 in lucene-solr's branch 
refs/heads/master from Namgyu Kim
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b2dbd18 ]

LUCENE-9253: Support custom dictionaries in KoreanTokenizer

Signed-off-by: Namgyu Kim 


> Support custom dictionaries in KoreanTokenizer
> --
>
> Key: LUCENE-9253
> URL: https://issues.apache.org/jira/browse/LUCENE-9253
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> KoreanTokenizer does not support custom dictionaries(system, unknown) now, 
> even though Nori provides DictionaryBuilder that creates custom dictionary.
> In the current state, it is very difficult for Nori users to use a custom 
> dictionary.
> Therefore, we need to open a new constructor that uses it.
> Kuromoji is already supported(LUCENE-8971) that, and I referenced it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi merged pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer