[jira] [Resolved] (LUCENE-9225) Rectangle should extend LatLonGeometry
[ https://issues.apache.org/jira/browse/LUCENE-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-9225. -- Fix Version/s: 8.5 Assignee: Ignacio Vera Resolution: Fixed > Rectangle should extend LatLonGeometry > -- > > Key: LUCENE-9225 > URL: https://issues.apache.org/jira/browse/LUCENE-9225 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Minor > Fix For: 8.5 > > Time Spent: 1h > Remaining Estimate: 0h > > Rectangle class is the only geometry class that do not extend LatLonGeometry. > This is because we have an specialise query for rectangles that works on the > encoded space (very similar to what LatLonPoint is doing). > It would be nice if Rectangle could implement LatLonGeometry, so in cases > where a bounding box is part of a complex geometry, it can fall back to > Component2D objects. > The idea is to move the specialise logic in Rectangle2D inside the > specialised LatLonBoundingBoxQuery and rename the current XYRectangle2D to > Rectangle2D. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049965#comment-17049965 ] Jan Høydahl commented on SOLR-13942: {quote}I feel like I'm talking to a wall . the comment I posted 20mins back just addresses these 2 questions {quote} When I answered, your commend said the following: {quote}bin/solr is not everyone has everywhere. People have curl & wget & a browser everywhere. If I want to script something up and read data in ZK there is no alternative I would like to merge it soon {quote} Since then you edited it to answer the security. I am still not convinced - ZK internals should not be part of our public API. ZK contains lots of internals. Please give some good concrete examples of what you'd use it for. Perhaps that will reveal real user needs that call for new managed Solr APIs? > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9225) Rectangle should extend LatLonGeometry
[ https://issues.apache.org/jira/browse/LUCENE-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049950#comment-17049950 ] ASF subversion and git services commented on LUCENE-9225: - Commit b8bfccebf6dde5d2ee38f42a2509629564ae9b0f in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b8bfcce ] LUCENE-9225: Rectangle extends LatLonGeometry so it can be used in a geometry collection (#1258) # Conflicts: # lucene/core/src/java/org/apache/lucene/geo/Rectangle2D.java > Rectangle should extend LatLonGeometry > -- > > Key: LUCENE-9225 > URL: https://issues.apache.org/jira/browse/LUCENE-9225 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Rectangle class is the only geometry class that do not extend LatLonGeometry. > This is because we have an specialise query for rectangles that works on the > encoded space (very similar to what LatLonPoint is doing). > It would be nice if Rectangle could implement LatLonGeometry, so in cases > where a bounding box is part of a complex geometry, it can fall back to > Component2D objects. > The idea is to move the specialise logic in Rectangle2D inside the > specialised LatLonBoundingBoxQuery and rename the current XYRectangle2D to > Rectangle2D. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14300) Some conditional clauses on unindexed field will be ignored by query parser in some specific cases
[ https://issues.apache.org/jira/browse/SOLR-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049940#comment-17049940 ] KOUTA SETSU commented on SOLR-14300: our team did a little bit research, and we wander if it is a bug of SolrQueryParser. more specifically, we think if statement here might be wrong. {code:java} // If this field isn't indexed, or if it is indexed and we want to use TermsQuery, then collect this value. // We are currently relying on things like PointField not being marked as indexed in order to bypass // the "useTermQuery" check. if ((fieldValues == null && useTermsQuery) || !sfield.indexed()) { fieldValues = new ArrayList<>(2); fmap.put(sfield, fieldValues); } {code} [https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711] this if statement will cause fieldValues to be overwritten if sfiled is not indexed. we will do more research and try to fix this problem. but we need a little more time because we need our company's permission. thanks. > Some conditional clauses on unindexed field will be ignored by query parser > in some specific cases > -- > > Key: SOLR-14300 > URL: https://issues.apache.org/jira/browse/SOLR-14300 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 7.3.1 > Environment: Solr 7.3.1 > centos7.5 >Reporter: KOUTA SETSU >Priority: Minor > > In some specific cases, some conditional clauses on unindexed field will be > ignored > * for query like, q=A:1 OR B:1 OR A:2 OR B:2 > if field B is not indexed(but docValues="true"), "B:1" will be lost. > > * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2, > it will work perfect. > the only difference of two queries is that they are wrote in different orders. > one is *ABAB*, another is *AABB.* > > *steps of reproduce* > you can easily reproduce this problem on a solr collection with _default > configset and exampledocs/books.csv data. > # create a _default collection > {code:java} > bin/solr create -c books -s 2 -rf 2{code} > # post books.csv. > {code:java} > bin/post -c books example/exampledocs/books.csv{code} > # run followed query. > ** query1: > [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)&debug=query] > ** query2: > [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)&debug=query] > ** then you can find the parsedqueries are different. > *** query1. ("name_str:Foundation" is lost.) > {code:json} > "debug":{ > "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg > OR cat:cd)", > "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR > cat:cd)", > "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a > 68 65 72 65 67]]))", > "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] > TO [4a 68 65 72 65 67]])", > "QParser":"LuceneQParser"}}{code} > *** query2. ("name_str:Foundation" isn't lost.) > {code:json} > "debug":{ > "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book > OR cat:cd)", > "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR > cat:cd)", > "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f > 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO > [4a 68 65 72 65 67]])))", > "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 > 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 > 67] TO [4a 68 65 72 65 67]]))", > "QParser":"LuceneQParser"}{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9225) Rectangle should extend LatLonGeometry
[ https://issues.apache.org/jira/browse/LUCENE-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049941#comment-17049941 ] ASF subversion and git services commented on LUCENE-9225: - Commit 286d22717b212f125d2df00ecd1c67b28eda4fe3 in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=286d227 ] LUCENE-9225: Rectangle extends LatLonGeometry so it can be used in a geometry collection (#1258) > Rectangle should extend LatLonGeometry > -- > > Key: LUCENE-9225 > URL: https://issues.apache.org/jira/browse/LUCENE-9225 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Rectangle class is the only geometry class that do not extend LatLonGeometry. > This is because we have an specialise query for rectangles that works on the > encoded space (very similar to what LatLonPoint is doing). > It would be nice if Rectangle could implement LatLonGeometry, so in cases > where a bounding box is part of a complex geometry, it can fall back to > Component2D objects. > The idea is to move the specialise logic in Rectangle2D inside the > specialised LatLonBoundingBoxQuery and rename the current XYRectangle2D to > Rectangle2D. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry
iverase merged pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry URL: https://github.com/apache/lucene-solr/pull/1258 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry
iverase commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry URL: https://github.com/apache/lucene-solr/pull/1258#discussion_r386822636 ## File path: lucene/core/src/java/org/apache/lucene/document/LatLonShapeBoundingBoxQuery.java ## @@ -108,4 +115,385 @@ public String toString(String field) { sb.append(rectangle.toString()); return sb.toString(); } + + /** Holds spatial logic for a bounding box that works in the encoded space */ + private static class EncodedRectangle { Review comment: This class is an specialisation for bounding box queries working on the encoded space. As such it feels like the correct place to package this logic is in the query itself. I am pushing this change, if you disagree we can re-think how to package this logic later on. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10397) Port 'autoAddReplicas' feature to the autoscaling framework and make it work with non-shared filesystems
[ https://issues.apache.org/jira/browse/SOLR-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049938#comment-17049938 ] David Smiley commented on SOLR-10397: - I know this is an old issue, but I was looking at {{CoreContainer.isSharedFs(CoreDescriptor)}} which was added here. It looks up the Core to then call {{core.getDirectoryFactory().isSharedStorage()}} or failing that (null core) it loads the config on the fly. Either path is bad IMO; with transient/lazy cores, we don't want to potentially trigger a core load, nor do we want to be loading configs which can potentially be expensive. IMO the "sharedStorage" nature of a core is so important that it ought to go in the core descriptor. WDYT? > Port 'autoAddReplicas' feature to the autoscaling framework and make it work > with non-shared filesystems > > > Key: SOLR-10397 > URL: https://issues.apache.org/jira/browse/SOLR-10397 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Cao Manh Dat >Priority: Major > Labels: autoscaling > Fix For: 7.1, 8.0 > > Attachments: SOLR-10397.1.patch, SOLR-10397.2.patch, > SOLR-10397.2.patch, SOLR-10397.2.patch, SOLR-10397.patch, > SOLR-10397_remove_nocommit.patch > > > Currently 'autoAddReplicas=true' can be specified in the Collection Create > API to automatically add replicas when a replica becomes unavailable. I > propose to move this feature to the autoscaling cluster policy rules design. > This will include the following: > * Trigger support for ‘nodeLost’ event type > * Modification of existing implementation of ‘autoAddReplicas’ to > automatically create the appropriate ‘nodeLost’ trigger. > * Any such auto-created trigger must be marked internally such that setting > ‘autoAddReplicas=false’ via the Modify Collection API should delete or > disable corresponding trigger. > * Support for non-HDFS filesystems while retaining the optimization afforded > by HDFS i.e. the replaced replica can point to the existing data dir of the > old replica. > * Deprecate/remove the feature of enabling/disabling ‘autoAddReplicas’ across > the entire cluster using cluster properties in favor of using the > suspend-trigger/resume-trigger APIs. > This will retain backward compatibility for the most part and keep a common > use-case easy to enable as well as make it available to more people (i.e. > people who don't use HDFS). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9251) Polygon tessellator fails to detect some collinear points
[ https://issues.apache.org/jira/browse/LUCENE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-9251. -- Fix Version/s: 8.5 Assignee: Ignacio Vera Resolution: Fixed > Polygon tessellator fails to detect some collinear points > - > > Key: LUCENE-9251 > URL: https://issues.apache.org/jira/browse/LUCENE-9251 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: 8.5 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > A user of Elasticsearch [has reported| > https://discuss.elastic.co/t/unable-to-tessellate-shape-error-on-indexing-es-7-6/220867] > a tessellation error in a valid polygon. The reported polygon is quite > complex but after digging a bit, the problem is the tesselator fails to > detect some colinearities. In particular, in complex tessellation we can end > up with two equal edges with different flag in {{isEdgeFromPolygon}}. Still > we should be able to remove that co-linearity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14300) Some conditional clauses on unindexed field will be ignored by query parser in some specific cases
[ https://issues.apache.org/jira/browse/SOLR-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KOUTA SETSU updated SOLR-14300: --- Description: In some specific cases, some conditional clauses on unindexed field will be ignored * for query like, q=A:1 OR B:1 OR A:2 OR B:2 if field B is not indexed(but docValues="true"), "B:1" will be lost. * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2, it will work perfect. the only difference of two queries is that they are wrote in different orders. one is *ABAB*, another is *AABB.* *steps of reproduce* you can easily reproduce this problem on a solr collection with _default configset and exampledocs/books.csv data. # create a _default collection {code:java} bin/solr create -c books -s 2 -rf 2{code} # post books.csv. {code:java} bin/post -c books example/exampledocs/books.csv{code} # run followed query. ** query1: [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)&debug=query] ** query2: [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)&debug=query] ** then you can find the parsedqueries are different. *** query1. ("name_str:Foundation" is lost.) {code:json} "debug":{ "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd)", "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd)", "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]]))", "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]])", "QParser":"LuceneQParser"}}{code} *** query2. ("name_str:Foundation" isn't lost.) {code:json} "debug":{ "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd)", "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd)", "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]])))", "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]]))", "QParser":"LuceneQParser"}{code} was: In some specific cases, some conditional clauses on unindexed field will be ignored * for query like, q=A:1 OR B:1 OR A:2 OR B:2 if field B is not indexed(but docValues="true"), "B:1" will be lost. * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2, it will work perfect. the only difference of two queries is that they are wrote in different orders. one is *ABAB*, another is *AABB.* *steps of reproduce* you can easily reproduce this problem on a solr collection with _default configset and exampledocs/books.csv data. # create a _default collection {code:java} bin/solr create -c books -s 2 -rf 2{code} # post books.csv. {code:java} bin/post -c books example/exampledocs/books.csv{code} # run followed query. ** query1: [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)&debug=query] ** query2: [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)&debug=query] ** then you can find the parsedqueries are different. *** query1. ("name_str:Foundation" is lost.) {code:json} "debug":{ "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd)", "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd)", "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]]))", "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]])", "QParser":"LuceneQParser"}}{code} *** query2. ("name_str:Foundation" isn't lost.) {code:json} "debug":{ "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd)", "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd)", "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]])))", "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]]))", "QParser":"LuceneQParser"}{code} > Some conditional clauses on unindexed field will be ignored by query parser > in some specific cases > -- > > Key: SOLR-14300 > URL: https://issues.apache.org/jira/browse/SOLR-14300 > Project: Solr > Issue Type: B
[jira] [Resolved] (LUCENE-9239) TestLatLonMultiPolygonShapeQueries error with CIRCLE queries
[ https://issues.apache.org/jira/browse/LUCENE-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-9239. -- Fix Version/s: 8.5 Assignee: Ignacio Vera Resolution: Fixed > TestLatLonMultiPolygonShapeQueries error with CIRCLE queries > > > Key: LUCENE-9239 > URL: https://issues.apache.org/jira/browse/LUCENE-9239 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: 8.5 > > Attachments: screenshot-1.png > > Time Spent: 20m > Remaining Estimate: 0h > > The failure can be reproduced with: > {code} > ant test -Dtestcase=TestLatLonMultiPolygonShapeQueries > -Dtests.method=testRandomBig -Dtests.seed=844FBD6099212BE8 > -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true > -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt > -Dtests.locale=sr-BA -Dtests.timezone=Asia/Ashkhabad -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > {code} > The error message: > {code} > query=LatLonShapeQuery: > field=shape:[CIRCLE([78.01086555431775,0.9513280497489234] radius = > 1097753.4254892308 meters),] docID=43632 > shape=[[-22.350172194105966, 49.931598911327825] [90.0, 49.931598911327825] > [90.0, 51.408196891378765] [-22.350172194105966, 51.408196891378765] > [-22.350172194105966, 49.931598911327825] , [76.12283244781244, > -28.218674420982268] [81.7520930577503, -28.218674420982268] > [81.7520930577503, -1.0286448278003566E-32] [76.12283244781244, > -1.0286448278003566E-32] [76.12283244781244, -28.218674420982268] ] > deleted?=false distanceQuery=CIRCLE([78.01086555431775,0.9513280497489234] > radius = 1097753.4254892308 meters) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14300) Some conditional clauses on unindexed field will be ignored by query parser in some specific cases
KOUTA SETSU created SOLR-14300: -- Summary: Some conditional clauses on unindexed field will be ignored by query parser in some specific cases Key: SOLR-14300 URL: https://issues.apache.org/jira/browse/SOLR-14300 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: query parsers Affects Versions: 7.3.1 Environment: Solr 7.3.1 centos7.5 Reporter: KOUTA SETSU In some specific cases, some conditional clauses on unindexed field will be ignored * for query like, q=A:1 OR B:1 OR A:2 OR B:2 if field B is not indexed(but docValues="true"), "B:1" will be lost. * but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2, it will work perfect. the only difference of two queries is that they are wrote in different orders. one is *ABAB*, another is *AABB.* *steps of reproduce* you can easily reproduce this problem on a solr collection with _default configset and exampledocs/books.csv data. # create a _default collection {code:java} bin/solr create -c books -s 2 -rf 2{code} # post books.csv. {code:java} bin/post -c books example/exampledocs/books.csv{code} # run followed query. ** query1: [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)&debug=query] ** query2: [http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)&debug=query] ** then you can find the parsedqueries are different. *** query1. ("name_str:Foundation" is lost.) {code:json} "debug":{ "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd)", "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd)", "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]]))", "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]])", "QParser":"LuceneQParser"}}{code} *** query2. ("name_str:Foundation" isn't lost.) {code:json} "debug":{ "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd)", "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd)", "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]])))", "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]]))", "QParser":"LuceneQParser"}{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049930#comment-17049930 ] Ishan Chattopadhyaya commented on SOLR-13942: - bq. I would like to merge it soon +1, this is a convenience to have. > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049929#comment-17049929 ] Ishan Chattopadhyaya commented on SOLR-13942: - bq. I would like you to explain the true benefit of this, explain why it is not a security risk and then gain consensus before continuing. Why is this a security risk? Nothing in this issue makes Solr more insecure than it already is. > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9251) Polygon tessellator fails to detect some collinear points
[ https://issues.apache.org/jira/browse/LUCENE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049927#comment-17049927 ] ASF subversion and git services commented on LUCENE-9251: - Commit e5036e963c93960a968c1d02025910590e9b242a in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e5036e9 ] LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon (#1290) Fix bug in the polygon tessellator where edges with different value on #isEdgeFromPolygon were bot filtered out properly > Polygon tessellator fails to detect some collinear points > - > > Key: LUCENE-9251 > URL: https://issues.apache.org/jira/browse/LUCENE-9251 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > A user of Elasticsearch [has reported| > https://discuss.elastic.co/t/unable-to-tessellate-shape-error-on-indexing-es-7-6/220867] > a tessellation error in a valid polygon. The reported polygon is quite > complex but after digging a bit, the problem is the tesselator fails to > detect some colinearities. In particular, in complex tessellation we can end > up with two equal edges with different flag in {{isEdgeFromPolygon}}. Still > we should be able to remove that co-linearity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9251) Polygon tessellator fails to detect some collinear points
[ https://issues.apache.org/jira/browse/LUCENE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049926#comment-17049926 ] ASF subversion and git services commented on LUCENE-9251: - Commit c313365c5ffe76192e6179f3dfe23d056f7076c5 in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c313365 ] LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon (#1290) Fix bug in the polygon tessellator where edges with different value on #isEdgeFromPolygon were bot filtered out properly > Polygon tessellator fails to detect some collinear points > - > > Key: LUCENE-9251 > URL: https://issues.apache.org/jira/browse/LUCENE-9251 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > A user of Elasticsearch [has reported| > https://discuss.elastic.co/t/unable-to-tessellate-shape-error-on-indexing-es-7-6/220867] > a tessellation error in a valid polygon. The reported polygon is quite > complex but after digging a bit, the problem is the tesselator fails to > detect some colinearities. In particular, in complex tessellation we can end > up with two equal edges with different flag in {{isEdgeFromPolygon}}. Still > we should be able to remove that co-linearity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #1290: LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon
iverase merged pull request #1290: LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon URL: https://github.com/apache/lucene-solr/pull/1290 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9239) TestLatLonMultiPolygonShapeQueries error with CIRCLE queries
[ https://issues.apache.org/jira/browse/LUCENE-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049924#comment-17049924 ] ASF subversion and git services commented on LUCENE-9239: - Commit 3fb10787910ad57959fc636bd66d83d9fdde7ea5 in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3fb1078 ] LUCENE-9239: Circle2D#WithinTriangle detects properly if a triangle is Within distance. (#1280) > TestLatLonMultiPolygonShapeQueries error with CIRCLE queries > > > Key: LUCENE-9239 > URL: https://issues.apache.org/jira/browse/LUCENE-9239 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Attachments: screenshot-1.png > > Time Spent: 20m > Remaining Estimate: 0h > > The failure can be reproduced with: > {code} > ant test -Dtestcase=TestLatLonMultiPolygonShapeQueries > -Dtests.method=testRandomBig -Dtests.seed=844FBD6099212BE8 > -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true > -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt > -Dtests.locale=sr-BA -Dtests.timezone=Asia/Ashkhabad -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > {code} > The error message: > {code} > query=LatLonShapeQuery: > field=shape:[CIRCLE([78.01086555431775,0.9513280497489234] radius = > 1097753.4254892308 meters),] docID=43632 > shape=[[-22.350172194105966, 49.931598911327825] [90.0, 49.931598911327825] > [90.0, 51.408196891378765] [-22.350172194105966, 51.408196891378765] > [-22.350172194105966, 49.931598911327825] , [76.12283244781244, > -28.218674420982268] [81.7520930577503, -28.218674420982268] > [81.7520930577503, -1.0286448278003566E-32] [76.12283244781244, > -1.0286448278003566E-32] [76.12283244781244, -28.218674420982268] ] > deleted?=false distanceQuery=CIRCLE([78.01086555431775,0.9513280497489234] > radius = 1097753.4254892308 meters) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9239) TestLatLonMultiPolygonShapeQueries error with CIRCLE queries
[ https://issues.apache.org/jira/browse/LUCENE-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049923#comment-17049923 ] ASF subversion and git services commented on LUCENE-9239: - Commit b732ce700258ab05590f3904c8d2cd332aa4e0cb in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b732ce7 ] LUCENE-9239: Circle2D#WithinTriangle detects properly if a triangle is Within distance. (#1280) > TestLatLonMultiPolygonShapeQueries error with CIRCLE queries > > > Key: LUCENE-9239 > URL: https://issues.apache.org/jira/browse/LUCENE-9239 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Attachments: screenshot-1.png > > Time Spent: 20m > Remaining Estimate: 0h > > The failure can be reproduced with: > {code} > ant test -Dtestcase=TestLatLonMultiPolygonShapeQueries > -Dtests.method=testRandomBig -Dtests.seed=844FBD6099212BE8 > -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true > -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt > -Dtests.locale=sr-BA -Dtests.timezone=Asia/Ashkhabad -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > {code} > The error message: > {code} > query=LatLonShapeQuery: > field=shape:[CIRCLE([78.01086555431775,0.9513280497489234] radius = > 1097753.4254892308 meters),] docID=43632 > shape=[[-22.350172194105966, 49.931598911327825] [90.0, 49.931598911327825] > [90.0, 51.408196891378765] [-22.350172194105966, 51.408196891378765] > [-22.350172194105966, 49.931598911327825] , [76.12283244781244, > -28.218674420982268] [81.7520930577503, -28.218674420982268] > [81.7520930577503, -1.0286448278003566E-32] [76.12283244781244, > -1.0286448278003566E-32] [76.12283244781244, -28.218674420982268] ] > deleted?=false distanceQuery=CIRCLE([78.01086555431775,0.9513280497489234] > radius = 1097753.4254892308 meters) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #1280: LUCENE-9239: Change withinTriangle logic for Circles
iverase merged pull request #1280: LUCENE-9239: Change withinTriangle logic for Circles URL: https://github.com/apache/lucene-solr/pull/1280 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul opened a new pull request #1308: SOLR-13942 /api/cluster/zk/* to fetch raw ZK data
noblepaul opened a new pull request #1308: SOLR-13942 /api/cluster/zk/* to fetch raw ZK data URL: https://github.com/apache/lucene-solr/pull/1308 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"
[ https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049901#comment-17049901 ] Houston Putman commented on SOLR-11746: --- I'm not sure why that'd be an issue, those links do work both on the released 8.4 ref guide, and on the master branch. [https://lucene.apache.org/solr/guide/8_4/the-standard-query-parser.html#differences-between-lucenes-classic-query-parser-and-solrs-standard-query-parser] [https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/the-standard-query-parser.adoc#differences-between-lucenes-classic-query-parser-and-solrs-standard-query-parser] I also tested {{ant build-site}}, which works for me off of master. > numeric fields need better error handling for prefix/wildcard syntax -- > consider uniform support for "foo:* == foo:[* TO *]" > > > Key: SOLR-11746 > URL: https://issues.apache.org/jira/browse/SOLR-11746 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0 >Reporter: Chris M. Hostetter >Assignee: Houston Putman >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, > SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, > SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch > > > On the solr-user mailing list, Torsten Krah pointed out that with Trie > numeric fields, query syntax such as {{foo_d:\*}} has been functionality > equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported > for Point based numeric fields. > The fact that this type of syntax works (for {{indexed="true"}} Trie fields) > appears to have been an (untested, undocumented) fluke of Trie fields given > that they use indexed terms for the (encoded) numeric terms and inherit the > default implementation of {{FieldType.getPrefixQuery}} which produces a > prefix query against the {{""}} (empty string) term. > (Note that this syntax has aparently _*never*_ worked for Trie fields with > {{indexed="false" docValues="true"}} ) > In general, we should assess the behavior users attempt a prefix/wildcard > syntax query against numeric fields, as currently the behavior is largely > non-sensical: prefix/wildcard syntax frequently match no docs w/o any sort > of error, and the aformentioned {{numeric_field:*}} behaves inconsistently > between points/trie fields and between indexed/docValued trie fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14291) OldAnalyticsRequestConverter should support fields names with dots
[ https://issues.apache.org/jira/browse/SOLR-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049897#comment-17049897 ] Houston Putman commented on SOLR-14291: --- This looks good to me, especially with the test. Thanks for the fix [~anatolii_siuniaev]! > OldAnalyticsRequestConverter should support fields names with dots > -- > > Key: SOLR-14291 > URL: https://issues.apache.org/jira/browse/SOLR-14291 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search, SearchComponents - other >Reporter: Anatolii Siuniaev >Priority: Trivial > Attachments: SOLR-14291.patch > > > If you send a query with range facets using old olap-style syntax (see pdf > [here|https://issues.apache.org/jira/browse/SOLR-5302]), > OldAnalyticsRequestConverter just silently (no exception thrown) omits > parameters like > {code:java} > olap..rangefacet..start > {code} > in case if __ has dots inside (for instance field name is > _Project.Value_). And thus no range facets are returned in response. > Probably the same happens in case of field faceting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-13942: -- Description: example download the {{state.json}} of {code} GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json {code} get a list of all children under {{/live_nodes}} {code} GET http://localhost:8983/api/cluster/zk/live_nodes {code} If the requested path is a node with children show the list of child nodes and their meta data was:If the requested path is a node with children show the list of child nodes and their meta data > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14270) export command to have an option to write to a zip file
[ https://issues.apache.org/jira/browse/SOLR-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049883#comment-17049883 ] Noble Paul commented on SOLR-14270: --- Thanks [~ctargett] > export command to have an option to write to a zip file > --- > > Key: SOLR-14270 > URL: https://issues.apache.org/jira/browse/SOLR-14270 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Labels: cli > Fix For: 8.5 > > Time Spent: 50m > Remaining Estimate: 0h > > Plain json files are too big. Export to a compressed file > {{bin/solr export -url http://localhost:8983/solr/gettingstarted -out > gettingstarted.json.gz}} > This will write the data to a file called {{gettingstarted.json.gz}} in a zip > format -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9150) Restore support for dynamic PlanetModel in Geo3D
[ https://issues.apache.org/jira/browse/LUCENE-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049866#comment-17049866 ] ASF subversion and git services commented on LUCENE-9150: - Commit ab6fb77c63df5a8d347c0d67335e8919df683f64 in lucene-solr's branch refs/heads/branch_8x from Nicholas Knize [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ab6fb77 ] LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d > Restore support for dynamic PlanetModel in Geo3D > > > Key: LUCENE-9150 > URL: https://issues.apache.org/jira/browse/LUCENE-9150 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Nick Knize >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > LUCENE-7072 removed dynamic planet model support in Geo3D. This was logical > at the time (given the state of Lucene and spatial projections and coordinate > reference systems). Since then, however, there have been a lot of new > developments within the OGC community around [Coordinate Reference > Systems|https://docs.opengeospatial.org/as/18-005r4/18-005r4.html], [Dynamic > Coordinate Reference > Systems|http://docs.opengeospatial.org/DRAFTS/18-058.html], and [Updated ISO > Standards|https://www.iso.org/obp/ui/#iso:std:iso:19111:ed-3:v1:en]. > It would be useful for Geo3D (and eventually LatLon*) to support different > geographic datums to make lucene a viable option for indexing/searching in > different spatial reference systems (e.g., more accurately computing query > shape relations to BKD's internal nodes using datum consistent with the > spatial projection). This would also provide an alternative to other > limitations of the {{LatLon*/XY*}} implementation (e.g., pole/dateline > crossing, quantization of small polygons). > I'd like to propose keeping the current WGS84 static datum as the default for > Geo3D but adding back the constructors to accept custom planet models. > Perhaps this could be listed as an "expert" API feature? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13502) Investigate using something other than ZooKeeper's "4 letter words" for the admin UI status
[ https://issues.apache.org/jira/browse/SOLR-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-13502. --- Resolution: Won't Fix I don't find anything that looks useful. Plus, this is very low-benefit, we already have a workable solution that nobody's complained about since 8.2, so I don't think this is worth the effort. If someone else wants to take it over, it can be reopened. > Investigate using something other than ZooKeeper's "4 letter words" for the > admin UI status > --- > > Key: SOLR-13502 > URL: https://issues.apache.org/jira/browse/SOLR-13502 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > ZooKeeper 3.5.5 requires a whitelist of allowed "4 letter words". The only > place I see on a quick look at the Solr code where 4lws are used is in the > admin UI "ZK Status" link. > In order to use the admin UI "ZK Status" link, users will have to modify > their zoo.cfg file with > {code} > 4lw.commands.whitelist=mntr,conf,ruok > {code} > This JIRA is to see if there are alternatives to using 4lw for the admin UI. > This depends on SOLR-8346. If we find an alternative, we need to remove the > additions to the ref guide that mention changing zoo.cfg (just scan for 4lw > in all the .adoc files) and remove SolrZkServer.ZK_WHITELIST_PROPERTY and all > references to it (SolrZkServer and SolrTestCaseJ4). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049810#comment-17049810 ] Lucene/Solr QA commented on SOLR-12325: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 3s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 34s{color} | {color:red} solr_core generated 1 new + 99 unchanged - 0 fixed = 100 total (was 99) {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 76m 16s{color} | {color:green} core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} test-framework in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 85m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-12325 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995363/SOLR-12325.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / e308e53 | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | LTS | | javac | https://builds.apache.org/job/PreCommit-SOLR-Build/693/artifact/out/diff-compile-javac-solr_core.txt | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/693/testReport/ | | modules | C: solr/core solr/test-framework U: solr | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/693/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch, > SOLR-12325.patch, SOLR-12325.patch, > SOLR-12325_Random_test_for_uniqueBlockQuery (1).patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14299) IndexFetcher doesnt' reset errorCount to 0 after the last packet is received
[ https://issues.apache.org/jira/browse/SOLR-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved SOLR-14299. -- Fix Version/s: 8.5 Assignee: Mike Drob Resolution: Fixed Thanks for the patch [~praste], I added a line to CHANGES and committed this. > IndexFetcher doesnt' reset errorCount to 0 after the last packet is received > > > Key: SOLR-14299 > URL: https://issues.apache.org/jira/browse/SOLR-14299 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) >Affects Versions: 7.7.1 >Reporter: Pushkar Raste >Assignee: Mike Drob >Priority: Minor > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > While fetching the files from master `IndexFetcher` retries 5 times before > giving up. It resets the errorCount after successfully receiving the packet > except for the last packet. Seems like an oversight. > > [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1739-L1742] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14299) IndexFetcher doesnt' reset errorCount to 0 after the last packet is received
[ https://issues.apache.org/jira/browse/SOLR-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049803#comment-17049803 ] ASF subversion and git services commented on SOLR-14299: Commit fa2ce13fde08a8cfbf4f18279a13e273039d6eb6 in lucene-solr's branch refs/heads/branch_8x from Pushkar Raste [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fa2ce13 ] SOLR-14299 IndexFetcher doesn't reset count to 0 after the last packet is received > IndexFetcher doesnt' reset errorCount to 0 after the last packet is received > > > Key: SOLR-14299 > URL: https://issues.apache.org/jira/browse/SOLR-14299 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) >Affects Versions: 7.7.1 >Reporter: Pushkar Raste >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > While fetching the files from master `IndexFetcher` retries 5 times before > giving up. It resets the errorCount after successfully receiving the packet > except for the last packet. Seems like an oversight. > > [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1739-L1742] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14299) IndexFetcher doesnt' reset errorCount to 0 after the last packet is received
[ https://issues.apache.org/jira/browse/SOLR-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049789#comment-17049789 ] ASF subversion and git services commented on SOLR-14299: Commit 17c576a36f7419166554c1cfd3438d063b751e2b in lucene-solr's branch refs/heads/master from Pushkar Raste [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=17c576a ] SOLR-14299 IndexFetcher doesn't reset count to 0 after the last packet is received > IndexFetcher doesnt' reset errorCount to 0 after the last packet is received > > > Key: SOLR-14299 > URL: https://issues.apache.org/jira/browse/SOLR-14299 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) >Affects Versions: 7.7.1 >Reporter: Pushkar Raste >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > While fetching the files from master `IndexFetcher` retries 5 times before > giving up. It resets the errorCount after successfully receiving the packet > except for the last packet. Seems like an oversight. > > [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1739-L1742] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits
madrob commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits URL: https://github.com/apache/lucene-solr/pull/1297#discussion_r386726709 ## File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java ## @@ -1684,58 +1685,37 @@ private void doGetShardIdAndNodeNameProcess(CoreDescriptor cd) { } private void waitForCoreNodeName(CoreDescriptor descriptor) { -int retryCount = 320; -log.debug("look for our core node name"); -while (retryCount-- > 0) { - final DocCollection docCollection = zkStateReader.getClusterState() - .getCollectionOrNull(descriptor.getCloudDescriptor().getCollectionName()); - if (docCollection != null && docCollection.getSlicesMap() != null) { -final Map slicesMap = docCollection.getSlicesMap(); -for (Slice slice : slicesMap.values()) { - for (Replica replica : slice.getReplicas()) { -// TODO: for really large clusters, we could 'index' on this - -String nodeName = replica.getStr(ZkStateReader.NODE_NAME_PROP); -String core = replica.getStr(ZkStateReader.CORE_NAME_PROP); - -String msgNodeName = getNodeName(); -String msgCore = descriptor.getName(); - -if (msgNodeName.equals(nodeName) && core.equals(msgCore)) { - descriptor.getCloudDescriptor() - .setCoreNodeName(replica.getName()); - getCoreContainer().getCoresLocator().persist(getCoreContainer(), descriptor); - return; -} - } +log.debug("waitForCoreNodeName >>> look for our core node name"); +try { + zkStateReader.waitForState(descriptor.getCollectionName(), 320, TimeUnit.SECONDS, c -> { +String name = ClusterStateMutator.getAssignedCoreNodeName(c, getNodeName(), descriptor.getName()); +if (name == null) { + return false; } - } - try { -Thread.sleep(1000); - } catch (InterruptedException e) { -Thread.currentThread().interrupt(); - } +descriptor.getCloudDescriptor().setCoreNodeName(name); Review comment: Dug into this deeper, and I believe that the latch in `waitForState` will guarantee data visibility. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits
madrob commented on a change in pull request #1297: SOLR-14253 Replace various sleep calls with ZK waits URL: https://github.com/apache/lucene-solr/pull/1297#discussion_r386726758 ## File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java ## @@ -1684,58 +1685,37 @@ private void doGetShardIdAndNodeNameProcess(CoreDescriptor cd) { } private void waitForCoreNodeName(CoreDescriptor descriptor) { -int retryCount = 320; -log.debug("look for our core node name"); -while (retryCount-- > 0) { - final DocCollection docCollection = zkStateReader.getClusterState() - .getCollectionOrNull(descriptor.getCloudDescriptor().getCollectionName()); - if (docCollection != null && docCollection.getSlicesMap() != null) { -final Map slicesMap = docCollection.getSlicesMap(); -for (Slice slice : slicesMap.values()) { - for (Replica replica : slice.getReplicas()) { -// TODO: for really large clusters, we could 'index' on this - -String nodeName = replica.getStr(ZkStateReader.NODE_NAME_PROP); -String core = replica.getStr(ZkStateReader.CORE_NAME_PROP); - -String msgNodeName = getNodeName(); -String msgCore = descriptor.getName(); - -if (msgNodeName.equals(nodeName) && core.equals(msgCore)) { - descriptor.getCloudDescriptor() - .setCoreNodeName(replica.getName()); - getCoreContainer().getCoresLocator().persist(getCoreContainer(), descriptor); - return; -} - } +log.debug("waitForCoreNodeName >>> look for our core node name"); +try { + zkStateReader.waitForState(descriptor.getCollectionName(), 320, TimeUnit.SECONDS, c -> { +String name = ClusterStateMutator.getAssignedCoreNodeName(c, getNodeName(), descriptor.getName()); +if (name == null) { + return false; } - } - try { -Thread.sleep(1000); - } catch (InterruptedException e) { -Thread.currentThread().interrupt(); - } +descriptor.getCloudDescriptor().setCoreNodeName(name); +return true; + }); +} catch (TimeoutException | InterruptedException e) { + throw new SolrException(ErrorCode.SERVER_ERROR, "Timeout waiting for collection state", e); Review comment: Will do! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Sokolov resolved LUCENE-8962. - Fix Version/s: 8.5 Resolution: Fixed pushed to master, and also backported to branch8x > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.5 > > Attachments: LUCENE-8962_demo.png > > Time Spent: 6h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049754#comment-17049754 ] ASF subversion and git services commented on LUCENE-8962: - Commit e308e538731f392eb81ba81cfb3ec5fc526fd383 in lucene-solr's branch refs/heads/master from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e308e53 ] Add CHANGES entry for LUCENE-8962 > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 6h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049750#comment-17049750 ] ASF subversion and git services commented on LUCENE-8962: - Commit a1791e77143aa8087c0b5ee0e8eb57422e59a09a in lucene-solr's branch refs/heads/branch_8x from msfroh [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a1791e7 ] LUCENE-8962: Add ability to selectively merge on commit (#1155) * LUCENE-8962: Add ability to selectively merge on commit This adds a new "findCommitMerges" method to MergePolicy, which can specify merges to be executed before the IndexWriter.prepareCommitInternal method returns. If we have many index writer threads, they will flush their DWPT buffers on commit, resulting in many small segments, which can be merged before the commit returns. * Add missing Javadoc * Fix incorrect comment * Refactoring and fix intermittent test failure 1. Made some changes to the callback to update toCommit, leveraging SegmentInfos.applyMergeChanges. 2. I realized that we'll never end up with 0 registered merges, because we throw an exception if we fail to register a merge. 3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before we call MergeScheduler.merge, since we may not be merging on another thread. 4. There was an intermittent test failure due to randomness in the time it takes for merges to complete. Before doing the final commit, we wait for pending merges to finish. We may still end up abandoning the final merge, but we can detect that and assert that either the merge was abandoned (and we have > 1 segment) or we did merge down to 1 segment. * Fix typo * Fix/improve comments based on PR feedback * More comment improvements from PR feedback * Rename method and add new MergeTrigger 1. Renamed findCommitMerges -> findFullFlushMerges. 2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to MergeScheduler when merging on commit. * Update renamed method name in strings and comments > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 6h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049751#comment-17049751 ] ASF subversion and git services commented on LUCENE-8962: - Commit a1791e77143aa8087c0b5ee0e8eb57422e59a09a in lucene-solr's branch refs/heads/branch_8x from msfroh [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a1791e7 ] LUCENE-8962: Add ability to selectively merge on commit (#1155) * LUCENE-8962: Add ability to selectively merge on commit This adds a new "findCommitMerges" method to MergePolicy, which can specify merges to be executed before the IndexWriter.prepareCommitInternal method returns. If we have many index writer threads, they will flush their DWPT buffers on commit, resulting in many small segments, which can be merged before the commit returns. * Add missing Javadoc * Fix incorrect comment * Refactoring and fix intermittent test failure 1. Made some changes to the callback to update toCommit, leveraging SegmentInfos.applyMergeChanges. 2. I realized that we'll never end up with 0 registered merges, because we throw an exception if we fail to register a merge. 3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before we call MergeScheduler.merge, since we may not be merging on another thread. 4. There was an intermittent test failure due to randomness in the time it takes for merges to complete. Before doing the final commit, we wait for pending merges to finish. We may still end up abandoning the final merge, but we can detect that and assert that either the merge was abandoned (and we have > 1 segment) or we did merge down to 1 segment. * Fix typo * Fix/improve comments based on PR feedback * More comment improvements from PR feedback * Rename method and add new MergeTrigger 1. Renamed findCommitMerges -> findFullFlushMerges. 2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to MergeScheduler when merging on commit. * Update renamed method name in strings and comments > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 6h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049753#comment-17049753 ] ASF subversion and git services commented on LUCENE-8962: - Commit fdac6d866344611290c45c164112277581328bc9 in lucene-solr's branch refs/heads/branch_8x from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fdac6d8 ] Add CHANGES entry for LUCENE-8962 > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 6h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049752#comment-17049752 ] ASF subversion and git services commented on LUCENE-8962: - Commit a5475de57fed6b339cd5565bd1bd2650f265a537 in lucene-solr's branch refs/heads/branch_8x from Michael Froh [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a5475de ] LUCENE-8962: Fix intermittent test failures 1. TestIndexWriterMergePolicy.testMergeOnCommit will fail if the last commit (the one that should trigger the full merge) doesn't have any pending changes (which could occur if the last indexing thread commits at the end). We can fix that by adding one more document before that commit. 2. The previous implementation was throwing IOException if the commit thread gets interrupted while waiting for merges to complete. This violates IndexWriter's documented behavior of throwing ThreadInterruptedException. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 6h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049748#comment-17049748 ] ASF subversion and git services commented on LUCENE-8962: - Commit f017ae465ec416b9cf5ac91f9aa12ff71abd7de0 in lucene-solr's branch refs/heads/master from msfroh [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f017ae4 ] LUCENE-8962: Fix intermittent test failures (#1307) 1. TestIndexWriterMergePolicy.testMergeOnCommit will fail if the last commit (the one that should trigger the full merge) doesn't have any pending changes (which could occur if the last indexing thread commits at the end). We can fix that by adding one more document before that commit. 2. The previous implementation was throwing IOException if the commit thread gets interrupted while waiting for merges to complete. This violates IndexWriter's documented behavior of throwing ThreadInterruptedException. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 6h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049747#comment-17049747 ] ASF subversion and git services commented on LUCENE-8962: - Commit f017ae465ec416b9cf5ac91f9aa12ff71abd7de0 in lucene-solr's branch refs/heads/master from msfroh [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f017ae4 ] LUCENE-8962: Fix intermittent test failures (#1307) 1. TestIndexWriterMergePolicy.testMergeOnCommit will fail if the last commit (the one that should trigger the full merge) doesn't have any pending changes (which could occur if the last indexing thread commits at the end). We can fix that by adding one more document before that commit. 2. The previous implementation was throwing IOException if the commit thread gets interrupted while waiting for merges to complete. This violates IndexWriter's documented behavior of throwing ThreadInterruptedException. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 6h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on issue #1307: LUCENE-8962: Fix intermittent test failures
msokolov commented on issue #1307: LUCENE-8962: Fix intermittent test failures URL: https://github.com/apache/lucene-solr/pull/1307#issuecomment-593676692 OK, phew after this change all the renewed testing effort failed to turn up any failures, so I'll push This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov merged pull request #1307: LUCENE-8962: Fix intermittent test failures
msokolov merged pull request #1307: LUCENE-8962: Fix intermittent test failures URL: https://github.com/apache/lucene-solr/pull/1307 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation
dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593675019 Ehh; nevermind my ill-thought-out idea of a cost on the Map context. There are many ValueSource.getValues impls that'd need to parse it, and then there's a concern that we wouldn't want it to propagate to sub-FunctionValues. Alternative proposal: When FunctionRangeQuery calls functionValues.getRangeScorer, it gets back a ValueSourceScorer. We could just add a mutable cost on VSC that if set will be returned by VSC and if not VSC will delegate to the proposed `FV.cost`. While the mutability of it isn't pretty, it's also quite minor. It saves FRQ from having to wrap the scorer only to specify a matchCost. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on issue #1307: LUCENE-8962: Fix intermittent test failures
msokolov commented on issue #1307: LUCENE-8962: Fix intermittent test failures URL: https://github.com/apache/lucene-solr/pull/1307#issuecomment-593674134 I re-ran the whole test suite locally and will beast these: TestIndexWriter.testThreadInterruptDeadlock, TestIndexWriterMergePolicy.testMergeOnCommit and verify failing seeds from jenkins: 5760178D4A8250A6:73E60D8AAB67B286 (failed on master) EF4E611015C3B5B0:CBC87B17F4265790 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049738#comment-17049738 ] Michele Palmia commented on LUCENE-9258: I added a patch with the fix together with a(n addition to a) test that fails with the current implementation. Any advice on improving the testing would be greatly appreciated (is it ok to test the Scorer independently? Should I mock the Weight?). > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-9258: --- Attachment: LUCENE-9258.patch Lucene Fields: New,Patch Available (was: New) Review Patch?: Yes > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
Michele Palmia created LUCENE-9258: -- Summary: DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field Key: LUCENE-9258 URL: https://issues.apache.org/jira/browse/LUCENE-9258 Project: Lucene - Core Issue Type: Bug Affects Versions: 8.4 Reporter: Michele Palmia When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from _DocTermsIndexDocValues_ , the latter instantiates a new iterator on _SortedDocValues_ regardless of the fact that the underlying field can actually be of a different type (e.g. a _SortedSetDocValues_ processed through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14241) Streaming Expression for deleting documents by IDs (from tuples)
[ https://issues.apache.org/jira/browse/SOLR-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049729#comment-17049729 ] ASF subversion and git services commented on SOLR-14241: Commit f2a6ff1494c33d9e70e73864ca892958103e170a in lucene-solr's branch refs/heads/branch_8x from Cassandra Targett [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f2a6ff1 ] SOLR-14241: fix typos & incorrect example param > Streaming Expression for deleting documents by IDs (from tuples) > > > Key: SOLR-14241 > URL: https://issues.apache.org/jira/browse/SOLR-14241 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: DELQ-adds-and-deletes.png, DELQ-only-adds.png, > SOLR-14241.patch, STREAM-adds-and-deletes.png, STREAM-only-adds.png, > microbenchmark_scripts.zip > > > Streaming expressions currently supports an {{update(...)}} decorator > function for wrapping another stream and treating each Tuple from the inner > stream as a document to be added to an index. > I've implemented an analogous subclass of the {{UpdateStream}} called > {{DeleteStream}} that uses the tuples from the inner stream to identify the > uniqueKeys of documents that should be deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14270) export command to have an option to write to a zip file
[ https://issues.apache.org/jira/browse/SOLR-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049728#comment-17049728 ] ASF subversion and git services commented on SOLR-14270: Commit 1f549dc4742e9adc5d8354c812e05f662e195309 in lucene-solr's branch refs/heads/branch_8x from Cassandra Targett [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1f549dc ] SOLR-14270: Move .gz example to CLI page; Remove bin/solr export from command-line-utilities.adoc > export command to have an option to write to a zip file > --- > > Key: SOLR-14270 > URL: https://issues.apache.org/jira/browse/SOLR-14270 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Labels: cli > Fix For: 8.5 > > Time Spent: 50m > Remaining Estimate: 0h > > Plain json files are too big. Export to a compressed file > {{bin/solr export -url http://localhost:8983/solr/gettingstarted -out > gettingstarted.json.gz}} > This will write the data to a file called {{gettingstarted.json.gz}} in a zip > format -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14241) Streaming Expression for deleting documents by IDs (from tuples)
[ https://issues.apache.org/jira/browse/SOLR-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049726#comment-17049726 ] ASF subversion and git services commented on SOLR-14241: Commit 422d994612280ab5c4e13ac34260ddf05e3f7ad5 in lucene-solr's branch refs/heads/master from Cassandra Targett [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=422d994 ] SOLR-14241: fix typos & incorrect example param > Streaming Expression for deleting documents by IDs (from tuples) > > > Key: SOLR-14241 > URL: https://issues.apache.org/jira/browse/SOLR-14241 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: DELQ-adds-and-deletes.png, DELQ-only-adds.png, > SOLR-14241.patch, STREAM-adds-and-deletes.png, STREAM-only-adds.png, > microbenchmark_scripts.zip > > > Streaming expressions currently supports an {{update(...)}} decorator > function for wrapping another stream and treating each Tuple from the inner > stream as a document to be added to an index. > I've implemented an analogous subclass of the {{UpdateStream}} called > {{DeleteStream}} that uses the tuples from the inner stream to identify the > uniqueKeys of documents that should be deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14270) export command to have an option to write to a zip file
[ https://issues.apache.org/jira/browse/SOLR-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049725#comment-17049725 ] ASF subversion and git services commented on SOLR-14270: Commit 27523b5e40921499212bd0c5f7f56c35cdebe073 in lucene-solr's branch refs/heads/master from Cassandra Targett [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=27523b5 ] SOLR-14270: Move .gz example to CLI page; Remove bin/solr export from command-line-utilities.adoc > export command to have an option to write to a zip file > --- > > Key: SOLR-14270 > URL: https://issues.apache.org/jira/browse/SOLR-14270 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Labels: cli > Fix For: 8.5 > > Time Spent: 50m > Remaining Estimate: 0h > > Plain json files are too big. Export to a compressed file > {{bin/solr export -url http://localhost:8983/solr/gettingstarted -out > gettingstarted.json.gz}} > This will write the data to a file called {{gettingstarted.json.gz}} in a zip > format -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14270) export command to have an option to write to a zip file
[ https://issues.apache.org/jira/browse/SOLR-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049718#comment-17049718 ] Cassandra Targett commented on SOLR-14270: -- Just FYI, the docs here got a little messed up. Docs for the bin/solr export command were already added where they belonged (solr-control-script.adoc - about bin/solr) back in SOLR-13862, but this issue adds slightly different and less-detailed docs to the wrong place (command-line-utilities.adoc - about zkcli.sh), while not updating the original docs to include the ability to .zip the output. I'll fix it (by removing the wrong docs from the wrong place but copying the one new example that's needed in the right place), but just wanted to mention it. > export command to have an option to write to a zip file > --- > > Key: SOLR-14270 > URL: https://issues.apache.org/jira/browse/SOLR-14270 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Labels: cli > Fix For: 8.5 > > Time Spent: 50m > Remaining Estimate: 0h > > Plain json files are too big. Export to a compressed file > {{bin/solr export -url http://localhost:8983/solr/gettingstarted -out > gettingstarted.json.gz}} > This will write the data to a file called {{gettingstarted.json.gz}} in a zip > format -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9150) Restore support for dynamic PlanetModel in Geo3D
[ https://issues.apache.org/jira/browse/LUCENE-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049716#comment-17049716 ] ASF subversion and git services commented on LUCENE-9150: - Commit a6e80d004d84213886a5ce52fd220d2e5112e43e in lucene-solr's branch refs/heads/master from Nicholas Knize [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a6e80d0 ] LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d > Restore support for dynamic PlanetModel in Geo3D > > > Key: LUCENE-9150 > URL: https://issues.apache.org/jira/browse/LUCENE-9150 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Nick Knize >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > LUCENE-7072 removed dynamic planet model support in Geo3D. This was logical > at the time (given the state of Lucene and spatial projections and coordinate > reference systems). Since then, however, there have been a lot of new > developments within the OGC community around [Coordinate Reference > Systems|https://docs.opengeospatial.org/as/18-005r4/18-005r4.html], [Dynamic > Coordinate Reference > Systems|http://docs.opengeospatial.org/DRAFTS/18-058.html], and [Updated ISO > Standards|https://www.iso.org/obp/ui/#iso:std:iso:19111:ed-3:v1:en]. > It would be useful for Geo3D (and eventually LatLon*) to support different > geographic datums to make lucene a viable option for indexing/searching in > different spatial reference systems (e.g., more accurately computing query > shape relations to BKD's internal nodes using datum consistent with the > spatial projection). This would also provide an alternative to other > limitations of the {{LatLon*/XY*}} implementation (e.g., pole/dateline > crossing, quantization of small polygons). > I'd like to propose keeping the current WGS84 static datum as the default for > Geo3D but adding back the constructors to accept custom planet models. > Perhaps this could be listed as an "expert" API feature? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] asfgit merged pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d
asfgit merged pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d URL: https://github.com/apache/lucene-solr/pull/1253 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation
dsmiley commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593657932 Why a separate PR for my proposed test? Your proposal is better than the status quo but I think is rather lacking if that's it. If your proposal can also accommodate a query-time user supplied cost, especially by FunctionRangeQuery somehow, then I think we're then in good shape as it'll allow a user to set this on the fly. (BTW ignore the identical named class in Solr, which I plan on removing). Perhaps this cost could sneak in by putting the cost on the "context" Map supplied to ValueSource.getValues ? Yeah; that'd be cool :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049708#comment-17049708 ] Michael Sokolov commented on LUCENE-8962: - Thanks, [~msfroh] I'll take a look > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5.5h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049700#comment-17049700 ] Michael Froh commented on LUCENE-8962: -- Posted a PR with fixes for the above test failures: [https://github.com/apache/lucene-solr/pull/1307] > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5.5h > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msfroh opened a new pull request #1307: LUCENE-8962: Fix intermittent test failures
msfroh opened a new pull request #1307: LUCENE-8962: Fix intermittent test failures URL: https://github.com/apache/lucene-solr/pull/1307 1. TestIndexWriterMergePolicy.testMergeOnCommit will fail if the last commit (the one that should trigger the full merge) doesn't have any pending changes (which could occur if the last indexing thread commits at the end). We can fix that by adding one more document before that commit. 2. The previous implementation was throwing IOException if the commit thread gets interrupted while waiting for merges to complete. This violates IndexWriter's documented behavior of throwing ThreadInterruptedException. # Description This fixes intermittent test failures related to the previous commit on LUCENE-8962. # Solution There were two separate bugs in the previous commit: 1. TestIndexWriterMergePolicy.testMergeOnCommit could sometimes fail the last assertion, because the final commit in the test method triggered no merges. This could happen if multiple indexing threads committed after adding their last documents. To guarantee that the final commit in the test method triggers a merge, we can add one more document (so there is a change to commit). 2. TestIndexWriter. testThreadInterruptDeadlock verifies IndexWriter's documented behavior of throwing ThreadInterruptedException when interrupted. The previous commit for LUCENE-8962 violated this behavior. This commit fixes that. # Tests After applying these fixes, I have run both TestIndexWriter and TestIndexWriterMergePolicy multiple times with previously-failing seeds and random seeds, and have not seen the test failures occur again. # Checklist Please review the following and check all that apply: - [X] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [X] I have created a Jira issue and added the issue ID to my pull request title. - [X] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [X] I have developed this patch against the `master` branch. - [X] I have run `ant precommit` and the appropriate test suite. - [X] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] nknize commented on a change in pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d
nknize commented on a change in pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d URL: https://github.com/apache/lucene-solr/pull/1253#discussion_r386676232 ## File path: lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/PlanetModel.java ## @@ -383,30 +509,233 @@ public GeoPoint surfacePointOnBearing(final GeoPoint from, final double dist, fi Δσ = B * sinσ * (cos2σM + B / 4.0 * (cosσ * (-1.0 + 2.0 * cos2σM * cos2σM) - B / 6.0 * cos2σM * (-3.0 + 4.0 * sinσ * sinσ) * (-3.0 + 4.0 * cos2σM * cos2σM))); σʹ = σ; - σ = dist / (c * inverseScale * A) + Δσ; + σ = dist / (zScaling * inverseScale * A) + Δσ; } while (Math.abs(σ - σʹ) >= Vector.MINIMUM_RESOLUTION && ++iterations < 100); double x = sinU1 * sinσ - cosU1 * cosσ * cosα1; -double φ2 = Math.atan2(sinU1 * cosσ + cosU1 * sinσ * cosα1, (1.0 - flattening) * Math.sqrt(sinα * sinα + x * x)); +double φ2 = Math.atan2(sinU1 * cosσ + cosU1 * sinσ * cosα1, (1.0 - scaledFlattening) * Math.sqrt(sinα * sinα + x * x)); double λ = Math.atan2(sinσ * sinα1, cosU1 * cosσ - sinU1 * sinσ * cosα1); -double C = flattening / 16.0 * cosSqα * (4.0 + flattening * (4.0 - 3.0 * cosSqα)); -double L = λ - (1.0 - C) * flattening * sinα * +double C = scaledFlattening / 16.0 * cosSqα * (4.0 + scaledFlattening * (4.0 - 3.0 * cosSqα)); +double L = λ - (1.0 - C) * scaledFlattening * sinα * (σ + C * sinσ * (cos2σM + C * cosσ * (-1.0 + 2.0 * cos2σM * cos2σM))); double λ2 = (lon + L + 3.0 * Math.PI) % (2.0 * Math.PI) - Math.PI; // normalise to -180..+180 return new GeoPoint(this, φ2, λ2); } + /** Utility class for encoding / decoding from lat/lon (decimal degrees) into sortable doc value numerics (integers) */ + public static class DocValueEncoder { +private final PlanetModel planetModel; + +// These are the multiplicative constants we need to use to arrive at values that fit in 21 bits. +// The formula we use to go from double to encoded value is: Math.floor((value - minimum) * factor + 0.5) +// If we plug in maximum for value, we should get 0x1F. +// So, 0x1F = Math.floor((maximum - minimum) * factor + 0.5) +// We factor out the 0.5 and Math.floor by stating instead: +// 0x1F = (maximum - minimum) * factor +// So, factor = 0x1F / (maximum - minimum) + +private final static double inverseMaximumValue = 1.0 / (double)(0x1F); + +private final double inverseXFactor; +private final double inverseYFactor; +private final double inverseZFactor; + +private final double xFactor; +private final double yFactor; +private final double zFactor; + +// Fudge factor for step adjustments. This is here solely to handle inaccuracies in bounding boxes +// that occur because of quantization. For unknown reasons, the fudge factor needs to be +// 10.0 rather than 1.0. See LUCENE-7430. + +private final static double STEP_FUDGE = 10.0; + +// These values are the delta between a value and the next value in each specific dimension + +private final double xStep; +private final double yStep; +private final double zStep; + +/** construct an encoder/decoder instance from the provided PlanetModel definition */ +public DocValueEncoder(final PlanetModel planetModel) { Review comment: :+1: good call! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] nknize commented on a change in pull request #1290: LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon
nknize commented on a change in pull request #1290: LUCENE-9251: Filter equal edges with different value on isEdgeFromPolygon URL: https://github.com/apache/lucene-solr/pull/1290#discussion_r386674661 ## File path: lucene/core/src/test/org/apache/lucene/geo/TestTessellator.java ## @@ -561,6 +561,18 @@ public void testComplexPolygon39() throws Exception { checkPolygon(wkt); } + @Nightly + public void testComplexPolygon40() throws Exception { +String wkt = GeoTestUtil.readShape("lucene-9251.wkt.gz"); Review comment: nice! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry
nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry URL: https://github.com/apache/lucene-solr/pull/1258#discussion_r380381973 ## File path: lucene/core/src/java/org/apache/lucene/geo/Rectangle2D.java ## @@ -415,16 +217,63 @@ public boolean equals(Object o) { return minX == that.minX && maxX == that.maxX && minY == that.minY && -maxY == that.maxY && -Arrays.equals(bbox, that.bbox) && -Arrays.equals(west, that.west); +maxY == that.maxY; } @Override public int hashCode() { int result = Objects.hash(minX, maxX, minY, maxY); -result = 31 * result + Arrays.hashCode(bbox); -result = 31 * result + Arrays.hashCode(west); return result; } -} + + @Override + public String toString() { +final StringBuilder sb = new StringBuilder(); +sb.append("XYRectangle(x="); +sb.append(minX); +sb.append(" TO "); +sb.append(maxX); +sb.append(" y="); +sb.append(minY); +sb.append(" TO "); +sb.append(maxY); +sb.append(")"); +return sb.toString(); + } + + /** create a component2D from the provided XY rectangle */ + static Component2D create(XYRectangle rectangle) { +return new Rectangle2D(rectangle.minX, rectangle.maxX, rectangle.minY, rectangle.maxY); + } + + private static double MIN_LON_INCL_QUANTIZE = decodeLongitude(encodeLongitude(MIN_LON_INCL)); Review comment: ```suggestion private static double MIN_LON_INCL_QUANTIZE = decodeLongitude(GeoEncodingUtils.MIN_LON_ENCODED); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry
nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry URL: https://github.com/apache/lucene-solr/pull/1258#discussion_r380385091 ## File path: lucene/core/src/java/org/apache/lucene/document/LatLonShapeBoundingBoxQuery.java ## @@ -108,4 +115,385 @@ public String toString(String field) { sb.append(rectangle.toString()); return sb.toString(); } + + /** Holds spatial logic for a bounding box that works in the encoded space */ + private static class EncodedRectangle { Review comment: Is this class needed because `Rectangle2D` is package private? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry
nknize commented on a change in pull request #1258: LUCENE-9225: Rectangle should extend LatLonGeometry URL: https://github.com/apache/lucene-solr/pull/1258#discussion_r380382109 ## File path: lucene/core/src/java/org/apache/lucene/geo/Rectangle2D.java ## @@ -415,16 +217,63 @@ public boolean equals(Object o) { return minX == that.minX && maxX == that.maxX && minY == that.minY && -maxY == that.maxY && -Arrays.equals(bbox, that.bbox) && -Arrays.equals(west, that.west); +maxY == that.maxY; } @Override public int hashCode() { int result = Objects.hash(minX, maxX, minY, maxY); -result = 31 * result + Arrays.hashCode(bbox); -result = 31 * result + Arrays.hashCode(west); return result; } -} + + @Override + public String toString() { +final StringBuilder sb = new StringBuilder(); +sb.append("XYRectangle(x="); +sb.append(minX); +sb.append(" TO "); +sb.append(maxX); +sb.append(" y="); +sb.append(minY); +sb.append(" TO "); +sb.append(maxY); +sb.append(")"); +return sb.toString(); + } + + /** create a component2D from the provided XY rectangle */ + static Component2D create(XYRectangle rectangle) { +return new Rectangle2D(rectangle.minX, rectangle.maxX, rectangle.minY, rectangle.maxY); + } + + private static double MIN_LON_INCL_QUANTIZE = decodeLongitude(encodeLongitude(MIN_LON_INCL)); + private static double MAX_LON_INCL_QUANTIZE = decodeLongitude(encodeLongitude(MAX_LON_INCL)); Review comment: ```suggestion private static double MAX_LON_INCL_QUANTIZE = decodeLongitude(GeoEncodingUtils.MAX_LON_ENCODED); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
[ https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049681#comment-17049681 ] Michele Palmia edited comment on LUCENE-8674 at 3/2/20 9:49 PM: This is due to a _VectorValueSource_ being fed to a _FunctionRangeQuery_, that is therefore trying to use its _floatVal_. By default, requesting the _floatVal(int doc)_ of a _VectorValueSource_ throws an _UnsupportedOperationException_, since no algorithm for merging the (possibly multiple) values is implemented. For reference, the query Solr tries to do is the following, {code:java} new ConstantScoreQuery( new FunctionRangeQuery( new VectorValueSource( new BytesRefFieldSource("any_field"), new SortedSetFieldSource("another_field") ), 0, 100, true, true)); {code} that always throws an exception if there are documents in the index. >From the way it's implemented (with the _UnsupportedOperationException_) it >doesn't look like this kind of inconsistencies are meant to be fixed in >Lucene. But not sure about that. Any suggestions are appreciated! was (Author: micpalmia): This is due to a _VectorValueSource_ being fed to a _FunctionRangeQuery_, that is therefore trying to use its _floatVal_. By default, requesting the _floatVal(int doc)_ of a _VectorValueSource_ throws an _UnsupportedOperationException_, since no algorithm for merging the (possibly multiple) values is implemented. For reference, the query Solr tries to do is the following, {code:java} final ConstantScoreQuery query = new ConstantScoreQuery( new FunctionRangeQuery( new VectorValueSource( new BytesRefFieldSource("any_field"), new SortedSetFieldSource("another_field") ), 0, 100, true, true)); {code} that always throws an exception if there are documents in the index. >From the way it's implemented (with the _UnsupportedOperationException_) it >doesn't look like this kind of inconsistencies are meant to be fixed in >Lucene. But not sure about that. Any suggestions are appreciated! > UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal > -- > > Key: LUCENE-8674 > URL: https://issues.apache.org/jira/browse/LUCENE-8674 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection and reproducing the bug > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > curl -v “URL_BUG” > {noformat} > Please check the issue description below to find the “URL_BUG” that will > allow you to reproduce the issue reported. >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.UnsupportedOperationException > at > org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47) > at > org.apache.lucene.queries.function.FunctionValues$3.match
[jira] [Comment Edited] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
[ https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049681#comment-17049681 ] Michele Palmia edited comment on LUCENE-8674 at 3/2/20 9:47 PM: This is due to a _VectorValueSource_ being fed to a _FunctionRangeQuery_, that is therefore trying to use its _floatVal_. By default, requesting the _floatVal(int doc)_ of a _VectorValueSource_ throws an _UnsupportedOperationException_, since no algorithm for merging the (possibly multiple) values is implemented. For reference, the query Solr tries to do is the following, {code:java} final ConstantScoreQuery query = new ConstantScoreQuery( new FunctionRangeQuery( new VectorValueSource( new BytesRefFieldSource("any_field"), new SortedSetFieldSource("another_field") ), 0, 100, true, true)); {code} that always throws an exception if there are documents in the index. >From the way it's implemented (with the _UnsupportedOperationException_) it >doesn't look like this kind of inconsistencies are meant to be fixed in >Lucene. But not sure about that. Any suggestions are appreciated! was (Author: micpalmia): This is due to a `VectorValueSource` being fed to a `FunctionRangeQuery`, that is therefore trying to use its `floatVal`. By default, requesting the `floatVal(int doc)` of a `VectorValueSource` throws an `UnsupportedOperationException`, since no algorithm for merging the (possibly multiple) values is implemented. For reference, the query Solr tries to do is the following, ```java final ConstantScoreQuery query = new ConstantScoreQuery( new FunctionRangeQuery( new VectorValueSource( new BytesRefFieldSource("any_field"), new SortedSetFieldSource("another_field") ), 0, 100, true, true)); ``` that always throws an exception if there are documents in the index. >From the way it's implemented (with the `UnsupportedOperationException`) it >doesn't look like this kind of inconsistencies are meant to be fixed in >Lucene. But not sure about that. Any suggestions are appreciated! > UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal > -- > > Key: LUCENE-8674 > URL: https://issues.apache.org/jira/browse/LUCENE-8674 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection and reproducing the bug > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > curl -v “URL_BUG” > {noformat} > Please check the issue description below to find the “URL_BUG” that will > allow you to reproduce the issue reported. >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.UnsupportedOperationException > at > org.ap
[jira] [Commented] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
[ https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049681#comment-17049681 ] Michele Palmia commented on LUCENE-8674: This is due to a `VectorValueSource` being fed to a `FunctionRangeQuery`, that is therefore trying to use its `floatVal`. By default, requesting the `floatVal(int doc)` of a `VectorValueSource` throws an `UnsupportedOperationException`, since no algorithm for merging the (possibly multiple) values is implemented. For reference, the query Solr tries to do is the following, ```java final ConstantScoreQuery query = new ConstantScoreQuery( new FunctionRangeQuery( new VectorValueSource( new BytesRefFieldSource("any_field"), new SortedSetFieldSource("another_field") ), 0, 100, true, true)); ``` that always throws an exception if there are documents in the index. >From the way it's implemented (with the `UnsupportedOperationException`) it >doesn't look like this kind of inconsistencies are meant to be fixed in >Lucene. But not sure about that. Any suggestions are appreciated! > UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal > -- > > Key: LUCENE-8674 > URL: https://issues.apache.org/jira/browse/LUCENE-8674 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection and reproducing the bug > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > curl -v “URL_BUG” > {noformat} > Please check the issue description below to find the “URL_BUG” that will > allow you to reproduce the issue reported. >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.UnsupportedOperationException > at > org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47) > at > org.apache.lucene.queries.function.FunctionValues$3.matches(FunctionValues.java:188) > at > org.apache.lucene.queries.function.ValueSourceScorer$1.matches(ValueSourceScorer.java:53) > at > org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.doNext(TwoPhaseIterator.java:89) > at > org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.nextDoc(TwoPhaseIterator.java:77) > at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:261) > at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214) > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443) > at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151) > at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140) > at > org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1177) > at > org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049666#comment-17049666 ] Michael Froh commented on LUCENE-8962: -- I was able to reproduce the {{testMergeOnCommit}} failure on master sometimes with the following options: {{-Dtestcase=TestIndexWriterMergePolicy -Dtests.method=testMergeOnCommit -Dtests.seed=F8DD5AD20994FDDF -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=fi-FI -Dtests.timezone=America/Danmarkshavn -Dtests.asserts=true -Dtests.file.encoding=UTF-8}} > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049655#comment-17049655 ] David Smiley commented on SOLR-13749: - I was about to say add a new enum option but as I look at this, the existing "method=index" (default) seems appropriate, it's just that we're now able to handle scenarios that were not handled before -- multiple shards and also when the collection isn't on the same node. Ideally, the existing code would detect when the current code can be used and if not use XCJF. It'll take some work to make this transition; not "hard" but I don't want to ask more time of you than you have. If we're not ready to do what's best for Solr for 8.5, then I think we should un-document it with a big comment or something like that so that people don't start using a feature that isn't quite ready. In my experience with Solr, once something is released in a certain way, it tends to be set it stone (sadly). I'm sorry if this is unsatisfying to everyone who put awesome work into this but I want to do what's best for Solr in the long term. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049651#comment-17049651 ] Michael Froh commented on LUCENE-8962: -- Regarding {{TestIndexWriter.testThreadInterruptDeadlock}}, I think that's a bug in the implementation. When waiting for merges to complete, I added a {{catch}} for {{InterruptedException}} that sets the interrupt flag and throws an {{IOException}}. The documented behavior of {{IndexWriter}} is to clear the interrupt flag and throw {{ThreadInterruptedException}}. Again, not sure why the tests on master didn't fail. Maybe we just got lucky with the branch_8x tests. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049641#comment-17049641 ] Michael Froh commented on LUCENE-8962: -- I think the failure in {{testMergeOnCommit}} occurs because of a difference in the random behavior of the test. Specifically, sometimes the last writing thread happens to choose to {{commit()}} at the end, so there are no pending changes by the time we do the last {{commit()}} which should merge all segments (or abandon the merge, if it takes too long). If we add one more doc before that last commit (ensuring that the {{anyChanges}} check in {{IndexWriter.prepareCommitInternal()}} is {{true}}), the test passes consistently. I'm not sure why we don't see the same failure sometimes on master, though. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14291) OldAnalyticsRequestConverter should support fields names with dots
[ https://issues.apache.org/jira/browse/SOLR-14291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049635#comment-17049635 ] Mikhail Khludnev commented on SOLR-14291: - [~houstonputman], what's your opinion? > OldAnalyticsRequestConverter should support fields names with dots > -- > > Key: SOLR-14291 > URL: https://issues.apache.org/jira/browse/SOLR-14291 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search, SearchComponents - other >Reporter: Anatolii Siuniaev >Priority: Trivial > Attachments: SOLR-14291.patch > > > If you send a query with range facets using old olap-style syntax (see pdf > [here|https://issues.apache.org/jira/browse/SOLR-5302]), > OldAnalyticsRequestConverter just silently (no exception thrown) omits > parameters like > {code:java} > olap..rangefacet..start > {code} > in case if __ has dots inside (for instance field name is > _Project.Value_). And thus no range facets are returned in response. > Probably the same happens in case of field faceting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049634#comment-17049634 ] Mikhail Khludnev commented on SOLR-12325: - Attaching random test patch by [~anatolii_siuniaev], I'd like to reduce its' footprint on existing test codebase. > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch, > SOLR-12325.patch, SOLR-12325.patch, > SOLR-12325_Random_test_for_uniqueBlockQuery (1).patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-12325: Attachment: SOLR-12325.patch Status: Patch Available (was: Patch Available) > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch, > SOLR-12325.patch, SOLR-12325.patch, > SOLR-12325_Random_test_for_uniqueBlockQuery (1).patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049600#comment-17049600 ] Noble Paul edited comment on SOLR-13942 at 3/2/20 8:38 PM: --- [~erickerickson] *why this is needed?* bin/solr is not something that everyone has everywhere. People have curl & wget & a browser everywhere. If I want to script something up and read data in ZK there is no alternative *Security aspect* There is already a REST API to fetch data stored in ZK ({{/admin/zookeeper}} is the end point). It has no proper API docs. It is used by our admin UI all the time I would like to merge it soon was (Author: noble.paul): [~erickerickson] *why this is needed?* bin/solr is not everyone has everywhere. People have curl & wget & a browser everywhere. If I want to script something up and read data in ZK there is no alternative *Security aspect* There is already a REST API to fetch data stored in ZK ({{/admin/zookeeper}} is the end point). It has no proper API docs. It is used by our admin UI all the time I would like to merge it soon > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049600#comment-17049600 ] Noble Paul edited comment on SOLR-13942 at 3/2/20 8:22 PM: --- [~erickerickson] *why this is needed?* bin/solr is not everyone has everywhere. People have curl & wget & a browser everywhere. If I want to script something up and read data in ZK there is no alternative *Security aspect* There is already a REST API to fetch data stored in ZK ({{/admin/zookeeper}} is the end point). It has no proper API docs. It is used by our admin UI all the time I would like to merge it soon was (Author: noble.paul): [~erickerickson] *why this is needed?* bin/solr is not everyone has everywhere. People have curl & wget & a browser everywhere. If I want to script something up and read data in ZK there is no alternative *Security aspect* There is already a REST API to fetch data stored in ZK. It ha no proper API docs. It is used by our admin UI all the time I would like to merge it soon > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049621#comment-17049621 ] Noble Paul commented on SOLR-13942: --- [~janhoy] I feel like I'm talking to a wall . the comment I posted 20mins back just addresses these 2 questions > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049617#comment-17049617 ] Gus Heck commented on SOLR-13749: - Can you be more explicit about what you want here? Do you have a suggestion for how that logic to "know when to use" should work? > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also be specified as a local > param.| > > Example Solr Config.xml changes: > > {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}} > {{ }}{{class}}{{=}}{{"solr.LRUCache"}} > {{ }}{{size}}{{=}}{{"128"}} > {{ }}{{initialSize}}{{=}}{{"0"}} > {{ }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}} > > {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} > {{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}} > {{ }}{{<}}{{str}} {{name}}{{=}}{{"rou
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049608#comment-17049608 ] Jan Høydahl commented on SOLR-13942: {quote}I would like to merge it soon {quote} I would like you to explain the true benefit of this, explain why it is not a security risk and then gain consensus before continuing. > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049600#comment-17049600 ] Noble Paul edited comment on SOLR-13942 at 3/2/20 8:05 PM: --- [~erickerickson] *why this is needed?* bin/solr is not everyone has everywhere. People have curl & wget & a browser everywhere. If I want to script something up and read data in ZK there is no alternative *Security aspect* There is already a REST API to fetch data stored in ZK. It ha no proper API docs. It is used by our admin UI all the time I would like to merge it soon was (Author: noble.paul): [~erickerickson] bin/solr is not everyone has everywhere. People have curl & wget & a browser everywhere. If I want to script something up and read data in ZK there is no alternative I would like to merge it soon > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul reassigned SOLR-13942: - Assignee: Noble Paul > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13942) /api/cluster/zk/* to fetch raw ZK data
[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049600#comment-17049600 ] Noble Paul commented on SOLR-13942: --- [~erickerickson] bin/solr is not everyone has everywhere. People have curl & wget & a browser everywhere. If I want to script something up and read data in ZK there is no alternative I would like to merge it soon > /api/cluster/zk/* to fetch raw ZK data > -- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug >Reporter: Noble Paul >Priority: Major > > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] praste commented on a change in pull request #1306: SOLR-14299: IndexFetcher doesnt' reset count to 0 after the last packet is received
praste commented on a change in pull request #1306: SOLR-14299: IndexFetcher doesnt' reset count to 0 after the last packet is received URL: https://github.com/apache/lucene-solr/pull/1306#discussion_r386591132 ## File path: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java ## @@ -1728,20 +1728,21 @@ private int fetchPackets(FastInputStream fis) throws Exception { long checkSumClient = checksum.getValue(); if (checkSumClient != checkSumServer) { log.error("Checksum not matched between client and server for file: {}", fileName); - //if checksum is wrong it is a problem return for retry + //if checksum is wrong it is a problem return (there doesn't seem to be a retry in this case.) return 1; } } //if everything is fine, write down the packet to the file file.write(buf, packetSize); bytesDownloaded += packetSize; log.debug("Fetched and wrote {} bytes of file: {}", bytesDownloaded, fileName); - if (bytesDownloaded >= size) -return 0; //errorCount is always set to zero after a successful packet errorCount = 0; + if (bytesDownloaded >= size) +return 0; } } catch (ReplicationHandlerException e) { +log.warn("Aborting index replication", e); Review comment: agreed. Logging here is not needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049564#comment-17049564 ] Michael Froh commented on LUCENE-8962: -- I'm looking into the {{branch_8x}} failures. I'm able to reproduce on my machine and will step through to see what's different. > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1306: SOLR-14299: IndexFetcher doesnt' reset count to 0 after the last packet is received
madrob commented on a change in pull request #1306: SOLR-14299: IndexFetcher doesnt' reset count to 0 after the last packet is received URL: https://github.com/apache/lucene-solr/pull/1306#discussion_r386585563 ## File path: solr/core/src/java/org/apache/solr/handler/IndexFetcher.java ## @@ -1728,20 +1728,21 @@ private int fetchPackets(FastInputStream fis) throws Exception { long checkSumClient = checksum.getValue(); if (checkSumClient != checkSumServer) { log.error("Checksum not matched between client and server for file: {}", fileName); - //if checksum is wrong it is a problem return for retry + //if checksum is wrong it is a problem return (there doesn't seem to be a retry in this case.) return 1; } } //if everything is fine, write down the packet to the file file.write(buf, packetSize); bytesDownloaded += packetSize; log.debug("Fetched and wrote {} bytes of file: {}", bytesDownloaded, fileName); - if (bytesDownloaded >= size) -return 0; //errorCount is always set to zero after a successful packet errorCount = 0; + if (bytesDownloaded >= size) +return 0; } } catch (ReplicationHandlerException e) { +log.warn("Aborting index replication", e); Review comment: Is log-and-throw necessary here? it looks like we already log this higher up in the call stack. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] praste opened a new pull request #1306: SOLR-14299: IndexFetcher doesnt' reset count to 0 after the last packet is received
praste opened a new pull request #1306: SOLR-14299: IndexFetcher doesnt' reset count to 0 after the last packet is received URL: https://github.com/apache/lucene-solr/pull/1306 * SOLR-14299: IndexFetcher should reset the `errorCount` to 0 after successfully receive the last packet while fetching the file. # Description While fetching the files from master `IndexFetcher` retries 5 times before giving up. It resets the errorCount after successfully receiving the packet except for the last packet. Seems like an oversight. # Solution Reset the errorCount to 0 before verifying if it is last packet for the while. # Tests This is a trivial change, no test cases added for now # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13502) Investigate using something other than ZooKeeper's "4 letter words" for the admin UI status
[ https://issues.apache.org/jira/browse/SOLR-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049544#comment-17049544 ] Jan Høydahl commented on SOLR-13502: Check whether the Zk java client supports admin commands? We might have to support both old and new 4lw for as long as zk supports old? May zk admin server have SSL? Then we might need to support custom keyStore for it and mutualSsl. What about auth? Where to configure it’s host/port? Will it always run on all zk servers? This might be more work than it looks like on the surface... > Investigate using something other than ZooKeeper's "4 letter words" for the > admin UI status > --- > > Key: SOLR-13502 > URL: https://issues.apache.org/jira/browse/SOLR-13502 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > ZooKeeper 3.5.5 requires a whitelist of allowed "4 letter words". The only > place I see on a quick look at the Solr code where 4lws are used is in the > admin UI "ZK Status" link. > In order to use the admin UI "ZK Status" link, users will have to modify > their zoo.cfg file with > {code} > 4lw.commands.whitelist=mntr,conf,ruok > {code} > This JIRA is to see if there are alternatives to using 4lw for the admin UI. > This depends on SOLR-8346. If we find an alternative, we need to remove the > additions to the ref guide that mention changing zoo.cfg (just scan for 4lw > in all the .adoc files) and remove SolrZkServer.ZK_WHITELIST_PROPERTY and all > references to it (SolrZkServer and SolrTestCaseJ4). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14299) IndexFetcher doesnt' reset errorCount to 0 after the last packet is received
Pushkar Raste created SOLR-14299: Summary: IndexFetcher doesnt' reset errorCount to 0 after the last packet is received Key: SOLR-14299 URL: https://issues.apache.org/jira/browse/SOLR-14299 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: replication (java) Affects Versions: 7.7.1 Reporter: Pushkar Raste While fetching the files from master `IndexFetcher` retries 5 times before giving up. It resets the errorCount after successfully receiving the packet except for the last packet. Seems like an oversight. [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1739-L1742] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation
atris commented on issue #1303: LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation URL: https://github.com/apache/lucene-solr/pull/1303#issuecomment-593544138 > Lets add the following test to TestFunctionRangeQuery: > > ``` > @Test > public void testTwoRangeQueries() throws IOException { > Query rq1 = new FunctionRangeQuery(INT_VALUESOURCE, 2, 4, true, true); > Query rq2 = new FunctionRangeQuery(INT_VALUESOURCE, 8, 10, true, true); > Query bq = new BooleanQuery.Builder() > .add(rq1, BooleanClause.Occur.SHOULD) > .add(rq2, BooleanClause.Occur.SHOULD) > .build(); > > ScoreDoc[] scoreDocs = indexSearcher.search(bq, N_DOCS).scoreDocs; > expectScores(scoreDocs, 10, 9, 8, 4, 3, 2); > } > ``` Thanks, will add it in a separate PR. > > Maybe have FunctionValues expose an abstract cost() method, have all FV derivatives implement it and then simply let VSC's matchCost use that method? > > Yes; we certainly need the FV to provide the cost; the TPI.matchCost should simply look it up. By the FV (or VS) having a cost, it then becomes straight-forward for anyone's custom FV/VS to specify what their cost is. It's debatable wether this cost should be on the VS vs FV. Ok, so my next iteration will have a cost method in FV (I am inclined to add the method in FV since VSC has a direct member of that type?) and have VSC owned TPI's matchCost() refer to the cost of the delegated FV. I also feel that FV class itself should have a default implementation of cost method where it just returns a dumb value (100?) as the cost. The idea is that if you are using the default FV, you probably dont care about costs. If you plug in your own FV which has a smarter costing algorithm, VSC will automatically pick it up. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049517#comment-17049517 ] David Smiley commented on SOLR-13749: - I had commented on SOLR-11384 but I think I should have placed it here. Basically, lets enhance JoinQParserPlugin to know when to use this new implementation instead of adding a new query parser that looks like the current one. The existing one already has a "method" and branches. Can we get this in ASAP for 8.5 please? > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: 8.5 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also be specified as a local > param.| > > Example Solr Config.xml changes: > > {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}} > {{ }}{{class}}{{=}}{{"solr.LRUCache"}} > {{ }}{{size}}{{=}}{{"128"}} > {{ }}{{initialSize}}{{=}}{{"0"}} > {{ }}{{regenerator}}{{=}}
[jira] [Created] (SOLR-14298) LBSolrClient.checkAZombieServer should be less stupid
Chris M. Hostetter created SOLR-14298: - Summary: LBSolrClient.checkAZombieServer should be less stupid Key: SOLR-14298 URL: https://issues.apache.org/jira/browse/SOLR-14298 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Chris M. Hostetter LBSolrClient.checkAZombieServer() currently does /select query for {{\*:\*}} with distrib=false, rows=0, sort=\_docid\_ ... but this can still chew up a lot of time if the shard is big, and it's not self evident wtf is going on in the server logs. At a minimum, these requests should include some sort of tracing param to identify the point of he query (ie: {{_zombieservercheck=true}}) and should probably be changed to hit something like the /ping handler, or the node status handler, or if it's important to folks that it do a "search" that actaully uses the index searcher, then it should use options like timeAllowed / segmentTerminateEarly, and/or {{q=-\*:\*}} instead .. or maybe a cusorMark ... something to make it not have the overhead of counting all the hits. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049504#comment-17049504 ] Michael Sokolov edited comment on LUCENE-8962 at 3/2/20 6:01 PM: - Tests seemed to pass consistently on master, so I pushed there, but now I cherry-picked to 8x branch and see consistent failures there: [junit4] Tests with failures [seed: 5036466712CCD5FE]: [junit4] - org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock [junit4] - org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeOnCommit was (Author: sokolov): Tests seemed to pass consistently on master, so I pushed there, but now I cherry-picked to 8x branch and see consistent failures there: {{ [junit4] Tests with failures [seed: 5036466712CCD5FE]:}} {{ [junit4] - org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock}} {{ [junit4] - org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeOnCommit}} > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049504#comment-17049504 ] Michael Sokolov edited comment on LUCENE-8962 at 3/2/20 6:00 PM: - Tests seemed to pass consistently on master, so I pushed there, but now I cherry-picked to 8x branch and see consistent failures there: {{ [junit4] Tests with failures [seed: 5036466712CCD5FE]:}} {{ [junit4] - org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock}} {{ [junit4] - org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeOnCommit}} was (Author: sokolov): Tests seemed to pass consistently on master, so I pushed there, but now I cherry-picked to 8x branch and see consistent failures there: ``` [junit4] Tests with failures [seed: 5036466712CCD5FE]: [junit4] - org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock [junit4] - org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeOnCommit ``` > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049504#comment-17049504 ] Michael Sokolov commented on LUCENE-8962: - Tests seemed to pass consistently on master, so I pushed there, but now I cherry-picked to 8x branch and see consistent failures there: ``` [junit4] Tests with failures [seed: 5036466712CCD5FE]: [junit4] - org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock [junit4] - org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeOnCommit ``` > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9253) Support custom dictionaries in KoreanTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049483#comment-17049483 ] ASF subversion and git services commented on LUCENE-9253: - Commit 0de87de90039032b595a44a0e31c0a7da5a4064d in lucene-solr's branch refs/heads/branch_8x from Namgyu Kim [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0de87de ] LUCENE-9253: Support custom dictionaries in KoreanTokenizer Signed-off-by: Namgyu Kim > Support custom dictionaries in KoreanTokenizer > -- > > Key: LUCENE-9253 > URL: https://issues.apache.org/jira/browse/LUCENE-9253 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Namgyu Kim >Assignee: Namgyu Kim >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > KoreanTokenizer does not support custom dictionaries(system, unknown) now, > even though Nori provides DictionaryBuilder that creates custom dictionary. > In the current state, it is very difficult for Nori users to use a custom > dictionary. > Therefore, we need to open a new constructor that uses it. > Kuromoji is already supported(LUCENE-8971) that, and I referenced it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049458#comment-17049458 ] ASF subversion and git services commented on LUCENE-8962: - Commit 043c5dff6f44c9bb2415005ac97db3c2c561ab45 in lucene-solr's branch refs/heads/master from msfroh [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=043c5df ] LUCENE-8962: Add ability to selectively merge on commit (#1155) * LUCENE-8962: Add ability to selectively merge on commit This adds a new "findCommitMerges" method to MergePolicy, which can specify merges to be executed before the IndexWriter.prepareCommitInternal method returns. If we have many index writer threads, they will flush their DWPT buffers on commit, resulting in many small segments, which can be merged before the commit returns. * Add missing Javadoc * Fix incorrect comment * Refactoring and fix intermittent test failure 1. Made some changes to the callback to update toCommit, leveraging SegmentInfos.applyMergeChanges. 2. I realized that we'll never end up with 0 registered merges, because we throw an exception if we fail to register a merge. 3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before we call MergeScheduler.merge, since we may not be merging on another thread. 4. There was an intermittent test failure due to randomness in the time it takes for merges to complete. Before doing the final commit, we wait for pending merges to finish. We may still end up abandoning the final merge, but we can detect that and assert that either the merge was abandoned (and we have > 1 segment) or we did merge down to 1 segment. * Fix typo * Fix/improve comments based on PR feedback * More comment improvements from PR feedback * Rename method and add new MergeTrigger 1. Renamed findCommitMerges -> findFullFlushMerges. 2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to MergeScheduler when merging on commit. * Update renamed method name in strings and comments > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049459#comment-17049459 ] ASF subversion and git services commented on LUCENE-8962: - Commit 043c5dff6f44c9bb2415005ac97db3c2c561ab45 in lucene-solr's branch refs/heads/master from msfroh [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=043c5df ] LUCENE-8962: Add ability to selectively merge on commit (#1155) * LUCENE-8962: Add ability to selectively merge on commit This adds a new "findCommitMerges" method to MergePolicy, which can specify merges to be executed before the IndexWriter.prepareCommitInternal method returns. If we have many index writer threads, they will flush their DWPT buffers on commit, resulting in many small segments, which can be merged before the commit returns. * Add missing Javadoc * Fix incorrect comment * Refactoring and fix intermittent test failure 1. Made some changes to the callback to update toCommit, leveraging SegmentInfos.applyMergeChanges. 2. I realized that we'll never end up with 0 registered merges, because we throw an exception if we fail to register a merge. 3. Moved the IndexWriterEvents.beginMergeOnCommit notification to before we call MergeScheduler.merge, since we may not be merging on another thread. 4. There was an intermittent test failure due to randomness in the time it takes for merges to complete. Before doing the final commit, we wait for pending merges to finish. We may still end up abandoning the final merge, but we can detect that and assert that either the merge was abandoned (and we have > 1 segment) or we did merge down to 1 segment. * Fix typo * Fix/improve comments based on PR feedback * More comment improvements from PR feedback * Rename method and add new MergeTrigger 1. Renamed findCommitMerges -> findFullFlushMerges. 2. Added MergeTrigger.COMMIT, passed to findFullFlushMerges and to MergeScheduler when merging on commit. * Update renamed method name in strings and comments > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Attachments: LUCENE-8962_demo.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov merged pull request #1155: LUCENE-8962: Add ability to selectively merge on commit
msokolov merged pull request #1155: LUCENE-8962: Add ability to selectively merge on commit URL: https://github.com/apache/lucene-solr/pull/1155 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9253) Support custom dictionaries in KoreanTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049452#comment-17049452 ] ASF subversion and git services commented on LUCENE-9253: - Commit b2dbd18f96e05478146a838108d096df7348d1b4 in lucene-solr's branch refs/heads/master from Namgyu Kim [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b2dbd18 ] LUCENE-9253: Support custom dictionaries in KoreanTokenizer Signed-off-by: Namgyu Kim > Support custom dictionaries in KoreanTokenizer > -- > > Key: LUCENE-9253 > URL: https://issues.apache.org/jira/browse/LUCENE-9253 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Namgyu Kim >Assignee: Namgyu Kim >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > KoreanTokenizer does not support custom dictionaries(system, unknown) now, > even though Nori provides DictionaryBuilder that creates custom dictionary. > In the current state, it is very difficult for Nori users to use a custom > dictionary. > Therefore, we need to open a new constructor that uses it. > Kuromoji is already supported(LUCENE-8971) that, and I referenced it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi merged pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer
danmuzi merged pull request #1296: LUCENE-9253: Support custom dictionaries in KoreanTokenizer URL: https://github.com/apache/lucene-solr/pull/1296 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org