[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833207#comment-16833207 ] jefferyyuan commented on SOLR-12833: [~ichattopadhyaya] Please check the PR at [https://github.com/apache/lucene-solr/pull/663] // this test checks the behavior of VersionBucket or TimedVersionBucket, // it doesn't makes sense if there is no updateLog, thus no VersionBucket > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Blocker > Fix For: 7.7, 8.0, 8.1 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch, threadDump.txt > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833207#comment-16833207 ] jefferyyuan edited comment on SOLR-12833 at 5/5/19 3:55 AM: [~ichattopadhyaya] Please check the PR at [https://github.com/apache/lucene-solr/pull/663] With this see(test config), the solr cluster doesn't define updateLog. // this test checks the behavior of VersionBucket or TimedVersionBucket, // it doesn't makes sense if there is no updateLog, thus no VersionBucket was (Author: yuanyun.cn): [~ichattopadhyaya] Please check the PR at [https://github.com/apache/lucene-solr/pull/663] // this test checks the behavior of VersionBucket or TimedVersionBucket, // it doesn't makes sense if there is no updateLog, thus no VersionBucket > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Blocker > Fix For: 7.7, 8.0, 8.1 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch, threadDump.txt > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832031#comment-16832031 ] jefferyyuan commented on SOLR-12833: [~ichattopadhyaya] The error was related with unreleased tracked resources. I pulled latest code, and ran it again multiple times(Just finished another one), all passed with no problem. [beaster] Beast round 50 results: /Users/jyuan/apple/code-new/apple/solr/lucene-solr/solr/build/solr-core/test/50 [beaster] Beasting finished Successfully. - I was meant to delete that part, but didn't delete all : ) > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Blocker > Fix For: 7.7, 8.0, 8.1 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch, threadDump.txt > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831976#comment-16831976 ] jefferyyuan edited comment on SOLR-12833 at 5/2/19 9:52 PM: [~ab] [~ichattopadhyaya] Please check pr at [https://github.com/apache/lucene-solr/pull/661] - I saw Ishan already committed the change, so just ignore this : ) I ran "ant beast -Dbeast.iters=50 -Dtestcase=PeerSyncTest -Dtests.method=test -Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple times, the problem/error is gone was (Author: yuanyun.cn): [~ab] [~ichattopadhyaya] Please check pr at [https://github.com/apache/lucene-solr/pull/661] - I saw Ishan already committed the change, so just ignore this : ) I ran "ant beast -Dbeast.iters=50 -Dtestcase=PeerSyncTest -Dtests.method=test -Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple times, the problem/error is gone, but I saw some error: not sure whether it was related/expected or... > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Blocker > Fix For: 7.7, 8.0, 8.1 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch, threadDump.txt > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831976#comment-16831976 ] jefferyyuan commented on SOLR-12833: [~ab] [~ichattopadhyaya]Sorry for the bug. Please check pr at [https://github.com/apache/lucene-solr/pull/661] - I saw Ishan already committed the change, so just ignore this : ) I ran "ant beast -Dbeast.iters=50 -Dtestcase=PeerSyncTest -Dtests.method=test -Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple times, the problem/error is gone, but I saw some error: not sure whether it was related/expected or... > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Blocker > Fix For: 7.7, 8.0, 8.1 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch, threadDump.txt > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831976#comment-16831976 ] jefferyyuan edited comment on SOLR-12833 at 5/2/19 9:11 PM: [~ab] [~ichattopadhyaya] Please check pr at [https://github.com/apache/lucene-solr/pull/661] - I saw Ishan already committed the change, so just ignore this : ) I ran "ant beast -Dbeast.iters=50 -Dtestcase=PeerSyncTest -Dtests.method=test -Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple times, the problem/error is gone, but I saw some error: not sure whether it was related/expected or... was (Author: yuanyun.cn): [~ab] [~ichattopadhyaya]Sorry for the bug. Please check pr at [https://github.com/apache/lucene-solr/pull/661] - I saw Ishan already committed the change, so just ignore this : ) I ran "ant beast -Dbeast.iters=50 -Dtestcase=PeerSyncTest -Dtests.method=test -Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple times, the problem/error is gone, but I saw some error: not sure whether it was related/expected or... > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Blocker > Fix For: 7.7, 8.0, 8.1 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch, threadDump.txt > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831976#comment-16831976 ] jefferyyuan edited comment on SOLR-12833 at 5/2/19 9:10 PM: [~ab] [~ichattopadhyaya]Sorry for the bug. Please check pr at [https://github.com/apache/lucene-solr/pull/661] - I saw Ishan already committed the change, so just ignore this : ) I ran "ant beast -Dbeast.iters=50 -Dtestcase=PeerSyncTest -Dtests.method=test -Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple times, the problem/error is gone, but I saw some error: not sure whether it was related/expected or... was (Author: yuanyun.cn): [~ab] [~ichattopadhyaya]Sorry for the bug. Please check pr at [https://github.com/apache/lucene-solr/pull/661] - I saw Ishan already committed the change, so just ignore this : ) I ran "ant beast -Dbeast.iters=50 -Dtestcase=PeerSyncTest -Dtests.method=test -Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple times, the problem/error is gone, but I saw some error: not sure whether it was related/expected or... > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Blocker > Fix For: 7.7, 8.0, 8.1 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch, threadDump.txt > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831922#comment-16831922 ] jefferyyuan commented on SOLR-12833: [~ichattopadhyaya] [~aivanise] Found the issue and I will create one pr in a second. its related with: wait(TimeUnit.*_NANOSECONDS_*.toMillis(nanosTimeout)) when convert nanoseconds to millseconds, the latter may be 0, which would cause wait forever. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Blocker > Fix For: 7.7, 8.0, 8.1 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch, threadDump.txt > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831786#comment-16831786 ] jefferyyuan commented on SOLR-12833: Thanks for the ingo [~ab], I am checking it now and will focus on this today. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: 7.7, 8.0 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch, threadDump.txt > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825415#comment-16825415 ] jefferyyuan commented on SOLR-12833: [~ab] [~markrmil...@gmail.com] I cleaned the code and added the test cases, please check the pr: [https://github.com/apache/lucene-solr/pull/641/files] * all the doXXX methods will suppose it already owns the lock(either the intrinsic monitor or lock object) and unlock it at the finally block. * its caller calls vinfo.lockForUpdate(0 before and vinfo.unlockForUpdate() at the finally block. * so its clear who owns lock and should release the lock: symmetric : ) > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: 7.7, 8.0 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824565#comment-16824565 ] jefferyyuan edited comment on SOLR-12833 at 4/23/19 10:02 PM: -- [~ab] [~markrmil...@gmail.com] thanks for your update and sorry for the late response. I have updated the PR with two implementations of VersionBucket: TimedVersionBucket. Please check. [https://github.com/apache/lucene-solr/pull/641/files] * The code is not cleaned and just for prove of concept. * If the approach looks good to you, I will clean the code and improve it. * The change will also make the code a little bit cleaner: make method smaller : ) Thanks. was (Author: yuanyun.cn): [~ab] [~markrmil...@gmail.com] thanks for your update and sorry for the late response. I have updated the PR with two implementations of VersionBucket: TimedVersionBucket. Please check. [https://github.com/apache/lucene-solr/pull/641/files] * The code is not cleaned and just for prove of concept. * If the approach looks good to you, I will clean the code and improve it. Thanks. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: 7.7, 8.0 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824565#comment-16824565 ] jefferyyuan commented on SOLR-12833: [~ab] [~markrmil...@gmail.com] thanks for your update and sorry for the late response. I have updated the PR with two implementations of VersionBucket: TimedVersionBucket. Please check. [https://github.com/apache/lucene-solr/pull/641/files] * The code is not cleaned and just for prove of concept. * If the approach looks good to you, I will clean the code and improve it. Thanks. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: 7.7, 8.0 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815081#comment-16815081 ] jefferyyuan commented on SOLR-12833: [~ab]Based on your suggestion, I removed versionBucketLockTimeoutMs from VersionBucket. [https://github.com/apache/lucene-solr/pull/641] Thanks. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: 7.7, 8.0 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814993#comment-16814993 ] jefferyyuan commented on SOLR-12833: [~ab] I will create the pr by end of tomorrow. Thanks. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: 7.7, 8.0 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810203#comment-16810203 ] jefferyyuan commented on SOLR-12833: [~ab] vinfo.lockForUpdate() is using the readLock().lock(), so multiple threads can still execute versionAdd and versionDelete simultaneously. The readwrite lock in VersionIfo is used to make sure there would be no update coming when solr is doing recovery or switch tlog, etc. The problem we are trying to solve here is that when users try to update docs in same buckets and the update takes time, only the first will get processed, all other updates on same buckets have to wait and these threads would pile up and eventually cause OOM or unable to handle other requests as all threads are used up. This is even worse when clients retry update (like in cross-dc env, the consumer will try to re-execute the commands multiple times if it fails) By default, customers do't enable this feature, if customer hits OOM and finds out that there are a lot of threads are waiting for the lock on VersionBucket, they can enable this feature to make the Solr cluster more stable: fail fast. We added the test at [https://github.com/apache/lucene-solr/pull/463/files#diff-7b816a919f7a0caf8119a684a3e71c84], but to make the method testable, we need change the code: the tryLockElseThrow method in DistributedUpdateProcessor. We can definitely re-add the test. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: 7.7, 8.0 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809012#comment-16809012 ] jefferyyuan commented on SOLR-12833: thanks for the findings, [~ab], [~markrmil...@gmail.com] as you said, we don't need versionBucketLockTimeoutMs in every VersionBucket, I can create one pr to remove it from VersionBucket. One approach to fix this problem: * if there is not a lot of competition on same version bucket and the update usually finishes fast, customers don't specify versionBucketLockTimeoutMs value, then we use the old VersionBucket which has no lock and Condition objects, and its lock signalAll, awaitNanos methods will keep the old ways, use the intrinsic. * if there is a lot of competition on same version bucket, and update(like geo related updates) takes time, customer can specify versionBucketLockTimeoutMs explicitly, then we can create and use another class TimedVersionBucket that extends VersionBucket, and uses lock and Condition, so only one update on same bucket will actually go forward and get processed, other updates will fail fast. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, 8.0 >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: 7.7, 8.0 > > Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, > SOLR-12833.patch > > Time Spent: 20m > Remaining Estimate: 0h > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13328) HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates connection
[ https://issues.apache.org/jira/browse/SOLR-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan resolved SOLR-13328. Resolution: Not A Problem > HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates > connection > --- > > Key: SOLR-13328 > URL: https://issues.apache.org/jira/browse/SOLR-13328 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: clients - java >Affects Versions: 8.0 >Reporter: jefferyyuan >Priority: Minor > Fix For: 8.0.1, 8.1 > > > In SolrHttpClientBuilder, we can configure a lot of things including > HostnameVerifier. > We have code like below: > HttpClientUtil.setHttpClientBuilder(new CommonNameVerifierClientConfigurer()); > CommonNameVerifierClientConfigurer will set our own HostnameVerifier which > checks subject dn name. > But this doesn't work as when we create SSLConnectionSocketFactory at > HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry() we don't > check and use HostnameVerifier in SolrHttpClientBuilder at all. > The fix would be very simple, at > HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry, if > HostnameVerifier in SolrHttpClientBuilder is not null, use it, otherwise same > logic as before. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13328) HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates connection
[ https://issues.apache.org/jira/browse/SOLR-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793981#comment-16793981 ] jefferyyuan edited comment on SOLR-13328 at 3/15/19 10:08 PM: -- We are using latest SOlr 7, but seems Solr 8 removes HostnameVerifier from SolrHttpClientBuilder, so this Jira doesn't apply any more. was (Author: yuanyun.cn): Seems Solr 8 removes HostnameVerifier from SolrHttpClientBuilder, so this Jira doesn't apply any more. > HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates > connection > --- > > Key: SOLR-13328 > URL: https://issues.apache.org/jira/browse/SOLR-13328 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: clients - java >Affects Versions: 8.0 >Reporter: jefferyyuan >Priority: Minor > Fix For: 8.0.1, 8.1 > > > In SolrHttpClientBuilder, we can configure a lot of things including > HostnameVerifier. > We have code like below: > HttpClientUtil.setHttpClientBuilder(new CommonNameVerifierClientConfigurer()); > CommonNameVerifierClientConfigurer will set our own HostnameVerifier which > checks subject dn name. > But this doesn't work as when we create SSLConnectionSocketFactory at > HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry() we don't > check and use HostnameVerifier in SolrHttpClientBuilder at all. > The fix would be very simple, at > HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry, if > HostnameVerifier in SolrHttpClientBuilder is not null, use it, otherwise same > logic as before. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13328) HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates connection
[ https://issues.apache.org/jira/browse/SOLR-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793981#comment-16793981 ] jefferyyuan commented on SOLR-13328: Seems Solr 8 removes HostnameVerifier from SolrHttpClientBuilder, so this Jira doesn't apply any more. > HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates > connection > --- > > Key: SOLR-13328 > URL: https://issues.apache.org/jira/browse/SOLR-13328 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: clients - java >Affects Versions: 8.0 >Reporter: jefferyyuan >Priority: Minor > Fix For: 8.0.1, 8.1 > > > In SolrHttpClientBuilder, we can configure a lot of things including > HostnameVerifier. > We have code like below: > HttpClientUtil.setHttpClientBuilder(new CommonNameVerifierClientConfigurer()); > CommonNameVerifierClientConfigurer will set our own HostnameVerifier which > checks subject dn name. > But this doesn't work as when we create SSLConnectionSocketFactory at > HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry() we don't > check and use HostnameVerifier in SolrHttpClientBuilder at all. > The fix would be very simple, at > HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry, if > HostnameVerifier in SolrHttpClientBuilder is not null, use it, otherwise same > logic as before. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13328) HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates connection
jefferyyuan created SOLR-13328: -- Summary: HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates connection Key: SOLR-13328 URL: https://issues.apache.org/jira/browse/SOLR-13328 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: clients - java Affects Versions: 8.0 Reporter: jefferyyuan Fix For: 8.0.1, 8.1 In SolrHttpClientBuilder, we can configure a lot of things including HostnameVerifier. We have code like below: HttpClientUtil.setHttpClientBuilder(new CommonNameVerifierClientConfigurer()); CommonNameVerifierClientConfigurer will set our own HostnameVerifier which checks subject dn name. But this doesn't work as when we create SSLConnectionSocketFactory at HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry() we don't check and use HostnameVerifier in SolrHttpClientBuilder at all. The fix would be very simple, at HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry, if HostnameVerifier in SolrHttpClientBuilder is not null, use it, otherwise same logic as before. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract + delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763028#comment-16763028 ] jefferyyuan commented on LUCENE-8662: - BTW, the performance (OOM) issues caused by the seekExact happened when we do search, commit that includes the offending ids. Even worse, it also happens when solr recoveries and try to replay the transaction log which includes the the offending ids. The recovery would fail and that solr node would never be online. - have to delete it and recreate the replica. > Change TermsEnum.seekExact(BytesRef) to abstract + delegate > seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > --- > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 50m > Remaining Estimate: 0h > > Recently in our production, we found that Solr uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract + delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Summary: Change TermsEnum.seekExact(BytesRef) to abstract + delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum (was: Change TermsEnum.seekExact(BytesRef) to abstract) > Change TermsEnum.seekExact(BytesRef) to abstract + delegate > seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > --- > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 50m > Remaining Estimate: 0h > > Recently in our production, we found that Solr uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758746#comment-16758746 ] jefferyyuan commented on LUCENE-8662: - [~simonw] [~dsmiley] addressed your comments in the PR and thanks : ) > Change TermsEnum.seekExact(BytesRef) to abstract > > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 50m > Remaining Estimate: 0h > > Recently in our production, we found that Solr uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Description: Recently in our production, we found that Solr uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:194) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z (DistributedUpdateProcessor.java:1051) {code} We reproduced the problem locally with the following code using Lucene code. {code:java} public static void main(String[] args) throws IOException { FSDirectory index = FSDirectory.open(Paths.get("the-index")); try (IndexReader reader = new ExitableDirectoryReader(DirectoryReader.open(index), new QueryTimeoutImpl(1000 * 60 * 5))) { String id = "the-id"; BytesRef text = new BytesRef(id); for (LeafReaderContext lf : reader.leaves()) { TermsEnum te = lf.reader().terms("id").iterator(); System.out.println(te.seekExact(text)); } } } {code} I added System.out.println("ord: " + ord); in codecs.blocktree.SegmentTermsEnum.getFrame(int). Please check the attached output of test program.txt. We found out the root cause: we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this case. {code:java} public boolean seekExact(BytesRef text) throws IOException { return seekCeil(text) == SeekStatus.FOUND; } {code} The fix is simple, just override seekExact(BytesRef) method in FilterLeafReader.FilterTerms {code:java} @Override public boolean seekExact(BytesRef text) throws IOException { return in.seekExact(text); } {code} was: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apa
[jira] [Commented] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755323#comment-16755323 ] jefferyyuan commented on LUCENE-8662: - Thanks for the comments and suggestions. Changed TermsEnum.seekExact(BytesRef) to abstract. When needed, all subclasses calls the default implementation for now. https://github.com/apache/lucene-solr/pull/551/files#diff-bdfed242b7c2c62e7df628f47532dfd9 Maybe we can check which subclasses should have its own implementation of seekExact method for the sake of better performance, and change them in another pr(s). > Change TermsEnum.seekExact(BytesRef) to abstract > > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Summary: Change TermsEnum.seekExact(BytesRef) to abstract (was: Make TermsEnum.seekExact(BytesRef) abstract) > Change TermsEnum.seekExact(BytesRef) to abstract > > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8662) Make TermsEnum.seekExact(BytesRef) abstract
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Summary: Make TermsEnum.seekExact(BytesRef) abstract (was: Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum) > Make TermsEnum.seekExact(BytesRef) abstract > --- > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Description: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:194) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z (DistributedUpdateProcessor.java:1051) {code} We reproduced the problem locally with the following code using Lucene code. {code:java} public static void main(String[] args) throws IOException { FSDirectory index = FSDirectory.open(Paths.get("the-index")); try (IndexReader reader = new ExitableDirectoryReader(DirectoryReader.open(index), new QueryTimeoutImpl(1000 * 60 * 5))) { String id = "the-id"; BytesRef text = new BytesRef(id); for (LeafReaderContext lf : reader.leaves()) { TermsEnum te = lf.reader().terms("id").iterator(); System.out.println(te.seekExact(text)); } } } {code} I added System.out.println("ord: " + ord); in codecs.blocktree.SegmentTermsEnum.getFrame(int). Please check the attached output of test program.txt. We found out the root cause: we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this case. {code:java} public boolean seekExact(BytesRef text) throws IOException { return seekCeil(text) == SeekStatus.FOUND; } {code} The fix is simple, just override seekExact(BytesRef) method in FilterLeafReader.FilterTerms {code:java} @Override public boolean seekExact(BytesRef text) throws IOException { return in.seekExact(text); } {code} was: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/ap
[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Description: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:194) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z (DistributedUpdateProcessor.java:1051) {code} We reproduced the problem locally with the following code using Lucene code. {code:java} public static void main(String[] args) throws IOException { FSDirectory index = FSDirectory.open(Paths.get("the-index")); try (IndexReader reader = new ExitableDirectoryReader(DirectoryReader.open(index), new QueryTimeoutImpl(1000 * 60 * 5))) { String id = "the-id"; BytesRef text = new BytesRef(id); for (LeafReaderContext lf : reader.leaves()) { TermsEnum te = lf.reader().terms("id").iterator(); System.out.println(te.seekExact(text)); } } } {code} Please check the attched We found out the root cause: we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this case. {code:java} public boolean seekExact(BytesRef text) throws IOException { return seekCeil(text) == SeekStatus.FOUND; } {code} The fix is simple, just override seekExact(BytesRef) method in FilterLeafReader.FilterTerms {code:java} @Override public boolean seekExact(BytesRef text) throws IOException { return in.seekExact(text); } {code} was: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apa
[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Description: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:194) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z (DistributedUpdateProcessor.java:1051) {code} We reproduced the problem locally with the following code using Lucene code. {code:java} public static void main(String[] args) throws IOException { FSDirectory index = FSDirectory.open(Paths.get("the-index")); try (IndexReader reader = new ExitableDirectoryReader(DirectoryReader.open(index), new QueryTimeoutImpl(1000 * 60 * 5))) { String id = "the-id"; BytesRef text = new BytesRef(id); for (LeafReaderContext lf : reader.leaves()) { TermsEnum te = lf.reader().terms("id").iterator(); System.out.println(te.seekExact(text)); } } } {code} I added System.out.println("ord: " + ord); in codecs.blocktree.SegmentTermsEnum.getFrame(int). Please check the attached output of test program.txt. We found out the root cause: we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this case. {code:java} public boolean seekExact(BytesRef text) throws IOException { return seekCeil(text) == SeekStatus.FOUND; } {code} The fix is simple, just override seekExact(BytesRef) method in FilterLeafReader.FilterTerms {code:java} @Override public boolean seekExact(BytesRef text) throws IOException { return in.seekExact(text); } {code} was: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apa
[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Attachment: output of test program.txt > Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Attachment: (was: output of test program.txt) > Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Attachment: output of test program.txt > Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Description: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:194) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z (DistributedUpdateProcessor.java:1051) {code} We reproduced the problem locally with the following code using Lucene code. {code:java} public static void main(String[] args) throws IOException { FSDirectory index = FSDirectory.open(Paths.get("the-index")); try (IndexReader reader = new ExitableDirectoryReader(DirectoryReader.open(index), new QueryTimeoutImpl(1000 * 60 * 5))) { String id = "the-id"; BytesRef text = new BytesRef(id); for (LeafReaderContext lf : reader.leaves()) { TermsEnum te = lf.reader().terms("id").iterator(); System.out.println(te.seekExact(text)); } } } {code} We found out the root cause: we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this case. {code:java} public boolean seekExact(BytesRef text) throws IOException { return seekCeil(text) == SeekStatus.FOUND; } {code} The fix is simple, just override seekExact(BytesRef) method in FilterLeafReader.FilterTerms {code:java} @Override public boolean seekExact(BytesRef text) throws IOException { return in.seekExact(text); } {code} was: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef
[jira] [Comment Edited] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754405#comment-16754405 ] jefferyyuan edited comment on LUCENE-8662 at 1/28/19 10:06 PM: --- At https://issues.apache.org/jira/browse/LUCENE-4874 - Don't override non abstract methods that have an impl through other abstract methods in FilterAtomicReader and related classes [https://github.com/apache/lucene-solr/commit/9588a84dec9fe5da210a9210cb0efbe3221c9f9e] - Should we add exception for seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum due to the performance issue? - FilterLeafReader.FilterTermsEnum delegates all calls to its field TermsEnum in, seems it makes sense to override seekExact(BytesRef). was (Author: yuanyun.cn): At https://issues.apache.org/jira/browse/LUCENE-4874 - Don't override non abstract methods that have an impl through other abstract methods in FilterAtomicReader and related classes [https://github.com/apache/lucene-solr/commit/9588a84dec9fe5da210a9210cb0efbe3221c9f9e] - Should we add exception for seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum due to the performance issue? - FilterLeafReader.FilterTermsEnum delegates all calls to its field TermsEnum in, seems it makes sense to override seekExact(BytesRef). > Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > We found out the root cause: we didn't implement seekExact(BytesRef) method > in FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekE
[jira] [Commented] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754405#comment-16754405 ] jefferyyuan commented on LUCENE-8662: - At https://issues.apache.org/jira/browse/LUCENE-4874 - Don't override non abstract methods that have an impl through other abstract methods in FilterAtomicReader and related classes [https://github.com/apache/lucene-solr/commit/9588a84dec9fe5da210a9210cb0efbe3221c9f9e] - Should we add exception for seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum due to the performance issue? - FilterLeafReader.FilterTermsEnum delegates all calls to its field TermsEnum in, seems it makes sense to override seekExact(BytesRef). > Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > We found out the root cause: we didn't implement seekExact(BytesRef) method > in FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754356#comment-16754356 ] jefferyyuan commented on LUCENE-8662: - PR here: [https://github.com/apache/lucene-solr/pull/551] Thanks. > Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 >Reporter: jefferyyuan >Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > We found out the root cause: we didn't implement seekExact(BytesRef) method > in FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated LUCENE-8662: Description: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:194) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z (DistributedUpdateProcessor.java:1051) {code} We reproduced the problem locally with the following code using Lucene code. {code:java} public static void main(String[] args) throws IOException { FSDirectory index = FSDirectory.open(Paths.get("the-index")); try (IndexReader reader = new ExitableDirectoryReader(DirectoryReader.open(index), new QueryTimeoutImpl(1000 * 60 * 5))) { String id = "the-id"; BytesRef text = new BytesRef(id); for (LeafReaderContext lf : reader.leaves()) { TermsEnum te = lf.reader().terms("id").iterator(); System.out.println(te.seekExact(text)); } } } {code} We found out the root cause: we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this case. {code:java} public boolean seekExact(BytesRef text) throws IOException { return seekCeil(text) == SeekStatus.FOUND; } {code} The fix is simple, just override seekExact(BytesRef) method in FilterLeafReader.FilterTerms {code:java} @Override public boolean seekExact(BytesRef text) throws IOException { return in.seekExact(text); } {code} was: Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava
[jira] [Created] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
jefferyyuan created LUCENE-8662: --- Summary: Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum Key: LUCENE-8662 URL: https://issues.apache.org/jira/browse/LUCENE-8662 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: 7.6, 6.6.5, 5.5.5, 8.0 Reporter: jefferyyuan Fix For: 8.0, 7.7 Recently in our production, we found that Sole uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb) The stack trace is: {code:java} Thread 0x4d4b115c0 at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786) at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757) at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185) at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74) at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823) at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204) at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786) at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:194) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z (DistributedUpdateProcessor.java:1051) {code} We reproduced the problem locally with the following code using Lucene code. {code:java} public static void main(String[] args) throws IOException { FSDirectory index = FSDirectory.open(Paths.get("the-index")); try (IndexReader reader = new ExitableDirectoryReader(DirectoryReader.open(index), new QueryTimeoutImpl(1000 * 60 * 5))) { String id = "the-id"; BytesRef text = new BytesRef(id); for (LeafReaderContext lf : reader.leaves()) { TermsEnum te = lf.reader().terms("id").iterator(); System.out.println(te.seekExact(text)); } } } {code} We found out the root cause: we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this case. {code:java} public boolean seekExact(BytesRef text) throws IOException { return seekCeil(text) == SeekStatus.FOUND; } {code} The fix is simple, just override seekExact(BytesRef) method in FilterLeafReader.FilterTerms {code:java} @Override public boolean seekExact(BytesRef text) throws IOException { return in.seekExact(text); } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709103#comment-16709103 ] jefferyyuan commented on SOLR-12833: It looks great for me and thanks, [~markrmil...@gmail.com]. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, master (8.0) >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: master (8.0) > > Attachments: SOLR-12833.patch, SOLR-12833.patch > > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688670#comment-16688670 ] jefferyyuan commented on SOLR-12833: Hi, [~markrmil...@gmail.com], the tryGetVersionBucketLock will throw exception if not able to get the lock. * it's kind of confusing, as it returns true if able to get lock, else throw exception. * The reason I am doing is: ** I want it to return a value(true or false), so we can unlock it at the finally if its true ** I don't want to put another if else. * I changed the method name to tryGetLockElseThrow to be a little bit more readable. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, master (8.0) >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: master (8.0) > > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665771#comment-16665771 ] jefferyyuan commented on SOLR-12833: Thanks [~markrmil...@gmail.com], changed default timeout to 10mins: same as default client read timeout. > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, master (8.0) >Reporter: jefferyyuan >Assignee: Mark Miller >Priority: Minor > Fix For: master (8.0) > > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639043#comment-16639043 ] jefferyyuan commented on SOLR-12833: Here is the PR: [https://github.com/apache/lucene-solr/pull/463/files] > Use timed-out lock in DistributedUpdateProcessor > > > Key: SOLR-12833 > URL: https://issues.apache.org/jira/browse/SOLR-12833 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 7.5, master (8.0) >Reporter: jefferyyuan >Priority: Minor > Fix For: master (8.0) > > > There is a synchronize block that blocks other update requests whose IDs fall > in the same hash bucket. The update waits forever until it gets the lock at > the synchronize block, this can be a problem in some cases. > > Some add/update requests (for example updates with spatial/shape analysis) > like may take time (30+ seconds or even more), this would the request time > out and fail. > Client may retry the same requests multiple times or several minutes, this > would make things worse. > The server side receives all the update requests but all except one can do > nothing, have to wait there. This wastes precious memory and cpu resource. > We have seen the case 2000+ threads are blocking at the synchronize lock, and > only a few updates are making progress. Each thread takes 3+ mb memory which > causes OOM. > Also if the update can't get the lock in expected time range, its better to > fail fast. > > We can have one configuration in solrconfig.xml: > updateHandler/versionLock/timeInMill, so users can specify how long they want > to wait the version bucket lock. > The default value can be -1, so it behaves same - wait forever until it gets > the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor
jefferyyuan created SOLR-12833: -- Summary: Use timed-out lock in DistributedUpdateProcessor Key: SOLR-12833 URL: https://issues.apache.org/jira/browse/SOLR-12833 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: update, UpdateRequestProcessors Affects Versions: 7.5, master (8.0) Reporter: jefferyyuan Fix For: master (8.0) There is a synchronize block that blocks other update requests whose IDs fall in the same hash bucket. The update waits forever until it gets the lock at the synchronize block, this can be a problem in some cases. Some add/update requests (for example updates with spatial/shape analysis) like may take time (30+ seconds or even more), this would the request time out and fail. Client may retry the same requests multiple times or several minutes, this would make things worse. The server side receives all the update requests but all except one can do nothing, have to wait there. This wastes precious memory and cpu resource. We have seen the case 2000+ threads are blocking at the synchronize lock, and only a few updates are making progress. Each thread takes 3+ mb memory which causes OOM. Also if the update can't get the lock in expected time range, its better to fail fast. We can have one configuration in solrconfig.xml: updateHandler/versionLock/timeInMill, so users can specify how long they want to wait the version bucket lock. The default value can be -1, so it behaves same - wait forever until it gets the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12612) Accept any key in cluster properties
[ https://issues.apache.org/jira/browse/SOLR-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587760#comment-16587760 ] jefferyyuan commented on SOLR-12612: Thanks [~janhoy], ext. is better and changed the code accordingly. > Accept any key in cluster properties > > > Key: SOLR-12612 > URL: https://issues.apache.org/jira/browse/SOLR-12612 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4, master (8.0) >Reporter: jefferyyuan >Priority: Minor > Fix For: master (8.0) > > > Cluster properties is a good place to store configuration data that's shared > in the whole cluster: solr and other (authorized) apps can easily read and > update them. > > It would be very useful if we can store extra data in cluster properties > which would act as a centralized property management system between solr and > its related apps (like manager or monitor apps). > > And the change would be also very simple. > We can also require all extra property starts with prefix like: extra_ > > PR: https://github.com/apache/lucene-solr/pull/429 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12612) Accept any key in cluster properties
[ https://issues.apache.org/jira/browse/SOLR-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584504#comment-16584504 ] jefferyyuan commented on SOLR-12612: thanks [~tomasflobbe] and [~anshumg] I changed the prefix to plugin. and added the tests, please check. > Accept any key in cluster properties > > > Key: SOLR-12612 > URL: https://issues.apache.org/jira/browse/SOLR-12612 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4, master (8.0) >Reporter: jefferyyuan >Priority: Minor > Fix For: master (8.0) > > > Cluster properties is a good place to store configuration data that's shared > in the whole cluster: solr and other (authorized) apps can easily read and > update them. > > It would be very useful if we can store extra data in cluster properties > which would act as a centralized property management system between solr and > its related apps (like manager or monitor apps). > > And the change would be also very simple. > We can also require all extra property starts with prefix like: extra_ > > PR: https://github.com/apache/lucene-solr/pull/429 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12612) Accept any key in cluster properties
[ https://issues.apache.org/jira/browse/SOLR-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-12612: --- Description: Cluster properties is a good place to store configuration data that's shared in the whole cluster: solr and other (authorized) apps can easily read and update them. It would be very useful if we can store extra data in cluster properties which would act as a centralized property management system between solr and its related apps (like manager or monitor apps). And the change would be also very simple. We can also require all extra property starts with prefix like: extra_ PR: https://github.com/apache/lucene-solr/pull/429 was: Cluster properties is a good place to store configuration data that's shared in the whole cluster: solr and other (authorized) apps can easily read and update them. It would be very useful if we can store extra data in cluster properties which would act as a centralized property management system between solr and its related apps (like manager or monitor apps). And the change would be also very simple. We can also require all extra property starts with prefix like: extra_ > Accept any key in cluster properties > > > Key: SOLR-12612 > URL: https://issues.apache.org/jira/browse/SOLR-12612 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4, master (8.0) >Reporter: jefferyyuan >Priority: Minor > Fix For: master (8.0) > > > Cluster properties is a good place to store configuration data that's shared > in the whole cluster: solr and other (authorized) apps can easily read and > update them. > > It would be very useful if we can store extra data in cluster properties > which would act as a centralized property management system between solr and > its related apps (like manager or monitor apps). > > And the change would be also very simple. > We can also require all extra property starts with prefix like: extra_ > > PR: https://github.com/apache/lucene-solr/pull/429 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12612) Accept any key in cluster properties
jefferyyuan created SOLR-12612: -- Summary: Accept any key in cluster properties Key: SOLR-12612 URL: https://issues.apache.org/jira/browse/SOLR-12612 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 7.4, master (8.0) Reporter: jefferyyuan Fix For: master (8.0) Cluster properties is a good place to store configuration data that's shared in the whole cluster: solr and other (authorized) apps can easily read and update them. It would be very useful if we can store extra data in cluster properties which would act as a centralized property management system between solr and its related apps (like manager or monitor apps). And the change would be also very simple. We can also require all extra property starts with prefix like: extra_ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)
[ https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561385#comment-16561385 ] jefferyyuan commented on SOLR-12477: [~varunthacker] It makes sense(as Mockito doesn't work with newer java) , and I have reverted the change at DirectUpdateHandlerTest#testAddDocThrowAlreadyClosedException. Please check and thanks. > Return server error(500) for AlreadyClosedException instead of client > Errors(400) > - > > Key: SOLR-12477 > URL: https://issues.apache.org/jira/browse/SOLR-12477 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: jefferyyuan >Assignee: Varun Thacker >Priority: Minor > Labels: update > Fix For: 7.3.2, master (8.0) > > Attachments: SOLR-12477.patch > > Time Spent: 40m > Remaining Estimate: 0h > > In some cases(for example: corrupt index), addDoc0 throws > AlreadyClosedException, but solr server returns client error 400 to client > This will confuse customers and especially monitoring tool. > Patch: [https://github.com/apache/lucene-solr/pull/402] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)
[ https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560599#comment-16560599 ] jefferyyuan commented on SOLR-12477: Thanks [~varunthacker] Addressed the comments in github and changed the code as you suggested : ) > Return server error(500) for AlreadyClosedException instead of client > Errors(400) > - > > Key: SOLR-12477 > URL: https://issues.apache.org/jira/browse/SOLR-12477 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: jefferyyuan >Assignee: Varun Thacker >Priority: Minor > Labels: update > Fix For: 7.3.2, master (8.0) > > Time Spent: 40m > Remaining Estimate: 0h > > In some cases(for example: corrupt index), addDoc0 throws > AlreadyClosedException, but solr server returns client error 400 to client > This will confuse customers and especially monitoring tool. > Patch: [https://github.com/apache/lucene-solr/pull/402] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)
[ https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560517#comment-16560517 ] jefferyyuan commented on SOLR-12477: thanks [~varunthacker] Changed CoreContainer.checkTragicException(SolrCore) to return true when there was a tragic exception. Please check the pr: [https://github.com/apache/lucene-solr/pull/402/files] Thanks. > Return server error(500) for AlreadyClosedException instead of client > Errors(400) > - > > Key: SOLR-12477 > URL: https://issues.apache.org/jira/browse/SOLR-12477 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: jefferyyuan >Assignee: Varun Thacker >Priority: Minor > Labels: update > Fix For: 7.3.2, master (8.0) > > Time Spent: 10m > Remaining Estimate: 0h > > In some cases(for example: corrupt index), addDoc0 throws > AlreadyClosedException, but solr server returns client error 400 to client > This will confuse customers and especially monitoring tool. > Patch: [https://github.com/apache/lucene-solr/pull/402] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)
[ https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551474#comment-16551474 ] jefferyyuan commented on SOLR-12477: Thanks, [~varunthacker] Made the change as you suggested. Please check. Just one exception: - corruptLeader may throw RemoteSolrException when called by test method. so the test code changes accordingly. > Return server error(500) for AlreadyClosedException instead of client > Errors(400) > - > > Key: SOLR-12477 > URL: https://issues.apache.org/jira/browse/SOLR-12477 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Affects Versions: 7.3.1, master (8.0) >Reporter: jefferyyuan >Assignee: Varun Thacker >Priority: Minor > Labels: update > Fix For: 7.3.2, master (8.0) > > Time Spent: 10m > Remaining Estimate: 0h > > In some cases(for example: corrupt index), addDoc0 throws > AlreadyClosedException, but solr server returns client error 400 to client > This will confuse customers and especially monitoring tool. > Patch: [https://github.com/apache/lucene-solr/pull/402] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)
[ https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-12477: --- Environment: (was: In some cases(for example: corrupt index), addDoc0 throws AlreadyClosedException, but solr server returns client error 400 to client This will confuse customers and especially monitoring tool. Patch: https://github.com/apache/lucene-solr/pull/402) Labels: update (was: ) Description: In some cases(for example: corrupt index), addDoc0 throws AlreadyClosedException, but solr server returns client error 400 to client This will confuse customers and especially monitoring tool. Patch: [https://github.com/apache/lucene-solr/pull/402] > Return server error(500) for AlreadyClosedException instead of client > Errors(400) > - > > Key: SOLR-12477 > URL: https://issues.apache.org/jira/browse/SOLR-12477 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Affects Versions: 7.3.1, master (8.0) >Reporter: jefferyyuan >Priority: Minor > Labels: update > Fix For: 7.3.2, master (8.0) > > Time Spent: 10m > Remaining Estimate: 0h > > In some cases(for example: corrupt index), addDoc0 throws > AlreadyClosedException, but solr server returns client error 400 to client > This will confuse customers and especially monitoring tool. > Patch: [https://github.com/apache/lucene-solr/pull/402] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)
[ https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-12477: --- Environment: In some cases(for example: corrupt index), addDoc0 throws AlreadyClosedException, but solr server returns client error 400 to client This will confuse customers and especially monitoring tool. Patch: https://github.com/apache/lucene-solr/pull/402 was: In some cases(for example: corrupt index), addDoc0 throws AlreadyClosedException, but solr server returns client error 400 to client This will confuse customers and especially monitoring tool. > Return server error(500) for AlreadyClosedException instead of client > Errors(400) > - > > Key: SOLR-12477 > URL: https://issues.apache.org/jira/browse/SOLR-12477 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Affects Versions: 7.3.1, master (8.0) > Environment: In some cases(for example: corrupt index), addDoc0 > throws AlreadyClosedException, but solr server returns client error 400 to > client > This will confuse customers and especially monitoring tool. > Patch: https://github.com/apache/lucene-solr/pull/402 >Reporter: jefferyyuan >Priority: Minor > Fix For: 7.3.2, master (8.0) > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)
jefferyyuan created SOLR-12477: -- Summary: Return server error(500) for AlreadyClosedException instead of client Errors(400) Key: SOLR-12477 URL: https://issues.apache.org/jira/browse/SOLR-12477 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: update Affects Versions: 7.3.1, master (8.0) Environment: In some cases(for example: corrupt index), addDoc0 throws AlreadyClosedException, but solr server returns client error 400 to client This will confuse customers and especially monitoring tool. Reporter: jefferyyuan Fix For: 7.3.2, master (8.0) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10885) NullPointerException when run collapse filter
[ https://issues.apache.org/jira/browse/SOLR-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257855#comment-16257855 ] jefferyyuan commented on SOLR-10885: Thanks, It make senses. Could we update the doc to explicitly state that: The collapse parser only supports collapsing on one field. - This can avoid others to misuse it and later wonder why it doesn't work. > NullPointerException when run collapse filter > -- > > Key: SOLR-10885 > URL: https://issues.apache.org/jira/browse/SOLR-10885 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 6.4.1 >Reporter: jefferyyuan >Assignee: Varun Thacker >Priority: Critical > > Solr collapse is a great function to collapse data that is related so we only > show one in search result. > Just found one issue related with it - It throw NullPointerException in some > cases. > To reproduce it, first ingest some data - AND commit multiple times. > 1. When there is no data that matches the query: > http://localhost:8983/solr/thecollection/select?defType=edismax&q=non-existType:*&fq={!collapse > field=seriesId nullPolicy=expand}&fq={!collapse field=programId > nullPolicy=expand} > - But the problem only happens if I use both collapse fqs, if I just use one > of them, it would be fine. > *2. When the data that matches the query doesn't have the collapse fields > - This is kind of a big problem as we may store different kinds of docs in > one collection, one query may match different kinds of docs. > If some docs (docType1) have same value for field1, we want to collapse > them, if other dosc(docType2) have some value for field2, do same things.* > - channel data doesn't have seriesId or programId > http://localhost:8983/solr/thecollection/select?defType=edismax&q=docType:channel&fq={!collapse > field=seriesId nullPolicy=expand}&fq={!collapse field=programId > nullPolicy=expand} > - But the problem only happens if I use both collapse fqs, if I just use one > of them, it would be fine. > Exception from log: > Caused by: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://localhost:8983/solr/searchItems_shard1_replica3: > java.lang.NullPointerException > at > org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:617) > at > org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:667) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:256) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1823) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1640) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:611) > at > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:533) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
[jira] [Comment Edited] (SOLR-6205) Make SolrCloud Data-center, rack or zone aware
[ https://issues.apache.org/jira/browse/SOLR-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191698#comment-16191698 ] jefferyyuan edited comment on SOLR-6205 at 10/6/17 9:56 PM: Seem this (at least part of) function has already been in Solr. Rule-based Replica Placement: http://lucene.apache.org/solr/guide/7_0/rule-based-replica-placement.html https://issues.apache.org/jira/browse/SOLR-6220 was (Author: yuanyun.cn): Make Solr rack awareness can help prevent data loss and improve query performance. Elastic-search already supported it: https://www.elastic.co/guide/en/elasticsearch/reference/5.4/allocation-awareness.html And a lot of projects support this: Hadoop, Cassandra, Kafka etc. > Make SolrCloud Data-center, rack or zone aware > -- > > Key: SOLR-6205 > URL: https://issues.apache.org/jira/browse/SOLR-6205 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.8.1 >Reporter: Arcadius Ahouansou >Assignee: Noble Paul > > Use case: > Let's say we have SolrCloud deployed across 2 Datacenters, racks or zones A > and B > There is a need to have a SolrCloud deployment that will make it possible to > have a working system even if one of the Datacenter/rack/zone A or B is lost. > - This has been discussed on the mailing list at > http://lucene.472066.n3.nabble.com/SolrCloud-multiple-data-center-support-td4115097.html > and there are many workarounds that require adding more moving parts to the > system. > - On the above thread, Daniel Collins mentioned > https://issues.apache.org/jira/browse/ZOOKEEPER-107 > which could help solve this issue. > - Note that this is a very important feature that is overlooked most of the > time. > - Note that this feature is available in ElasticSearch. > See > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness > and > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#forced-awareness -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6205) Make SolrCloud Data-center, rack or zone aware
[ https://issues.apache.org/jira/browse/SOLR-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191698#comment-16191698 ] jefferyyuan edited comment on SOLR-6205 at 10/4/17 5:53 PM: Make Solr rack awareness can help prevent data loss and improve query performance. Elastic-search already supported it: https://www.elastic.co/guide/en/elasticsearch/reference/5.4/allocation-awareness.html And a lot of projects support this: Hadoop, Cassandra, Kafka etc. was (Author: yuanyun.cn): Make Solr rack awareness can help prevent data loss and improve query performance. Elastic-search already supported it: https://www.elastic.co/guide/en/elasticsearch/reference/5.4/allocation-awareness.html > Make SolrCloud Data-center, rack or zone aware > -- > > Key: SOLR-6205 > URL: https://issues.apache.org/jira/browse/SOLR-6205 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.8.1 >Reporter: Arcadius Ahouansou >Assignee: Noble Paul > > Use case: > Let's say we have SolrCloud deployed across 2 Datacenters, racks or zones A > and B > There is a need to have a SolrCloud deployment that will make it possible to > have a working system even if one of the Datacenter/rack/zone A or B is lost. > - This has been discussed on the mailing list at > http://lucene.472066.n3.nabble.com/SolrCloud-multiple-data-center-support-td4115097.html > and there are many workarounds that require adding more moving parts to the > system. > - On the above thread, Daniel Collins mentioned > https://issues.apache.org/jira/browse/ZOOKEEPER-107 > which could help solve this issue. > - Note that this is a very important feature that is overlooked most of the > time. > - Note that this feature is available in ElasticSearch. > See > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness > and > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#forced-awareness -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6205) Make SolrCloud Data-center, rack or zone aware
[ https://issues.apache.org/jira/browse/SOLR-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191698#comment-16191698 ] jefferyyuan commented on SOLR-6205: --- Make Solr rack awareness can help prevent data loss and improve query performance. Elastic-search already supported it: https://www.elastic.co/guide/en/elasticsearch/reference/5.4/allocation-awareness.html > Make SolrCloud Data-center, rack or zone aware > -- > > Key: SOLR-6205 > URL: https://issues.apache.org/jira/browse/SOLR-6205 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.8.1 >Reporter: Arcadius Ahouansou >Assignee: Noble Paul > > Use case: > Let's say we have SolrCloud deployed across 2 Datacenters, racks or zones A > and B > There is a need to have a SolrCloud deployment that will make it possible to > have a working system even if one of the Datacenter/rack/zone A or B is lost. > - This has been discussed on the mailing list at > http://lucene.472066.n3.nabble.com/SolrCloud-multiple-data-center-support-td4115097.html > and there are many workarounds that require adding more moving parts to the > system. > - On the above thread, Daniel Collins mentioned > https://issues.apache.org/jira/browse/ZOOKEEPER-107 > which could help solve this issue. > - Note that this is a very important feature that is overlooked most of the > time. > - Note that this feature is available in ElasticSearch. > See > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness > and > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#forced-awareness -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-10950) Support context filtering for FuzzyLookupFactory
jefferyyuan created SOLR-10950: -- Summary: Support context filtering for FuzzyLookupFactory Key: SOLR-10950 URL: https://issues.apache.org/jira/browse/SOLR-10950 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Suggester Environment: FuzzyLookupFactory is great as it can still find matches even if users mis-spell. Context filtering is also great, as we can only show suggestions based on user's languages, doc types etc But it's a pity that (seems) FuzzyLookupFactory and context filtering don't work together. >From >http://lucene.472066.n3.nabble.com/Is-it-possible-to-support-context-filtering-for-FuzzyLookupFactory-td4342051.html Reporter: jefferyyuan Priority: Critical Fix For: 6.6.1 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10928) Support elevate.q in QueryElevationComponent
[ https://issues.apache.org/jira/browse/SOLR-10928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-10928: --- Summary: Support elevate.q in QueryElevationComponent (was: Support elevate.q () in QueryElevationComponent) > Support elevate.q in QueryElevationComponent > > > Key: SOLR-10928 > URL: https://issues.apache.org/jira/browse/SOLR-10928 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SearchComponents - other >Reporter: jefferyyuan >Priority: Critical > Fix For: 6.6.1 > > > QueryElevationComponent uses the query in parameter to match the elevate.xml. > "query text" from elevate.xml > : has to match the query (q=...). So in this case, elevation works only for > : http://localhost:8080/solr/elevate?q=brain, but not for > : http://localhost:8080/solr/elevate?q=indexingabstract:brain type of > queries. > But sometimes, the query is more complex, we may use some nested query or > complexphrase. > it would also be fairly easy to make QEC support an "elevate.q" param > similar to how there is a "spellcheck.q" param and a "hl.q" param to let the > client specify an alternate, simplified, string for the feature to use. > Conten copied from: > http://lucene.472066.n3.nabble.com/Problems-with-elevation-component-configuration-td3993204.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-10928) Support elevate.q () in QueryElevationComponent
jefferyyuan created SOLR-10928: -- Summary: Support elevate.q () in QueryElevationComponent Key: SOLR-10928 URL: https://issues.apache.org/jira/browse/SOLR-10928 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: SearchComponents - other Reporter: jefferyyuan Priority: Critical Fix For: 6.6.1 QueryElevationComponent uses the query in parameter to match the elevate.xml. "query text" from elevate.xml : has to match the query (q=...). So in this case, elevation works only for : http://localhost:8080/solr/elevate?q=brain, but not for : http://localhost:8080/solr/elevate?q=indexingabstract:brain type of queries. But sometimes, the query is more complex, we may use some nested query or complexphrase. it would also be fairly easy to make QEC support an "elevate.q" param similar to how there is a "spellcheck.q" param and a "hl.q" param to let the client specify an alternate, simplified, string for the feature to use. Conten copied from: http://lucene.472066.n3.nabble.com/Problems-with-elevation-component-configuration-td3993204.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10927) Support position to context.xml in Query Elevation Component
[ https://issues.apache.org/jira/browse/SOLR-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-10927: --- Summary: Support position to context.xml in Query Elevation Component (was: Suuport position to context.xml in Query Elevation Component) > Support position to context.xml in Query Elevation Component > > > Key: SOLR-10927 > URL: https://issues.apache.org/jira/browse/SOLR-10927 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SearchComponents - other >Reporter: jefferyyuan > Fix For: 6.7, 6.6.1 > > > Query Elevation Component is useful but is kind of limited. > Usually we want to boost one document for one query, but not necessary put at > the first one. > For example, user searches walking dead - we want to boost our shows "Deadly > Vampire", but we don't want to show it as the first one - as that may piss > off the user. > We want to show "Deadly Vampire" at the 2nd or maybe 3rd, 4th position. > Seems at Editorial Query Boosting Component > [https://issues.apache.org/jira/browse/SOLR-418], the original draft > implementation actually supports it - the priority property. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-10927) Suuport position to context.xml in Query Elevation Component
jefferyyuan created SOLR-10927: -- Summary: Suuport position to context.xml in Query Elevation Component Key: SOLR-10927 URL: https://issues.apache.org/jira/browse/SOLR-10927 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: SearchComponents - other Reporter: jefferyyuan Fix For: 6.7, 6.6.1 Query Elevation Component is useful but is kind of limited. Usually we want to boost one document for one query, but not necessary put at the first one. For example, user searches walking dead - we want to boost our shows "Deadly Vampire", but we don't want to show it as the first one - as that may piss off the user. We want to show "Deadly Vampire" at the 2nd or maybe 3rd, 4th position. Seems at Editorial Query Boosting Component [https://issues.apache.org/jira/browse/SOLR-418], the original draft implementation actually supports it - the priority property. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6092) Provide a REST managed QueryElevationComponent
[ https://issues.apache.org/jira/browse/SOLR-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810679#comment-15810679 ] jefferyyuan edited comment on SOLR-6092 at 6/20/17 7:14 PM: Vote for this. We can manage stop words, synonyms, why not QueryElevation which are much more useful. Also the content in elevate.xml - what we want to upselll for different query changed very frequently Thanks was (Author: yuanyun.cn): Vote for this. We can manage stop words, synonyms, why not QueryElevation which are much more useful. Thanks > Provide a REST managed QueryElevationComponent > -- > > Key: SOLR-6092 > URL: https://issues.apache.org/jira/browse/SOLR-6092 > Project: Solr > Issue Type: New Feature >Reporter: Timothy Potter >Priority: Minor > > Provide a managed query elevation component to allow CRUD operations from a > REST API. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10885) NullPointerException when run collapse filter
[ https://issues.apache.org/jira/browse/SOLR-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-10885: --- Description: Solr collapse is a great function to collapse data that is related so we only show one in search result. Just found one issue related with it - It throw NullPointerException in some cases. To reproduce it, first ingest some data - AND commit multiple times. 1. When there is no data that matches the query: http://localhost:8983/solr/thecollection/select?defType=edismax&q=non-existType:*&fq={!collapse field=seriesId nullPolicy=expand}&fq={!collapse field=programId nullPolicy=expand} - But the problem only happens if I use both collapse fqs, if I just use one of them, it would be fine. *2. When the data that matches the query doesn't have the collapse fields - This is kind of a big problem as we may store different kinds of docs in one collection, one query may match different kinds of docs. If some docs (docType1) have same value for field1, we want to collapse them, if other dosc(docType2) have some value for field2, do same things.* - channel data doesn't have seriesId or programId http://localhost:8983/solr/thecollection/select?defType=edismax&q=docType:channel&fq={!collapse field=seriesId nullPolicy=expand}&fq={!collapse field=programId nullPolicy=expand} - But the problem only happens if I use both collapse fqs, if I just use one of them, it would be fine. Exception from log: Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/searchItems_shard1_replica3: java.lang.NullPointerException at org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:617) at org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:667) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:256) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1823) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1640) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:611) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:533) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) int nextDocBase = currentContext + 1 < this.contexts.length ? this.contexts[(currentContext + 1)].docBase : this.maxDoc; - 617 from solr 6.4.1 CollapsingQParserPlugin.java Seems related with https://issues.apache.org/jira/browse/SOLR-8807 - But SOLR-8807 only fixes issue related with spell checker. I may test this with latest solr 6.6.0 when I have time. Updated: Whether solr supports multiple collapse fields? - Seems the query occasionally works (1/10 maybe), but othertimes it throws NullPointerException http://localhost:18983/solr/thecollection/select?q=programId:* AND id:*&defType=edismax&fq={!collapse+field=id }&fq={!collapse+field=programId } was: Solr collapse is a great function to collapse data that is rela
[jira] [Updated] (SOLR-10885) NullPointerException when run collapse filter
[ https://issues.apache.org/jira/browse/SOLR-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-10885: --- Description: Solr collapse is a great function to collapse data that is related so we only show one in search result. Just found one issue related with it - It throw NullPointerException in some cases. To reproduce it, first ingest some data - AND commit multiple times. 1. When there is no data that matches the query: http://localhost:8983/solr/thecollection/select?defType=edismax&q=non-existType:*&fq={!collapse field=seriesId nullPolicy=expand}&fq={!collapse field=programId nullPolicy=expand} - But the problem only happens if I use both collapse fqs, if I just use one of them, it would be fine. *2. When the data that matches the query doesn't have the collapse fields - This is kind of a big problem as we may store different kinds of docs in one collection, one query may match different kinds of docs. If some docs (docType1) have same value for field1, we want to collapse them, if other dosc(docType2) have some value for field2, do same things.* - channel data doesn't have seriesId or programId http://localhost:8983/solr/thecollection/select?defType=edismax&q=docType:channel&fq={!collapse field=seriesId nullPolicy=expand}&fq={!collapse field=programId nullPolicy=expand} - But the problem only happens if I use both collapse fqs, if I just use one of them, it would be fine. Exception from log: Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/searchItems_shard1_replica3: java.lang.NullPointerException at org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:617) at org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:667) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:256) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1823) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1640) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:611) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:533) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) int nextDocBase = currentContext + 1 < this.contexts.length ? this.contexts[(currentContext + 1)].docBase : this.maxDoc; - 617 from solr 6.4.1 CollapsingQParserPlugin.java Seems related with https://issues.apache.org/jira/browse/SOLR-8807 - But SOLR-8807 only fixes issue related with spell checker. I may test this with latest solr 6.6.0 when I have time. was: Solr collapse is a great function to collapse data that is related so we only show one in search result. Just found one issue related with it - It throw NullPointerException in some cases. To reproduce it, first ingest some data - AND commit multiple times. 1. When there is no data that matches the query: http://localhost:8983/solr/thecollection/select?defT
[jira] [Created] (SOLR-10885) NullPointerException when run collapse filter
jefferyyuan created SOLR-10885: -- Summary: NullPointerException when run collapse filter Key: SOLR-10885 URL: https://issues.apache.org/jira/browse/SOLR-10885 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: search Affects Versions: 6.4.1 Reporter: jefferyyuan Priority: Critical Solr collapse is a great function to collapse data that is related so we only show one in search result. Just found one issue related with it - It throw NullPointerException in some cases. To reproduce it, first ingest some data - AND commit multiple times. 1. When there is no data that matches the query: http://localhost:8983/solr/thecollection/select?defType=edismax&q=non-existType:*&fq={!collapse field=seriesId nullPolicy=expand}&fq={!collapse field=programId nullPolicy=expand} - But the problem only happens if I use both collapse fqs, if I just use one of them, it would be fine. 2. When the data that matches the query doesn't have the collapse fields - channel data doesn't have seriesId or programId http://localhost:8983/solr/thecollection/select?defType=edismax&q=docType:channel&fq={!collapse field=seriesId nullPolicy=expand}&fq={!collapse field=programId nullPolicy=expand} - But the problem only happens if I use both collapse fqs, if I just use one of them, it would be fine. Exception from log: Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/searchItems_shard1_replica3: java.lang.NullPointerException at org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:617) at org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:667) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:256) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1823) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1640) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:611) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:533) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) int nextDocBase = currentContext + 1 < this.contexts.length ? this.contexts[(currentContext + 1)].docBase : this.maxDoc; - 617 from solr 6.4.1 CollapsingQParserPlugin.java Seems related with https://issues.apache.org/jira/browse/SOLR-8807 - But SOLR-8807 only fixes issue related with spell checker. I may test this with latest solr 6.6.0 when I have time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6096) Support Update and Delete on nested documents
[ https://issues.apache.org/jira/browse/SOLR-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009519#comment-16009519 ] jefferyyuan commented on SOLR-6096: --- [~mkhludnev] How can we specify childfree=true when use solrJ? - Seems there is no childfree in solr's code at all. https://github.com/apache/lucene-solr/search?utf8=%E2%9C%93&q=childfree&type= How can we define the special purposed /blockupdate/ handler with explicit block semantics for all case above ? Thanks a lot. > Support Update and Delete on nested documents > - > > Key: SOLR-6096 > URL: https://issues.apache.org/jira/browse/SOLR-6096 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.7.2 >Reporter: Thomas Scheffler > Labels: blockjoin, nested > > When using nested or child document. Update and delete operation on the root > document should also affect the nested documents, as no child can exist > without its parent :-) > Example > {code:xml|title=First Import} > > 1 > Article with author > > Smith, John > author > > > {code} > If I change my mind and the author was not named *John* but *_Jane_*: > {code:xml|title=Changed name of author of '1'} > > 1 > Article with author > > Smith, Jane > author > > > {code} > I would expect that John is not in the index anymore. Currently he is. There > might also be the case that any subdocument is removed by an update: > {code:xml|title=Remove author} > > 1 > Article without author > > {code} > This should affect a delete on all nested documents, too. The same way all > nested documents should be deleted if I delete the root document: > {code:xml|title=Deletion of '1'} > > 1 > > > {code} > This is currently possible to do all this stuff on client side by issuing > additional request to delete document before every update. It would be more > efficient if this could be handled on SOLR side. One would benefit on atomic > update. The biggest plus shows when using "delete-by-query". > {code:xml|title=Deletion of '1' by query} > > title:* > > > {code} > In that case one would not have to first query all documents and issue > deletes by those id and every document that are nested. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
[ https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856300#comment-15856300 ] jefferyyuan commented on SOLR-6246: --- This is great news. Thanks so much [~steve_rowe] for clarifying my questions. - I should do more search before asking here. My fault. I do read release notes for Solr 6.4.1 but not lucene 6.4.1 - which I should. Also I should have read the issue links this Jira depends on. > Core fails to reload when AnalyzingInfixSuggester is used as a Suggester > > > Key: SOLR-6246 > URL: https://issues.apache.org/jira/browse/SOLR-6246 > Project: Solr > Issue Type: Sub-task > Components: SearchComponents - other >Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4 >Reporter: Varun Thacker >Assignee: Steve Rowe > Fix For: 6.5, master (7.0) > > Attachments: SOLR-6246.patch, SOLR-6246.patch, SOLR-6246-test.patch, > SOLR-6246-test.patch, SOLR-6246-test.patch, SOLR-6246-test.patch > > > LUCENE-5477 - added near-real-time suggest building to > AnalyzingInfixSuggester. One of the changes that went in was a writer is > persisted now to support real time updates via the add() and update() methods. > When we call Solr's reload command, a new instance of AnalyzingInfixSuggester > is created. When trying to create a new writer on the same Directory a lock > cannot be obtained and Solr fails to reload the core. > Also when AnalyzingInfixLookupFactory throws a RuntimeException we should > pass along the original message. > I am not sure what should be the approach to fix it. Should we have a > reloadHook where we close the writer? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
[ https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855637#comment-15855637 ] jefferyyuan edited comment on SOLR-6246 at 2/7/17 9:34 AM: --- Thanks [~steve_rowe] I am wondering is there any plan to also fix this issue in 6.4.x version? This fix is so valuable, without this we can't really use AnalyzingInfixSuggester - as we always reload the collections to update schema or config etc. And it takes time to release 6.5 - usually several(2 or 3) months. was (Author: yuanyun.cn): Thanks [~steve_rowe] I am wondering is there any plan to also fix this issue in 6.4.x version? This fix is so valuable, without this we can't really use AnalyzingInfixSuggester - as we always reload the collections to update schema or config etc. > Core fails to reload when AnalyzingInfixSuggester is used as a Suggester > > > Key: SOLR-6246 > URL: https://issues.apache.org/jira/browse/SOLR-6246 > Project: Solr > Issue Type: Sub-task > Components: SearchComponents - other >Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4 >Reporter: Varun Thacker >Assignee: Steve Rowe > Fix For: 6.5, master (7.0) > > Attachments: SOLR-6246.patch, SOLR-6246.patch, SOLR-6246-test.patch, > SOLR-6246-test.patch, SOLR-6246-test.patch, SOLR-6246-test.patch > > > LUCENE-5477 - added near-real-time suggest building to > AnalyzingInfixSuggester. One of the changes that went in was a writer is > persisted now to support real time updates via the add() and update() methods. > When we call Solr's reload command, a new instance of AnalyzingInfixSuggester > is created. When trying to create a new writer on the same Directory a lock > cannot be obtained and Solr fails to reload the core. > Also when AnalyzingInfixLookupFactory throws a RuntimeException we should > pass along the original message. > I am not sure what should be the approach to fix it. Should we have a > reloadHook where we close the writer? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
[ https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855637#comment-15855637 ] jefferyyuan commented on SOLR-6246: --- Thanks [~steve_rowe] I am wondering is there any plan to also fix this issue in 6.4.x version? This fix is so valuable, without this we can't really use AnalyzingInfixSuggester - as we always reload the collections to update schema or config etc. > Core fails to reload when AnalyzingInfixSuggester is used as a Suggester > > > Key: SOLR-6246 > URL: https://issues.apache.org/jira/browse/SOLR-6246 > Project: Solr > Issue Type: Sub-task > Components: SearchComponents - other >Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4 >Reporter: Varun Thacker >Assignee: Steve Rowe > Fix For: 6.5, master (7.0) > > Attachments: SOLR-6246.patch, SOLR-6246.patch, SOLR-6246-test.patch, > SOLR-6246-test.patch, SOLR-6246-test.patch, SOLR-6246-test.patch > > > LUCENE-5477 - added near-real-time suggest building to > AnalyzingInfixSuggester. One of the changes that went in was a writer is > persisted now to support real time updates via the add() and update() methods. > When we call Solr's reload command, a new instance of AnalyzingInfixSuggester > is created. When trying to create a new writer on the same Directory a lock > cannot be obtained and Solr fails to reload the core. > Also when AnalyzingInfixLookupFactory throws a RuntimeException we should > pass along the original message. > I am not sure what should be the approach to fix it. Should we have a > reloadHook where we close the writer? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
[ https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851104#comment-15851104 ] jefferyyuan commented on SOLR-6246: --- First I reproduce the issue in current 6.4. Then verified the 6.4.1 release candidate fixed the issue. Thanks for solving this issue and looking for 6.4.1 release. [~steve_rowe] > Core fails to reload when AnalyzingInfixSuggester is used as a Suggester > > > Key: SOLR-6246 > URL: https://issues.apache.org/jira/browse/SOLR-6246 > Project: Solr > Issue Type: Sub-task > Components: SearchComponents - other >Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4 >Reporter: Varun Thacker > Attachments: SOLR-6246.patch, SOLR-6246-test.patch, > SOLR-6246-test.patch, SOLR-6246-test.patch, SOLR-6246-test.patch > > > LUCENE-5477 - added near-real-time suggest building to > AnalyzingInfixSuggester. One of the changes that went in was a writer is > persisted now to support real time updates via the add() and update() methods. > When we call Solr's reload command, a new instance of AnalyzingInfixSuggester > is created. When trying to create a new writer on the same Directory a lock > cannot be obtained and Solr fails to reload the core. > Also when AnalyzingInfixLookupFactory throws a RuntimeException we should > pass along the original message. > I am not sure what should be the approach to fix it. Should we have a > reloadHook where we close the writer? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
[ https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822691#comment-15822691 ] jefferyyuan edited comment on SOLR-6246 at 1/14/17 5:15 AM: I tested with the latest build - solr-6.4.0-222,reloading collection/cores with AnalyzingInfixSuggester still failed with LockObtainFailedException. It failed with same error even after after solr. It can be easily reproduced, add a suggest component, then ***build the suggester***: suggest?suggest.build=true. Then reload collection or cores. - Seems the key to reproduce the issue is we need build the suggester. infixSuggester BlendedInfixLookupFactory DocumentDictionaryFactory position_linear suggester suggesterContextField 4 textSuggest infix_suggestions true false false true infixSuggester true 10 true suggest was (Author: yuanyun.cn): I tested with the latest build - solr-6.4.0-222,reloading collection/cores with AnalyzingInfixSuggester still failed with LockObtainFailedException. It failed with same error even after after solr. It can be easily reproduced, add a suggest component, then build the suggester: suggest?suggest.build=true. Then reload collection or cores/ infixSuggester BlendedInfixLookupFactory DocumentDictionaryFactory position_linear suggester suggesterContextField 4 textSuggest infix_suggestions true false false true infixSuggester true 10 true suggest > Core fails to reload when AnalyzingInfixSuggester is used as a Suggester > > > Key: SOLR-6246 > URL: https://issues.apache.org/jira/browse/SOLR-6246 > Project: Solr > Issue Type: Sub-task > Components: SearchComponents - other >Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4 >Reporter: Varun Thacker > Attachments: SOLR-6246.patch, SOLR-6246-test.patch, > SOLR-6246-test.patch, SOLR-6246-test.patch > > > LUCENE-5477 - added near-real-time suggest building to > AnalyzingInfixSuggester. One of the changes that went in was a writer is > persisted now to support real time updates via the add() and update() methods. > When we call Solr's reload command, a new instance of AnalyzingInfixSuggester > is created. When trying to create a new writer on the same Directory a lock > cannot be obtained and Solr fails to reload the core. > Also when AnalyzingInfixLookupFactory throws a RuntimeException we should > pass along the original message. > I am not sure what should be the approach to fix it. Should we have a > reloadHook where we close the writer? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
[ https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822691#comment-15822691 ] jefferyyuan commented on SOLR-6246: --- I tested with the latest build - solr-6.4.0-222,reloading collection/cores with AnalyzingInfixSuggester still failed with LockObtainFailedException. It failed with same error even after after solr. It can be easily reproduced, add a suggest component, then build the suggester: suggest?suggest.build=true. Then reload collection or cores/ infixSuggester BlendedInfixLookupFactory DocumentDictionaryFactory position_linear suggester suggesterContextField 4 textSuggest infix_suggestions true false false true infixSuggester true 10 true suggest > Core fails to reload when AnalyzingInfixSuggester is used as a Suggester > > > Key: SOLR-6246 > URL: https://issues.apache.org/jira/browse/SOLR-6246 > Project: Solr > Issue Type: Sub-task > Components: SearchComponents - other >Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4 >Reporter: Varun Thacker > Attachments: SOLR-6246.patch, SOLR-6246-test.patch, > SOLR-6246-test.patch, SOLR-6246-test.patch > > > LUCENE-5477 - added near-real-time suggest building to > AnalyzingInfixSuggester. One of the changes that went in was a writer is > persisted now to support real time updates via the add() and update() methods. > When we call Solr's reload command, a new instance of AnalyzingInfixSuggester > is created. When trying to create a new writer on the same Directory a lock > cannot be obtained and Solr fails to reload the core. > Also when AnalyzingInfixLookupFactory throws a RuntimeException we should > pass along the original message. > I am not sure what should be the approach to fix it. Should we have a > reloadHook where we close the writer? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
[ https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822377#comment-15822377 ] jefferyyuan commented on SOLR-6246: --- >From >https://builds.apache.org/job/Solr-Artifacts-6.x/lastSuccessfulBuild/artifact/solr/package/ - it was build 195 when I downloaded at that time I will try the newest build and check whether this works > Core fails to reload when AnalyzingInfixSuggester is used as a Suggester > > > Key: SOLR-6246 > URL: https://issues.apache.org/jira/browse/SOLR-6246 > Project: Solr > Issue Type: Sub-task > Components: SearchComponents - other >Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4 >Reporter: Varun Thacker > Attachments: SOLR-6246.patch, SOLR-6246-test.patch, > SOLR-6246-test.patch, SOLR-6246-test.patch > > > LUCENE-5477 - added near-real-time suggest building to > AnalyzingInfixSuggester. One of the changes that went in was a writer is > persisted now to support real time updates via the add() and update() methods. > When we call Solr's reload command, a new instance of AnalyzingInfixSuggester > is created. When trying to create a new writer on the same Directory a lock > cannot be obtained and Solr fails to reload the core. > Also when AnalyzingInfixLookupFactory throws a RuntimeException we should > pass along the original message. > I am not sure what should be the approach to fix it. Should we have a > reloadHook where we close the writer? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
[ https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822340#comment-15822340 ] jefferyyuan commented on SOLR-6246: --- I am running on Solr 6.4 - solr-6.4.0-195. But the problem still exists. Even restarting solr doesn't work - after restart solr and reload collection or current node still fails with LockObtainFailedException. I even tried to manually delete the write.lock, then call reload-collection/cores , it still failed again with same error. INFO - 2017-01-12 16:55:42.392; [c:myCollection s:shard2 r:core_node3 x:searchItems_shard2_replica1] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/cores params={core=searchItems_shard2_replica1&qt=/admin/cores&action=RELOAD&wt=javabin&version=2} status=500 QTime=592 ERROR - 2017-01-12 16:55:42.393; [c:myCollection s:shard2 r:core_node3 x:searchItems_shard2_replica1] org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:114) at org.apache.solr.handler.admin.CoreAdminOperation$$Lambda$23/265321659.execute(Unknown Source) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:377) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:365) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:156) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:152) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:445) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: Unable to reload core [searchItems_shard2_replica1] at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:950) at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:112) ... 34 more Caused by: org.apache.solr.common.SolrException: org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: /Applications/solr-6.4.0/example/cloud/node2/solr/searchItems_shard2_replica1/data/infix_suggestions/write.lock at org.apache.solr.core.SolrCore.(SolrCore.java:899)
[jira] [Closed] (LUCENE-7625) Support Multiple (AND) Context Filter Query in Suggestor
[ https://issues.apache.org/jira/browse/LUCENE-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan closed LUCENE-7625. --- Resolution: Not A Problem Already supported in Lucene/Solr: http://stackoverflow.com/questions/36079395/how-to-configure-multiple-contextfields-in-single-solr-suggester > Support Multiple (AND) Context Filter Query in Suggestor > > > Key: LUCENE-7625 > URL: https://issues.apache.org/jira/browse/LUCENE-7625 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/suggest >Reporter: jefferyyuan > Labels: lucene, solr, suggester > > Just as the normal query, usually we want to use multiple filter query when > run auto-completion. > It would be great if suggestor can return (title of) doc that is meaningful > to the current user where we need multiple filters. > Thanks > Jeffery Yuan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7625) Support Multiple (AND) Context Filter Query in Suggestor
[ https://issues.apache.org/jira/browse/LUCENE-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819194#comment-15819194 ] jefferyyuan commented on LUCENE-7625: - My mistake Lucene/Solr actually already support this. http://stackoverflow.com/questions/36079395/how-to-configure-multiple-contextfields-in-single-solr-suggester > Support Multiple (AND) Context Filter Query in Suggestor > > > Key: LUCENE-7625 > URL: https://issues.apache.org/jira/browse/LUCENE-7625 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/suggest >Reporter: jefferyyuan > Labels: lucene, solr, suggester > > Just as the normal query, usually we want to use multiple filter query when > run auto-completion. > It would be great if suggestor can return (title of) doc that is meaningful > to the current user where we need multiple filters. > Thanks > Jeffery Yuan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7625) Support Multiple (AND) Context Filter Query in Suggestor
jefferyyuan created LUCENE-7625: --- Summary: Support Multiple (AND) Context Filter Query in Suggestor Key: LUCENE-7625 URL: https://issues.apache.org/jira/browse/LUCENE-7625 Project: Lucene - Core Issue Type: Improvement Components: modules/suggest Reporter: jefferyyuan Just as the normal query, usually we want to use multiple filter query when run auto-completion. It would be great if suggestor can return (title of) doc that is meaningful to the current user where we need multiple filters. Thanks Jeffery Yuan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6092) Provide a REST managed QueryElevationComponent
[ https://issues.apache.org/jira/browse/SOLR-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810679#comment-15810679 ] jefferyyuan commented on SOLR-6092: --- Vote for this. We can manage stop words, synonyms, why not QueryElevation which are much more useful. Thanks > Provide a REST managed QueryElevationComponent > -- > > Key: SOLR-6092 > URL: https://issues.apache.org/jira/browse/SOLR-6092 > Project: Solr > Issue Type: New Feature >Reporter: Timothy Potter >Priority: Minor > > Provide a managed query elevation component to allow CRUD operations from a > REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9929) Documentation and sample code about how to train the model using user clicks when use ltr module
[ https://issues.apache.org/jira/browse/SOLR-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-9929: -- Summary: Documentation and sample code about how to train the model using user clicks when use ltr module (was: Documentation and smaple code about how to train the model using user clicks when use ltr module) > Documentation and sample code about how to train the model using user clicks > when use ltr module > > > Key: SOLR-9929 > URL: https://issues.apache.org/jira/browse/SOLR-9929 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: jefferyyuan > Labels: learning-to-rank, machine_learning, solr > > Thanks very much for integrating machine learning to Solr. > https://issues.apache.org/jira/browse/SOLR-8542 > I tried to integrate it. But have difficult figuring out how to translate the > partial pairwise feedback to the importance or relevance of that doc. > https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md > In the Assemble training data part: the third column indicates the relative > importance or relevance of that doc > Could you please give more info about how to give a score based on what user > clicks? > I have read > https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf > http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf > http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html > But still have no clue yet. > From a user's perspective, the steps such as setup the feature and model in > Solr is simple, but collecting the feedback data and train/update the model > is much more complex. Without it, we can't really use the learning-to-rank > function in Solr. > It would be great if Solr can provide some detailed instruction and sample > code about how to translate the partial pairwise feedback and use it to train > and update model. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9929) Documentation and smaple code about how to train the model using user clicks when use ltr module
jefferyyuan created SOLR-9929: - Summary: Documentation and smaple code about how to train the model using user clicks when use ltr module Key: SOLR-9929 URL: https://issues.apache.org/jira/browse/SOLR-9929 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: jefferyyuan Thanks very much for integrating machine learning to Solr. https://issues.apache.org/jira/browse/SOLR-8542 I tried to integrate it. But have difficult figuring out how to translate the partial pairwise feedback to the importance or relevance of that doc. https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md In the Assemble training data part: the third column indicates the relative importance or relevance of that doc Could you please give more info about how to give a score based on what user clicks? I have read https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html But still have no clue yet. >From a user's perspective, the steps such as setup the feature and model in >Solr is simple, but collecting the feedback data and train/update the model is >much more complex. Without it, we can't really use the learning-to-rank >function in Solr. It would be great if Solr can provide some detailed instruction and sample code about how to translate the partial pairwise feedback and use it to train and update model. Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5701) Allow DocTransformer to add arbitrary fields
[ https://issues.apache.org/jira/browse/SOLR-5701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-5701: -- Issue Type: Improvement (was: Bug) > Allow DocTransformer to add arbitrary fields > > > Key: SOLR-5701 > URL: https://issues.apache.org/jira/browse/SOLR-5701 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: jefferyyuan > Labels: search > Fix For: 4.9, 6.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > DocTransformer is very powerful, and allow us to add/remove or update fields > before returning. > One limit we don't like is that it can only add one field, and the field name > must be [transformer_name]. > We may want to add multiple fields in one DocTransformer. > One possible solution is to add method getFieldNames into DocTransformer. > public abstract class DocTransformer{ >public void List getFieldNames() { return null; } > } > Then in SolrReturnFields.add(String, NamedList, DocTransformers, > SolrQueryRequest) > Change augmenters.addTransformer( factory.create(disp, augmenterParams, req) > ); like below: > DocTransformer docTransfomer = factory.create(disp, augmenterParams, req); > SolrReturnFields.add(docTransfomer); > then read fi3eldnames: docTransfomer.getFieldNames(); add them into > SolrReturnFields. > DocTransfomer implementation would add all fields via doc.addField. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4736) Support group.mincount for Result Grouping
[ https://issues.apache.org/jira/browse/SOLR-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-4736: -- Issue Type: Improvement (was: Bug) > Support group.mincount for Result Grouping > -- > > Key: SOLR-4736 > URL: https://issues.apache.org/jira/browse/SOLR-4736 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.2 >Reporter: jefferyyuan >Priority: Minor > Labels: group, solr > Fix For: 4.9, 6.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > Result Grouping is a very useful feature: we can use it to find duplicate > data in index, but it lacks of one feature-group.mincount. > With group.mincount, we can specify that only groups that has equal or more > than ${mincount} for the group field will be returned. > Specially, we can use group.mincount=2 to only return duplicate data. > Could we add this in future release? Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9463) SolrJ: Support Converter and make it easier to extend DocumentObjectBinder
[ https://issues.apache.org/jira/browse/SOLR-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-9463: -- Priority: Major (was: Minor) > SolrJ: Support Converter and make it easier to extend DocumentObjectBinder > -- > > Key: SOLR-9463 > URL: https://issues.apache.org/jira/browse/SOLR-9463 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: jefferyyuan > Labels: extensibility, solrj > > In our old project, we use Spring-Solr, it provides some good function such > as allow us to define converters to serialize java enum to solr string and > vice verse, serialize object as json string and vice verse. > But it doesn't support latest solr, solr cloud and child documents. > We would like to use pure solrj, but we do like spring solr's converters > function. > Is it possible that SolrJ can support custom convert in SolrJ? > Also SolrJ should make it easier to extend DocumentObjectBinder, such as make > DocField, infocache, getDocFields etc accessible in sub class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-9602) Support Bucket Filters in Facet Functions
[ https://issues.apache.org/jira/browse/SOLR-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan closed SOLR-9602. - Resolution: Duplicate Yonik Seeley already created https://issues.apache.org/jira/browse/SOLR-9603. > Support Bucket Filters in Facet Functions > - > > Key: SOLR-9602 > URL: https://issues.apache.org/jira/browse/SOLR-9602 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, faceting >Reporter: jefferyyuan > Labels: facet, faceted-search, faceting, function > Fix For: 5.5.4, 6.3, 6.x, 6.2.2 > > > Original link: > http://lucene.472066.n3.nabble.com/Facet-Stats-MinCount-How-to-use-mincount-filter-when-use-facet-stats-td4299367.html > we need bucket filters in general (beyond mincount). - Yonik Seeley > We store some events data such as accountId, startTime, endTime, timeSpent > and some other searchable fields. > We want to get all acountIds that spend more than xhours between startTime > and endTime and some other criteria which are not important here. > We use solr facet function like below. > it's very powerful. The only missing part is that it doesn't minValue and > maxValue filter. > http://localhost:8983/solr/events/select?q=*:*&json.facet={ >categories:{ > type : terms, > field : accountId, > numBuckets: true, > facet:{ >sum : "sum(timeSpent)" >// it would be great if we support minValue, maxValue to do filter > here > } >} > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9603) Facet bucket filters
[ https://issues.apache.org/jira/browse/SOLR-9603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547369#comment-15547369 ] jefferyyuan commented on SOLR-9603: --- Original link: http://lucene.472066.n3.nabble.com/Facet-Stats-MinCount-How-to-use-mincount-filter-when-use-facet-stats-td4299367.html https://issues.apache.org/jira/browse/SOLR-9602 > Facet bucket filters > > > Key: SOLR-9603 > URL: https://issues.apache.org/jira/browse/SOLR-9603 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Yonik Seeley > > "filter" may be a bit of an overloaded term, but it would be nice to be able > to filter facet buckets by additional things, like the metrics that are > calculated per bucket. > This is like the HAVING clause in SQL. > Example of a facet that would group by author, find the average review rating > for that author, and filter out authors (buckets) with less than a 3.5 > average. > > {code} > reviews : { > type : terms, > field: author, > sort: "x desc", > having: "x >= 3.5", > facet : { > x : avg(rating) > } > } > {code} > > This functionality would also be useful for "pushing down" more calculations > to the endpoints for streaming expressions / SQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9602) Support Bucket Filters in Facet Functions
jefferyyuan created SOLR-9602: - Summary: Support Bucket Filters in Facet Functions Key: SOLR-9602 URL: https://issues.apache.org/jira/browse/SOLR-9602 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: Facet Module, faceting Reporter: jefferyyuan Fix For: 5.5.4, 6.3, 6.x, 6.2.2 Original link: http://lucene.472066.n3.nabble.com/Facet-Stats-MinCount-How-to-use-mincount-filter-when-use-facet-stats-td4299367.html we need bucket filters in general (beyond mincount). - Yonik Seeley We store some events data such as accountId, startTime, endTime, timeSpent and some other searchable fields. We want to get all acountIds that spend more than xhours between startTime and endTime and some other criteria which are not important here. We use solr facet function like below. it's very powerful. The only missing part is that it doesn't minValue and maxValue filter. http://localhost:8983/solr/events/select?q=*:*&json.facet={ categories:{ type : terms, field : accountId, numBuckets: true, facet:{ sum : "sum(timeSpent)" // it would be great if we support minValue, maxValue to do filter here } } } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9463) SolrJ: Support Converter and make it easier to extend DocumentObjectBinder
jefferyyuan created SOLR-9463: - Summary: SolrJ: Support Converter and make it easier to extend DocumentObjectBinder Key: SOLR-9463 URL: https://issues.apache.org/jira/browse/SOLR-9463 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: jefferyyuan Priority: Minor In our old project, we use Spring-Solr, it provides some good function such as allow us to define converters to serialize java enum to solr string and vice verse, serialize object as json string and vice verse. But it doesn't support latest solr, solr cloud and child documents. We would like to use pure solrj, but we do like spring solr's converters function. Is it possible that SolrJ can support custom convert in SolrJ? Also SolrJ should make it easier to extend DocumentObjectBinder, such as make DocField, infocache, getDocFields etc accessible in sub class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5005) JavaScriptRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290100#comment-15290100 ] jefferyyuan commented on SOLR-5005: --- Very useful feature, this can make easier for developer to extend solr - only need change solrconfig.xml and add one script file. > JavaScriptRequestHandler > > > Key: SOLR-5005 > URL: https://issues.apache.org/jira/browse/SOLR-5005 > Project: Solr > Issue Type: New Feature >Reporter: David Smiley >Assignee: Noble Paul > Attachments: SOLR-5005.patch, SOLR-5005.patch, SOLR-5005.patch, > SOLR-5005_ScriptRequestHandler_take3.patch, > SOLR-5005_ScriptRequestHandler_take3.patch, patch > > > A user customizable script based request handler would be very useful. It's > inspired from the ScriptUpdateRequestProcessor, but on the search end. A user > could write a script that submits searches to Solr (in-VM) and can react to > the results of one search before making another that is formulated > dynamically. And it can assemble the response data, potentially reducing > both the latency and data that would move over the wire if this feature > didn't exist. It could also be used to easily add a user-specifiable search > API at the Solr server with request parameters governed by what the user > wants to advertise -- especially useful within enterprises. And, it could be > used to enforce security requirements on allowable parameter valuables to > Solr, so a javascript based Solr client could be allowed to talk to only a > script based request handler which enforces the rules. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7131) Sort Group Ascendingly(asc_max) by Max Value in Each Group
[ https://issues.apache.org/jira/browse/SOLR-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jefferyyuan updated SOLR-7131: -- Description: Solr group supports asc and desc on some field: let's take sort=time asc as an example In asc mode: groups are sorted by the min value in the group, In desc mode, groups are sorted by the max value in the group. But users may want more: in asc_max mode, sort group by max(not min) value in the group ==> this should be a common requirement. Vice verse, in desc_min mode, sort group by min(not max) value in th group. We have this requirement in our product, and we implemented in some cumbersome way: by create a new kind of FieldComparator: LongAbnormalComparator It would be great Solr can support this. was: Solr group supports asc and desc on some field: let's take sort=time asc as an example In asc mode: groups are sorted by the min value in the group, In desc mode, groups are sorted by the max value in the group. But users may want more: in asc_max mode, sort group by max(not min) value in the group ==> this should be a common requirement. Vice verse, in desc_min mode, sort group by min(not max) value in th group. We have this requirement in our product, and we implemented in some cumbersome way: by create a new kind of FieldComparator: LongAbnormalComparator I am not we are not alone, and it would be great Solr can support this. > Sort Group Ascendingly(asc_max) by Max Value in Each Group > -- > > Key: SOLR-7131 > URL: https://issues.apache.org/jira/browse/SOLR-7131 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: jefferyyuan >Priority: Minor > Labels: group, search > Fix For: 5.2, Trunk > > > Solr group supports asc and desc on some field: > let's take sort=time asc as an example > In asc mode: groups are sorted by the min value in the group, > In desc mode, groups are sorted by the max value in the group. > But users may want more: > in asc_max mode, sort group by max(not min) value in the group > ==> this should be a common requirement. > Vice verse, in desc_min mode, sort group by min(not max) value in th group. > We have this requirement in our product, and we implemented in some > cumbersome way: by create a new kind of FieldComparator: > LongAbnormalComparator > It would be great Solr can support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7131) Sort Group Ascendingly(asc_max) by Max Value in Each Group
jefferyyuan created SOLR-7131: - Summary: Sort Group Ascendingly(asc_max) by Max Value in Each Group Key: SOLR-7131 URL: https://issues.apache.org/jira/browse/SOLR-7131 Project: Solr Issue Type: Improvement Components: search Reporter: jefferyyuan Priority: Minor Fix For: 5.1 Solr group supports asc and desc on some field: let's take sort=time asc as an example In asc mode: groups are sorted by the min value in the group, In desc mode, groups are sorted by the max value in the group. But users may want more: in asc_max mode, sort group by max(not min) value in the group ==> this should be a common requirement. Vice verse, in desc_min mode, sort group by min(not max) value in th group. We have this requirement in our product, and we implemented in some cumbersome way: by create a new kind of FieldComparator: LongAbnormalComparator I am not we are not alone, and it would be great Solr can support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7097) Update other Document in DocTransformer
[ https://issues.apache.org/jira/browse/SOLR-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328034#comment-14328034 ] jefferyyuan commented on SOLR-7097: --- Hi, Noble: In some cases, we may want to change previous documents: add/update/remove fields. Or totally remove previous documents. One use case: in solr group flat mode: group.main=true, and groupCount to the first document in this group: We choose to not change Solr code in our product instead we write a new CachedXMLWriter whose writeSolrDocument caches the SolrDocument, and write all documents out in writeEndDocumentList. http://lifelongprogrammer.blogspot.com/2015/01/use-solr-transformer-to-gen-groupcount.html It would be great if Solr allows us to change previous documents. > Update other Document in DocTransformer > --- > > Key: SOLR-7097 > URL: https://issues.apache.org/jira/browse/SOLR-7097 > Project: Solr > Issue Type: Improvement >Reporter: jefferyyuan >Priority: Minor > Labels: searcher, transformers > > Solr DocTransformer is good, but it only allows us to change current > document: add or remove, update fields. > It would be great if we can update other document(previous especially) , or > better we can delete doc(especially useful during test) or add doc in > DocTransformer. > User case: > We can use flat group mode(group.main=true) to put parent and child close to > each other(parent first), then we can use DocTransformer to update parent > document when access its child document. > Some thought about Implementation: > org.apache.solr.response.TextResponseWriter.writeDocuments(String, > ResultContext, ReturnFields) > when cachMode=true, in the for loop, after transform, we can store the > solrdoc in a list, write these doc at the end. > cachMode = req.getParams().getBool("cachMode", false); > SolrDocument[] cachedDocs = new SolrDocument[sz]; > for (int i = 0; i < sz; i++) { > SolrDocument sdoc = toSolrDocument(doc); > if (transformer != null) { > transformer.transform(sdoc, id); > } > if(cachMode) > { > cachedDocs[i] = sdoc; > } > else{ > writeSolrDocument( null, sdoc, returnFields, i ); > } > > } > if (transformer != null) { > transformer.setContext(null); > } > if(cachMode) { > for (int i = 0; i < sz; i++) { > writeSolrDocument(null, cachedDocs[i], returnFields, i); > } > } > writeEndDocumentList(); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org