from:"\"jefferyyuan \\\(JIRA\\\)\""

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-05-04 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833207#comment-16833207
 ] 

jefferyyuan commented on SOLR-12833:


[~ichattopadhyaya]

Please check the PR at [https://github.com/apache/lucene-solr/pull/663]

// this test checks the behavior of VersionBucket or TimedVersionBucket,

// it doesn't makes sense if there is no updateLog, thus no VersionBucket

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 7.7, 8.0, 8.1
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch, threadDump.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-05-04 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833207#comment-16833207
 ] 

jefferyyuan edited comment on SOLR-12833 at 5/5/19 3:55 AM:


[~ichattopadhyaya]

Please check the PR at [https://github.com/apache/lucene-solr/pull/663]

With this see(test config), the solr cluster doesn't define updateLog.

// this test checks the behavior of VersionBucket or TimedVersionBucket,

// it doesn't makes sense if there is no updateLog, thus no VersionBucket


was (Author: yuanyun.cn):
[~ichattopadhyaya]

Please check the PR at [https://github.com/apache/lucene-solr/pull/663]

// this test checks the behavior of VersionBucket or TimedVersionBucket,

// it doesn't makes sense if there is no updateLog, thus no VersionBucket

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 7.7, 8.0, 8.1
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch, threadDump.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-05-02 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832031#comment-16832031
 ] 

jefferyyuan commented on SOLR-12833:


[~ichattopadhyaya]

The error was related with unreleased tracked resources.

I pulled latest code, and ran it again multiple times(Just finished another 
one), all passed with no problem.

[beaster] Beast round 50 results: 
/Users/jyuan/apple/code-new/apple/solr/lucene-solr/solr/build/solr-core/test/50
 [beaster] Beasting finished Successfully.

- I was meant to delete that part, but didn't delete all : )

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 7.7, 8.0, 8.1
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch, threadDump.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-05-02 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831976#comment-16831976
 ] 

jefferyyuan edited comment on SOLR-12833 at 5/2/19 9:52 PM:


[~ab] [~ichattopadhyaya]

Please check pr at [https://github.com/apache/lucene-solr/pull/661]
 - I saw Ishan already committed the change, so just ignore this : )

I ran "ant beast -Dbeast.iters=50  -Dtestcase=PeerSyncTest -Dtests.method=test 
-Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple 
times, the problem/error is gone


was (Author: yuanyun.cn):
[~ab] [~ichattopadhyaya]

Please check pr at [https://github.com/apache/lucene-solr/pull/661]
 - I saw Ishan already committed the change, so just ignore this : )

I ran "ant beast -Dbeast.iters=50  -Dtestcase=PeerSyncTest -Dtests.method=test 
-Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple 
times, the problem/error is gone, but I saw some error: not sure whether it was 
related/expected or...

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 7.7, 8.0, 8.1
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch, threadDump.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-05-02 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831976#comment-16831976
 ] 

jefferyyuan commented on SOLR-12833:


[~ab] [~ichattopadhyaya]Sorry for the bug. Please check pr at 
[https://github.com/apache/lucene-solr/pull/661]

- I saw Ishan already committed the change, so just ignore this : )

I ran "ant beast -Dbeast.iters=50  -Dtestcase=PeerSyncTest -Dtests.method=test 
-Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple 
times, the problem/error is gone, but I saw some error: not sure whether it was 
related/expected or...

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 7.7, 8.0, 8.1
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch, threadDump.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-05-02 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831976#comment-16831976
 ] 

jefferyyuan edited comment on SOLR-12833 at 5/2/19 9:11 PM:


[~ab] [~ichattopadhyaya]

Please check pr at [https://github.com/apache/lucene-solr/pull/661]
 - I saw Ishan already committed the change, so just ignore this : )

I ran "ant beast -Dbeast.iters=50  -Dtestcase=PeerSyncTest -Dtests.method=test 
-Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple 
times, the problem/error is gone, but I saw some error: not sure whether it was 
related/expected or...


was (Author: yuanyun.cn):
[~ab] [~ichattopadhyaya]Sorry for the bug.

Please check pr at [https://github.com/apache/lucene-solr/pull/661]
 - I saw Ishan already committed the change, so just ignore this : )

I ran "ant beast -Dbeast.iters=50  -Dtestcase=PeerSyncTest -Dtests.method=test 
-Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple 
times, the problem/error is gone, but I saw some error: not sure whether it was 
related/expected or...

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 7.7, 8.0, 8.1
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch, threadDump.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-05-02 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831976#comment-16831976
 ] 

jefferyyuan edited comment on SOLR-12833 at 5/2/19 9:10 PM:


[~ab] [~ichattopadhyaya]Sorry for the bug.

Please check pr at [https://github.com/apache/lucene-solr/pull/661]
 - I saw Ishan already committed the change, so just ignore this : )

I ran "ant beast -Dbeast.iters=50  -Dtestcase=PeerSyncTest -Dtests.method=test 
-Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple 
times, the problem/error is gone, but I saw some error: not sure whether it was 
related/expected or...


was (Author: yuanyun.cn):
[~ab] [~ichattopadhyaya]Sorry for the bug. Please check pr at 
[https://github.com/apache/lucene-solr/pull/661]

- I saw Ishan already committed the change, so just ignore this : )

I ran "ant beast -Dbeast.iters=50  -Dtestcase=PeerSyncTest -Dtests.method=test 
-Dtests.slow=true -Dtests.badapples=true -Dtests.asserts=true" locally multiple 
times, the problem/error is gone, but I saw some error: not sure whether it was 
related/expected or...

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 7.7, 8.0, 8.1
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch, threadDump.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-05-02 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831922#comment-16831922
 ] 

jefferyyuan commented on SOLR-12833:


[~ichattopadhyaya] 

[~aivanise]

Found the issue and I will create one pr in a second. its related with: 

wait(TimeUnit.*_NANOSECONDS_*.toMillis(nanosTimeout))

when convert nanoseconds to millseconds, the latter may be 0, which would cause 
wait forever.

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 7.7, 8.0, 8.1
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch, threadDump.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-05-02 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831786#comment-16831786
 ] 

jefferyyuan commented on SOLR-12833:


Thanks for the ingo [~ab], I am checking it now and will focus on this today.

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 7.7, 8.0
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch, threadDump.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-04-24 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825415#comment-16825415
 ] 

jefferyyuan commented on SOLR-12833:


[~ab] [~markrmil...@gmail.com]

I cleaned the code and added the test cases, please check the pr: 
[https://github.com/apache/lucene-solr/pull/641/files]
 * all the doXXX methods will suppose it already owns the lock(either the 
intrinsic monitor or lock object) and unlock it at the finally block.
 * its caller calls vinfo.lockForUpdate(0 before and vinfo.unlockForUpdate() at 
the finally block.
 * so its clear who owns lock and should release the lock: symmetric : )

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 7.7, 8.0
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-04-23 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824565#comment-16824565
 ] 

jefferyyuan edited comment on SOLR-12833 at 4/23/19 10:02 PM:
--

[~ab] [~markrmil...@gmail.com] thanks for your update and sorry for the late 
response.

I have updated the PR with two implementations of VersionBucket: 
TimedVersionBucket. Please check.

[https://github.com/apache/lucene-solr/pull/641/files]
 * The code is not cleaned and just for prove of concept.
 * If the approach looks good to you, I will clean the code and improve it.
 * The change will also make the code a little bit cleaner: make method smaller 
: )

Thanks.


was (Author: yuanyun.cn):
[~ab] [~markrmil...@gmail.com] thanks for your update and sorry for the late 
response.

I have updated the PR with two implementations of VersionBucket: 
TimedVersionBucket. Please check.

[https://github.com/apache/lucene-solr/pull/641/files]

 * The code is not cleaned and just for prove of concept.
 * If the approach looks good to you, I will clean the code and improve it.

Thanks.

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 7.7, 8.0
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-04-23 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824565#comment-16824565
 ] 

jefferyyuan commented on SOLR-12833:


[~ab] [~markrmil...@gmail.com] thanks for your update and sorry for the late 
response.

I have updated the PR with two implementations of VersionBucket: 
TimedVersionBucket. Please check.

[https://github.com/apache/lucene-solr/pull/641/files]

 * The code is not cleaned and just for prove of concept.
 * If the approach looks good to you, I will clean the code and improve it.

Thanks.

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 7.7, 8.0
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-04-10 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815081#comment-16815081
 ] 

jefferyyuan commented on SOLR-12833:


[~ab]Based on your suggestion, I removed versionBucketLockTimeoutMs from 
VersionBucket.

[https://github.com/apache/lucene-solr/pull/641]

Thanks.

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 7.7, 8.0
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-04-10 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814993#comment-16814993
 ] 

jefferyyuan commented on SOLR-12833:


[~ab] I will create the pr by end of tomorrow. Thanks.

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 7.7, 8.0
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-04-04 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810203#comment-16810203
 ] 

jefferyyuan commented on SOLR-12833:


[~ab]

vinfo.lockForUpdate() is using the readLock().lock(), so multiple threads can 
still execute versionAdd and versionDelete simultaneously.

The readwrite lock in VersionIfo is used to make sure there would be no update 
coming when solr is doing recovery or switch tlog, etc.

 

The problem we are trying to solve here is that when users try to update docs 
in same buckets and the update takes time, only the first will get processed, 
all other updates on same buckets have to wait and these threads would pile up 
and eventually cause OOM or unable to handle other requests as all threads are 
used up.

 

This is even worse when clients retry update (like in cross-dc env, the 
consumer will try to re-execute the commands multiple times if it fails)

By default, customers do't enable this feature, if customer hits OOM and finds 
out that there are a lot of threads are waiting for the lock on VersionBucket, 
they can enable this feature to make the Solr cluster more stable: fail fast.

We added the test at 
[https://github.com/apache/lucene-solr/pull/463/files#diff-7b816a919f7a0caf8119a684a3e71c84],
 but to make the method testable, we need change the code: the tryLockElseThrow 
 method in DistributedUpdateProcessor. We can definitely re-add the test.

 

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 7.7, 8.0
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2019-04-03 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809012#comment-16809012
 ] 

jefferyyuan commented on SOLR-12833:


thanks for the findings, [~ab], [~markrmil...@gmail.com]

as you said, we don't need versionBucketLockTimeoutMs in every VersionBucket, I 
can create one pr to remove it from VersionBucket.

One approach to fix this problem:
 * if there is not a lot of competition on same version bucket and the update 
usually finishes fast, customers don't specify versionBucketLockTimeoutMs 
value, then we use the old VersionBucket which has no lock and Condition 
objects, and its lock signalAll, awaitNanos methods will keep the old ways, use 
the intrinsic.
 * if there is a lot of competition on same version bucket, and update(like geo 
related updates) takes time, customer can specify versionBucketLockTimeoutMs 
explicitly, then we can create and use another  class TimedVersionBucket that 
extends VersionBucket, and uses lock and Condition, so only one update on same 
bucket will actually go forward and get processed, other updates will fail fast.

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, 8.0
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 7.7, 8.0
>
> Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13328) HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates connection

2019-03-15 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan resolved SOLR-13328.

Resolution: Not A Problem

> HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates 
> connection
> ---
>
> Key: SOLR-13328
> URL: https://issues.apache.org/jira/browse/SOLR-13328
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: 8.0
>Reporter: jefferyyuan
>Priority: Minor
> Fix For: 8.0.1, 8.1
>
>
> In SolrHttpClientBuilder, we can configure a lot of things including 
> HostnameVerifier.
> We have code like below:
> HttpClientUtil.setHttpClientBuilder(new CommonNameVerifierClientConfigurer());
> CommonNameVerifierClientConfigurer will set our own HostnameVerifier which 
> checks subject dn name.
> But this doesn't work as when we create SSLConnectionSocketFactory at 
> HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry() we don't 
> check and use HostnameVerifier in SolrHttpClientBuilder at all.
> The fix would be very simple, at 
> HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry, if 
> HostnameVerifier in SolrHttpClientBuilder is not null, use it, otherwise same 
> logic as before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13328) HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates connection

2019-03-15 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793981#comment-16793981
 ] 

jefferyyuan edited comment on SOLR-13328 at 3/15/19 10:08 PM:
--

We are using latest SOlr 7, but seems Solr 8 removes HostnameVerifier from 
SolrHttpClientBuilder, so this Jira doesn't apply any more.


was (Author: yuanyun.cn):
Seems Solr 8 removes HostnameVerifier from SolrHttpClientBuilder, so this Jira 
doesn't apply any more.

> HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates 
> connection
> ---
>
> Key: SOLR-13328
> URL: https://issues.apache.org/jira/browse/SOLR-13328
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: 8.0
>Reporter: jefferyyuan
>Priority: Minor
> Fix For: 8.0.1, 8.1
>
>
> In SolrHttpClientBuilder, we can configure a lot of things including 
> HostnameVerifier.
> We have code like below:
> HttpClientUtil.setHttpClientBuilder(new CommonNameVerifierClientConfigurer());
> CommonNameVerifierClientConfigurer will set our own HostnameVerifier which 
> checks subject dn name.
> But this doesn't work as when we create SSLConnectionSocketFactory at 
> HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry() we don't 
> check and use HostnameVerifier in SolrHttpClientBuilder at all.
> The fix would be very simple, at 
> HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry, if 
> HostnameVerifier in SolrHttpClientBuilder is not null, use it, otherwise same 
> logic as before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13328) HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates connection

2019-03-15 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793981#comment-16793981
 ] 

jefferyyuan commented on SOLR-13328:


Seems Solr 8 removes HostnameVerifier from SolrHttpClientBuilder, so this Jira 
doesn't apply any more.

> HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates 
> connection
> ---
>
> Key: SOLR-13328
> URL: https://issues.apache.org/jira/browse/SOLR-13328
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: 8.0
>Reporter: jefferyyuan
>Priority: Minor
> Fix For: 8.0.1, 8.1
>
>
> In SolrHttpClientBuilder, we can configure a lot of things including 
> HostnameVerifier.
> We have code like below:
> HttpClientUtil.setHttpClientBuilder(new CommonNameVerifierClientConfigurer());
> CommonNameVerifierClientConfigurer will set our own HostnameVerifier which 
> checks subject dn name.
> But this doesn't work as when we create SSLConnectionSocketFactory at 
> HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry() we don't 
> check and use HostnameVerifier in SolrHttpClientBuilder at all.
> The fix would be very simple, at 
> HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry, if 
> HostnameVerifier in SolrHttpClientBuilder is not null, use it, otherwise same 
> logic as before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13328) HostnameVerifier in HttpClientBuilder is ignored when HttpClientUtil creates connection

2019-03-15 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-13328:
--

 Summary: HostnameVerifier in HttpClientBuilder is ignored when 
HttpClientUtil creates connection
 Key: SOLR-13328
 URL: https://issues.apache.org/jira/browse/SOLR-13328
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: clients - java
Affects Versions: 8.0
Reporter: jefferyyuan
 Fix For: 8.0.1, 8.1


In SolrHttpClientBuilder, we can configure a lot of things including 
HostnameVerifier.

We have code like below:

HttpClientUtil.setHttpClientBuilder(new CommonNameVerifierClientConfigurer());

CommonNameVerifierClientConfigurer will set our own HostnameVerifier which 
checks subject dn name.

But this doesn't work as when we create SSLConnectionSocketFactory at 
HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry() we don't check 
and use HostnameVerifier in SolrHttpClientBuilder at all.

The fix would be very simple, at 
HttpClientUtil.DefaultSchemaRegistryProvider.getSchemaRegistry, if 
HostnameVerifier in SolrHttpClientBuilder is not null, use it, otherwise same 
logic as before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract + delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-02-07 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763028#comment-16763028
 ] 

jefferyyuan commented on LUCENE-8662:
-

BTW, the performance (OOM) issues caused by the seekExact happened when we do 
search, commit that includes the offending ids.

Even worse, it also happens when solr recoveries and try to replay the 
transaction log which includes the the offending ids. The recovery would fail 
and that solr node would never be online.

- have to delete it and recreate the replica.

> Change TermsEnum.seekExact(BytesRef) to abstract + delegate 
> seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> ---
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Solr uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
> I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract + delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-02-01 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Summary: Change TermsEnum.seekExact(BytesRef) to abstract + delegate 
seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum  (was: Change 
TermsEnum.seekExact(BytesRef) to abstract)

> Change TermsEnum.seekExact(BytesRef) to abstract + delegate 
> seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> ---
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Solr uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
> I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract

2019-02-01 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758746#comment-16758746
 ] 

jefferyyuan commented on LUCENE-8662:
-

[~simonw] [~dsmiley] addressed your comments in the PR and thanks : )

> Change TermsEnum.seekExact(BytesRef) to abstract
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Solr uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
> I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract

2019-01-29 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Description: 
Recently in our production, we found that Solr uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (UpdateLog.java:786) 
  at 
org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:194) 
  at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
 (DistributedUpdateProcessor.java:1051)  
{code}
We reproduced the problem locally with the following code using Lucene code.
{code:java}
public static void main(String[] args) throws IOException {
  FSDirectory index = FSDirectory.open(Paths.get("the-index"));
  try (IndexReader reader = new   
ExitableDirectoryReader(DirectoryReader.open(index),
new QueryTimeoutImpl(1000 * 60 * 5))) {
String id = "the-id";

BytesRef text = new BytesRef(id);
for (LeafReaderContext lf : reader.leaves()) {
  TermsEnum te = lf.reader().terms("id").iterator();
  System.out.println(te.seekExact(text));
}
  }
}
{code}
 

I added System.out.println("ord: " + ord); in 
codecs.blocktree.SegmentTermsEnum.getFrame(int).

Please check the attached output of test program.txt. 

 

We found out the root cause:

we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, 
so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is 
very inefficient in this case.
{code:java}
public boolean seekExact(BytesRef text) throws IOException {
  return seekCeil(text) == SeekStatus.FOUND;
}
{code}
The fix is simple, just override seekExact(BytesRef) method in 
FilterLeafReader.FilterTerms
{code:java}
@Override
public boolean seekExact(BytesRef text) throws IOException {
  return in.seekExact(text);
}
{code}

  was:
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apa

[jira] [Commented] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract

2019-01-29 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755323#comment-16755323
 ] 

jefferyyuan commented on LUCENE-8662:
-

Thanks for the comments and suggestions.
Changed TermsEnum.seekExact(BytesRef) to abstract.

When needed, all subclasses calls the default implementation for now.
https://github.com/apache/lucene-solr/pull/551/files#diff-bdfed242b7c2c62e7df628f47532dfd9

Maybe we can check which subclasses should have its own implementation of 
seekExact method for the sake of better performance, and change them in another 
pr(s).

> Change TermsEnum.seekExact(BytesRef) to abstract
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
> I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract

2019-01-29 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Summary: Change TermsEnum.seekExact(BytesRef) to abstract  (was: Make 
TermsEnum.seekExact(BytesRef) abstract)

> Change TermsEnum.seekExact(BytesRef) to abstract
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
> I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8662) Make TermsEnum.seekExact(BytesRef) abstract

2019-01-29 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Summary: Make TermsEnum.seekExact(BytesRef) abstract  (was: Override 
seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum)

> Make TermsEnum.seekExact(BytesRef) abstract
> ---
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
> I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Description: 
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (UpdateLog.java:786) 
  at 
org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:194) 
  at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
 (DistributedUpdateProcessor.java:1051)  
{code}
We reproduced the problem locally with the following code using Lucene code.
{code:java}
public static void main(String[] args) throws IOException {
  FSDirectory index = FSDirectory.open(Paths.get("the-index"));
  try (IndexReader reader = new   
ExitableDirectoryReader(DirectoryReader.open(index),
new QueryTimeoutImpl(1000 * 60 * 5))) {
String id = "the-id";

BytesRef text = new BytesRef(id);
for (LeafReaderContext lf : reader.leaves()) {
  TermsEnum te = lf.reader().terms("id").iterator();
  System.out.println(te.seekExact(text));
}
  }
}
{code}
 

 I added System.out.println("ord: " + ord); in 
codecs.blocktree.SegmentTermsEnum.getFrame(int).

Please check the attached output of test program.txt. 

 

We found out the root cause:

we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, 
so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is 
very inefficient in this case.
{code:java}
public boolean seekExact(BytesRef text) throws IOException {
  return seekCeil(text) == SeekStatus.FOUND;
}
{code}
The fix is simple, just override seekExact(BytesRef) method in 
FilterLeafReader.FilterTerms
{code:java}
@Override
public boolean seekExact(BytesRef text) throws IOException {
  return in.seekExact(text);
}
{code}

  was:
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/ap

[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Description: 
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (UpdateLog.java:786) 
  at 
org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:194) 
  at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
 (DistributedUpdateProcessor.java:1051)  
{code}
We reproduced the problem locally with the following code using Lucene code.
{code:java}
public static void main(String[] args) throws IOException {
  FSDirectory index = FSDirectory.open(Paths.get("the-index"));
  try (IndexReader reader = new   
ExitableDirectoryReader(DirectoryReader.open(index),
new QueryTimeoutImpl(1000 * 60 * 5))) {
String id = "the-id";

BytesRef text = new BytesRef(id);
for (LeafReaderContext lf : reader.leaves()) {
  TermsEnum te = lf.reader().terms("id").iterator();
  System.out.println(te.seekExact(text));
}
  }
}
{code}
 

Please check the attched 

We found out the root cause:

we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, 
so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is 
very inefficient in this case.
{code:java}
public boolean seekExact(BytesRef text) throws IOException {
  return seekCeil(text) == SeekStatus.FOUND;
}
{code}
The fix is simple, just override seekExact(BytesRef) method in 
FilterLeafReader.FilterTerms
{code:java}
@Override
public boolean seekExact(BytesRef text) throws IOException {
  return in.seekExact(text);
}
{code}

  was:
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (UpdateLog.java:786) 
  at 
org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apa

[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Description: 
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (UpdateLog.java:786) 
  at 
org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:194) 
  at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
 (DistributedUpdateProcessor.java:1051)  
{code}
We reproduced the problem locally with the following code using Lucene code.
{code:java}
public static void main(String[] args) throws IOException {
  FSDirectory index = FSDirectory.open(Paths.get("the-index"));
  try (IndexReader reader = new   
ExitableDirectoryReader(DirectoryReader.open(index),
new QueryTimeoutImpl(1000 * 60 * 5))) {
String id = "the-id";

BytesRef text = new BytesRef(id);
for (LeafReaderContext lf : reader.leaves()) {
  TermsEnum te = lf.reader().terms("id").iterator();
  System.out.println(te.seekExact(text));
}
  }
}
{code}
 

I added System.out.println("ord: " + ord); in 
codecs.blocktree.SegmentTermsEnum.getFrame(int).

Please check the attached output of test program.txt. 

 

We found out the root cause:

we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, 
so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is 
very inefficient in this case.
{code:java}
public boolean seekExact(BytesRef text) throws IOException {
  return seekCeil(text) == SeekStatus.FOUND;
}
{code}
The fix is simple, just override seekExact(BytesRef) method in 
FilterLeafReader.FilterTerms
{code:java}
@Override
public boolean seekExact(BytesRef text) throws IOException {
  return in.seekExact(text);
}
{code}

  was:
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apa

[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Attachment: output of test program.txt

> Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
>  I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Attachment: (was: output of test program.txt)

> Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
>  I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Attachment: output of test program.txt

> Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Description: 
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (UpdateLog.java:786) 
  at 
org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:194) 
  at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
 (DistributedUpdateProcessor.java:1051)  
{code}
We reproduced the problem locally with the following code using Lucene code.
{code:java}
public static void main(String[] args) throws IOException {
  FSDirectory index = FSDirectory.open(Paths.get("the-index"));
  try (IndexReader reader = new   
ExitableDirectoryReader(DirectoryReader.open(index),
new QueryTimeoutImpl(1000 * 60 * 5))) {
String id = "the-id";

BytesRef text = new BytesRef(id);
for (LeafReaderContext lf : reader.leaves()) {
  TermsEnum te = lf.reader().terms("id").iterator();
  System.out.println(te.seekExact(text));
}
  }
}
{code}
 

 

We found out the root cause:

we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, 
so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is 
very inefficient in this case.
{code:java}
public boolean seekExact(BytesRef text) throws IOException {
  return seekCeil(text) == SeekStatus.FOUND;
}
{code}
The fix is simple, just override seekExact(BytesRef) method in 
FilterLeafReader.FilterTerms
{code:java}
@Override
public boolean seekExact(BytesRef text) throws IOException {
  return in.seekExact(text);
}
{code}

  was:
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (UpdateLog.java:786) 
  at 
org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef

[jira] [Comment Edited] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754405#comment-16754405
 ] 

jefferyyuan edited comment on LUCENE-8662 at 1/28/19 10:06 PM:
---

At https://issues.apache.org/jira/browse/LUCENE-4874
 - Don't override non abstract methods that have an impl through other abstract 
methods in FilterAtomicReader and related classes
 
[https://github.com/apache/lucene-solr/commit/9588a84dec9fe5da210a9210cb0efbe3221c9f9e]

 
 - Should we add exception for seekExact(BytesRef) in 
FilterLeafReader.FilterTermsEnum due to the performance issue?
 - FilterLeafReader.FilterTermsEnum delegates all calls to its field TermsEnum 
in, seems it makes sense to override seekExact(BytesRef).
  


was (Author: yuanyun.cn):
At https://issues.apache.org/jira/browse/LUCENE-4874

- Don't override non abstract methods that have an impl through other abstract 
methods in FilterAtomicReader and related classes
[https://github.com/apache/lucene-solr/commit/9588a84dec9fe5da210a9210cb0efbe3221c9f9e]
 
 
- Should we add exception for seekExact(BytesRef) in 
FilterLeafReader.FilterTermsEnum due to the performance issue?
- FilterLeafReader.FilterTermsEnum delegates all calls to its field TermsEnum 
in, seems it makes sense to override seekExact(BytesRef).
 

> Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
> We found out the root cause: we didn't implement seekExact(BytesRef) method 
> in FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekE

[jira] [Commented] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754405#comment-16754405
 ] 

jefferyyuan commented on LUCENE-8662:
-

At https://issues.apache.org/jira/browse/LUCENE-4874

- Don't override non abstract methods that have an impl through other abstract 
methods in FilterAtomicReader and related classes
[https://github.com/apache/lucene-solr/commit/9588a84dec9fe5da210a9210cb0efbe3221c9f9e]
 
 
- Should we add exception for seekExact(BytesRef) in 
FilterLeafReader.FilterTermsEnum due to the performance issue?
- FilterLeafReader.FilterTermsEnum delegates all calls to its field TermsEnum 
in, seems it makes sense to override seekExact(BytesRef).
 

> Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
> We found out the root cause: we didn't implement seekExact(BytesRef) method 
> in FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754356#comment-16754356
 ] 

jefferyyuan commented on LUCENE-8662:
-

PR here:

[https://github.com/apache/lucene-solr/pull/551]

Thanks.

> Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
> We found out the root cause: we didn't implement seekExact(BytesRef) method 
> in FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated LUCENE-8662:

Description: 
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
 The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (UpdateLog.java:786) 
  at 
org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:194) 
  at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
 (DistributedUpdateProcessor.java:1051)  
{code}
We reproduced the problem locally with the following code using Lucene code.
{code:java}
public static void main(String[] args) throws IOException {
  FSDirectory index = FSDirectory.open(Paths.get("the-index"));
  try (IndexReader reader = new   
ExitableDirectoryReader(DirectoryReader.open(index),
new QueryTimeoutImpl(1000 * 60 * 5))) {
String id = "the-id";

BytesRef text = new BytesRef(id);
for (LeafReaderContext lf : reader.leaves()) {
  TermsEnum te = lf.reader().terms("id").iterator();
  System.out.println(te.seekExact(text));
}
  }
}
{code}
We found out the root cause: we didn't implement seekExact(BytesRef) method in 
FilterLeafReader.FilterTerms, so it uses the base class 
TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this 
case.
{code:java}
public boolean seekExact(BytesRef text) throws IOException {
  return seekCeil(text) == SeekStatus.FOUND;
}
{code}
The fix is simple, just override seekExact(BytesRef) method in 
FilterLeafReader.FilterTerms
{code:java}
@Override
public boolean seekExact(BytesRef text) throws IOException {
  return in.seekExact(text);
}
{code}

  was:
Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (UpdateLog.java:786) 
  at 
org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava

[jira] [Created] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-28 Thread jefferyyuan (JIRA)

jefferyyuan created LUCENE-8662:
---

 Summary: Override seekExact(BytesRef) in 
FilterLeafReader.FilterTermsEnum
 Key: LUCENE-8662
 URL: https://issues.apache.org/jira/browse/LUCENE-8662
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: 7.6, 6.6.5, 5.5.5, 8.0
Reporter: jefferyyuan
 Fix For: 8.0, 7.7


Recently in our production, we found that Sole uses a lot of memory(more than 
10g) during recovery or commit for a small index (3.5gb)
The stack trace is:

 
{code:java}
Thread 0x4d4b115c0 
  at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
  at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
(SegmentTermsEnumFrame.java:157) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:786) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnumFrame.java:538) 
  at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (SegmentTermsEnum.java:757) 
  at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
 (FilterLeafReader.java:185) 
  at 
org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z 
(TermsEnum.java:74) 
  at 
org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
 (SolrIndexSearcher.java:823) 
  at 
org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:204) 
  at 
org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (UpdateLog.java:786) 
  at 
org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
 (VersionInfo.java:194) 
  at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
 (DistributedUpdateProcessor.java:1051)  
{code}
We reproduced the problem locally with the following code using Lucene code.
{code:java}
public static void main(String[] args) throws IOException {
  FSDirectory index = FSDirectory.open(Paths.get("the-index"));
  try (IndexReader reader = new   
ExitableDirectoryReader(DirectoryReader.open(index),
new QueryTimeoutImpl(1000 * 60 * 5))) {
String id = "the-id";

BytesRef text = new BytesRef(id);
for (LeafReaderContext lf : reader.leaves()) {
  TermsEnum te = lf.reader().terms("id").iterator();
  System.out.println(te.seekExact(text));
}
  }
}
{code}

We found out the root cause: we didn't implement seekExact(BytesRef) method in 
FilterLeafReader.FilterTerms, so it uses the base class 
TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this 
case.
{code:java}
public boolean seekExact(BytesRef text) throws IOException {
  return seekCeil(text) == SeekStatus.FOUND;
}
{code}

The fix is simple, just override seekExact(BytesRef) method in 
FilterLeafReader.FilterTerms
{code:java}
@Override
public boolean seekExact(BytesRef text) throws IOException {
  return in.seekExact(text);
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2018-12-04 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709103#comment-16709103
 ] 

jefferyyuan commented on SOLR-12833:


It looks great for me and thanks, [~markrmil...@gmail.com].

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, master (8.0)
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: master (8.0)
>
> Attachments: SOLR-12833.patch, SOLR-12833.patch
>
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2018-11-15 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688670#comment-16688670
 ] 

jefferyyuan commented on SOLR-12833:


Hi, [~markrmil...@gmail.com], the tryGetVersionBucketLock will throw exception 
if not able to get the lock.
 * it's kind of confusing, as it returns true if able to get lock, else throw 
exception.
 * The reason I am doing is:
 ** I want it to return a value(true or false), so we can unlock it at the 
finally if its true
 ** I don't want to put another if else.
 * I changed the method name to tryGetLockElseThrow to be a little bit more 
readable.

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, master (8.0)
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: master (8.0)
>
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2018-10-26 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665771#comment-16665771
 ] 

jefferyyuan commented on SOLR-12833:


Thanks [~markrmil...@gmail.com], changed default timeout to 10mins: same as 
default client read timeout.

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, master (8.0)
>Reporter: jefferyyuan
>Assignee: Mark Miller
>Priority: Minor
> Fix For: master (8.0)
>
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2018-10-04 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639043#comment-16639043
 ] 

jefferyyuan commented on SOLR-12833:


Here is the PR: 

[https://github.com/apache/lucene-solr/pull/463/files]

 

> Use timed-out lock in DistributedUpdateProcessor
> 
>
> Key: SOLR-12833
> URL: https://issues.apache.org/jira/browse/SOLR-12833
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 7.5, master (8.0)
>Reporter: jefferyyuan
>Priority: Minor
> Fix For: master (8.0)
>
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

2018-10-04 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-12833:
--

 Summary: Use timed-out lock in DistributedUpdateProcessor
 Key: SOLR-12833
 URL: https://issues.apache.org/jira/browse/SOLR-12833
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: update, UpdateRequestProcessors
Affects Versions: 7.5, master (8.0)
Reporter: jefferyyuan
 Fix For: master (8.0)


There is a synchronize block that blocks other update requests whose IDs fall 
in the same hash bucket. The update waits forever until it gets the lock at the 
synchronize block, this can be a problem in some cases.

 

Some add/update requests (for example updates with spatial/shape analysis) like 
may take time (30+ seconds or even more), this would the request time out and 
fail.

Client may retry the same requests multiple times or several minutes, this 
would make things worse.

The server side receives all the update requests but all except one can do 
nothing, have to wait there. This wastes precious memory and cpu resource.

We have seen the case 2000+ threads are blocking at the synchronize lock, and 
only a few updates are making progress. Each thread takes 3+ mb memory which 
causes OOM.

Also if the update can't get the lock in expected time range, its better to 
fail fast.

 

We can have one configuration in solrconfig.xml: 
updateHandler/versionLock/timeInMill, so users can specify how long they want 
to wait the version bucket lock.

The default value can be -1, so it behaves same - wait forever until it gets 
the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12612) Accept any key in cluster properties

2018-08-21 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587760#comment-16587760
 ] 

jefferyyuan commented on SOLR-12612:


Thanks [~janhoy], ext. is better and changed the code accordingly.

> Accept any key in cluster properties
> 
>
> Key: SOLR-12612
> URL: https://issues.apache.org/jira/browse/SOLR-12612
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.4, master (8.0)
>Reporter: jefferyyuan
>Priority: Minor
> Fix For: master (8.0)
>
>
> Cluster properties is a good place to store configuration data that's shared 
> in the whole cluster: solr and other (authorized) apps can easily read and 
> update them.
>  
> It would be very useful if we can store extra data in cluster properties 
> which would act as a centralized property management system between solr and 
> its related apps (like manager or monitor apps).
>  
> And the change would be also very simple.
> We can also require all extra property starts with prefix like: extra_
>  
> PR: https://github.com/apache/lucene-solr/pull/429
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12612) Accept any key in cluster properties

2018-08-17 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584504#comment-16584504
 ] 

jefferyyuan commented on SOLR-12612:


thanks [~tomasflobbe] and [~anshumg]

I changed the prefix to plugin. and added the tests, please check.

> Accept any key in cluster properties
> 
>
> Key: SOLR-12612
> URL: https://issues.apache.org/jira/browse/SOLR-12612
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.4, master (8.0)
>Reporter: jefferyyuan
>Priority: Minor
> Fix For: master (8.0)
>
>
> Cluster properties is a good place to store configuration data that's shared 
> in the whole cluster: solr and other (authorized) apps can easily read and 
> update them.
>  
> It would be very useful if we can store extra data in cluster properties 
> which would act as a centralized property management system between solr and 
> its related apps (like manager or monitor apps).
>  
> And the change would be also very simple.
> We can also require all extra property starts with prefix like: extra_
>  
> PR: https://github.com/apache/lucene-solr/pull/429
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-12612) Accept any key in cluster properties

2018-08-01 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-12612:
---
Description: 
Cluster properties is a good place to store configuration data that's shared in 
the whole cluster: solr and other (authorized) apps can easily read and update 
them.

 

It would be very useful if we can store extra data in cluster properties which 
would act as a centralized property management system between solr and its 
related apps (like manager or monitor apps).

 

And the change would be also very simple.

We can also require all extra property starts with prefix like: extra_

 

PR: https://github.com/apache/lucene-solr/pull/429

 

 

  was:
Cluster properties is a good place to store configuration data that's shared in 
the whole cluster: solr and other (authorized) apps can easily read and update 
them.

 

It would be very useful if we can store extra data in cluster properties which 
would act as a centralized property management system between solr and its 
related apps (like manager or monitor apps).

 

And the change would be also very simple.

We can also require all extra property starts with prefix like: extra_

 

 


> Accept any key in cluster properties
> 
>
> Key: SOLR-12612
> URL: https://issues.apache.org/jira/browse/SOLR-12612
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.4, master (8.0)
>Reporter: jefferyyuan
>Priority: Minor
> Fix For: master (8.0)
>
>
> Cluster properties is a good place to store configuration data that's shared 
> in the whole cluster: solr and other (authorized) apps can easily read and 
> update them.
>  
> It would be very useful if we can store extra data in cluster properties 
> which would act as a centralized property management system between solr and 
> its related apps (like manager or monitor apps).
>  
> And the change would be also very simple.
> We can also require all extra property starts with prefix like: extra_
>  
> PR: https://github.com/apache/lucene-solr/pull/429
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-12612) Accept any key in cluster properties

2018-08-01 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-12612:
--

 Summary: Accept any key in cluster properties
 Key: SOLR-12612
 URL: https://issues.apache.org/jira/browse/SOLR-12612
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 7.4, master (8.0)
Reporter: jefferyyuan
 Fix For: master (8.0)


Cluster properties is a good place to store configuration data that's shared in 
the whole cluster: solr and other (authorized) apps can easily read and update 
them.

 

It would be very useful if we can store extra data in cluster properties which 
would act as a centralized property management system between solr and its 
related apps (like manager or monitor apps).

 

And the change would be also very simple.

We can also require all extra property starts with prefix like: extra_

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)

2018-07-29 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561385#comment-16561385
 ] 

jefferyyuan commented on SOLR-12477:


[~varunthacker]

It makes sense(as Mockito doesn't work with newer java) , and I have reverted 
the change at DirectUpdateHandlerTest#testAddDocThrowAlreadyClosedException.

Please check and thanks.

> Return server error(500) for AlreadyClosedException instead of client 
> Errors(400)
> -
>
> Key: SOLR-12477
> URL: https://issues.apache.org/jira/browse/SOLR-12477
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: jefferyyuan
>Assignee: Varun Thacker
>Priority: Minor
>  Labels: update
> Fix For: 7.3.2, master (8.0)
>
> Attachments: SOLR-12477.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In some cases(for example: corrupt index), addDoc0 throws 
> AlreadyClosedException, but solr server returns client error 400 to client
> This will confuse customers and especially monitoring tool.
> Patch: [https://github.com/apache/lucene-solr/pull/402]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)

2018-07-27 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560599#comment-16560599
 ] 

jefferyyuan commented on SOLR-12477:


Thanks [~varunthacker]

Addressed the comments in github and changed the code as you suggested : )

> Return server error(500) for AlreadyClosedException instead of client 
> Errors(400)
> -
>
> Key: SOLR-12477
> URL: https://issues.apache.org/jira/browse/SOLR-12477
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: jefferyyuan
>Assignee: Varun Thacker
>Priority: Minor
>  Labels: update
> Fix For: 7.3.2, master (8.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In some cases(for example: corrupt index), addDoc0 throws 
> AlreadyClosedException, but solr server returns client error 400 to client
> This will confuse customers and especially monitoring tool.
> Patch: [https://github.com/apache/lucene-solr/pull/402]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)

2018-07-27 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560517#comment-16560517
 ] 

jefferyyuan commented on SOLR-12477:


thanks [~varunthacker]

Changed CoreContainer.checkTragicException(SolrCore) to return true when there 
was a tragic exception.

Please check the pr: [https://github.com/apache/lucene-solr/pull/402/files]

Thanks.

> Return server error(500) for AlreadyClosedException instead of client 
> Errors(400)
> -
>
> Key: SOLR-12477
> URL: https://issues.apache.org/jira/browse/SOLR-12477
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: jefferyyuan
>Assignee: Varun Thacker
>Priority: Minor
>  Labels: update
> Fix For: 7.3.2, master (8.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some cases(for example: corrupt index), addDoc0 throws 
> AlreadyClosedException, but solr server returns client error 400 to client
> This will confuse customers and especially monitoring tool.
> Patch: [https://github.com/apache/lucene-solr/pull/402]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)

2018-07-20 Thread jefferyyuan (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551474#comment-16551474
 ] 

jefferyyuan commented on SOLR-12477:


Thanks, [~varunthacker]

Made the change as you suggested. Please check.

Just one exception:

- corruptLeader may throw RemoteSolrException when called by test method. so 
the test code changes accordingly.

> Return server error(500) for AlreadyClosedException instead of client 
> Errors(400)
> -
>
> Key: SOLR-12477
> URL: https://issues.apache.org/jira/browse/SOLR-12477
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Affects Versions: 7.3.1, master (8.0)
>Reporter: jefferyyuan
>Assignee: Varun Thacker
>Priority: Minor
>  Labels: update
> Fix For: 7.3.2, master (8.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some cases(for example: corrupt index), addDoc0 throws 
> AlreadyClosedException, but solr server returns client error 400 to client
> This will confuse customers and especially monitoring tool.
> Patch: [https://github.com/apache/lucene-solr/pull/402]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)

2018-06-11 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-12477:
---
Environment: (was: In some cases(for example: corrupt index), addDoc0 
throws AlreadyClosedException, but solr server returns client error 400 to 
client

This will confuse customers and especially monitoring tool.

Patch: https://github.com/apache/lucene-solr/pull/402)
 Labels: update  (was: )
Description: 
In some cases(for example: corrupt index), addDoc0 throws 
AlreadyClosedException, but solr server returns client error 400 to client

This will confuse customers and especially monitoring tool.

Patch: [https://github.com/apache/lucene-solr/pull/402]

> Return server error(500) for AlreadyClosedException instead of client 
> Errors(400)
> -
>
> Key: SOLR-12477
> URL: https://issues.apache.org/jira/browse/SOLR-12477
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Affects Versions: 7.3.1, master (8.0)
>Reporter: jefferyyuan
>Priority: Minor
>  Labels: update
> Fix For: 7.3.2, master (8.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some cases(for example: corrupt index), addDoc0 throws 
> AlreadyClosedException, but solr server returns client error 400 to client
> This will confuse customers and especially monitoring tool.
> Patch: [https://github.com/apache/lucene-solr/pull/402]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)

2018-06-11 Thread jefferyyuan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-12477:
---
Environment: 
In some cases(for example: corrupt index), addDoc0 throws 
AlreadyClosedException, but solr server returns client error 400 to client

This will confuse customers and especially monitoring tool.

Patch: https://github.com/apache/lucene-solr/pull/402

  was:
In some cases(for example: corrupt index), addDoc0 throws 
AlreadyClosedException, but solr server returns client error 400 to client

This will confuse customers and especially monitoring tool.


> Return server error(500) for AlreadyClosedException instead of client 
> Errors(400)
> -
>
> Key: SOLR-12477
> URL: https://issues.apache.org/jira/browse/SOLR-12477
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Affects Versions: 7.3.1, master (8.0)
> Environment: In some cases(for example: corrupt index), addDoc0 
> throws AlreadyClosedException, but solr server returns client error 400 to 
> client
> This will confuse customers and especially monitoring tool.
> Patch: https://github.com/apache/lucene-solr/pull/402
>Reporter: jefferyyuan
>Priority: Minor
> Fix For: 7.3.2, master (8.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-12477) Return server error(500) for AlreadyClosedException instead of client Errors(400)

2018-06-11 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-12477:
--

 Summary: Return server error(500) for AlreadyClosedException 
instead of client Errors(400)
 Key: SOLR-12477
 URL: https://issues.apache.org/jira/browse/SOLR-12477
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: update
Affects Versions: 7.3.1, master (8.0)
 Environment: In some cases(for example: corrupt index), addDoc0 throws 
AlreadyClosedException, but solr server returns client error 400 to client

This will confuse customers and especially monitoring tool.
Reporter: jefferyyuan
 Fix For: 7.3.2, master (8.0)






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10885) NullPointerException when run collapse filter

2017-11-17 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257855#comment-16257855
 ] 

jefferyyuan commented on SOLR-10885:


Thanks, It make senses.
Could we update the doc to explicitly state that: The collapse parser only 
supports collapsing on one field.
- This can avoid others to misuse it and later wonder why it doesn't work. 

> NullPointerException when run collapse filter 
> --
>
> Key: SOLR-10885
> URL: https://issues.apache.org/jira/browse/SOLR-10885
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 6.4.1
>Reporter: jefferyyuan
>Assignee: Varun Thacker
>Priority: Critical
>
> Solr collapse is a great function to collapse data that is related so we only 
> show one in search result.
> Just found one issue related with it - It throw NullPointerException in some 
> cases.
> To reproduce it, first ingest some data - AND commit multiple times.
> 1. When there is no data that matches the query:
> http://localhost:8983/solr/thecollection/select?defType=edismax&q=non-existType:*&fq={!collapse
>  field=seriesId nullPolicy=expand}&fq={!collapse field=programId 
> nullPolicy=expand}
> - But the problem only happens if I use both collapse fqs, if I just use one 
> of them, it would be fine.
> *2. When the data that matches the query doesn't have the collapse fields
> - This is kind of a big problem as we may store different kinds of docs in 
> one collection, one query may match different kinds of docs. 
> If some docs (docType1) have same value for  field1, we want to collapse 
> them, if other dosc(docType2) have some value for field2, do same things.*
> - channel data doesn't have seriesId or programId
> http://localhost:8983/solr/thecollection/select?defType=edismax&q=docType:channel&fq={!collapse
>  field=seriesId nullPolicy=expand}&fq={!collapse field=programId 
> nullPolicy=expand}
> - But the problem only happens if I use both collapse fqs, if I just use one 
> of them, it would be fine.
> Exception from log:
> Caused by: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://localhost:8983/solr/searchItems_shard1_replica3: 
> java.lang.NullPointerException
>   at 
> org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:617)
>   at 
> org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:667)
>   at 
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:256)
>   at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1823)
>   at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1640)
>   at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:611)
>   at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:533)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl

[jira] [Comment Edited] (SOLR-6205) Make SolrCloud Data-center, rack or zone aware

2017-10-06 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191698#comment-16191698
 ] 

jefferyyuan edited comment on SOLR-6205 at 10/6/17 9:56 PM:


Seem this (at least part of) function has already been in Solr.
Rule-based Replica Placement:
http://lucene.apache.org/solr/guide/7_0/rule-based-replica-placement.html
https://issues.apache.org/jira/browse/SOLR-6220



was (Author: yuanyun.cn):
Make Solr rack awareness can help prevent data loss and improve query 
performance.
Elastic-search already supported it:
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/allocation-awareness.html

And a lot of projects support this: Hadoop, Cassandra, Kafka etc.

> Make SolrCloud Data-center, rack or zone aware
> --
>
> Key: SOLR-6205
> URL: https://issues.apache.org/jira/browse/SOLR-6205
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.8.1
>Reporter: Arcadius Ahouansou
>Assignee: Noble Paul
>
> Use case:
> Let's say we have SolrCloud deployed across 2 Datacenters, racks or zones A 
> and B
> There is a need to have a SolrCloud deployment that will make it possible to 
> have a working system even if one of the Datacenter/rack/zone A or B is lost.
> - This has been discussed on the mailing list at
> http://lucene.472066.n3.nabble.com/SolrCloud-multiple-data-center-support-td4115097.html
> and there are many workarounds that require adding more moving parts to the 
> system.
> - On the above thread, Daniel Collins mentioned  
> https://issues.apache.org/jira/browse/ZOOKEEPER-107 
>  which could help solve this issue.
> - Note that this is a very important feature that is overlooked most of the 
> time.
> - Note that this feature is available in ElasticSearch.
> See 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness
> and
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#forced-awareness



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6205) Make SolrCloud Data-center, rack or zone aware

2017-10-04 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191698#comment-16191698
 ] 

jefferyyuan edited comment on SOLR-6205 at 10/4/17 5:53 PM:


Make Solr rack awareness can help prevent data loss and improve query 
performance.
Elastic-search already supported it:
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/allocation-awareness.html

And a lot of projects support this: Hadoop, Cassandra, Kafka etc.


was (Author: yuanyun.cn):
Make Solr rack awareness can help prevent data loss and improve query 
performance.
Elastic-search already supported it:
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/allocation-awareness.html

> Make SolrCloud Data-center, rack or zone aware
> --
>
> Key: SOLR-6205
> URL: https://issues.apache.org/jira/browse/SOLR-6205
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.8.1
>Reporter: Arcadius Ahouansou
>Assignee: Noble Paul
>
> Use case:
> Let's say we have SolrCloud deployed across 2 Datacenters, racks or zones A 
> and B
> There is a need to have a SolrCloud deployment that will make it possible to 
> have a working system even if one of the Datacenter/rack/zone A or B is lost.
> - This has been discussed on the mailing list at
> http://lucene.472066.n3.nabble.com/SolrCloud-multiple-data-center-support-td4115097.html
> and there are many workarounds that require adding more moving parts to the 
> system.
> - On the above thread, Daniel Collins mentioned  
> https://issues.apache.org/jira/browse/ZOOKEEPER-107 
>  which could help solve this issue.
> - Note that this is a very important feature that is overlooked most of the 
> time.
> - Note that this feature is available in ElasticSearch.
> See 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness
> and
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#forced-awareness



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6205) Make SolrCloud Data-center, rack or zone aware

2017-10-04 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191698#comment-16191698
 ] 

jefferyyuan commented on SOLR-6205:
---

Make Solr rack awareness can help prevent data loss and improve query 
performance.
Elastic-search already supported it:
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/allocation-awareness.html

> Make SolrCloud Data-center, rack or zone aware
> --
>
> Key: SOLR-6205
> URL: https://issues.apache.org/jira/browse/SOLR-6205
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.8.1
>Reporter: Arcadius Ahouansou
>Assignee: Noble Paul
>
> Use case:
> Let's say we have SolrCloud deployed across 2 Datacenters, racks or zones A 
> and B
> There is a need to have a SolrCloud deployment that will make it possible to 
> have a working system even if one of the Datacenter/rack/zone A or B is lost.
> - This has been discussed on the mailing list at
> http://lucene.472066.n3.nabble.com/SolrCloud-multiple-data-center-support-td4115097.html
> and there are many workarounds that require adding more moving parts to the 
> system.
> - On the above thread, Daniel Collins mentioned  
> https://issues.apache.org/jira/browse/ZOOKEEPER-107 
>  which could help solve this issue.
> - Note that this is a very important feature that is overlooked most of the 
> time.
> - Note that this feature is available in ElasticSearch.
> See 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#allocation-awareness
> and
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html#forced-awareness



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-10950) Support context filtering for FuzzyLookupFactory

2017-06-23 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-10950:
--

 Summary: Support context filtering for FuzzyLookupFactory
 Key: SOLR-10950
 URL: https://issues.apache.org/jira/browse/SOLR-10950
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Suggester
 Environment: FuzzyLookupFactory is great as it can still find matches 
even if users mis-spell. 

Context filtering is also great, as we can only show suggestions based on 
user's languages, doc types etc 

But it's a pity that (seems) FuzzyLookupFactory and context filtering don't 
work together. 

>From 
>http://lucene.472066.n3.nabble.com/Is-it-possible-to-support-context-filtering-for-FuzzyLookupFactory-td4342051.html
Reporter: jefferyyuan
Priority: Critical
 Fix For: 6.6.1






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10928) Support elevate.q in QueryElevationComponent

2017-06-20 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-10928:
---
Summary: Support elevate.q in QueryElevationComponent  (was: Support 
elevate.q () in QueryElevationComponent)

> Support elevate.q in QueryElevationComponent
> 
>
> Key: SOLR-10928
> URL: https://issues.apache.org/jira/browse/SOLR-10928
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Reporter: jefferyyuan
>Priority: Critical
> Fix For: 6.6.1
>
>
> QueryElevationComponent uses the query in parameter to match the elevate.xml.
> "query text" from elevate.xml 
> : has to match the query (q=...). So in this case, elevation works only for 
> : http://localhost:8080/solr/elevate?q=brain, but not for 
> : http://localhost:8080/solr/elevate?q=indexingabstract:brain type of 
> queries. 
> But sometimes, the query is more complex, we may use some nested query or 
> complexphrase.
> it would also be fairly easy to make QEC support an "elevate.q"  param 
> similar to how there is a "spellcheck.q" param and a "hl.q" param to  let the 
> client specify an alternate, simplified, string for the feature to  use.
> Conten copied from:
> http://lucene.472066.n3.nabble.com/Problems-with-elevation-component-configuration-td3993204.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-10928) Support elevate.q () in QueryElevationComponent

2017-06-20 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-10928:
--

 Summary: Support elevate.q () in QueryElevationComponent
 Key: SOLR-10928
 URL: https://issues.apache.org/jira/browse/SOLR-10928
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SearchComponents - other
Reporter: jefferyyuan
Priority: Critical
 Fix For: 6.6.1


QueryElevationComponent uses the query in parameter to match the elevate.xml.

"query text" from elevate.xml 
: has to match the query (q=...). So in this case, elevation works only for 
: http://localhost:8080/solr/elevate?q=brain, but not for 
: http://localhost:8080/solr/elevate?q=indexingabstract:brain type of queries. 

But sometimes, the query is more complex, we may use some nested query or 
complexphrase.

it would also be fairly easy to make QEC support an "elevate.q"  param similar 
to how there is a "spellcheck.q" param and a "hl.q" param to  let the client 
specify an alternate, simplified, string for the feature to  use.

Conten copied from:
http://lucene.472066.n3.nabble.com/Problems-with-elevation-component-configuration-td3993204.html




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10927) Support position to context.xml in Query Elevation Component

2017-06-20 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-10927:
---
Summary: Support position to context.xml in Query Elevation Component  
(was: Suuport position to context.xml in Query Elevation Component)

> Support position to context.xml in Query Elevation Component
> 
>
> Key: SOLR-10927
> URL: https://issues.apache.org/jira/browse/SOLR-10927
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Reporter: jefferyyuan
> Fix For: 6.7, 6.6.1
>
>
> Query Elevation Component is useful but is kind of limited.
> Usually we want to boost one document for one query, but not necessary put at 
> the first one.
> For example, user searches walking dead - we want to boost our shows "Deadly 
> Vampire", but we don't want to show it as the first one - as that may piss 
> off the user.
> We want to show "Deadly Vampire" at the 2nd or maybe 3rd, 4th position.
> Seems at Editorial Query Boosting Component 
> [https://issues.apache.org/jira/browse/SOLR-418], the original draft 
> implementation actually supports it - the priority property.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-10927) Suuport position to context.xml in Query Elevation Component

2017-06-20 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-10927:
--

 Summary: Suuport position to context.xml in Query Elevation 
Component
 Key: SOLR-10927
 URL: https://issues.apache.org/jira/browse/SOLR-10927
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SearchComponents - other
Reporter: jefferyyuan
 Fix For: 6.7, 6.6.1


Query Elevation Component is useful but is kind of limited.

Usually we want to boost one document for one query, but not necessary put at 
the first one.

For example, user searches walking dead - we want to boost our shows "Deadly 
Vampire", but we don't want to show it as the first one - as that may piss off 
the user.
We want to show "Deadly Vampire" at the 2nd or maybe 3rd, 4th position.

Seems at Editorial Query Boosting Component 
[https://issues.apache.org/jira/browse/SOLR-418], the original draft 
implementation actually supports it - the priority property.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6092) Provide a REST managed QueryElevationComponent

2017-06-20 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810679#comment-15810679
 ] 

jefferyyuan edited comment on SOLR-6092 at 6/20/17 7:14 PM:


Vote for this.

We can manage stop words, synonyms, why not QueryElevation which are much more 
useful.
Also the content in elevate.xml - what we want to upselll for different query 
changed very frequently

Thanks


was (Author: yuanyun.cn):
Vote for this.

We can manage stop words, synonyms, why not QueryElevation which are much more 
useful.

Thanks

> Provide a REST managed QueryElevationComponent
> --
>
> Key: SOLR-6092
> URL: https://issues.apache.org/jira/browse/SOLR-6092
> Project: Solr
>  Issue Type: New Feature
>Reporter: Timothy Potter
>Priority: Minor
>
> Provide a managed query elevation component to allow CRUD operations from a 
> REST API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10885) NullPointerException when run collapse filter

2017-06-13 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-10885:
---
Description: 
Solr collapse is a great function to collapse data that is related so we only 
show one in search result.

Just found one issue related with it - It throw NullPointerException in some 
cases.

To reproduce it, first ingest some data - AND commit multiple times.

1. When there is no data that matches the query:
http://localhost:8983/solr/thecollection/select?defType=edismax&q=non-existType:*&fq={!collapse
 field=seriesId nullPolicy=expand}&fq={!collapse field=programId 
nullPolicy=expand}

- But the problem only happens if I use both collapse fqs, if I just use one of 
them, it would be fine.

*2. When the data that matches the query doesn't have the collapse fields
- This is kind of a big problem as we may store different kinds of docs in one 
collection, one query may match different kinds of docs. 
If some docs (docType1) have same value for  field1, we want to collapse them, 
if other dosc(docType2) have some value for field2, do same things.*
- channel data doesn't have seriesId or programId
http://localhost:8983/solr/thecollection/select?defType=edismax&q=docType:channel&fq={!collapse
 field=seriesId nullPolicy=expand}&fq={!collapse field=programId 
nullPolicy=expand}

- But the problem only happens if I use both collapse fqs, if I just use one of 
them, it would be fine.

Exception from log:
Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://localhost:8983/solr/searchItems_shard1_replica3: 
java.lang.NullPointerException
at 
org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:617)
at 
org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:667)
at 
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:256)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1823)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1640)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:611)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:533)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)

int nextDocBase = currentContext + 1 < this.contexts.length ? 
this.contexts[(currentContext + 1)].docBase : this.maxDoc; - 617 from solr 
6.4.1 CollapsingQParserPlugin.java

Seems related with https://issues.apache.org/jira/browse/SOLR-8807
- But SOLR-8807 only fixes issue related with spell checker.

I may test this with latest solr 6.6.0 when I have time.

Updated:
Whether solr supports multiple collapse fields?
- Seems the query occasionally works (1/10 maybe), but othertimes it throws 
NullPointerException
http://localhost:18983/solr/thecollection/select?q=programId:* AND 
id:*&defType=edismax&fq={!collapse+field=id }&fq={!collapse+field=programId }

  was:
Solr collapse is a great function to collapse data that is rela

[jira] [Updated] (SOLR-10885) NullPointerException when run collapse filter

2017-06-13 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-10885:
---
Description: 
Solr collapse is a great function to collapse data that is related so we only 
show one in search result.

Just found one issue related with it - It throw NullPointerException in some 
cases.

To reproduce it, first ingest some data - AND commit multiple times.

1. When there is no data that matches the query:
http://localhost:8983/solr/thecollection/select?defType=edismax&q=non-existType:*&fq={!collapse
 field=seriesId nullPolicy=expand}&fq={!collapse field=programId 
nullPolicy=expand}

- But the problem only happens if I use both collapse fqs, if I just use one of 
them, it would be fine.

*2. When the data that matches the query doesn't have the collapse fields
- This is kind of a big problem as we may store different kinds of docs in one 
collection, one query may match different kinds of docs. 
If some docs (docType1) have same value for  field1, we want to collapse them, 
if other dosc(docType2) have some value for field2, do same things.*
- channel data doesn't have seriesId or programId
http://localhost:8983/solr/thecollection/select?defType=edismax&q=docType:channel&fq={!collapse
 field=seriesId nullPolicy=expand}&fq={!collapse field=programId 
nullPolicy=expand}

- But the problem only happens if I use both collapse fqs, if I just use one of 
them, it would be fine.

Exception from log:
Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://localhost:8983/solr/searchItems_shard1_replica3: 
java.lang.NullPointerException
at 
org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:617)
at 
org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:667)
at 
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:256)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1823)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1640)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:611)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:533)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)

int nextDocBase = currentContext + 1 < this.contexts.length ? 
this.contexts[(currentContext + 1)].docBase : this.maxDoc; - 617 from solr 
6.4.1 CollapsingQParserPlugin.java

Seems related with https://issues.apache.org/jira/browse/SOLR-8807
- But SOLR-8807 only fixes issue related with spell checker.

I may test this with latest solr 6.6.0 when I have time.


  was:
Solr collapse is a great function to collapse data that is related so we only 
show one in search result.

Just found one issue related with it - It throw NullPointerException in some 
cases.

To reproduce it, first ingest some data - AND commit multiple times.

1. When there is no data that matches the query:
http://localhost:8983/solr/thecollection/select?defT

[jira] [Created] (SOLR-10885) NullPointerException when run collapse filter

2017-06-13 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-10885:
--

 Summary: NullPointerException when run collapse filter 
 Key: SOLR-10885
 URL: https://issues.apache.org/jira/browse/SOLR-10885
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: search
Affects Versions: 6.4.1
Reporter: jefferyyuan
Priority: Critical


Solr collapse is a great function to collapse data that is related so we only 
show one in search result.

Just found one issue related with it - It throw NullPointerException in some 
cases.

To reproduce it, first ingest some data - AND commit multiple times.

1. When there is no data that matches the query:
http://localhost:8983/solr/thecollection/select?defType=edismax&q=non-existType:*&fq={!collapse
 field=seriesId nullPolicy=expand}&fq={!collapse field=programId 
nullPolicy=expand}

- But the problem only happens if I use both collapse fqs, if I just use one of 
them, it would be fine.

2. When the data that matches the query doesn't have the collapse fields
- channel data doesn't have seriesId or programId
http://localhost:8983/solr/thecollection/select?defType=edismax&q=docType:channel&fq={!collapse
 field=seriesId nullPolicy=expand}&fq={!collapse field=programId 
nullPolicy=expand}

- But the problem only happens if I use both collapse fqs, if I just use one of 
them, it would be fine.

Exception from log:
Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://localhost:8983/solr/searchItems_shard1_replica3: 
java.lang.NullPointerException
at 
org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:617)
at 
org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.finish(CollapsingQParserPlugin.java:667)
at 
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:256)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1823)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1640)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:611)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:533)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)

int nextDocBase = currentContext + 1 < this.contexts.length ? 
this.contexts[(currentContext + 1)].docBase : this.maxDoc; - 617 from solr 
6.4.1 CollapsingQParserPlugin.java

Seems related with https://issues.apache.org/jira/browse/SOLR-8807
- But SOLR-8807 only fixes issue related with spell checker.

I may test this with latest solr 6.6.0 when I have time.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6096) Support Update and Delete on nested documents

2017-05-13 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009519#comment-16009519
 ] 

jefferyyuan commented on SOLR-6096:
---

[~mkhludnev]
How can we specify childfree=true when use solrJ?
- Seems there is no childfree in solr's code at all.
https://github.com/apache/lucene-solr/search?utf8=%E2%9C%93&q=childfree&type=

How can we define the special purposed /blockupdate/ handler with explicit 
block semantics for all case above ?

Thanks a lot.

> Support Update and Delete on nested documents
> -
>
> Key: SOLR-6096
> URL: https://issues.apache.org/jira/browse/SOLR-6096
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.7.2
>Reporter: Thomas Scheffler
>  Labels: blockjoin, nested
>
> When using nested or child document. Update and delete operation on the root 
> document should also affect the nested documents, as no child can exist 
> without its parent :-)
> Example
> {code:xml|title=First Import}
> 
>   1
>   Article with author
>   
> Smith, John
> author
>   
> 
> {code}
> If I change my mind and the author was not named *John* but *_Jane_*:
> {code:xml|title=Changed name of author of '1'}
> 
>   1
>   Article with author
>   
> Smith, Jane
> author
>   
> 
> {code}
> I would expect that John is not in the index anymore. Currently he is. There 
> might also be the case that any subdocument is removed by an update:
> {code:xml|title=Remove author}
> 
>   1
>   Article without author
> 
> {code}
> This should affect a delete on all nested documents, too. The same way all 
> nested documents should be deleted if I delete the root document:
> {code:xml|title=Deletion of '1'}
> 
>   1
>   
> 
> {code}
> This is currently possible to do all this stuff on client side by issuing 
> additional request to delete document before every update. It would be more 
> efficient if this could be handled on SOLR side. One would benefit on atomic 
> update. The biggest plus shows when using "delete-by-query". 
> {code:xml|title=Deletion of '1' by query}
> 
>   title:*
>   
> 
> {code}
> In that case one would not have to first query all documents and issue 
> deletes by those id and every document that are nested.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester

2017-02-07 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856300#comment-15856300
 ] 

jefferyyuan commented on SOLR-6246:
---

This is great news. Thanks so much [~steve_rowe] for clarifying my questions.
- I should do more search before asking here. My fault. 
I do read release notes for Solr 6.4.1 but not lucene 6.4.1 - which I should.
Also I should have read the issue links this Jira depends on.

> Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
> 
>
> Key: SOLR-6246
> URL: https://issues.apache.org/jira/browse/SOLR-6246
> Project: Solr
>  Issue Type: Sub-task
>  Components: SearchComponents - other
>Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4
>Reporter: Varun Thacker
>Assignee: Steve Rowe
> Fix For: 6.5, master (7.0)
>
> Attachments: SOLR-6246.patch, SOLR-6246.patch, SOLR-6246-test.patch, 
> SOLR-6246-test.patch, SOLR-6246-test.patch, SOLR-6246-test.patch
>
>
> LUCENE-5477 - added near-real-time suggest building to 
> AnalyzingInfixSuggester. One of the changes that went in was a writer is 
> persisted now to support real time updates via the add() and update() methods.
> When we call Solr's reload command, a new instance of AnalyzingInfixSuggester 
> is created. When trying to create a new writer on the same Directory a lock 
> cannot be obtained and Solr fails to reload the core.
> Also when AnalyzingInfixLookupFactory throws a RuntimeException we should 
> pass along the original message.
> I am not sure what should be the approach to fix it. Should we have a 
> reloadHook where we close the writer?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester

2017-02-07 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855637#comment-15855637
 ] 

jefferyyuan edited comment on SOLR-6246 at 2/7/17 9:34 AM:
---

Thanks [~steve_rowe]
I am wondering is there any plan to also fix this issue in 6.4.x version? 
This fix is so valuable, without this we can't really use 
AnalyzingInfixSuggester - as we always reload the collections to update schema 
or config etc.

And it takes time to release 6.5 - usually several(2 or 3) months.


was (Author: yuanyun.cn):
Thanks [~steve_rowe]
I am wondering is there any plan to also fix this issue in 6.4.x version? 
This fix is so valuable, without this we can't really use 
AnalyzingInfixSuggester - as we always reload the collections to update schema 
or config etc.

> Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
> 
>
> Key: SOLR-6246
> URL: https://issues.apache.org/jira/browse/SOLR-6246
> Project: Solr
>  Issue Type: Sub-task
>  Components: SearchComponents - other
>Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4
>Reporter: Varun Thacker
>Assignee: Steve Rowe
> Fix For: 6.5, master (7.0)
>
> Attachments: SOLR-6246.patch, SOLR-6246.patch, SOLR-6246-test.patch, 
> SOLR-6246-test.patch, SOLR-6246-test.patch, SOLR-6246-test.patch
>
>
> LUCENE-5477 - added near-real-time suggest building to 
> AnalyzingInfixSuggester. One of the changes that went in was a writer is 
> persisted now to support real time updates via the add() and update() methods.
> When we call Solr's reload command, a new instance of AnalyzingInfixSuggester 
> is created. When trying to create a new writer on the same Directory a lock 
> cannot be obtained and Solr fails to reload the core.
> Also when AnalyzingInfixLookupFactory throws a RuntimeException we should 
> pass along the original message.
> I am not sure what should be the approach to fix it. Should we have a 
> reloadHook where we close the writer?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester

2017-02-07 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855637#comment-15855637
 ] 

jefferyyuan commented on SOLR-6246:
---

Thanks [~steve_rowe]
I am wondering is there any plan to also fix this issue in 6.4.x version? 
This fix is so valuable, without this we can't really use 
AnalyzingInfixSuggester - as we always reload the collections to update schema 
or config etc.

> Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
> 
>
> Key: SOLR-6246
> URL: https://issues.apache.org/jira/browse/SOLR-6246
> Project: Solr
>  Issue Type: Sub-task
>  Components: SearchComponents - other
>Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4
>Reporter: Varun Thacker
>Assignee: Steve Rowe
> Fix For: 6.5, master (7.0)
>
> Attachments: SOLR-6246.patch, SOLR-6246.patch, SOLR-6246-test.patch, 
> SOLR-6246-test.patch, SOLR-6246-test.patch, SOLR-6246-test.patch
>
>
> LUCENE-5477 - added near-real-time suggest building to 
> AnalyzingInfixSuggester. One of the changes that went in was a writer is 
> persisted now to support real time updates via the add() and update() methods.
> When we call Solr's reload command, a new instance of AnalyzingInfixSuggester 
> is created. When trying to create a new writer on the same Directory a lock 
> cannot be obtained and Solr fails to reload the core.
> Also when AnalyzingInfixLookupFactory throws a RuntimeException we should 
> pass along the original message.
> I am not sure what should be the approach to fix it. Should we have a 
> reloadHook where we close the writer?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester

2017-02-02 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851104#comment-15851104
 ] 

jefferyyuan commented on SOLR-6246:
---

First I reproduce the issue in current 6.4.
Then verified the 6.4.1 release candidate fixed the issue.

Thanks for solving this issue and looking for 6.4.1 release. [~steve_rowe]

> Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
> 
>
> Key: SOLR-6246
> URL: https://issues.apache.org/jira/browse/SOLR-6246
> Project: Solr
>  Issue Type: Sub-task
>  Components: SearchComponents - other
>Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4
>Reporter: Varun Thacker
> Attachments: SOLR-6246.patch, SOLR-6246-test.patch, 
> SOLR-6246-test.patch, SOLR-6246-test.patch, SOLR-6246-test.patch
>
>
> LUCENE-5477 - added near-real-time suggest building to 
> AnalyzingInfixSuggester. One of the changes that went in was a writer is 
> persisted now to support real time updates via the add() and update() methods.
> When we call Solr's reload command, a new instance of AnalyzingInfixSuggester 
> is created. When trying to create a new writer on the same Directory a lock 
> cannot be obtained and Solr fails to reload the core.
> Also when AnalyzingInfixLookupFactory throws a RuntimeException we should 
> pass along the original message.
> I am not sure what should be the approach to fix it. Should we have a 
> reloadHook where we close the writer?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester

2017-01-13 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822691#comment-15822691
 ] 

jefferyyuan edited comment on SOLR-6246 at 1/14/17 5:15 AM:


I tested with the latest build - solr-6.4.0-222,reloading collection/cores with 
AnalyzingInfixSuggester still failed with LockObtainFailedException.
It failed with same error even after after solr.

It can be easily reproduced, add a suggest component, then ***build the 
suggester***: suggest?suggest.build=true. Then reload collection or cores.

- Seems the key to reproduce the issue is we need build the suggester.



infixSuggester
BlendedInfixLookupFactory
DocumentDictionaryFactory
position_linear
suggester
suggesterContextField
4
textSuggest
infix_suggestions
true
false
false





true
infixSuggester
true
10
true


suggest




was (Author: yuanyun.cn):
I tested with the latest build - solr-6.4.0-222,reloading collection/cores with 
AnalyzingInfixSuggester still failed with LockObtainFailedException.
It failed with same error even after after solr.
It can be easily reproduced, add a suggest component, then build the suggester: 
suggest?suggest.build=true. Then reload collection or cores/



infixSuggester
BlendedInfixLookupFactory
DocumentDictionaryFactory
position_linear
suggester
suggesterContextField
4
textSuggest
infix_suggestions
true
false
false





true
infixSuggester
true
10
true


suggest



> Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
> 
>
> Key: SOLR-6246
> URL: https://issues.apache.org/jira/browse/SOLR-6246
> Project: Solr
>  Issue Type: Sub-task
>  Components: SearchComponents - other
>Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4
>Reporter: Varun Thacker
> Attachments: SOLR-6246.patch, SOLR-6246-test.patch, 
> SOLR-6246-test.patch, SOLR-6246-test.patch
>
>
> LUCENE-5477 - added near-real-time suggest building to 
> AnalyzingInfixSuggester. One of the changes that went in was a writer is 
> persisted now to support real time updates via the add() and update() methods.
> When we call Solr's reload command, a new instance of AnalyzingInfixSuggester 
> is created. When trying to create a new writer on the same Directory a lock 
> cannot be obtained and Solr fails to reload the core.
> Also when AnalyzingInfixLookupFactory throws a RuntimeException we should 
> pass along the original message.
> I am not sure what should be the approach to fix it. Should we have a 
> reloadHook where we close the writer?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester

2017-01-13 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822691#comment-15822691
 ] 

jefferyyuan commented on SOLR-6246:
---

I tested with the latest build - solr-6.4.0-222,reloading collection/cores with 
AnalyzingInfixSuggester still failed with LockObtainFailedException.
It failed with same error even after after solr.
It can be easily reproduced, add a suggest component, then build the suggester: 
suggest?suggest.build=true. Then reload collection or cores/



infixSuggester
BlendedInfixLookupFactory
DocumentDictionaryFactory
position_linear
suggester
suggesterContextField
4
textSuggest
infix_suggestions
true
false
false





true
infixSuggester
true
10
true


suggest



> Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
> 
>
> Key: SOLR-6246
> URL: https://issues.apache.org/jira/browse/SOLR-6246
> Project: Solr
>  Issue Type: Sub-task
>  Components: SearchComponents - other
>Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4
>Reporter: Varun Thacker
> Attachments: SOLR-6246.patch, SOLR-6246-test.patch, 
> SOLR-6246-test.patch, SOLR-6246-test.patch
>
>
> LUCENE-5477 - added near-real-time suggest building to 
> AnalyzingInfixSuggester. One of the changes that went in was a writer is 
> persisted now to support real time updates via the add() and update() methods.
> When we call Solr's reload command, a new instance of AnalyzingInfixSuggester 
> is created. When trying to create a new writer on the same Directory a lock 
> cannot be obtained and Solr fails to reload the core.
> Also when AnalyzingInfixLookupFactory throws a RuntimeException we should 
> pass along the original message.
> I am not sure what should be the approach to fix it. Should we have a 
> reloadHook where we close the writer?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester

2017-01-13 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822377#comment-15822377
 ] 

jefferyyuan commented on SOLR-6246:
---

>From 
>https://builds.apache.org/job/Solr-Artifacts-6.x/lastSuccessfulBuild/artifact/solr/package/
- it was build 195 when I downloaded at that time

I will try the newest build and check whether this works

> Core fails to reload when AnalyzingInfixSuggester is used as a Suggester
> 
>
> Key: SOLR-6246
> URL: https://issues.apache.org/jira/browse/SOLR-6246
> Project: Solr
>  Issue Type: Sub-task
>  Components: SearchComponents - other
>Affects Versions: 4.8, 4.8.1, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4
>Reporter: Varun Thacker
> Attachments: SOLR-6246.patch, SOLR-6246-test.patch, 
> SOLR-6246-test.patch, SOLR-6246-test.patch
>
>
> LUCENE-5477 - added near-real-time suggest building to 
> AnalyzingInfixSuggester. One of the changes that went in was a writer is 
> persisted now to support real time updates via the add() and update() methods.
> When we call Solr's reload command, a new instance of AnalyzingInfixSuggester 
> is created. When trying to create a new writer on the same Directory a lock 
> cannot be obtained and Solr fails to reload the core.
> Also when AnalyzingInfixLookupFactory throws a RuntimeException we should 
> pass along the original message.
> I am not sure what should be the approach to fix it. Should we have a 
> reloadHook where we close the writer?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6246) Core fails to reload when AnalyzingInfixSuggester is used as a Suggester

2017-01-13 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822340#comment-15822340
 ] 

jefferyyuan commented on SOLR-6246:
---

I am running on Solr 6.4 - solr-6.4.0-195.
But the problem still exists. Even restarting solr doesn't work - after restart 
solr and reload collection or current node still fails with 
LockObtainFailedException.

I even tried to manually delete the write.lock, then call 
reload-collection/cores , it still failed again with same error.

INFO  - 2017-01-12 16:55:42.392; [c:myCollection s:shard2 r:core_node3 
x:searchItems_shard2_replica1] org.apache.solr.servlet.HttpSolrCall; [admin] 
webapp=null path=/admin/cores 
params={core=searchItems_shard2_replica1&qt=/admin/cores&action=RELOAD&wt=javabin&version=2}
 status=500 QTime=592
ERROR - 2017-01-12 16:55:42.393; [c:myCollection s:shard2 r:core_node3 
x:searchItems_shard2_replica1] org.apache.solr.common.SolrException; 
null:org.apache.solr.common.SolrException: Error handling 'reload' action
at 
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:114)
at 
org.apache.solr.handler.admin.CoreAdminOperation$$Lambda$23/265321659.execute(Unknown
 Source)
at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:377)
at 
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:365)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:156)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:152)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:445)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Unable to reload core 
[searchItems_shard2_replica1]
at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:950)
at 
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:112)
... 34 more
Caused by: org.apache.solr.common.SolrException: 
org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual 
machine: 
/Applications/solr-6.4.0/example/cloud/node2/solr/searchItems_shard2_replica1/data/infix_suggestions/write.lock
at org.apache.solr.core.SolrCore.(SolrCore.java:899)

[jira] [Closed] (LUCENE-7625) Support Multiple (AND) Context Filter Query in Suggestor

2017-01-11 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan closed LUCENE-7625.
---
Resolution: Not A Problem

Already supported in Lucene/Solr:
http://stackoverflow.com/questions/36079395/how-to-configure-multiple-contextfields-in-single-solr-suggester

> Support Multiple (AND) Context Filter Query in Suggestor
> 
>
> Key: LUCENE-7625
> URL: https://issues.apache.org/jira/browse/LUCENE-7625
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/suggest
>Reporter: jefferyyuan
>  Labels: lucene, solr, suggester
>
> Just as the normal query, usually we want to use multiple filter query when 
> run auto-completion.
> It would be great if suggestor can return (title of) doc that is meaningful 
> to the current user where we need multiple filters.
> Thanks
> Jeffery Yuan



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7625) Support Multiple (AND) Context Filter Query in Suggestor

2017-01-11 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819194#comment-15819194
 ] 

jefferyyuan commented on LUCENE-7625:
-

My mistake
Lucene/Solr actually already support this.
http://stackoverflow.com/questions/36079395/how-to-configure-multiple-contextfields-in-single-solr-suggester

> Support Multiple (AND) Context Filter Query in Suggestor
> 
>
> Key: LUCENE-7625
> URL: https://issues.apache.org/jira/browse/LUCENE-7625
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/suggest
>Reporter: jefferyyuan
>  Labels: lucene, solr, suggester
>
> Just as the normal query, usually we want to use multiple filter query when 
> run auto-completion.
> It would be great if suggestor can return (title of) doc that is meaningful 
> to the current user where we need multiple filters.
> Thanks
> Jeffery Yuan



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-7625) Support Multiple (AND) Context Filter Query in Suggestor

2017-01-10 Thread jefferyyuan (JIRA)

jefferyyuan created LUCENE-7625:
---

 Summary: Support Multiple (AND) Context Filter Query in Suggestor
 Key: LUCENE-7625
 URL: https://issues.apache.org/jira/browse/LUCENE-7625
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/suggest
Reporter: jefferyyuan


Just as the normal query, usually we want to use multiple filter query when run 
auto-completion.

It would be great if suggestor can return (title of) doc that is meaningful to 
the current user where we need multiple filters.

Thanks
Jeffery Yuan



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6092) Provide a REST managed QueryElevationComponent

2017-01-08 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810679#comment-15810679
 ] 

jefferyyuan commented on SOLR-6092:
---

Vote for this.

We can manage stop words, synonyms, why not QueryElevation which are much more 
useful.

Thanks

> Provide a REST managed QueryElevationComponent
> --
>
> Key: SOLR-6092
> URL: https://issues.apache.org/jira/browse/SOLR-6092
> Project: Solr
>  Issue Type: New Feature
>Reporter: Timothy Potter
>Priority: Minor
>
> Provide a managed query elevation component to allow CRUD operations from a 
> REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-9929) Documentation and sample code about how to train the model using user clicks when use ltr module

2017-01-05 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-9929:
--
Summary: Documentation and sample code about how to train the model using 
user clicks when use ltr module  (was: Documentation and smaple code about how 
to train the model using user clicks when use ltr module)

> Documentation and sample code about how to train the model using user clicks 
> when use ltr module
> 
>
> Key: SOLR-9929
> URL: https://issues.apache.org/jira/browse/SOLR-9929
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: jefferyyuan
>  Labels: learning-to-rank, machine_learning, solr
>
> Thanks very much for integrating machine learning to Solr.
> https://issues.apache.org/jira/browse/SOLR-8542
> I tried to integrate it. But have difficult figuring out how to translate the 
> partial pairwise feedback to the importance or relevance of that doc.
> https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md
> In the Assemble training data part: the third column indicates the relative 
> importance or relevance of that doc
> Could you please give more info about how to give a score based on what user 
> clicks?
> I have read 
> https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
> http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
> http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html
> But still have no clue yet.
> From a user's perspective, the steps such as setup the feature and model in 
> Solr is simple, but collecting the feedback data and train/update the model 
> is much more complex. Without it, we can't really use the learning-to-rank 
> function in Solr.
> It would be great if Solr can provide some detailed instruction and sample 
> code about how to translate the partial pairwise feedback and use it to train 
> and update model.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-9929) Documentation and smaple code about how to train the model using user clicks when use ltr module

2017-01-05 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-9929:
-

 Summary: Documentation and smaple code about how to train the 
model using user clicks when use ltr module
 Key: SOLR-9929
 URL: https://issues.apache.org/jira/browse/SOLR-9929
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: jefferyyuan


Thanks very much for integrating machine learning to Solr.
https://issues.apache.org/jira/browse/SOLR-8542

I tried to integrate it. But have difficult figuring out how to translate the 
partial pairwise feedback to the importance or relevance of that doc.

https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md
In the Assemble training data part: the third column indicates the relative 
importance or relevance of that doc
Could you please give more info about how to give a score based on what user 
clicks?

I have read 
https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html

But still have no clue yet.

>From a user's perspective, the steps such as setup the feature and model in 
>Solr is simple, but collecting the feedback data and train/update the model is 
>much more complex. Without it, we can't really use the learning-to-rank 
>function in Solr.

It would be great if Solr can provide some detailed instruction and sample code 
about how to translate the partial pairwise feedback and use it to train and 
update model.

Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5701) Allow DocTransformer to add arbitrary fields

2016-10-05 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-5701:
--
Issue Type: Improvement  (was: Bug)

> Allow DocTransformer to add arbitrary fields
> 
>
> Key: SOLR-5701
> URL: https://issues.apache.org/jira/browse/SOLR-5701
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: jefferyyuan
>  Labels: search
> Fix For: 4.9, 6.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> DocTransformer is very powerful, and allow us to add/remove or update fields 
> before returning.
> One limit we don't like is that it can only add one field, and the field name 
> must be [transformer_name].
> We may want to add multiple fields in one DocTransformer.
> One possible solution is to add method getFieldNames into DocTransformer.
> public abstract class DocTransformer{
>public void List getFieldNames() { return null; }
> }
> Then in SolrReturnFields.add(String, NamedList, DocTransformers, 
> SolrQueryRequest)
> Change augmenters.addTransformer( factory.create(disp, augmenterParams, req) 
> ); like below:
> DocTransformer docTransfomer = factory.create(disp, augmenterParams, req);
> SolrReturnFields.add(docTransfomer);
> then read fi3eldnames: docTransfomer.getFieldNames(); add them into 
> SolrReturnFields.
> DocTransfomer implementation would add all fields via doc.addField.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4736) Support group.mincount for Result Grouping

2016-10-05 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-4736:
--
Issue Type: Improvement  (was: Bug)

> Support group.mincount for Result Grouping
> --
>
> Key: SOLR-4736
> URL: https://issues.apache.org/jira/browse/SOLR-4736
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.2
>Reporter: jefferyyuan
>Priority: Minor
>  Labels: group, solr
> Fix For: 4.9, 6.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Result Grouping is a very useful feature: we can use it to find duplicate 
> data in index, but it lacks of one feature-group.mincount. 
> With group.mincount, we can specify that only groups that has equal or more 
> than ${mincount} for the group field will be returned.
> Specially, we can use group.mincount=2 to only return duplicate data.
> Could we add this in future release? Thanks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-9463) SolrJ: Support Converter and make it easier to extend DocumentObjectBinder

2016-10-05 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-9463:
--
Priority: Major  (was: Minor)

> SolrJ: Support Converter and make it easier to extend DocumentObjectBinder
> --
>
> Key: SOLR-9463
> URL: https://issues.apache.org/jira/browse/SOLR-9463
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: jefferyyuan
>  Labels: extensibility, solrj
>
> In our old project, we use Spring-Solr, it provides some good function such 
> as allow us to define converters to serialize java enum to solr string and 
> vice verse, serialize object as json string and vice verse.
> But it doesn't support latest solr, solr cloud and child documents.
> We would like to use pure solrj, but we do like spring solr's converters 
> function.
> Is it possible that SolrJ can support custom convert in SolrJ?
> Also SolrJ should make it easier to extend DocumentObjectBinder, such as make 
> DocField, infocache, getDocFields etc accessible in sub class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-9602) Support Bucket Filters in Facet Functions

2016-10-04 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan closed SOLR-9602.
-
Resolution: Duplicate

Yonik Seeley already created https://issues.apache.org/jira/browse/SOLR-9603.  

> Support Bucket Filters in Facet Functions
> -
>
> Key: SOLR-9602
> URL: https://issues.apache.org/jira/browse/SOLR-9602
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module, faceting
>Reporter: jefferyyuan
>  Labels: facet, faceted-search, faceting, function
> Fix For: 5.5.4, 6.3, 6.x, 6.2.2
>
>
> Original link: 
> http://lucene.472066.n3.nabble.com/Facet-Stats-MinCount-How-to-use-mincount-filter-when-use-facet-stats-td4299367.html
> we need bucket filters in general (beyond mincount).  - Yonik Seeley
> We store some events data such as accountId, startTime, endTime, timeSpent 
> and some other searchable fields.
> We want to get all acountIds that spend more than xhours between startTime 
> and endTime and some other criteria which are not important here.
> We use solr facet function like below.
> it's very powerful. The only missing part is that it doesn't minValue and 
> maxValue filter. 
> http://localhost:8983/solr/events/select?q=*:*&json.facet={ 
>categories:{ 
>  type : terms, 
>  field : accountId, 
>  numBuckets: true, 
>  facet:{ 
>sum : "sum(timeSpent)" 
>// it would be great if we support minValue, maxValue to do filter 
> here 
>  } 
>} 
>  }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9603) Facet bucket filters

2016-10-04 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547369#comment-15547369
 ] 

jefferyyuan commented on SOLR-9603:
---

Original link: 
http://lucene.472066.n3.nabble.com/Facet-Stats-MinCount-How-to-use-mincount-filter-when-use-facet-stats-td4299367.html
https://issues.apache.org/jira/browse/SOLR-9602

> Facet bucket filters
> 
>
> Key: SOLR-9603
> URL: https://issues.apache.org/jira/browse/SOLR-9603
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Reporter: Yonik Seeley
>
> "filter" may be a bit of an overloaded term, but it would be nice to be able 
> to filter facet buckets by additional things, like the metrics that are 
> calculated per bucket.
> This is like the HAVING clause in SQL.
> Example of a facet that would group by author, find the average review rating 
> for that author, and filter out authors (buckets) with less than a 3.5 
> average.
>  
> {code}
> reviews : {
>   type : terms,
>   field: author,
>   sort: "x desc",
>   having: "x >= 3.5",
>   facet : {
> x : avg(rating)
>   }
> }
> {code}
>  
> This functionality would also be useful for "pushing down" more calculations 
> to the endpoints for streaming expressions / SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-9602) Support Bucket Filters in Facet Functions

2016-10-04 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-9602:
-

 Summary: Support Bucket Filters in Facet Functions
 Key: SOLR-9602
 URL: https://issues.apache.org/jira/browse/SOLR-9602
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Facet Module, faceting
Reporter: jefferyyuan
 Fix For: 5.5.4, 6.3, 6.x, 6.2.2


Original link: 
http://lucene.472066.n3.nabble.com/Facet-Stats-MinCount-How-to-use-mincount-filter-when-use-facet-stats-td4299367.html

we need bucket filters in general (beyond mincount).  - Yonik Seeley

We store some events data such as accountId, startTime, endTime, timeSpent and 
some other searchable fields.

We want to get all acountIds that spend more than xhours between startTime and 
endTime and some other criteria which are not important here.

We use solr facet function like below.
it's very powerful. The only missing part is that it doesn't minValue and 
maxValue filter. 
http://localhost:8983/solr/events/select?q=*:*&json.facet={ 
   categories:{ 
 type : terms, 
 field : accountId, 
 numBuckets: true, 
 facet:{ 
   sum : "sum(timeSpent)" 
   // it would be great if we support minValue, maxValue to do filter here 
 } 
   } 
 }




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-9463) SolrJ: Support Converter and make it easier to extend DocumentObjectBinder

2016-08-30 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-9463:
-

 Summary: SolrJ: Support Converter and make it easier to extend 
DocumentObjectBinder
 Key: SOLR-9463
 URL: https://issues.apache.org/jira/browse/SOLR-9463
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: jefferyyuan
Priority: Minor


In our old project, we use Spring-Solr, it provides some good function such as 
allow us to define converters to serialize java enum to solr string and vice 
verse, serialize object as json string and vice verse.

But it doesn't support latest solr, solr cloud and child documents.

We would like to use pure solrj, but we do like spring solr's converters 
function.

Is it possible that SolrJ can support custom convert in SolrJ?

Also SolrJ should make it easier to extend DocumentObjectBinder, such as make 
DocField, infocache, getDocFields etc accessible in sub class.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5005) JavaScriptRequestHandler

2016-05-18 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290100#comment-15290100
 ] 

jefferyyuan commented on SOLR-5005:
---

Very useful feature, this can make easier for developer to extend solr - only 
need change solrconfig.xml and add one script file.

> JavaScriptRequestHandler
> 
>
> Key: SOLR-5005
> URL: https://issues.apache.org/jira/browse/SOLR-5005
> Project: Solr
>  Issue Type: New Feature
>Reporter: David Smiley
>Assignee: Noble Paul
> Attachments: SOLR-5005.patch, SOLR-5005.patch, SOLR-5005.patch, 
> SOLR-5005_ScriptRequestHandler_take3.patch, 
> SOLR-5005_ScriptRequestHandler_take3.patch, patch
>
>
> A user customizable script based request handler would be very useful.  It's 
> inspired from the ScriptUpdateRequestProcessor, but on the search end. A user 
> could write a script that submits searches to Solr (in-VM) and can react to 
> the results of one search before making another that is formulated 
> dynamically.  And it can assemble the response data, potentially reducing 
> both the latency and data that would move over the wire if this feature 
> didn't exist.  It could also be used to easily add a user-specifiable search 
> API at the Solr server with request parameters governed by what the user 
> wants to advertise -- especially useful within enterprises.  And, it could be 
> used to enforce security requirements on allowable parameter valuables to 
> Solr, so a javascript based Solr client could be allowed to talk to only a 
> script based request handler which enforces the rules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7131) Sort Group Ascendingly(asc_max) by Max Value in Each Group

2015-11-17 Thread jefferyyuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jefferyyuan updated SOLR-7131:
--
Description: 
Solr group supports asc and desc on some field:
let's take sort=time asc as an example
In asc mode: groups are sorted by the min value in the group, 
In desc mode, groups are sorted by the max value in the group.

But users may want more:
in asc_max mode, sort group by max(not min) value in the group
==> this should be a common requirement.
Vice verse, in desc_min mode, sort group by min(not max) value in th group.

We have this requirement in our product, and we implemented in some cumbersome 
way: by create a new kind of FieldComparator: LongAbnormalComparator

It would be great Solr can support this.

  was:
Solr group supports asc and desc on some field:
let's take sort=time asc as an example
In asc mode: groups are sorted by the min value in the group, 
In desc mode, groups are sorted by the max value in the group.

But users may want more:
in asc_max mode, sort group by max(not min) value in the group
==> this should be a common requirement.
Vice verse, in desc_min mode, sort group by min(not max) value in th group.

We have this requirement in our product, and we implemented in some cumbersome 
way: by create a new kind of FieldComparator: LongAbnormalComparator

I am not we are not alone, and it would be great Solr can support this.


> Sort Group Ascendingly(asc_max) by Max Value in Each Group
> --
>
> Key: SOLR-7131
> URL: https://issues.apache.org/jira/browse/SOLR-7131
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: jefferyyuan
>Priority: Minor
>  Labels: group, search
> Fix For: 5.2, Trunk
>
>
> Solr group supports asc and desc on some field:
> let's take sort=time asc as an example
> In asc mode: groups are sorted by the min value in the group, 
> In desc mode, groups are sorted by the max value in the group.
> But users may want more:
> in asc_max mode, sort group by max(not min) value in the group
> ==> this should be a common requirement.
> Vice verse, in desc_min mode, sort group by min(not max) value in th group.
> We have this requirement in our product, and we implemented in some 
> cumbersome way: by create a new kind of FieldComparator: 
> LongAbnormalComparator
> It would be great Solr can support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-7131) Sort Group Ascendingly(asc_max) by Max Value in Each Group

2015-02-19 Thread jefferyyuan (JIRA)

jefferyyuan created SOLR-7131:
-

 Summary: Sort Group Ascendingly(asc_max) by Max Value in Each Group
 Key: SOLR-7131
 URL: https://issues.apache.org/jira/browse/SOLR-7131
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: jefferyyuan
Priority: Minor
 Fix For: 5.1


Solr group supports asc and desc on some field:
let's take sort=time asc as an example
In asc mode: groups are sorted by the min value in the group, 
In desc mode, groups are sorted by the max value in the group.

But users may want more:
in asc_max mode, sort group by max(not min) value in the group
==> this should be a common requirement.
Vice verse, in desc_min mode, sort group by min(not max) value in th group.

We have this requirement in our product, and we implemented in some cumbersome 
way: by create a new kind of FieldComparator: LongAbnormalComparator

I am not we are not alone, and it would be great Solr can support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7097) Update other Document in DocTransformer

2015-02-19 Thread jefferyyuan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328034#comment-14328034
 ] 

jefferyyuan commented on SOLR-7097:
---

Hi, Noble:

In some cases, we may want to change previous documents: add/update/remove 
fields.
Or totally remove previous documents.

One use case: in solr group flat mode: group.main=true, and groupCount to the 
first document in this group:

We choose to not change Solr code in our product instead we write a new 
CachedXMLWriter whose writeSolrDocument caches the SolrDocument, and write all 
documents out in writeEndDocumentList. 
http://lifelongprogrammer.blogspot.com/2015/01/use-solr-transformer-to-gen-groupcount.html

It would be great if Solr allows us to change previous documents.


> Update other Document in DocTransformer
> ---
>
> Key: SOLR-7097
> URL: https://issues.apache.org/jira/browse/SOLR-7097
> Project: Solr
>  Issue Type: Improvement
>Reporter: jefferyyuan
>Priority: Minor
>  Labels: searcher, transformers
>
> Solr DocTransformer is good, but it only allows us to change current 
> document: add or remove, update fields.
> It would be great if we can update other document(previous especially) , or 
> better we can delete doc(especially useful during test) or add doc in 
> DocTransformer.
> User case:
> We can use flat group mode(group.main=true) to put parent and child close to 
> each other(parent first), then we can use DocTransformer to update parent 
> document when access its child document.
> Some thought about Implementation:
> org.apache.solr.response.TextResponseWriter.writeDocuments(String, 
> ResultContext, ReturnFields)
> when cachMode=true, in the for loop, after transform, we can store the 
> solrdoc in a list, write these doc at the end.
> cachMode = req.getParams().getBool("cachMode", false);
> SolrDocument[] cachedDocs = new SolrDocument[sz];
> for (int i = 0; i < sz; i++) {
>  SolrDocument sdoc = toSolrDocument(doc);
>  if (transformer != null) {
>   transformer.transform(sdoc, id);
>  }
>  if(cachMode)
>  {
> cachedDocs[i] = sdoc;
>  }
>  else{
> writeSolrDocument( null, sdoc, returnFields, i );
>  }
>   
> }
> if (transformer != null) {
>  transformer.setContext(null);
> }
> if(cachMode) {
>  for (int i = 0; i < sz; i++) {
>   writeSolrDocument(null, cachedDocs[i], returnFields, i);
>  }
> }
> writeEndDocumentList();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

94 matches

Mail list logo