[ 
https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809012#comment-16809012
 ] 

jefferyyuan commented on SOLR-12833:
------------------------------------

thanks for the findings, [~ab], [~markrmil...@gmail.com]

as you said, we don't need versionBucketLockTimeoutMs in every VersionBucket, I 
can create one pr to remove it from VersionBucket.

One approach to fix this problem:
 * if there is not a lot of competition on same version bucket and the update 
usually finishes fast, customers don't specify versionBucketLockTimeoutMs 
value, then we use the old VersionBucket which has no lock and Condition 
objects, and its lock signalAll, awaitNanos methods will keep the old ways, use 
the intrinsic.
 * if there is a lot of competition on same version bucket, and update(like geo 
related updates) takes time, customer can specify versionBucketLockTimeoutMs 
explicitly, then we can create and use another  class TimedVersionBucket that 
extends VersionBucket, and uses lock and Condition, so only one update on same 
bucket will actually go forward and get processed, other updates will fail fast.

> Use timed-out lock in DistributedUpdateProcessor
> ------------------------------------------------
>
>                 Key: SOLR-12833
>                 URL: https://issues.apache.org/jira/browse/SOLR-12833
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update, UpdateRequestProcessors
>    Affects Versions: 7.5, 8.0
>            Reporter: jefferyyuan
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 7.7, 8.0
>
>         Attachments: SOLR-12833-noint.patch, SOLR-12833.patch, 
> SOLR-12833.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall 
> in the same hash bucket. The update waits forever until it gets the lock at 
> the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) 
> like may take time (30+ seconds or even more), this would the request time 
> out and fail.
> Client may retry the same requests multiple times or several minutes, this 
> would make things worse.
> The server side receives all the update requests but all except one can do 
> nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and 
> only a few updates are making progress. Each thread takes 3+ mb memory which 
> causes OOM.
> Also if the update can't get the lock in expected time range, its better to 
> fail fast.
>  
> We can have one configuration in solrconfig.xml: 
> updateHandler/versionLock/timeInMill, so users can specify how long they want 
> to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets 
> the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to