[ 
https://issues.apache.org/jira/browse/BLUR-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637620#comment-13637620
 ] 

Aaron McCurry commented on BLUR-74:
-----------------------------------

Let me discuss how we got here.  In earlier versions of Blur, the index locks 
(Lucene API LockFactory) were actually controlled by ZooKeeper.  This made a 
lot of sense when I wrote it. Basically there was an ephemeral node for per 
shard per table.  When a failure was detected and shards were relocated, it was 
assumed that the ephemeral nodes would have been released (been removed by ZK) 
by the node that went offline.  And thus the locks would have been released, 
and the server that was opening the shard would be able to obtain the lock 
immediately and start the opening process by the writer.  In that 
implementation the waiting for the table to enable or disable was a matter of 
waiting for the ephemeral nodes (the locks) to be present or not.

However in practice it did not work that well, the problem was that in running 
a large cluster where there are thousands of shards ZK would not react that 
fast to individual ephemeral nodes.  And the result was during a failure the 
server trying to open the down shard would wait for seconds to minutes to 
obtain the lock to start opening the index.  So the ZK lockfactory was replaced 
with a HDFS versus that allows for any writer to obtain the lock however it 
validates that the writer that the writer has the lock before committing any 
new data to the index.

So the problem is that currently we really don't have idea what shards are 
actually open on any given server.  We only know what shards the "should" be 
open, and that may be the answer.  Perhaps we should add a another call in Blur 
service in thrift and extend the "shardServerLayout" method behavior.  We 
should leave the existing call and it's behavior in place and add a another 
"shardServerLayout" method that takes a parameter maybe an enum of ACTUAL and 
CALCULATED.  Where the CALCULATED is the current result and ACTUAL what is 
really open.  Then we can have the enable and disable calls key off the results 
of that call and block appropriately.

Aaron
                
> Make the disabling and enabling of tables blocking calls.
> ---------------------------------------------------------
>
>                 Key: BLUR-74
>                 URL: https://issues.apache.org/jira/browse/BLUR-74
>             Project: Apache Blur
>          Issue Type: Bug
>    Affects Versions: 0.1.5
>            Reporter: Aaron McCurry
>             Fix For: 0.1.5
>
>
> Currently the calls return, and then the action is carried out 
> asynchronously.  This is an issue with the writers when someone calls disable 
> and remove very quickly and the indexes are to be removed.  Because the 
> indexes are deleted out form underneath the writers.  This causes the shard 
> servers to throw errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to