[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2018-04-13 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437946#comment-16437946
 ] 

Mikhail Khludnev commented on SOLR-5859:


[~noble.paul], I want to clarify 
https://github.com/apache/lucene-solr/commit/3fd292234166105f96fcb5acd3999c9c2abff737#diff-9ed614eee66b9e685d73446b775dc043R287
{quote}
//do this in a separate thread because any wait is interrupted in this 
main thread
new Thread(this::checkIfIamStillLeader, "OverseerExitThread").start();
{quote}
Can't we clean interrupt flag with {{Thread.interrupted()}} and avoid spawning 
new thread ?

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Fix For: 4.8, 6.0
>
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch, 
> SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961372#comment-13961372
 ] 

ASF subversion and git services commented on SOLR-5859:
---

Commit 1585276 from no...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1585276 ]

SOLR-5859 Fixing test errors

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch, 
> SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961370#comment-13961370
 ] 

ASF subversion and git services commented on SOLR-5859:
---

Commit 1585274 from no...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1585274 ]

SOLR-5859 Fixing test errors

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch, 
> SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958599#comment-13958599
 ] 

ASF subversion and git services commented on SOLR-5859:
---

Commit 1584273 from [~noble.paul] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1584273 ]

SOLR-5859 improved logging, and fix a potential bug

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch, 
> SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958586#comment-13958586
 ] 

ASF subversion and git services commented on SOLR-5859:
---

Commit 1584271 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1584271 ]

SOLR-5859 improved logging, and fix a potential bug

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch, 
> SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-04-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957986#comment-13957986
 ] 

ASF subversion and git services commented on SOLR-5859:
---

Commit 1584120 from [~noble.paul] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1584120 ]

SOLR-5859 removing accidental removal of SOLR-5908 changes

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch, 
> SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-04-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957969#comment-13957969
 ] 

ASF subversion and git services commented on SOLR-5859:
---

Commit 1584115 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1584115 ]

SOLR-5859 removing accidental removal of SOLR-5908 changes

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch, 
> SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-04-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957951#comment-13957951
 ] 

ASF subversion and git services commented on SOLR-5859:
---

Commit 1584110 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1584110 ]

SOLR-5859: add OCP.getCollectionStatus() param description for 'clusterState' 
to stop 'ant precommit' bitching 'Javadoc: Description expected after this 
reference' and failing the build (merged trunk r1584108)

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch, 
> SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-04-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957944#comment-13957944
 ] 

ASF subversion and git services commented on SOLR-5859:
---

Commit 1584108 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1584108 ]

SOLR-5859: add OCP.getCollectionStatus() param description for 'clusterState' 
to stop 'ant precommit' bitching 'Javadoc: Description expected after this 
reference' and failing the build

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch, 
> SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-04-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957855#comment-13957855
 ] 

ASF subversion and git services commented on SOLR-5859:
---

Commit 1584085 from [~noble.paul] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1584085 ]

SOLR-5859 Harden Overseer restart

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch, 
> SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-04-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957786#comment-13957786
 ] 

ASF subversion and git services commented on SOLR-5859:
---

Commit 1584069 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1584069 ]

SOLR-5859 Harden Overseer restart

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-5859.patch, SOLR-5859.patch, SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-03-30 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954824#comment-13954824
 ] 

Mark Miller commented on SOLR-5859:
---

Yes, very nice change. This approach is great.

Patch looks good, but some nits with the current version listed below:

bq. OCP

We info log closing OCP - we probably should not abbreviate it though, a user 
won't know what it is.

bq.   } else if( QUIT.equals(operation)){

{code}
  }
  String getId(){
return myId;
  }
{code}

There are also some project formatting violations - eg spacing, missing new 
line: 

{code}
log.info("IsClosed  :{} , {}", isClosed, this);
log.warn("OverseerCollectionProcessor.processMessage : "+ operation + " , 
"+ message.toString());
{code}

I think both of those are wrong - should be one log line under debug.

bq. import org.apache.zookeeper.data.Stat;

Unused import added.

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-5859.patch, SOLR-5859.patch
>
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-03-25 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946745#comment-13946745
 ] 

Shalin Shekhar Mangar commented on SOLR-5859:
-

This seems to be a much better way of killing an overseer.

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5859) Harden the Overseer restart mechanism

2014-03-25 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946663#comment-13946663
 ] 

Noble Paul commented on SOLR-5859:
--

new strategy, implement a new message called _quit_ on receiving the message 
Overseer would set isClosed=true and the loop would exit as soon as the current 
in-flight message is done . 
After exiting the loop , it checks if it is still the leader (most likely it 
is) , if yes , remove the leader node from ZK and remove itself from the 
forefront of the election queue

> Harden the Overseer restart mechanism
> -
>
> Key: SOLR-5859
> URL: https://issues.apache.org/jira/browse/SOLR-5859
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> SOLR-5476 depends on Overseer restart.The current strategy is to remove the 
> zk node for leader election and wait for STATUS_UPDATE_DELAY +100 ms and  
> start the new overseer.
> Though overseer ops are short running,  it is not a 100% foolproof strategy 
> because if an operation takes longer than the wait period there can be race 
> condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org