Re: killing a Solr instance leaves the state in Zookeeper "active"

2015-10-23 Thread Jason Gerlowski
Is this documented anywhere outside of the JIRAs you mentioned Erick
(or anyone else)?  I can only speak for myself, but I don't think I
would've expected/caught that as a potential Solr consumer, even
though it is working as designed.  If it doesn't make sense to
actually this; ensuring this is covered by the documentation might be
a good compromise/follow-up.

On Fri, Oct 23, 2015 at 1:55 PM, Erick Erickson  wrote:
> Not so much a problem as behavior I wasn't fully expecting. It does
> seem a little trappy to have this thing that's supposed to be the
> state of the collection but then require that another znode be
> checked to see if state.json is telling the truth.
>
> In the particular case that came up, a monitoring system was trying
> to generate alerts when a node went down by relying on the state.json
> znode, but no alert was being generated in this case.
>
> BTW, this is 4.6, I suspect the eventual answer is to upgrade and
> use the collections API CLUSTERSTATUS...
>
> I don't have strong feelings about this, mostly throwing it out for
> discussion. I suppose the goal here is to keep any client from having
> to directly look at the state.json file and provide APIs that conceal
> this kind of thing.
>
> Your point about the complexity of publishing state for other nodes
> is well taken...
>
>
> On Fri, Oct 23, 2015 at 10:29 AM, Shalin Shekhar Mangar
>  wrote:
>> This is expected and works as designed. We have enough complexity in
>> publishing state for other nodes (LIR) and we shouldn't add any more.
>> Besides what if the leader itself was killed, who changes the state
>> then?
>>
>> What problem are you trying to solve?
>>
>> On Fri, Oct 23, 2015 at 10:19 PM, Erick Erickson
>>  wrote:
>>> If I kill a replica with -9, the state.json node never gets updated,
>>> the node shows as "active"
>>>
>>> There is code around that checks the live_nodes to see whether the
>>> state.json node can be believed, and Varun pointed me at Solr JIRAs
>>> for making sure CLUSTERSTATUS consults live_nodes, indicating that
>>> this is something that's expected. But it seems trappy.
>>>
>>> My question is whether it's worth raising a JIRA. The leader could
>>> notice a mismatch and update state.json or something like that.
>>>
>>> I'll raise a JIRA if it seems like something that should be discussed.
>>>
>>> Let me know,
>>> Erick
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: killing a Solr instance leaves the state in Zookeeper "active"

2015-10-23 Thread Erick Erickson
Not so much a problem as behavior I wasn't fully expecting. It does
seem a little trappy to have this thing that's supposed to be the
state of the collection but then require that another znode be
checked to see if state.json is telling the truth.

In the particular case that came up, a monitoring system was trying
to generate alerts when a node went down by relying on the state.json
znode, but no alert was being generated in this case.

BTW, this is 4.6, I suspect the eventual answer is to upgrade and
use the collections API CLUSTERSTATUS...

I don't have strong feelings about this, mostly throwing it out for
discussion. I suppose the goal here is to keep any client from having
to directly look at the state.json file and provide APIs that conceal
this kind of thing.

Your point about the complexity of publishing state for other nodes
is well taken...


On Fri, Oct 23, 2015 at 10:29 AM, Shalin Shekhar Mangar
 wrote:
> This is expected and works as designed. We have enough complexity in
> publishing state for other nodes (LIR) and we shouldn't add any more.
> Besides what if the leader itself was killed, who changes the state
> then?
>
> What problem are you trying to solve?
>
> On Fri, Oct 23, 2015 at 10:19 PM, Erick Erickson
>  wrote:
>> If I kill a replica with -9, the state.json node never gets updated,
>> the node shows as "active"
>>
>> There is code around that checks the live_nodes to see whether the
>> state.json node can be believed, and Varun pointed me at Solr JIRAs
>> for making sure CLUSTERSTATUS consults live_nodes, indicating that
>> this is something that's expected. But it seems trappy.
>>
>> My question is whether it's worth raising a JIRA. The leader could
>> notice a mismatch and update state.json or something like that.
>>
>> I'll raise a JIRA if it seems like something that should be discussed.
>>
>> Let me know,
>> Erick
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: killing a Solr instance leaves the state in Zookeeper "active"

2015-10-23 Thread Shalin Shekhar Mangar
This is expected and works as designed. We have enough complexity in
publishing state for other nodes (LIR) and we shouldn't add any more.
Besides what if the leader itself was killed, who changes the state
then?

What problem are you trying to solve?

On Fri, Oct 23, 2015 at 10:19 PM, Erick Erickson
 wrote:
> If I kill a replica with -9, the state.json node never gets updated,
> the node shows as "active"
>
> There is code around that checks the live_nodes to see whether the
> state.json node can be believed, and Varun pointed me at Solr JIRAs
> for making sure CLUSTERSTATUS consults live_nodes, indicating that
> this is something that's expected. But it seems trappy.
>
> My question is whether it's worth raising a JIRA. The leader could
> notice a mismatch and update state.json or something like that.
>
> I'll raise a JIRA if it seems like something that should be discussed.
>
> Let me know,
> Erick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>



-- 
Regards,
Shalin Shekhar Mangar.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org