Re: Replica and node states

2015-03-25 Thread Shalin Shekhar Mangar
On Wed, Mar 25, 2015 at 9:24 PM, Shai Erera  wrote:

> >
> > There's even a param onyIfDown=true which will remove a
> > replica only if it's already 'down'.
> >
>
> That will only work if the replica is in DOWN state correct? That is, if
> the Solr JVM was killed, and the replica stays in ACTIVE, but its node is
> not under /live_nodes, it won't get deleted? What I chose to do is to
> delete the replica if its node is not under /live_nodes, and I'm sure it
> will never return.
>

Probably not and we should fix it. It should be possible to delete replicas
which are not live I guess. But there are more behaviors that need to
defined e.g. what happens if a node was down and you deleted the replica
which was supposed to be on it and then the node came back up. Should we
re-create the replica automatically or ask the node to delete the local
core and have something new assigned to it? Some of these behaviors are
what we informally call ZK as Truth features where we want to move to a
world where ZK is the source of truth and nodes modify their state and
cores depending on what's inside ZK.


>
> No, there is no penalty because we always check for the state=active and
> > the live-ness before routing any requests to a replica.
> >
>
> Well, that's also a penalty :), though I agree it's a minor one. There is
> also a penalty ZK-wise -- clusterstate.json still records these orphanage
> replicas, so I'll make sure I do this cleanup from time to time.
>
>
Yeah but just to avoid any misunderstanding -- the live nodes are watched
by ZK so checking live-ness is a hash set lookup which is the cost but a
small one. But yeah you do need to cleanup from time to time.


> Thanks for the responses and clarifications!
>
> Shai
>
> On Wed, Mar 25, 2015 at 11:39 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > On Wed, Mar 25, 2015 at 12:51 PM, Shai Erera  wrote:
> >
> > > Thanks.
> > >
> > > Does Solr ever clean up those states? I.e. does it ever remove "down"
> > > replicas, or replicas belonging to non-live_nodes after some time? Or
> > will
> > > these remain in the cluster state forever (assuming they never come
> back
> > > up)?
> > >
> >
> > No, they remain there forever. You can still call the deletereplica API
> to
> > clean them up. There's even a param onyIfDown=true which will remove a
> > replica only if it's already 'down'.
> >
> >
> > >
> > > If they remain there, is there any penalty? E.g. Solr tries to send
> them
> > > updates, maybe tries to route search requests to? I'm talking about
> > > replicas that stay in ACTIVE state, but their nodes aren't under
> > > /live_nodes.
> > >
> >
> > No, there is no penalty because we always check for the state=active and
> > the live-ness before routing any requests to a replica.
> >
> >
> > >
> > > Shai
> > >
> > > On Wed, Mar 25, 2015 at 8:05 PM, Shalin Shekhar Mangar <
> > > shalinman...@gmail.com> wrote:
> > >
> > > > Comments inline:
> > > >
> > > > On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera 
> wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Is it possible for a replica to be DOWN, while the node it resides
> on
> > > is
> > > > > under /live_nodes? If so, what can lead to it, aside from someone
> > > > unloading
> > > > > a core.
> > > > >
> > > >
> > > > Yes, aside from someone unloading the index, this can happen in two
> > ways
> > > 1)
> > > > during startup each core publishes it's state as 'down' before it
> > enters
> > > > recovery, and 2) the leader force-publishes a replica as 'down' if it
> > is
> > > > not able to forward updates to that replica (this mechanism is called
> > > > Leader-Initiated-Recovery or LIR in short)
> > > >
> > > > The #2 above can happen when the replica is partitioned from leader
> but
> > > > both are able to talk to ZooKeeper.
> > > >
> > > >
> > > > >
> > > > > I don't know if each SolrCore reports status to ZK independently,
> or
> > > it's
> > > > > done by the Solr process as a whole.
> > > > >
> > > > >
> > > > It is done on a per-core basis for now. But the 'live' node is
> > maintained
> > > > one per Solr instance (JVM).
> > > >
> > > >
> > > > > Also, is it possible for a replica to report ACTIVE, while the node
> > it
> > > > > lives on is no longer under /live_nodes? Are there any ZK timings
> > that
> > > > can
> > > > > cause that?
> > > > >
> > > >
> > > > Yes, this can happen if the JVM crashed. A replica publishes itself
> as
> > > > 'down' on shutdown so if the graceful shutdown step is skipped then
> the
> > > > replica will continue to be 'active' in the cluster state. Even LIR
> > > doesn't
> > > > apply here because there's no point in the leader marking a node as
> > > 'down'
> > > > if it is not 'live' already.
> > > >
> > > >
> > > > >
> > > > > Shai
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Shalin Shekhar Mangar.
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Replica and node states

2015-03-25 Thread Shai Erera
>
> There's even a param onyIfDown=true which will remove a
> replica only if it's already 'down'.
>

That will only work if the replica is in DOWN state correct? That is, if
the Solr JVM was killed, and the replica stays in ACTIVE, but its node is
not under /live_nodes, it won't get deleted? What I chose to do is to
delete the replica if its node is not under /live_nodes, and I'm sure it
will never return.

No, there is no penalty because we always check for the state=active and
> the live-ness before routing any requests to a replica.
>

Well, that's also a penalty :), though I agree it's a minor one. There is
also a penalty ZK-wise -- clusterstate.json still records these orphanage
replicas, so I'll make sure I do this cleanup from time to time.

Thanks for the responses and clarifications!

Shai

On Wed, Mar 25, 2015 at 11:39 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Wed, Mar 25, 2015 at 12:51 PM, Shai Erera  wrote:
>
> > Thanks.
> >
> > Does Solr ever clean up those states? I.e. does it ever remove "down"
> > replicas, or replicas belonging to non-live_nodes after some time? Or
> will
> > these remain in the cluster state forever (assuming they never come back
> > up)?
> >
>
> No, they remain there forever. You can still call the deletereplica API to
> clean them up. There's even a param onyIfDown=true which will remove a
> replica only if it's already 'down'.
>
>
> >
> > If they remain there, is there any penalty? E.g. Solr tries to send them
> > updates, maybe tries to route search requests to? I'm talking about
> > replicas that stay in ACTIVE state, but their nodes aren't under
> > /live_nodes.
> >
>
> No, there is no penalty because we always check for the state=active and
> the live-ness before routing any requests to a replica.
>
>
> >
> > Shai
> >
> > On Wed, Mar 25, 2015 at 8:05 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> > > Comments inline:
> > >
> > > On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera  wrote:
> > >
> > > > Hi
> > > >
> > > > Is it possible for a replica to be DOWN, while the node it resides on
> > is
> > > > under /live_nodes? If so, what can lead to it, aside from someone
> > > unloading
> > > > a core.
> > > >
> > >
> > > Yes, aside from someone unloading the index, this can happen in two
> ways
> > 1)
> > > during startup each core publishes it's state as 'down' before it
> enters
> > > recovery, and 2) the leader force-publishes a replica as 'down' if it
> is
> > > not able to forward updates to that replica (this mechanism is called
> > > Leader-Initiated-Recovery or LIR in short)
> > >
> > > The #2 above can happen when the replica is partitioned from leader but
> > > both are able to talk to ZooKeeper.
> > >
> > >
> > > >
> > > > I don't know if each SolrCore reports status to ZK independently, or
> > it's
> > > > done by the Solr process as a whole.
> > > >
> > > >
> > > It is done on a per-core basis for now. But the 'live' node is
> maintained
> > > one per Solr instance (JVM).
> > >
> > >
> > > > Also, is it possible for a replica to report ACTIVE, while the node
> it
> > > > lives on is no longer under /live_nodes? Are there any ZK timings
> that
> > > can
> > > > cause that?
> > > >
> > >
> > > Yes, this can happen if the JVM crashed. A replica publishes itself as
> > > 'down' on shutdown so if the graceful shutdown step is skipped then the
> > > replica will continue to be 'active' in the cluster state. Even LIR
> > doesn't
> > > apply here because there's no point in the leader marking a node as
> > 'down'
> > > if it is not 'live' already.
> > >
> > >
> > > >
> > > > Shai
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Replica and node states

2015-03-25 Thread Shalin Shekhar Mangar
On Wed, Mar 25, 2015 at 12:51 PM, Shai Erera  wrote:

> Thanks.
>
> Does Solr ever clean up those states? I.e. does it ever remove "down"
> replicas, or replicas belonging to non-live_nodes after some time? Or will
> these remain in the cluster state forever (assuming they never come back
> up)?
>

No, they remain there forever. You can still call the deletereplica API to
clean them up. There's even a param onyIfDown=true which will remove a
replica only if it's already 'down'.


>
> If they remain there, is there any penalty? E.g. Solr tries to send them
> updates, maybe tries to route search requests to? I'm talking about
> replicas that stay in ACTIVE state, but their nodes aren't under
> /live_nodes.
>

No, there is no penalty because we always check for the state=active and
the live-ness before routing any requests to a replica.


>
> Shai
>
> On Wed, Mar 25, 2015 at 8:05 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > Comments inline:
> >
> > On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera  wrote:
> >
> > > Hi
> > >
> > > Is it possible for a replica to be DOWN, while the node it resides on
> is
> > > under /live_nodes? If so, what can lead to it, aside from someone
> > unloading
> > > a core.
> > >
> >
> > Yes, aside from someone unloading the index, this can happen in two ways
> 1)
> > during startup each core publishes it's state as 'down' before it enters
> > recovery, and 2) the leader force-publishes a replica as 'down' if it is
> > not able to forward updates to that replica (this mechanism is called
> > Leader-Initiated-Recovery or LIR in short)
> >
> > The #2 above can happen when the replica is partitioned from leader but
> > both are able to talk to ZooKeeper.
> >
> >
> > >
> > > I don't know if each SolrCore reports status to ZK independently, or
> it's
> > > done by the Solr process as a whole.
> > >
> > >
> > It is done on a per-core basis for now. But the 'live' node is maintained
> > one per Solr instance (JVM).
> >
> >
> > > Also, is it possible for a replica to report ACTIVE, while the node it
> > > lives on is no longer under /live_nodes? Are there any ZK timings that
> > can
> > > cause that?
> > >
> >
> > Yes, this can happen if the JVM crashed. A replica publishes itself as
> > 'down' on shutdown so if the graceful shutdown step is skipped then the
> > replica will continue to be 'active' in the cluster state. Even LIR
> doesn't
> > apply here because there's no point in the leader marking a node as
> 'down'
> > if it is not 'live' already.
> >
> >
> > >
> > > Shai
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Replica and node states

2015-03-25 Thread Shai Erera
Thanks.

Does Solr ever clean up those states? I.e. does it ever remove "down"
replicas, or replicas belonging to non-live_nodes after some time? Or will
these remain in the cluster state forever (assuming they never come back
up)?

If they remain there, is there any penalty? E.g. Solr tries to send them
updates, maybe tries to route search requests to? I'm talking about
replicas that stay in ACTIVE state, but their nodes aren't under
/live_nodes.

Shai

On Wed, Mar 25, 2015 at 8:05 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Comments inline:
>
> On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera  wrote:
>
> > Hi
> >
> > Is it possible for a replica to be DOWN, while the node it resides on is
> > under /live_nodes? If so, what can lead to it, aside from someone
> unloading
> > a core.
> >
>
> Yes, aside from someone unloading the index, this can happen in two ways 1)
> during startup each core publishes it's state as 'down' before it enters
> recovery, and 2) the leader force-publishes a replica as 'down' if it is
> not able to forward updates to that replica (this mechanism is called
> Leader-Initiated-Recovery or LIR in short)
>
> The #2 above can happen when the replica is partitioned from leader but
> both are able to talk to ZooKeeper.
>
>
> >
> > I don't know if each SolrCore reports status to ZK independently, or it's
> > done by the Solr process as a whole.
> >
> >
> It is done on a per-core basis for now. But the 'live' node is maintained
> one per Solr instance (JVM).
>
>
> > Also, is it possible for a replica to report ACTIVE, while the node it
> > lives on is no longer under /live_nodes? Are there any ZK timings that
> can
> > cause that?
> >
>
> Yes, this can happen if the JVM crashed. A replica publishes itself as
> 'down' on shutdown so if the graceful shutdown step is skipped then the
> replica will continue to be 'active' in the cluster state. Even LIR doesn't
> apply here because there's no point in the leader marking a node as 'down'
> if it is not 'live' already.
>
>
> >
> > Shai
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Replica and node states

2015-03-25 Thread Shalin Shekhar Mangar
Comments inline:

On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera  wrote:

> Hi
>
> Is it possible for a replica to be DOWN, while the node it resides on is
> under /live_nodes? If so, what can lead to it, aside from someone unloading
> a core.
>

Yes, aside from someone unloading the index, this can happen in two ways 1)
during startup each core publishes it's state as 'down' before it enters
recovery, and 2) the leader force-publishes a replica as 'down' if it is
not able to forward updates to that replica (this mechanism is called
Leader-Initiated-Recovery or LIR in short)

The #2 above can happen when the replica is partitioned from leader but
both are able to talk to ZooKeeper.


>
> I don't know if each SolrCore reports status to ZK independently, or it's
> done by the Solr process as a whole.
>
>
It is done on a per-core basis for now. But the 'live' node is maintained
one per Solr instance (JVM).


> Also, is it possible for a replica to report ACTIVE, while the node it
> lives on is no longer under /live_nodes? Are there any ZK timings that can
> cause that?
>

Yes, this can happen if the JVM crashed. A replica publishes itself as
'down' on shutdown so if the graceful shutdown step is skipped then the
replica will continue to be 'active' in the cluster state. Even LIR doesn't
apply here because there's no point in the leader marking a node as 'down'
if it is not 'live' already.


>
> Shai
>



-- 
Regards,
Shalin Shekhar Mangar.


Replica and node states

2015-03-25 Thread Shai Erera
Hi

Is it possible for a replica to be DOWN, while the node it resides on is
under /live_nodes? If so, what can lead to it, aside from someone unloading
a core.

I don't know if each SolrCore reports status to ZK independently, or it's
done by the Solr process as a whole.

Also, is it possible for a replica to report ACTIVE, while the node it
lives on is no longer under /live_nodes? Are there any ZK timings that can
cause that?

Shai