Re: Functionality of legacyCloud=false
Right, it seems like the DELETEREPLICA could handle this case, I know there have been some hardening done there lately but don't know if it'd cover this case. Or maybe I'm thinking of deleting collections... On Thu, Mar 12, 2015 at 10:26 AM, Varun Thacker varunthacker1...@gmail.com wrote: bq. how is copying a core dir from one node to another a normal use case ? That was just for testing what happens. Okay here is a real world scenario - I create a collection. The collection fails to create since it had a bad config. The empty folders for the replicas gets left behind. Now I fix the config and issue a create again. The replicas get created but on different nodes on my cluster. In the future if I bounce the nodes which had the left over folders, they end up interfering with the healthy replicas for that collection. So apart from checking coreNodeName we should also check against baseUrl and make sure they are the same when legacyCloud=false. I will create a Jira for it. On Thu, Mar 12, 2015 at 9:52 PM, Noble Paul noble.p...@gmail.com wrote: bq.Or they're testing out restoring backups This is in the context of ZK as truth functionality. I guess , in that case you expect those nodes to work exactly as the other replica On Thu, Mar 12, 2015 at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: bq: how is copying a core dir from one node to another a normal use case ? A user is trying to move a replica from one place to another. While I agree they should use ADDREPLICA for the new one then DELTERPLICA on the old replica.. Or they're testing out restoring backups. I've had clients do both of these things. On Thu, Mar 12, 2015 at 7:00 AM, Noble Paul noble.p...@gmail.com wrote: how is copying a core dir from one node to another a normal use case ? On Mar 12, 2015 7:22 PM, Varun Thacker varunthacker1...@gmail.com wrote: Hi Noble, Well I was just playing around to see if there were scenarios where different coreNodeNames could register themselves even if they weren't creating using the Collections API. So I was doing it intentionally here to see what happens. But I can totally imagine users running into the second scenario where an old node comes back up and ends up messing up that replica in the collection accidentally. On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com wrote: It is totally possible. The point is , it was not a security feature and it is extremely easy to spoof it. The question is , was it a normal scenario or was it an effort to prove that the system was not foolproof --Noble On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker varunthacker1...@gmail.com wrote: Two scenarios I observed where we can bring up a replica even when I think it shouldn't. legacyCloud is set to false. I have two nodes A and B. CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. manually copy test_shard1_replica1 folder to node B's solr home. Bring down node A. Restart node B. The shard comes up registering itself on node B and becomes 'active' I have two nodes A and B ( this is down currently ). CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. manually copy test_shard1_replica1 folder to node B's solr home. Start node B. The shard comes up registering itself on node B and stays 'down'. The reason being the leader is still node A but clusterstate has base_url of Node B. This is the error in the logs - Error getting leader from zk for shard shard1 In legacyCloud=false you get a 'no_such_replica in clusterstate' error if the 'coreNodeName' is not present in clusterstate. But in my two observations the 'coreNodeName' were the same, hence I ran into that scenario. Should we make the check more stringent to not allow this to happen? Check against base_url also? Also should we be making legacyCloud=false as default in 5.x? -- Regards, Varun Thacker http://www.vthacker.in/ -- - Noble Paul -- Regards, Varun Thacker http://www.vthacker.in/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Noble Paul -- Regards, Varun Thacker http://www.vthacker.in/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Functionality of legacyCloud=false
how is copying a core dir from one node to another a normal use case ? On Mar 12, 2015 7:22 PM, Varun Thacker varunthacker1...@gmail.com wrote: Hi Noble, Well I was just playing around to see if there were scenarios where different coreNodeNames could register themselves even if they weren't creating using the Collections API. So I was doing it intentionally here to see what happens. But I can totally imagine users running into the second scenario where an old node comes back up and ends up messing up that replica in the collection accidentally. On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com wrote: It is totally possible. The point is , it was not a security feature and it is extremely easy to spoof it. The question is , was it a normal scenario or was it an effort to prove that the system was not foolproof --Noble On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker varunthacker1...@gmail.com wrote: Two scenarios I observed where we can bring up a replica even when I think it shouldn't. legacyCloud is set to false. - I have two nodes A and B. - CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. - manually copy test_shard1_replica1 folder to node B's solr home. - Bring down node A. - Restart node B. The shard comes up registering itself on node B and becomes 'active' - I have two nodes A and B ( this is down currently ). - CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. - manually copy test_shard1_replica1 folder to node B's solr home. - Start node B. The shard comes up registering itself on node B and stays 'down'. The reason being the leader is still node A but clusterstate has base_url of Node B. This is the error in the logs - Error getting leader from zk for shard shard1 In legacyCloud=false you get a 'no_such_replica in clusterstate' error if the 'coreNodeName' is not present in clusterstate. But in my two observations the 'coreNodeName' were the same, hence I ran into that scenario. Should we make the check more stringent to not allow this to happen? Check against base_url also? Also should we be making legacyCloud=false as default in 5.x? -- Regards, Varun Thacker http://www.vthacker.in/ -- - Noble Paul -- Regards, Varun Thacker http://www.vthacker.in/
Re: Functionality of legacyCloud=false
It is totally possible. The point is , it was not a security feature and it is extremely easy to spoof it. The question is , was it a normal scenario or was it an effort to prove that the system was not foolproof --Noble On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker varunthacker1...@gmail.com wrote: Two scenarios I observed where we can bring up a replica even when I think it shouldn't. legacyCloud is set to false. - I have two nodes A and B. - CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. - manually copy test_shard1_replica1 folder to node B's solr home. - Bring down node A. - Restart node B. The shard comes up registering itself on node B and becomes 'active' - I have two nodes A and B ( this is down currently ). - CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. - manually copy test_shard1_replica1 folder to node B's solr home. - Start node B. The shard comes up registering itself on node B and stays 'down'. The reason being the leader is still node A but clusterstate has base_url of Node B. This is the error in the logs - Error getting leader from zk for shard shard1 In legacyCloud=false you get a 'no_such_replica in clusterstate' error if the 'coreNodeName' is not present in clusterstate. But in my two observations the 'coreNodeName' were the same, hence I ran into that scenario. Should we make the check more stringent to not allow this to happen? Check against base_url also? Also should we be making legacyCloud=false as default in 5.x? -- Regards, Varun Thacker http://www.vthacker.in/ -- - Noble Paul
Re: Functionality of legacyCloud=false
Hi Noble, Well I was just playing around to see if there were scenarios where different coreNodeNames could register themselves even if they weren't creating using the Collections API. So I was doing it intentionally here to see what happens. But I can totally imagine users running into the second scenario where an old node comes back up and ends up messing up that replica in the collection accidentally. On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com wrote: It is totally possible. The point is , it was not a security feature and it is extremely easy to spoof it. The question is , was it a normal scenario or was it an effort to prove that the system was not foolproof --Noble On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker varunthacker1...@gmail.com wrote: Two scenarios I observed where we can bring up a replica even when I think it shouldn't. legacyCloud is set to false. - I have two nodes A and B. - CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. - manually copy test_shard1_replica1 folder to node B's solr home. - Bring down node A. - Restart node B. The shard comes up registering itself on node B and becomes 'active' - I have two nodes A and B ( this is down currently ). - CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. - manually copy test_shard1_replica1 folder to node B's solr home. - Start node B. The shard comes up registering itself on node B and stays 'down'. The reason being the leader is still node A but clusterstate has base_url of Node B. This is the error in the logs - Error getting leader from zk for shard shard1 In legacyCloud=false you get a 'no_such_replica in clusterstate' error if the 'coreNodeName' is not present in clusterstate. But in my two observations the 'coreNodeName' were the same, hence I ran into that scenario. Should we make the check more stringent to not allow this to happen? Check against base_url also? Also should we be making legacyCloud=false as default in 5.x? -- Regards, Varun Thacker http://www.vthacker.in/ -- - Noble Paul -- Regards, Varun Thacker http://www.vthacker.in/
Re: Functionality of legacyCloud=false
bq.Or they're testing out restoring backups This is in the context of ZK as truth functionality. I guess , in that case you expect those nodes to work exactly as the other replica On Thu, Mar 12, 2015 at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: bq: how is copying a core dir from one node to another a normal use case ? A user is trying to move a replica from one place to another. While I agree they should use ADDREPLICA for the new one then DELTERPLICA on the old replica.. Or they're testing out restoring backups. I've had clients do both of these things. On Thu, Mar 12, 2015 at 7:00 AM, Noble Paul noble.p...@gmail.com wrote: how is copying a core dir from one node to another a normal use case ? On Mar 12, 2015 7:22 PM, Varun Thacker varunthacker1...@gmail.com wrote: Hi Noble, Well I was just playing around to see if there were scenarios where different coreNodeNames could register themselves even if they weren't creating using the Collections API. So I was doing it intentionally here to see what happens. But I can totally imagine users running into the second scenario where an old node comes back up and ends up messing up that replica in the collection accidentally. On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com wrote: It is totally possible. The point is , it was not a security feature and it is extremely easy to spoof it. The question is , was it a normal scenario or was it an effort to prove that the system was not foolproof --Noble On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker varunthacker1...@gmail.com wrote: Two scenarios I observed where we can bring up a replica even when I think it shouldn't. legacyCloud is set to false. I have two nodes A and B. CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. manually copy test_shard1_replica1 folder to node B's solr home. Bring down node A. Restart node B. The shard comes up registering itself on node B and becomes 'active' I have two nodes A and B ( this is down currently ). CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. manually copy test_shard1_replica1 folder to node B's solr home. Start node B. The shard comes up registering itself on node B and stays 'down'. The reason being the leader is still node A but clusterstate has base_url of Node B. This is the error in the logs - Error getting leader from zk for shard shard1 In legacyCloud=false you get a 'no_such_replica in clusterstate' error if the 'coreNodeName' is not present in clusterstate. But in my two observations the 'coreNodeName' were the same, hence I ran into that scenario. Should we make the check more stringent to not allow this to happen? Check against base_url also? Also should we be making legacyCloud=false as default in 5.x? -- Regards, Varun Thacker http://www.vthacker.in/ -- - Noble Paul -- Regards, Varun Thacker http://www.vthacker.in/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Noble Paul
Re: Functionality of legacyCloud=false
bq: how is copying a core dir from one node to another a normal use case ? A user is trying to move a replica from one place to another. While I agree they should use ADDREPLICA for the new one then DELTERPLICA on the old replica.. Or they're testing out restoring backups. I've had clients do both of these things. On Thu, Mar 12, 2015 at 7:00 AM, Noble Paul noble.p...@gmail.com wrote: how is copying a core dir from one node to another a normal use case ? On Mar 12, 2015 7:22 PM, Varun Thacker varunthacker1...@gmail.com wrote: Hi Noble, Well I was just playing around to see if there were scenarios where different coreNodeNames could register themselves even if they weren't creating using the Collections API. So I was doing it intentionally here to see what happens. But I can totally imagine users running into the second scenario where an old node comes back up and ends up messing up that replica in the collection accidentally. On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com wrote: It is totally possible. The point is , it was not a security feature and it is extremely easy to spoof it. The question is , was it a normal scenario or was it an effort to prove that the system was not foolproof --Noble On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker varunthacker1...@gmail.com wrote: Two scenarios I observed where we can bring up a replica even when I think it shouldn't. legacyCloud is set to false. I have two nodes A and B. CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. manually copy test_shard1_replica1 folder to node B's solr home. Bring down node A. Restart node B. The shard comes up registering itself on node B and becomes 'active' I have two nodes A and B ( this is down currently ). CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. manually copy test_shard1_replica1 folder to node B's solr home. Start node B. The shard comes up registering itself on node B and stays 'down'. The reason being the leader is still node A but clusterstate has base_url of Node B. This is the error in the logs - Error getting leader from zk for shard shard1 In legacyCloud=false you get a 'no_such_replica in clusterstate' error if the 'coreNodeName' is not present in clusterstate. But in my two observations the 'coreNodeName' were the same, hence I ran into that scenario. Should we make the check more stringent to not allow this to happen? Check against base_url also? Also should we be making legacyCloud=false as default in 5.x? -- Regards, Varun Thacker http://www.vthacker.in/ -- - Noble Paul -- Regards, Varun Thacker http://www.vthacker.in/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Functionality of legacyCloud=false
bq. how is copying a core dir from one node to another a normal use case ? That was just for testing what happens. Okay here is a real world scenario - - I create a collection. - The collection fails to create since it had a bad config. The empty folders for the replicas gets left behind. - Now I fix the config and issue a create again. The replicas get created but on different nodes on my cluster. - In the future if I bounce the nodes which had the left over folders, they end up interfering with the healthy replicas for that collection. So apart from checking coreNodeName we should also check against baseUrl and make sure they are the same when legacyCloud=false. I will create a Jira for it. On Thu, Mar 12, 2015 at 9:52 PM, Noble Paul noble.p...@gmail.com wrote: bq.Or they're testing out restoring backups This is in the context of ZK as truth functionality. I guess , in that case you expect those nodes to work exactly as the other replica On Thu, Mar 12, 2015 at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: bq: how is copying a core dir from one node to another a normal use case ? A user is trying to move a replica from one place to another. While I agree they should use ADDREPLICA for the new one then DELTERPLICA on the old replica.. Or they're testing out restoring backups. I've had clients do both of these things. On Thu, Mar 12, 2015 at 7:00 AM, Noble Paul noble.p...@gmail.com wrote: how is copying a core dir from one node to another a normal use case ? On Mar 12, 2015 7:22 PM, Varun Thacker varunthacker1...@gmail.com wrote: Hi Noble, Well I was just playing around to see if there were scenarios where different coreNodeNames could register themselves even if they weren't creating using the Collections API. So I was doing it intentionally here to see what happens. But I can totally imagine users running into the second scenario where an old node comes back up and ends up messing up that replica in the collection accidentally. On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com wrote: It is totally possible. The point is , it was not a security feature and it is extremely easy to spoof it. The question is , was it a normal scenario or was it an effort to prove that the system was not foolproof --Noble On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker varunthacker1...@gmail.com wrote: Two scenarios I observed where we can bring up a replica even when I think it shouldn't. legacyCloud is set to false. I have two nodes A and B. CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. manually copy test_shard1_replica1 folder to node B's solr home. Bring down node A. Restart node B. The shard comes up registering itself on node B and becomes 'active' I have two nodes A and B ( this is down currently ). CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. manually copy test_shard1_replica1 folder to node B's solr home. Start node B. The shard comes up registering itself on node B and stays 'down'. The reason being the leader is still node A but clusterstate has base_url of Node B. This is the error in the logs - Error getting leader from zk for shard shard1 In legacyCloud=false you get a 'no_such_replica in clusterstate' error if the 'coreNodeName' is not present in clusterstate. But in my two observations the 'coreNodeName' were the same, hence I ran into that scenario. Should we make the check more stringent to not allow this to happen? Check against base_url also? Also should we be making legacyCloud=false as default in 5.x? -- Regards, Varun Thacker http://www.vthacker.in/ -- - Noble Paul -- Regards, Varun Thacker http://www.vthacker.in/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Noble Paul -- Regards, Varun Thacker http://www.vthacker.in/
Functionality of legacyCloud=false
Two scenarios I observed where we can bring up a replica even when I think it shouldn't. legacyCloud is set to false. - I have two nodes A and B. - CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. - manually copy test_shard1_replica1 folder to node B's solr home. - Bring down node A. - Restart node B. The shard comes up registering itself on node B and becomes 'active' - I have two nodes A and B ( this is down currently ). - CREATE collection 'test' with 1 shard, 1 replica. It gets created on node A. - manually copy test_shard1_replica1 folder to node B's solr home. - Start node B. The shard comes up registering itself on node B and stays 'down'. The reason being the leader is still node A but clusterstate has base_url of Node B. This is the error in the logs - Error getting leader from zk for shard shard1 In legacyCloud=false you get a 'no_such_replica in clusterstate' error if the 'coreNodeName' is not present in clusterstate. But in my two observations the 'coreNodeName' were the same, hence I ran into that scenario. Should we make the check more stringent to not allow this to happen? Check against base_url also? Also should we be making legacyCloud=false as default in 5.x? -- Regards, Varun Thacker http://www.vthacker.in/