Re: Functionality of legacyCloud=false

2015-03-14 Thread Erick Erickson
Right, it seems like the DELETEREPLICA could handle this case, I know
there have been some hardening done there lately but don't know if
it'd cover this case. Or maybe I'm thinking of deleting collections...

On Thu, Mar 12, 2015 at 10:26 AM, Varun Thacker
varunthacker1...@gmail.com wrote:
 bq. how is copying a core dir from one node to another a normal use case ?

 That was just for testing what happens.

 Okay here is a real world scenario -

 I create a collection.
 The collection fails to create since it had a bad config. The empty folders
 for the replicas gets left behind.
 Now I fix the config and issue a create again. The replicas get created but
 on different nodes on my cluster.
 In the future if I bounce the nodes which had the left over folders, they
 end up interfering with the healthy replicas for that collection.

 So apart from checking coreNodeName we should also check against baseUrl and
 make sure they are the same when legacyCloud=false. I will create a Jira for
 it.

 On Thu, Mar 12, 2015 at 9:52 PM, Noble Paul noble.p...@gmail.com wrote:

 bq.Or they're testing out restoring backups

 This is in the context of ZK as truth functionality. I guess , in that
 case you expect those nodes to work exactly as the other replica

 On Thu, Mar 12, 2015 at 8:36 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 bq: how is copying a core dir from one node to another a normal use case
 ?

 A user is trying to move a replica from one place to another. While I
 agree they should use ADDREPLICA for the new one then DELTERPLICA on
 the old replica..

 Or they're testing out restoring backups.

 I've had clients do both of these things.

 On Thu, Mar 12, 2015 at 7:00 AM, Noble Paul noble.p...@gmail.com wrote:
  how is copying a core dir from one node to another a normal use case ?
 
  On Mar 12, 2015 7:22 PM, Varun Thacker varunthacker1...@gmail.com
  wrote:
 
  Hi Noble,
 
  Well I was just playing around to see if there were scenarios where
  different coreNodeNames could register themselves even if they weren't
  creating using the Collections API.
 
  So I was doing it intentionally here to see what happens. But I can
  totally imagine users running into the second scenario where an old
  node
  comes back up and ends up messing up that replica in the collection
  accidentally.
 
  On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com
  wrote:
 
  It is totally possible.
  The point is , it was not a security feature and it is extremely easy
  to
  spoof it.
  The question is , was it a normal scenario or was it an effort to
  prove
  that the system was not foolproof
 
  --Noble
 
  On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker
  varunthacker1...@gmail.com wrote:
 
  Two scenarios I observed where we can bring up a replica even when I
  think it shouldn't. legacyCloud is set to false.
 
  I have two nodes A and B.
  CREATE collection 'test' with 1 shard, 1 replica. It gets created on
  node A.
  manually copy test_shard1_replica1 folder to node B's solr home.
  Bring down node A.
  Restart node B. The shard comes up registering itself on node B and
  becomes 'active'
 
  I have two nodes A and B ( this is down currently ).
  CREATE collection 'test' with 1 shard, 1 replica. It gets created on
  node A.
  manually copy test_shard1_replica1 folder to node B's solr home.
  Start node B. The shard comes up registering itself on node B and
  stays
  'down'. The reason being the leader is still node A but clusterstate
  has
  base_url of Node B. This is the error in the logs - Error getting
  leader
  from zk for shard shard1
 
  In legacyCloud=false you get a 'no_such_replica in clusterstate'
  error
  if the 'coreNodeName' is not present in clusterstate.
 
  But in my two observations the 'coreNodeName' were the same, hence I
  ran
  into that scenario.
 
  Should we make the check more stringent to not allow this to happen?
  Check against base_url also?
 
  Also should we be making legacyCloud=false as default in 5.x?
  --
 
 
  Regards,
  Varun Thacker
  http://www.vthacker.in/
 
 
 
 
  --
  -
  Noble Paul
 
 
 
 
  --
 
 
  Regards,
  Varun Thacker
  http://www.vthacker.in/

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 -
 Noble Paul




 --


 Regards,
 Varun Thacker
 http://www.vthacker.in/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Functionality of legacyCloud=false

2015-03-12 Thread Noble Paul
how is copying a core dir from one node to another a normal use case ?
On Mar 12, 2015 7:22 PM, Varun Thacker varunthacker1...@gmail.com wrote:

 Hi Noble,

 Well I was just playing around to see if there were scenarios where
 different coreNodeNames could register themselves even if they weren't
 creating using the Collections API.

 So I was doing it intentionally here to see what happens. But I can
 totally imagine users running into the second scenario where an old node
 comes back up and ends up messing up that replica in the collection
 accidentally.

 On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com wrote:

 It is totally possible.
 The point is , it was not a security feature and it is extremely easy to
 spoof it.
 The question is , was it a normal scenario or was it an effort to prove
 that the system was not foolproof

 --Noble

 On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker 
 varunthacker1...@gmail.com wrote:

 Two scenarios I observed where we can bring up a replica even when I
 think it shouldn't. legacyCloud is set to false.

- I have two nodes A and B.
- CREATE collection 'test' with 1 shard, 1 replica. It gets created
on node A.
- manually copy test_shard1_replica1 folder to node B's solr home.
- Bring down node A.
- Restart node B. The shard comes up registering itself on node B
and becomes 'active'


- I have two nodes A and B ( this is down currently ).
- CREATE collection 'test' with 1 shard, 1 replica. It gets created
on node A.
- manually copy test_shard1_replica1 folder to node B's solr home.
- Start node B. The shard comes up registering itself on node B and
stays 'down'. The reason being the leader is still node A but 
 clusterstate
has base_url of Node B. This is the error in the logs - Error getting
leader from zk for shard shard1

 In legacyCloud=false you get a 'no_such_replica in clusterstate' error
 if the 'coreNodeName' is not present in clusterstate.

 But in my two observations the 'coreNodeName' were the same, hence I ran
 into that scenario.

 Should we make the check more stringent to not allow this to happen?
 Check against base_url also?

 Also should we be making legacyCloud=false as default in 5.x?
 --


 Regards,
 Varun Thacker
 http://www.vthacker.in/




 --
 -
 Noble Paul




 --


 Regards,
 Varun Thacker
 http://www.vthacker.in/



Re: Functionality of legacyCloud=false

2015-03-12 Thread Noble Paul
It is totally possible.
The point is , it was not a security feature and it is extremely easy to
spoof it.
The question is , was it a normal scenario or was it an effort to prove
that the system was not foolproof

--Noble

On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker varunthacker1...@gmail.com
wrote:

 Two scenarios I observed where we can bring up a replica even when I think
 it shouldn't. legacyCloud is set to false.

- I have two nodes A and B.
- CREATE collection 'test' with 1 shard, 1 replica. It gets created on
node A.
- manually copy test_shard1_replica1 folder to node B's solr home.
- Bring down node A.
- Restart node B. The shard comes up registering itself on node B and
becomes 'active'


- I have two nodes A and B ( this is down currently ).
- CREATE collection 'test' with 1 shard, 1 replica. It gets created on
node A.
- manually copy test_shard1_replica1 folder to node B's solr home.
- Start node B. The shard comes up registering itself on node B and
stays 'down'. The reason being the leader is still node A but clusterstate
has base_url of Node B. This is the error in the logs - Error getting
leader from zk for shard shard1

 In legacyCloud=false you get a 'no_such_replica in clusterstate' error if
 the 'coreNodeName' is not present in clusterstate.

 But in my two observations the 'coreNodeName' were the same, hence I ran
 into that scenario.

 Should we make the check more stringent to not allow this to happen? Check
 against base_url also?

 Also should we be making legacyCloud=false as default in 5.x?
 --


 Regards,
 Varun Thacker
 http://www.vthacker.in/




-- 
-
Noble Paul


Re: Functionality of legacyCloud=false

2015-03-12 Thread Varun Thacker
Hi Noble,

Well I was just playing around to see if there were scenarios where
different coreNodeNames could register themselves even if they weren't
creating using the Collections API.

So I was doing it intentionally here to see what happens. But I can totally
imagine users running into the second scenario where an old node comes back
up and ends up messing up that replica in the collection accidentally.

On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com wrote:

 It is totally possible.
 The point is , it was not a security feature and it is extremely easy to
 spoof it.
 The question is , was it a normal scenario or was it an effort to prove
 that the system was not foolproof

 --Noble

 On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker varunthacker1...@gmail.com
  wrote:

 Two scenarios I observed where we can bring up a replica even when I
 think it shouldn't. legacyCloud is set to false.

- I have two nodes A and B.
- CREATE collection 'test' with 1 shard, 1 replica. It gets created
on node A.
- manually copy test_shard1_replica1 folder to node B's solr home.
- Bring down node A.
- Restart node B. The shard comes up registering itself on node B and
becomes 'active'


- I have two nodes A and B ( this is down currently ).
- CREATE collection 'test' with 1 shard, 1 replica. It gets created
on node A.
- manually copy test_shard1_replica1 folder to node B's solr home.
- Start node B. The shard comes up registering itself on node B and
stays 'down'. The reason being the leader is still node A but clusterstate
has base_url of Node B. This is the error in the logs - Error getting
leader from zk for shard shard1

 In legacyCloud=false you get a 'no_such_replica in clusterstate' error if
 the 'coreNodeName' is not present in clusterstate.

 But in my two observations the 'coreNodeName' were the same, hence I ran
 into that scenario.

 Should we make the check more stringent to not allow this to happen?
 Check against base_url also?

 Also should we be making legacyCloud=false as default in 5.x?
 --


 Regards,
 Varun Thacker
 http://www.vthacker.in/




 --
 -
 Noble Paul




-- 


Regards,
Varun Thacker
http://www.vthacker.in/


Re: Functionality of legacyCloud=false

2015-03-12 Thread Noble Paul
bq.Or they're testing out restoring backups

This is in the context of ZK as truth functionality. I guess , in that case
you expect those nodes to work exactly as the other replica

On Thu, Mar 12, 2015 at 8:36 PM, Erick Erickson erickerick...@gmail.com
wrote:

 bq: how is copying a core dir from one node to another a normal use case ?

 A user is trying to move a replica from one place to another. While I
 agree they should use ADDREPLICA for the new one then DELTERPLICA on
 the old replica..

 Or they're testing out restoring backups.

 I've had clients do both of these things.

 On Thu, Mar 12, 2015 at 7:00 AM, Noble Paul noble.p...@gmail.com wrote:
  how is copying a core dir from one node to another a normal use case ?
 
  On Mar 12, 2015 7:22 PM, Varun Thacker varunthacker1...@gmail.com
 wrote:
 
  Hi Noble,
 
  Well I was just playing around to see if there were scenarios where
  different coreNodeNames could register themselves even if they weren't
  creating using the Collections API.
 
  So I was doing it intentionally here to see what happens. But I can
  totally imagine users running into the second scenario where an old node
  comes back up and ends up messing up that replica in the collection
  accidentally.
 
  On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com
 wrote:
 
  It is totally possible.
  The point is , it was not a security feature and it is extremely easy
 to
  spoof it.
  The question is , was it a normal scenario or was it an effort to prove
  that the system was not foolproof
 
  --Noble
 
  On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker
  varunthacker1...@gmail.com wrote:
 
  Two scenarios I observed where we can bring up a replica even when I
  think it shouldn't. legacyCloud is set to false.
 
  I have two nodes A and B.
  CREATE collection 'test' with 1 shard, 1 replica. It gets created on
  node A.
  manually copy test_shard1_replica1 folder to node B's solr home.
  Bring down node A.
  Restart node B. The shard comes up registering itself on node B and
  becomes 'active'
 
  I have two nodes A and B ( this is down currently ).
  CREATE collection 'test' with 1 shard, 1 replica. It gets created on
  node A.
  manually copy test_shard1_replica1 folder to node B's solr home.
  Start node B. The shard comes up registering itself on node B and
 stays
  'down'. The reason being the leader is still node A but clusterstate
 has
  base_url of Node B. This is the error in the logs - Error getting
 leader
  from zk for shard shard1
 
  In legacyCloud=false you get a 'no_such_replica in clusterstate' error
  if the 'coreNodeName' is not present in clusterstate.
 
  But in my two observations the 'coreNodeName' were the same, hence I
 ran
  into that scenario.
 
  Should we make the check more stringent to not allow this to happen?
  Check against base_url also?
 
  Also should we be making legacyCloud=false as default in 5.x?
  --
 
 
  Regards,
  Varun Thacker
  http://www.vthacker.in/
 
 
 
 
  --
  -
  Noble Paul
 
 
 
 
  --
 
 
  Regards,
  Varun Thacker
  http://www.vthacker.in/

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
-
Noble Paul


Re: Functionality of legacyCloud=false

2015-03-12 Thread Erick Erickson
bq: how is copying a core dir from one node to another a normal use case ?

A user is trying to move a replica from one place to another. While I
agree they should use ADDREPLICA for the new one then DELTERPLICA on
the old replica..

Or they're testing out restoring backups.

I've had clients do both of these things.

On Thu, Mar 12, 2015 at 7:00 AM, Noble Paul noble.p...@gmail.com wrote:
 how is copying a core dir from one node to another a normal use case ?

 On Mar 12, 2015 7:22 PM, Varun Thacker varunthacker1...@gmail.com wrote:

 Hi Noble,

 Well I was just playing around to see if there were scenarios where
 different coreNodeNames could register themselves even if they weren't
 creating using the Collections API.

 So I was doing it intentionally here to see what happens. But I can
 totally imagine users running into the second scenario where an old node
 comes back up and ends up messing up that replica in the collection
 accidentally.

 On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com wrote:

 It is totally possible.
 The point is , it was not a security feature and it is extremely easy to
 spoof it.
 The question is , was it a normal scenario or was it an effort to prove
 that the system was not foolproof

 --Noble

 On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker
 varunthacker1...@gmail.com wrote:

 Two scenarios I observed where we can bring up a replica even when I
 think it shouldn't. legacyCloud is set to false.

 I have two nodes A and B.
 CREATE collection 'test' with 1 shard, 1 replica. It gets created on
 node A.
 manually copy test_shard1_replica1 folder to node B's solr home.
 Bring down node A.
 Restart node B. The shard comes up registering itself on node B and
 becomes 'active'

 I have two nodes A and B ( this is down currently ).
 CREATE collection 'test' with 1 shard, 1 replica. It gets created on
 node A.
 manually copy test_shard1_replica1 folder to node B's solr home.
 Start node B. The shard comes up registering itself on node B and stays
 'down'. The reason being the leader is still node A but clusterstate has
 base_url of Node B. This is the error in the logs - Error getting leader
 from zk for shard shard1

 In legacyCloud=false you get a 'no_such_replica in clusterstate' error
 if the 'coreNodeName' is not present in clusterstate.

 But in my two observations the 'coreNodeName' were the same, hence I ran
 into that scenario.

 Should we make the check more stringent to not allow this to happen?
 Check against base_url also?

 Also should we be making legacyCloud=false as default in 5.x?
 --


 Regards,
 Varun Thacker
 http://www.vthacker.in/




 --
 -
 Noble Paul




 --


 Regards,
 Varun Thacker
 http://www.vthacker.in/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Functionality of legacyCloud=false

2015-03-12 Thread Varun Thacker
bq. how is copying a core dir from one node to another a normal use case ?

That was just for testing what happens.

Okay here is a real world scenario -

   - I create a collection.
   - The collection fails to create since it had a bad config. The empty
   folders for the replicas gets left behind.
   - Now I fix the config and issue a create again. The replicas get
   created but on different nodes on my cluster.
   - In the future if I bounce the nodes which had the left over folders,
   they end up interfering with the healthy replicas for that collection.

So apart from checking coreNodeName we should also check against baseUrl
and make sure they are the same when legacyCloud=false. I will create a
Jira for it.

On Thu, Mar 12, 2015 at 9:52 PM, Noble Paul noble.p...@gmail.com wrote:

 bq.Or they're testing out restoring backups

 This is in the context of ZK as truth functionality. I guess , in that
 case you expect those nodes to work exactly as the other replica

 On Thu, Mar 12, 2015 at 8:36 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 bq: how is copying a core dir from one node to another a normal use case ?

 A user is trying to move a replica from one place to another. While I
 agree they should use ADDREPLICA for the new one then DELTERPLICA on
 the old replica..

 Or they're testing out restoring backups.

 I've had clients do both of these things.

 On Thu, Mar 12, 2015 at 7:00 AM, Noble Paul noble.p...@gmail.com wrote:
  how is copying a core dir from one node to another a normal use case ?
 
  On Mar 12, 2015 7:22 PM, Varun Thacker varunthacker1...@gmail.com
 wrote:
 
  Hi Noble,
 
  Well I was just playing around to see if there were scenarios where
  different coreNodeNames could register themselves even if they weren't
  creating using the Collections API.
 
  So I was doing it intentionally here to see what happens. But I can
  totally imagine users running into the second scenario where an old
 node
  comes back up and ends up messing up that replica in the collection
  accidentally.
 
  On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul noble.p...@gmail.com
 wrote:
 
  It is totally possible.
  The point is , it was not a security feature and it is extremely easy
 to
  spoof it.
  The question is , was it a normal scenario or was it an effort to
 prove
  that the system was not foolproof
 
  --Noble
 
  On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker
  varunthacker1...@gmail.com wrote:
 
  Two scenarios I observed where we can bring up a replica even when I
  think it shouldn't. legacyCloud is set to false.
 
  I have two nodes A and B.
  CREATE collection 'test' with 1 shard, 1 replica. It gets created on
  node A.
  manually copy test_shard1_replica1 folder to node B's solr home.
  Bring down node A.
  Restart node B. The shard comes up registering itself on node B and
  becomes 'active'
 
  I have two nodes A and B ( this is down currently ).
  CREATE collection 'test' with 1 shard, 1 replica. It gets created on
  node A.
  manually copy test_shard1_replica1 folder to node B's solr home.
  Start node B. The shard comes up registering itself on node B and
 stays
  'down'. The reason being the leader is still node A but clusterstate
 has
  base_url of Node B. This is the error in the logs - Error getting
 leader
  from zk for shard shard1
 
  In legacyCloud=false you get a 'no_such_replica in clusterstate'
 error
  if the 'coreNodeName' is not present in clusterstate.
 
  But in my two observations the 'coreNodeName' were the same, hence I
 ran
  into that scenario.
 
  Should we make the check more stringent to not allow this to happen?
  Check against base_url also?
 
  Also should we be making legacyCloud=false as default in 5.x?
  --
 
 
  Regards,
  Varun Thacker
  http://www.vthacker.in/
 
 
 
 
  --
  -
  Noble Paul
 
 
 
 
  --
 
 
  Regards,
  Varun Thacker
  http://www.vthacker.in/

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 -
 Noble Paul




-- 


Regards,
Varun Thacker
http://www.vthacker.in/


Functionality of legacyCloud=false

2015-03-12 Thread Varun Thacker
Two scenarios I observed where we can bring up a replica even when I think
it shouldn't. legacyCloud is set to false.

   - I have two nodes A and B.
   - CREATE collection 'test' with 1 shard, 1 replica. It gets created on
   node A.
   - manually copy test_shard1_replica1 folder to node B's solr home.
   - Bring down node A.
   - Restart node B. The shard comes up registering itself on node B and
   becomes 'active'


   - I have two nodes A and B ( this is down currently ).
   - CREATE collection 'test' with 1 shard, 1 replica. It gets created on
   node A.
   - manually copy test_shard1_replica1 folder to node B's solr home.
   - Start node B. The shard comes up registering itself on node B and
   stays 'down'. The reason being the leader is still node A but clusterstate
   has base_url of Node B. This is the error in the logs - Error getting
   leader from zk for shard shard1

In legacyCloud=false you get a 'no_such_replica in clusterstate' error if
the 'coreNodeName' is not present in clusterstate.

But in my two observations the 'coreNodeName' were the same, hence I ran
into that scenario.

Should we make the check more stringent to not allow this to happen? Check
against base_url also?

Also should we be making legacyCloud=false as default in 5.x?
--


Regards,
Varun Thacker
http://www.vthacker.in/