Re: Clone (or Restore) Solrcloud

2014-02-03 Thread Shalin Shekhar Mangar
Hi David,

The parent metadata persists only until the sub-shards become active.
Actually the logic to make the sub-shards active depends on knowing
when all 'sibling' sub-shards' replicas have recovered successfully.
We store the parent to make that easier to look up. Once all replicas
of all sub-shards have recovered, the shard states are updated. The
'updateshardstate' command also removes the 'parent' key from the
sub-shards while switching them to 'active'.

If you're seeing the 'parent' key on a 'active' sub-shard then it may
be a bug. Please paste your clusterstate and I'll look into why it was
left over.

On Mon, Feb 3, 2014 at 10:19 AM, David Smiley (@MITRE.org)
dsmi...@mitre.org wrote:
 I think I figured this out; I hope people find this useful..

 It may not be possible to declare what the hash ranges are when you create
 the collection, but you *can* do so when you split via the 'ranges'
 parameter, which is a comma-delimited list. So this means you can create a
 new collection with one shard and then immediately split it to the desired
 ranges to line up with that of your backup.  I also observed that if you
 create a collection and then split every shard (in 2), it will result in an
 equivalent collection to one that was created with twice as many shards to
 begin with.  I hoped that was so and verified the ranges end up being the
 same both ways.

 The only thing that seems like it may be benign but not 100% certain is that
 if you split a shard, the new shards have a 'parent' reference to the name
 of the shard it was split from.  And even if you delete that parent shard
 (since it's not needed anymore; it becomes inactive).  I'm not sure why this
 metadata is recorded because, at least after the split, I can't see why it's
 pertinent to anything.

 ~ David


 David Smiley (@MITRE.org) wrote
 Hi,

 I'm attempting to come up with a SolrCloud restore / clone process for
 either recover to a known good state or to clone the environment for
 experimentation.  At the moment my process involves either creating a new
 zookeeper environment or at least deleting the existing Collection so that
 I can create a new one.  This works; I use the Core API; the first command
 defines the collection parameters, and I invoke it once for each replica.
 I don't use the Collection API because I want SolrCloud to go off trying
 to create all the replicas -- I know where each one is pre-positioned.

 What I'm concerned about is what happens once I start wanting to use Shard
 splitting, *especially* if I don't want to split all shards because shards
 are uneven due to custom routing (e.g. id:customer!myid).  In this case
 I don't know how to create the collection with the hash ranges post-shard
 split.  Solr doesn't have an API for me to explicitly say what the hash
 ranges should be on each shard (to match up with a backup).  And I'm
 concerned about undocumented pitfalls that may exist in manually
 constructing a clusterstate.json, as another approach.

 Any ideas?

 ~ David





 -
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Clone-or-Restore-Solrcloud-tp4114773p4114983.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Clone (or Restore) Solrcloud

2014-02-02 Thread David Smiley (@MITRE.org)
I think I figured this out; I hope people find this useful..  

It may not be possible to declare what the hash ranges are when you create
the collection, but you *can* do so when you split via the 'ranges'
parameter, which is a comma-delimited list. So this means you can create a
new collection with one shard and then immediately split it to the desired
ranges to line up with that of your backup.  I also observed that if you
create a collection and then split every shard (in 2), it will result in an
equivalent collection to one that was created with twice as many shards to
begin with.  I hoped that was so and verified the ranges end up being the
same both ways.

The only thing that seems like it may be benign but not 100% certain is that
if you split a shard, the new shards have a 'parent' reference to the name
of the shard it was split from.  And even if you delete that parent shard
(since it's not needed anymore; it becomes inactive).  I'm not sure why this
metadata is recorded because, at least after the split, I can't see why it's
pertinent to anything.

~ David


David Smiley (@MITRE.org) wrote
 Hi,
 
 I'm attempting to come up with a SolrCloud restore / clone process for
 either recover to a known good state or to clone the environment for
 experimentation.  At the moment my process involves either creating a new
 zookeeper environment or at least deleting the existing Collection so that
 I can create a new one.  This works; I use the Core API; the first command
 defines the collection parameters, and I invoke it once for each replica. 
 I don't use the Collection API because I want SolrCloud to go off trying
 to create all the replicas -- I know where each one is pre-positioned.
 
 What I'm concerned about is what happens once I start wanting to use Shard
 splitting, *especially* if I don't want to split all shards because shards
 are uneven due to custom routing (e.g. id:customer!myid).  In this case
 I don't know how to create the collection with the hash ranges post-shard
 split.  Solr doesn't have an API for me to explicitly say what the hash
 ranges should be on each shard (to match up with a backup).  And I'm
 concerned about undocumented pitfalls that may exist in manually
 constructing a clusterstate.json, as another approach.
 
 Any ideas?
 
 ~ David





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clone-or-Restore-Solrcloud-tp4114773p4114983.html
Sent from the Solr - User mailing list archive at Nabble.com.