Re: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Mark Miller

On Jan 10, 2013, at 12:06 PM, Shawn Heisey s...@elyograg.org wrote:

 On 1/9/2013 8:54 PM, Mark Miller wrote:
 I'd put everything into one. You can upload different named sets of config 
 files and point collections either to the same sets or different sets.
 
 You can really think about it the same way you would setting up a single 
 node with multiple cores. The main difference is that it's easier to share 
 sets of config files across collections if you want to. You don't need to at 
 all though.
 
 I'm not sure if xinclude works with zk, but I don't think it does.
 
 Thank you for your assistance.  I'll work on recombining my solrconfig.xml.  
 Are there any available full examples of how to set up and start both 
 zookeeper and Solr?  I'll be using the included Jetty 8.

I'm not sure - there are a few blog posts out there. The wiki does a decent job 
for Solr but doesn't get in ZooKeeper - the ZooKeeper site has a pretty simple 
setup guide though.

 
 Specific questions that have come to mind:
 
 If I'm planning multiple collections with their own configs, do I still need 
 to bootstrap zookeeper when I start Solr, or should I start it up with the 
 zkHost parameter and then use the collection admin to upload information?  I 
 have not looked closely at the collection admin yet, I just know that it 
 exists.

Currently, there are two main options. Either use the bootstrap param on first 
startup or use the zkcli cmd line tool to upload config sets and link them to 
collections.

 
 I have heard that if a replica node is down long enough that transaction logs 
 are not enough to fully fix that node, SolrCloud will initiate a full 
 replication.  Is that the case?  If so, is it necessary to configure the 
 replication handler with a specific path for the name, or does SolrCloud 
 handle that itself?

The replication handler should be defined as you see it in the default example 
solrconfig.xml file. Very bare bones.

 
 Is there an option on updateLog that controls how many transactions are kept, 
 or is that managed automatically by SolrCloud?  I have read some things that 
 talk about 100 updates.  I expect updates on this to be extremely frequent 
 and small, so 100 updates isn't much, and I may want to increase that.

No option - 100 is it as it has implications on the recovery strategy if it's 
raised. I'd like to see it configurable in the future, but would require make 
some other knobs change as well if I remember right.

 
 Is it expected with future versions of Solr that I could upgrade one of my 
 nodes to 4.2 or 4.3 and have it work with the other node still at 4.1?  I 
 would also hope that would mean that the last 4.x release would work with 
 5.0.  That would make it possible to do rolling upgrades with no downtime.

I don't think we have committed to anything here yet. Seems like something we 
need to hash out, but we have not wanted to be too limited initially. For 
example, the Solr 4.0 to 4.1 upgrade with SolrCloud still needs some 
explanation and might require some down time.

- Mark

 
 Thanks,
 Shawn
 



RE: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Markus Jelsma
FYI: XInclude works fine. We have all request handlers in solrconfig in 
separate files and include them via XInclude on a running SolrCloud cluster. 
 
-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Fri 11-Jan-2013 17:13
 To: solr-user@lucene.apache.org
 Subject: Re: Setting up new SolrCloud - need some guidance
 
 
 On Jan 10, 2013, at 12:06 PM, Shawn Heisey s...@elyograg.org wrote:
 
  On 1/9/2013 8:54 PM, Mark Miller wrote:
  I'd put everything into one. You can upload different named sets of config 
  files and point collections either to the same sets or different sets.
  
  You can really think about it the same way you would setting up a single 
  node with multiple cores. The main difference is that it's easier to share 
  sets of config files across collections if you want to. You don't need to 
  at all though.
  
  I'm not sure if xinclude works with zk, but I don't think it does.
  
  Thank you for your assistance.  I'll work on recombining my solrconfig.xml. 
   Are there any available full examples of how to set up and start both 
  zookeeper and Solr?  I'll be using the included Jetty 8.
 
 I'm not sure - there are a few blog posts out there. The wiki does a decent 
 job for Solr but doesn't get in ZooKeeper - the ZooKeeper site has a pretty 
 simple setup guide though.
 
  
  Specific questions that have come to mind:
  
  If I'm planning multiple collections with their own configs, do I still 
  need to bootstrap zookeeper when I start Solr, or should I start it up with 
  the zkHost parameter and then use the collection admin to upload 
  information?  I have not looked closely at the collection admin yet, I just 
  know that it exists.
 
 Currently, there are two main options. Either use the bootstrap param on 
 first startup or use the zkcli cmd line tool to upload config sets and link 
 them to collections.
 
  
  I have heard that if a replica node is down long enough that transaction 
  logs are not enough to fully fix that node, SolrCloud will initiate a full 
  replication.  Is that the case?  If so, is it necessary to configure the 
  replication handler with a specific path for the name, or does SolrCloud 
  handle that itself?
 
 The replication handler should be defined as you see it in the default 
 example solrconfig.xml file. Very bare bones.
 
  
  Is there an option on updateLog that controls how many transactions are 
  kept, or is that managed automatically by SolrCloud?  I have read some 
  things that talk about 100 updates.  I expect updates on this to be 
  extremely frequent and small, so 100 updates isn't much, and I may want to 
  increase that.
 
 No option - 100 is it as it has implications on the recovery strategy if it's 
 raised. I'd like to see it configurable in the future, but would require make 
 some other knobs change as well if I remember right.
 
  
  Is it expected with future versions of Solr that I could upgrade one of my 
  nodes to 4.2 or 4.3 and have it work with the other node still at 4.1?  I 
  would also hope that would mean that the last 4.x release would work with 
  5.0.  That would make it possible to do rolling upgrades with no downtime.
 
 I don't think we have committed to anything here yet. Seems like something we 
 need to hash out, but we have not wanted to be too limited initially. For 
 example, the Solr 4.0 to 4.1 upgrade with SolrCloud still needs some 
 explanation and might require some down time.
 
 - Mark
 
  
  Thanks,
  Shawn
  
 
 


Re: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Shawn Heisey

On 1/11/2013 9:15 AM, Markus Jelsma wrote:

FYI: XInclude works fine. We have all request handlers in solrconfig in 
separate files and include them via XInclude on a running SolrCloud cluster.


Good to know.  I'm still deciding whether I want to recombine or 
continue to use xinclude.  Is the xinclude path relative to 
solrconfig.xml just as it is now, so I could link to 
include/indexConfig.xml?  Are things partitioned well enough that one 
collection's config will not overlap into another config when using 
xinclude and relative paths?


The way I do things now, all files in cores/corename/conf (relative to 
solr.home) are symlinks, such as solrconfig.xml - 
../../../config/X/solrconfig.xml, where X is a general 
designation for a type of config.  I have good separation between 
instanceDir, data, and real config files.  The paths in the xinclude 
elements are relative to the location of the symlink.


Thanks,
Shawn



Re: Setting up new SolrCloud - need some guidance

2013-01-10 Thread Shawn Heisey

On 1/9/2013 8:54 PM, Mark Miller wrote:

I'd put everything into one. You can upload different named sets of config 
files and point collections either to the same sets or different sets.

You can really think about it the same way you would setting up a single node 
with multiple cores. The main difference is that it's easier to share sets of 
config files across collections if you want to. You don't need to at all though.

I'm not sure if xinclude works with zk, but I don't think it does.


Thank you for your assistance.  I'll work on recombining my 
solrconfig.xml.  Are there any available full examples of how to set up 
and start both zookeeper and Solr?  I'll be using the included Jetty 8.


Specific questions that have come to mind:

If I'm planning multiple collections with their own configs, do I still 
need to bootstrap zookeeper when I start Solr, or should I start it up 
with the zkHost parameter and then use the collection admin to upload 
information?  I have not looked closely at the collection admin yet, I 
just know that it exists.


I have heard that if a replica node is down long enough that transaction 
logs are not enough to fully fix that node, SolrCloud will initiate a 
full replication.  Is that the case?  If so, is it necessary to 
configure the replication handler with a specific path for the name, or 
does SolrCloud handle that itself?


Is there an option on updateLog that controls how many transactions are 
kept, or is that managed automatically by SolrCloud?  I have read some 
things that talk about 100 updates.  I expect updates on this to be 
extremely frequent and small, so 100 updates isn't much, and I may want 
to increase that.


Is it expected with future versions of Solr that I could upgrade one of 
my nodes to 4.2 or 4.3 and have it work with the other node still at 
4.1?  I would also hope that would mean that the last 4.x release would 
work with 5.0.  That would make it possible to do rolling upgrades with 
no downtime.


Thanks,
Shawn



Re: Setting up new SolrCloud - need some guidance

2013-01-09 Thread Mark Miller
I'd put everything into one. You can upload different named sets of config 
files and point collections either to the same sets or different sets.

You can really think about it the same way you would setting up a single node 
with multiple cores. The main difference is that it's easier to share sets of 
config files across collections if you want to. You don't need to at all though.

I'm not sure if xinclude works with zk, but I don't think it does.

- Mark

On Jan 9, 2013, at 10:31 PM, Shawn Heisey s...@elyograg.org wrote:

 I have a lot of experience with Solr, starting with 1.4.0 and currently 
 running 3.5.0 in production.  I am working on a 4.1 upgrade, but I have not 
 touched SolrCloud at all.
 
 I now need to set up a brand new Solr deployment to replace a custom Lucene 
 system, and due to the way the client works, SolrCloud is going to be the 
 only reasonable way to have redundancy.  I am planning to have two Solr 
 servers (each also running standalone zookeeper) plus a third low-end machine 
 that will complete the zookeeper ensemble.  I'm planning to set it up with 
 numShards=1, replica 2.
 
 It will need to support several different collections.  Although it's 
 possible that those collections will all use the same schema and config at 
 first, it's likely that they will diverge before too long.
 
 What would be the best practice for setting up zookeeper for this? Would I 
 use multiple zk chroots, or put everything into one?  I've been trying to 
 figure this out on my own, without much luck.  Can anyone share some known 
 good ZK/SolrCloud configs?
 
 What gotchas am I likely to run into?  The existing config that I've come up 
 with for this system heavily uses xinclude in solrconfig.xml. Is it possible 
 to use xinclude when the config files are in zookeeper, or will I have to 
 re-combine it?
 
 Thanks,
 Shawn