Re: Setting up new SolrCloud - need some guidance
On Jan 10, 2013, at 12:06 PM, Shawn Heisey s...@elyograg.org wrote: On 1/9/2013 8:54 PM, Mark Miller wrote: I'd put everything into one. You can upload different named sets of config files and point collections either to the same sets or different sets. You can really think about it the same way you would setting up a single node with multiple cores. The main difference is that it's easier to share sets of config files across collections if you want to. You don't need to at all though. I'm not sure if xinclude works with zk, but I don't think it does. Thank you for your assistance. I'll work on recombining my solrconfig.xml. Are there any available full examples of how to set up and start both zookeeper and Solr? I'll be using the included Jetty 8. I'm not sure - there are a few blog posts out there. The wiki does a decent job for Solr but doesn't get in ZooKeeper - the ZooKeeper site has a pretty simple setup guide though. Specific questions that have come to mind: If I'm planning multiple collections with their own configs, do I still need to bootstrap zookeeper when I start Solr, or should I start it up with the zkHost parameter and then use the collection admin to upload information? I have not looked closely at the collection admin yet, I just know that it exists. Currently, there are two main options. Either use the bootstrap param on first startup or use the zkcli cmd line tool to upload config sets and link them to collections. I have heard that if a replica node is down long enough that transaction logs are not enough to fully fix that node, SolrCloud will initiate a full replication. Is that the case? If so, is it necessary to configure the replication handler with a specific path for the name, or does SolrCloud handle that itself? The replication handler should be defined as you see it in the default example solrconfig.xml file. Very bare bones. Is there an option on updateLog that controls how many transactions are kept, or is that managed automatically by SolrCloud? I have read some things that talk about 100 updates. I expect updates on this to be extremely frequent and small, so 100 updates isn't much, and I may want to increase that. No option - 100 is it as it has implications on the recovery strategy if it's raised. I'd like to see it configurable in the future, but would require make some other knobs change as well if I remember right. Is it expected with future versions of Solr that I could upgrade one of my nodes to 4.2 or 4.3 and have it work with the other node still at 4.1? I would also hope that would mean that the last 4.x release would work with 5.0. That would make it possible to do rolling upgrades with no downtime. I don't think we have committed to anything here yet. Seems like something we need to hash out, but we have not wanted to be too limited initially. For example, the Solr 4.0 to 4.1 upgrade with SolrCloud still needs some explanation and might require some down time. - Mark Thanks, Shawn
RE: Setting up new SolrCloud - need some guidance
FYI: XInclude works fine. We have all request handlers in solrconfig in separate files and include them via XInclude on a running SolrCloud cluster. -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 11-Jan-2013 17:13 To: solr-user@lucene.apache.org Subject: Re: Setting up new SolrCloud - need some guidance On Jan 10, 2013, at 12:06 PM, Shawn Heisey s...@elyograg.org wrote: On 1/9/2013 8:54 PM, Mark Miller wrote: I'd put everything into one. You can upload different named sets of config files and point collections either to the same sets or different sets. You can really think about it the same way you would setting up a single node with multiple cores. The main difference is that it's easier to share sets of config files across collections if you want to. You don't need to at all though. I'm not sure if xinclude works with zk, but I don't think it does. Thank you for your assistance. I'll work on recombining my solrconfig.xml. Are there any available full examples of how to set up and start both zookeeper and Solr? I'll be using the included Jetty 8. I'm not sure - there are a few blog posts out there. The wiki does a decent job for Solr but doesn't get in ZooKeeper - the ZooKeeper site has a pretty simple setup guide though. Specific questions that have come to mind: If I'm planning multiple collections with their own configs, do I still need to bootstrap zookeeper when I start Solr, or should I start it up with the zkHost parameter and then use the collection admin to upload information? I have not looked closely at the collection admin yet, I just know that it exists. Currently, there are two main options. Either use the bootstrap param on first startup or use the zkcli cmd line tool to upload config sets and link them to collections. I have heard that if a replica node is down long enough that transaction logs are not enough to fully fix that node, SolrCloud will initiate a full replication. Is that the case? If so, is it necessary to configure the replication handler with a specific path for the name, or does SolrCloud handle that itself? The replication handler should be defined as you see it in the default example solrconfig.xml file. Very bare bones. Is there an option on updateLog that controls how many transactions are kept, or is that managed automatically by SolrCloud? I have read some things that talk about 100 updates. I expect updates on this to be extremely frequent and small, so 100 updates isn't much, and I may want to increase that. No option - 100 is it as it has implications on the recovery strategy if it's raised. I'd like to see it configurable in the future, but would require make some other knobs change as well if I remember right. Is it expected with future versions of Solr that I could upgrade one of my nodes to 4.2 or 4.3 and have it work with the other node still at 4.1? I would also hope that would mean that the last 4.x release would work with 5.0. That would make it possible to do rolling upgrades with no downtime. I don't think we have committed to anything here yet. Seems like something we need to hash out, but we have not wanted to be too limited initially. For example, the Solr 4.0 to 4.1 upgrade with SolrCloud still needs some explanation and might require some down time. - Mark Thanks, Shawn
Re: Setting up new SolrCloud - need some guidance
On 1/11/2013 9:15 AM, Markus Jelsma wrote: FYI: XInclude works fine. We have all request handlers in solrconfig in separate files and include them via XInclude on a running SolrCloud cluster. Good to know. I'm still deciding whether I want to recombine or continue to use xinclude. Is the xinclude path relative to solrconfig.xml just as it is now, so I could link to include/indexConfig.xml? Are things partitioned well enough that one collection's config will not overlap into another config when using xinclude and relative paths? The way I do things now, all files in cores/corename/conf (relative to solr.home) are symlinks, such as solrconfig.xml - ../../../config/X/solrconfig.xml, where X is a general designation for a type of config. I have good separation between instanceDir, data, and real config files. The paths in the xinclude elements are relative to the location of the symlink. Thanks, Shawn
Re: Setting up new SolrCloud - need some guidance
On 1/9/2013 8:54 PM, Mark Miller wrote: I'd put everything into one. You can upload different named sets of config files and point collections either to the same sets or different sets. You can really think about it the same way you would setting up a single node with multiple cores. The main difference is that it's easier to share sets of config files across collections if you want to. You don't need to at all though. I'm not sure if xinclude works with zk, but I don't think it does. Thank you for your assistance. I'll work on recombining my solrconfig.xml. Are there any available full examples of how to set up and start both zookeeper and Solr? I'll be using the included Jetty 8. Specific questions that have come to mind: If I'm planning multiple collections with their own configs, do I still need to bootstrap zookeeper when I start Solr, or should I start it up with the zkHost parameter and then use the collection admin to upload information? I have not looked closely at the collection admin yet, I just know that it exists. I have heard that if a replica node is down long enough that transaction logs are not enough to fully fix that node, SolrCloud will initiate a full replication. Is that the case? If so, is it necessary to configure the replication handler with a specific path for the name, or does SolrCloud handle that itself? Is there an option on updateLog that controls how many transactions are kept, or is that managed automatically by SolrCloud? I have read some things that talk about 100 updates. I expect updates on this to be extremely frequent and small, so 100 updates isn't much, and I may want to increase that. Is it expected with future versions of Solr that I could upgrade one of my nodes to 4.2 or 4.3 and have it work with the other node still at 4.1? I would also hope that would mean that the last 4.x release would work with 5.0. That would make it possible to do rolling upgrades with no downtime. Thanks, Shawn
Re: Setting up new SolrCloud - need some guidance
I'd put everything into one. You can upload different named sets of config files and point collections either to the same sets or different sets. You can really think about it the same way you would setting up a single node with multiple cores. The main difference is that it's easier to share sets of config files across collections if you want to. You don't need to at all though. I'm not sure if xinclude works with zk, but I don't think it does. - Mark On Jan 9, 2013, at 10:31 PM, Shawn Heisey s...@elyograg.org wrote: I have a lot of experience with Solr, starting with 1.4.0 and currently running 3.5.0 in production. I am working on a 4.1 upgrade, but I have not touched SolrCloud at all. I now need to set up a brand new Solr deployment to replace a custom Lucene system, and due to the way the client works, SolrCloud is going to be the only reasonable way to have redundancy. I am planning to have two Solr servers (each also running standalone zookeeper) plus a third low-end machine that will complete the zookeeper ensemble. I'm planning to set it up with numShards=1, replica 2. It will need to support several different collections. Although it's possible that those collections will all use the same schema and config at first, it's likely that they will diverge before too long. What would be the best practice for setting up zookeeper for this? Would I use multiple zk chroots, or put everything into one? I've been trying to figure this out on my own, without much luck. Can anyone share some known good ZK/SolrCloud configs? What gotchas am I likely to run into? The existing config that I've come up with for this system heavily uses xinclude in solrconfig.xml. Is it possible to use xinclude when the config files are in zookeeper, or will I have to re-combine it? Thanks, Shawn