Re: SolrCloud "master mode" planned?

Upayavira Wed, 26 Apr 2017 14:17:37 -0700

On Wed, 26 Apr 2017, at 10:06 PM, David Smiley wrote:
> 
>> On Apr 26, 2017, at 4:35 PM, Upayavira <u...@odoko.co.uk> wrote:
>> 
>> I have done a *lot* of automating this. Redoing it recently it was
>> quite embarrassing to realise how much complexity there is involved
>> in it - it is crazy hard to get a basic, production ready SolrCloud
>> setup running.> 
> Would you mind enumerating a list of what sort of issues you ran into
> deploying ZooKeeper in a production config?  A quick draft list of
> sorts just to get a sense of what sort of stuff generally you had to
> contend with.  I recently did it in a Docker/Kontena infrastructure.
> I did not find it to be hard; maybe medium :-).  I got the nodes
> working out of the box with minimal effort but had to make changes to
> harden it.> * I found the existing official Docker image for Zookeeper 
> lacking in
>   that I couldn't easily specify the "auto purge" settings, which
>   default to no purging which is unacceptable.> * I set 
> "-XX:+CrashOnOutOfMemoryError" so that the process would end
>   when an OOM occurs so that Kontena (Docker orchestrator) would
>   notice its down so it could restart it (a rare event obviously).
>   Users not using a container environment might not care about this I
>   guess.  This was merely a configuration setting; no Docker image
>   hack needed.> * I also ensured I used the latest ZK 3.4.6 release.... I 
> recall 3.4.4
>   (or maybe even 3.4.5?) cached DNS entries without re-looking up if
>   it failed which is particularly problematic in a container
>   environment where it's common for services to get a new IP when they
>   are restarted.  Thankfully I did not learn that issue the hard way;
>   I recall a blog warning of this issue by Shalin or Martijn Koster.
>   No action from me here other than ensuring I used an appropriate new
>   version.  Originally out of laziness I used Confluent's Docker image
>   but I knew I would have to switch because of this issue.
I used it for a test case for an app I built when learning more about
deployments. It is all on github at
http://github.com/odoko-devops/uberstack. There's an example there in
examples/apache-solr. I gave up on that effort because (a) I was making
it more complex than it needed to be and (b) I couldn't compete with the
big guns in the devops industry.
What I needed was to have three ZooKeeper nodes start up and
autodiscover each other. That, I handled with Exhibitor (from Netflix).
They provide a (not-production ready!!) Docker image that, whilst it
takes a few minutes, does result in ZK nodes that are working as an
ensemble, when none of them know about each other to start. It requires
an S3 bucket to provide the co-ordination.
The real lesson was "don't fail, retry". I built a wrapper around Solr
so that if ZK wasn't available (and in the correct ensemble) Solr
wouldn't start yet. It just kept retrying. Similarly, In
https://github.com/odoko-devops/solr-utils there's a tool that, when
containerised, will allow you to create a chroot (must be done after ZK
but before Solr), to upload configs (must be done after Solr, but
before collection creation), create a collection (only once configs
present), etc.
It made use of Rancher to provide an overlay network with DNS used for
service discovery - the ZK nodes were accessible to Solr via the
hostname "zookeeper", so Solr didn't have to do anything fancy in order
to find them.
The outcome of this was a reliable one-click install of three ZK nodes,
three Solr cloud nodes, and indexed content and a webapp showing the
content. Was pretty cool.
Or should I say, the outcome was cool, the process to get there
was painful.
Happy to share more of this if it is useful.

Upayavira

>> One thing that is hard is getting a ZooKeeper ensemble going - using
>> Exhibitor makes it much easier.>> 
>> Something that has often occurred to me is, why do we require people
>> to go download a separate ZooKeeper, and work out how to install and
>> configure it, when we have it embedded already? Why can't we just
>> have a 'bin/solr zk start' command which starts an "embedded"
>> zookeeper, but without Solr. To really make it neat, we offer some
>> way (a la Exhibitor) for multiple concurrently started ZK nodes to
>> autodiscover each other, then getting our three ZK nodes up won't be
>> quite so treacherous.> 
> I've often thought the same -- why not just embed it.  People say it's
> not a "production config" but this is only because we all keep telling
> us this is in an echo chamber and we believe ourselves :-P> 
> ~ David
> 
>> 
>> On Wed, 26 Apr 2017, at 03:58 PM, Mike Drob wrote:
>>> Could the zk role also be guaranteed to run the Overseer (and no
>>> collections)? If we already have that separated out, it would make
>>> sense to put it with the embedded zk. I think you can already
>>> configure and place things manually this way, but it would be a huge
>>> win to package it all up nicely for users and set it to turnkey
>>> operation.>>> 
>>> I think it was a great improvement for deployment when we dropped
>>> tomcat, this is the next logical step.>>> 
>>> Mike
>>> 
>>> On Wed, Apr 26, 2017, 4:22 AM Jan Høydahl <jan....@cominvent.com>
>>> wrote:>>>> There have been suggestions to add a “node controller” process
>>>> which again could start Solr and perhaps ZK on a node.>>>> 
>>>> But adding a new “zk” role which would let that node start
>>>> (embedded) ZK I cannot recall. It would of course make a deploy
>>>> simpler if ZK was hidden as a solr role/feature and perhaps
>>>> assigned to N nodes, moved if needed etc. If I’m not mistaken ZK
>>>> 3.5 would make such more dynamic setups easier but is currently in
>>>> beta.>>>> 
>>>> Also, in these days of containers, I kind of like the concept of
>>>> spinning up N ZK containers that the Solr containers connect to and
>>>> let Kubernetes or whatever you use take care of placement, versions
>>>> etc. So perhaps the need for a production-ready solr-managed zk is
>>>> not as big as it used to be, or maybe even undesirable? For
>>>> production Windows installs I could still clearly see a need
>>>> though.>>>> 
>>>> --
>>>> Jan Høydahl, search solution architect
>>>> Cominvent AS - www.cominvent.com[1]
>>>> 
>>>>> 25. apr. 2017 kl. 23.30 skrev Ishan Chattopadhyaya
>>>>>     <ichattopadhy...@gmail.com>:>>>>> 
>>>>> Hi Otis,
>>>>> I've been working on, and shall be working on, a few issues on the
>>>>> lines of "hide ZK".>>>>> 
>>>>> SOLR-6736: Uploading configsets can now be done through Solr nodes
>>>>> instead of uploading them to ZK.>>>>> SOLR-10272: Use a _default 
>>>>> configset, with the intention of not
>>>>> needing the user to bother about the concept of configsets unless
>>>>> he needs to>>>>> SOLR-10446 (SOLR-9057): User can use CloudSolrClient 
>>>>> without
>>>>> access to ZK>>>>> SOLR-8440: Enabling BasicAuth security through bin/solr 
>>>>> script
>>>>> Ability to edit security.json through the bin/solr script
>>>>> Having all this in place, and perhaps some more that I may be
>>>>> missing, should hopefully not need the user to know much about ZK.>>>>> 
>>>>> 1. Do you have suggestions on what more needs to be done for
>>>>>    "hiding ZK"?>>>>> 2. Do you have suggestions on how to track this 
>>>>> overall theme of
>>>>>    "hiding ZK"? Some of these issues I mentioned are associated
>>>>>    with other epics, so I don't know if creating a "hiding ZK"
>>>>>    epic and having these (and other issues) as sub-tasks is a good
>>>>>    idea (maybe it is). Alternatively, how about tracking these
>>>>>    (and other issues) using some label?>>>>> Regards,
>>>>> Ishan
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Apr 26, 2017 at 2:39 AM, Otis Gospodnetić
>>>>> <otis.gospodne...@gmail.com> wrote:>>>>>> Hi,
>>>>>> 
>>>>>> This thread about Solr master-slave vs. SolrCloud deployment poll
>>>>>> seems to point out people find SolrCloud (the ZK part of it)
>>>>>> deployment complex:>>>>>> 
>>>>>> http://search-lucene.com/m/Solr/eHNlfm4WpJPVR92?subj=Re+Poll+Master+Slave+or+SolrCloud+>>>>>>
>>>>>>  
>>>>>> It could be just how information is presented...
>>>>>> ... or how ZK is exposed as something external, which it is...
>>>>>> 
>>>>>> Are there plans to "hide ZK"?  Or maybe have the notion of master-
>>>>>> only (not as in master-slave, but as in running ZK only, not
>>>>>> hosting data) mode for SolrCloud nodes (a la ES)?>>>>>> 
>>>>>> I peeked at JIRA, but couldn't find anything about that, although
>>>>>> I seem to recall some mention of embedding ZK to make things
>>>>>> easier for SolrCloud users.  I think I saw that at some Lucene
>>>>>> Revolution talk?>>>>>> 
>>>>>> Thanks,
>>>>>> Otis
>>>>>> --
>>>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>>>> Solr & Elasticsearch Consulting Support Training -
>>>>>> http://sematext.com/>>>>>> 
>> 


Links:

  1. http://www.cominvent.com/
Re: SolrCloud "master mode" planned?

Reply via email to