Re: When does Solr write in Zookeeper ?

2019-11-18 Thread Dominique Bejean
Thanh you Shawn


Le lun. 18 nov. 2019 à 19:28, Shawn Heisey  a écrit :

> On 11/18/2019 8:39 AM, Dominique Bejean wrote:
> > How Solr nodes know that something was changed in Zookeeper by an other
> > node ? Is there any notification from ZK or do Solr nodes read
> > systematically in ZK (without local caching) ?
>
> This is built-in functionality of ZooKeeper.  The client allows setting
> what's called watches, which trigger when the watched node changes.
>
>
> https://zookeeper.apache.org/doc/r3.5.6/zookeeperProgrammers.html#sc_zkDataMode_watches
>
> This functionality is used extensively in SolrCloud.
>
> Thanks,
> Shawn
>


Re: When does Solr write in Zookeeper ?

2019-11-18 Thread Shawn Heisey

On 11/18/2019 8:39 AM, Dominique Bejean wrote:

How Solr nodes know that something was changed in Zookeeper by an other
node ? Is there any notification from ZK or do Solr nodes read
systematically in ZK (without local caching) ?


This is built-in functionality of ZooKeeper.  The client allows setting 
what's called watches, which trigger when the watched node changes.


https://zookeeper.apache.org/doc/r3.5.6/zookeeperProgrammers.html#sc_zkDataMode_watches

This functionality is used extensively in SolrCloud.

Thanks,
Shawn


Re: When does Solr write in Zookeeper ?

2019-11-18 Thread Dominique Bejean
How Solr nodes know that something was changed in Zookeeper by an other
node ? Is there any notification from ZK or do Solr nodes read
systematically in ZK (without local caching) ?

Dominique



Le ven. 15 nov. 2019 à 18:36, Erick Erickson  a
écrit :

> Dominique:
>
> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
> actively involved in queries/updates/whatever. Basically, what ZK is
> responsible for is maintaining collection-wide resources, i.e. the current
> state of all the replicas, config files, etc., your “global configuration"
> and "collection configuration”, which should change very rarely thus rarely
> generate writes.
>
> The “collection state” (including your “nodes state”) information changes
> more frequently and generates more writes as nodes come up and down, go
> into recovery, etc. That said, for a cluster where all the replicas are
> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>
> So the consequence is that when you power up a cluster, there will be a
> flurry of write operations managed by the Overseer, but after all the
> replicas are up, write activity should pretty much cease.
>
> As long as the state is steady, i.e. no replicas changing state, each
> individual Solr node has a copy of the relevant collection’s “state.json”
> znode and has all the information it needs to query or index without asking
> Zookeeper without _either_ reading or writing to ZK.
>
> One rather obscure cause for ZK writes is when using “schemaless” mode.
> When a new field is detected, the schema (and thus the collection’s
> configuration) is changed, which generates writes..
>
> Best,
> Erick
>
>
> > On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
> >
> > Hi,
> >
> > I would like to be certain to understand how Solr use Zookeeper and more
> > precisely when Solr write into Zookeeper.
> >
> > Solr stores various informations in ZK
> >
> >   - globale configuration (autoscaling, security.json)
> >   - collection configuration (configs)
> >   - collections state (state.json, leaders, ...)
> >   - nodes state (live_nodes, overseer)
> >
> >
> > Writes in Zk occur when
> >
> >   - a zookeeper member start or stop
> >   - a solr node start or stop
> >   - a configuration is loaded
> >   - a collection is created, deleted or updated (nearly all call to
> >   collection, core or config API)
> >
> >
> > Write do not occur during
> >
> >   - SolrJ client creation
> >   - indexing data (Solrj, HTTP, DIH, ...)
> >   - searching (Solrj, HTTP)
> >
> >
> > In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> > calls to  collection, core or config API are done, so there is nearly no
> > writes to ZK.
> >
> > Is it correct ?
> >
> >
> > Regards
> >
> > Dominique
>
>


Re: When does Solr write in Zookeeper ?

2019-11-16 Thread Vincenzo D'Amore
Hi Dominique,

in my experience, with Solr 4.8.1, this configuration it’s related to the 
garbage collection. When a “stop the world” endures more than 15 seconds the 
Solr nodes disconnects from Zookeeper, the node replicas go down and sometimes, 
I don’t know exactly why, you need to restart the node to have the replica 
back. As said this is my own personal experience, and it’s related to an old 
version of Solr which runs with Java 8 (CMS) a collection with 8/10 millions of 
documents and 4/5 millions of updates per day. 

I think that the size of the collection and the number of updates play an 
import role in this scenario. I mean in terms of memory fragmentation. 

With the newer version of Solr I don’t know if this happens again even because 
I have worked always with smaller size, so I never had this kind of troubles. 

Ciao,
Vincenzo

--


> On 15 Nov 2019, at 18:49, Dominique Bejean  wrote:
> 
> Thank you Erick for this fast answer
> Why is it a best practice to set the zookeeper  connection timeout to 3
> instead the default 15000 value?
> 
> Regards
> 
> Dominique
> 
>> Le ven. 15 nov. 2019 à 18:36, Erick Erickson  a
>> écrit :
>> 
>> Dominique:
>> 
>> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
>> actively involved in queries/updates/whatever. Basically, what ZK is
>> responsible for is maintaining collection-wide resources, i.e. the current
>> state of all the replicas, config files, etc., your “global configuration"
>> and "collection configuration”, which should change very rarely thus rarely
>> generate writes.
>> 
>> The “collection state” (including your “nodes state”) information changes
>> more frequently and generates more writes as nodes come up and down, go
>> into recovery, etc. That said, for a cluster where all the replicas are
>> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>> 
>> So the consequence is that when you power up a cluster, there will be a
>> flurry of write operations managed by the Overseer, but after all the
>> replicas are up, write activity should pretty much cease.
>> 
>> As long as the state is steady, i.e. no replicas changing state, each
>> individual Solr node has a copy of the relevant collection’s “state.json”
>> znode and has all the information it needs to query or index without asking
>> Zookeeper without _either_ reading or writing to ZK.
>> 
>> One rather obscure cause for ZK writes is when using “schemaless” mode.
>> When a new field is detected, the schema (and thus the collection’s
>> configuration) is changed, which generates writes..
>> 
>> Best,
>> Erick
>> 
>> 
>>> On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
>> dominique.bej...@eolya.fr> wrote:
>>> 
>>> Hi,
>>> 
>>> I would like to be certain to understand how Solr use Zookeeper and more
>>> precisely when Solr write into Zookeeper.
>>> 
>>> Solr stores various informations in ZK
>>> 
>>>  - globale configuration (autoscaling, security.json)
>>>  - collection configuration (configs)
>>>  - collections state (state.json, leaders, ...)
>>>  - nodes state (live_nodes, overseer)
>>> 
>>> 
>>> Writes in Zk occur when
>>> 
>>>  - a zookeeper member start or stop
>>>  - a solr node start or stop
>>>  - a configuration is loaded
>>>  - a collection is created, deleted or updated (nearly all call to
>>>  collection, core or config API)
>>> 
>>> 
>>> Write do not occur during
>>> 
>>>  - SolrJ client creation
>>>  - indexing data (Solrj, HTTP, DIH, ...)
>>>  - searching (Solrj, HTTP)
>>> 
>>> 
>>> In conclusion, if Solr nodes are stable (no failure, no maintenance), no
>>> calls to  collection, core or config API are done, so there is nearly no
>>> writes to ZK.
>>> 
>>> Is it correct ?
>>> 
>>> 
>>> Regards
>>> 
>>> Dominique
>> 
>> 


Re: When does Solr write in Zookeeper ?

2019-11-15 Thread Dominique Bejean
Thank you Erick for this fast answer
Why is it a best practice to set the zookeeper  connection timeout to 3
instead the default 15000 value?

Regards

Dominique

Le ven. 15 nov. 2019 à 18:36, Erick Erickson  a
écrit :

> Dominique:
>
> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
> actively involved in queries/updates/whatever. Basically, what ZK is
> responsible for is maintaining collection-wide resources, i.e. the current
> state of all the replicas, config files, etc., your “global configuration"
> and "collection configuration”, which should change very rarely thus rarely
> generate writes.
>
> The “collection state” (including your “nodes state”) information changes
> more frequently and generates more writes as nodes come up and down, go
> into recovery, etc. That said, for a cluster where all the replicas are
> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>
> So the consequence is that when you power up a cluster, there will be a
> flurry of write operations managed by the Overseer, but after all the
> replicas are up, write activity should pretty much cease.
>
> As long as the state is steady, i.e. no replicas changing state, each
> individual Solr node has a copy of the relevant collection’s “state.json”
> znode and has all the information it needs to query or index without asking
> Zookeeper without _either_ reading or writing to ZK.
>
> One rather obscure cause for ZK writes is when using “schemaless” mode.
> When a new field is detected, the schema (and thus the collection’s
> configuration) is changed, which generates writes..
>
> Best,
> Erick
>
>
> > On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
> >
> > Hi,
> >
> > I would like to be certain to understand how Solr use Zookeeper and more
> > precisely when Solr write into Zookeeper.
> >
> > Solr stores various informations in ZK
> >
> >   - globale configuration (autoscaling, security.json)
> >   - collection configuration (configs)
> >   - collections state (state.json, leaders, ...)
> >   - nodes state (live_nodes, overseer)
> >
> >
> > Writes in Zk occur when
> >
> >   - a zookeeper member start or stop
> >   - a solr node start or stop
> >   - a configuration is loaded
> >   - a collection is created, deleted or updated (nearly all call to
> >   collection, core or config API)
> >
> >
> > Write do not occur during
> >
> >   - SolrJ client creation
> >   - indexing data (Solrj, HTTP, DIH, ...)
> >   - searching (Solrj, HTTP)
> >
> >
> > In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> > calls to  collection, core or config API are done, so there is nearly no
> > writes to ZK.
> >
> > Is it correct ?
> >
> >
> > Regards
> >
> > Dominique
>
>


Re: When does Solr write in Zookeeper ?

2019-11-15 Thread Erick Erickson
Dominique:

In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is 
actively involved in queries/updates/whatever. Basically, what ZK is 
responsible for is maintaining collection-wide resources, i.e. the current 
state of all the replicas, config files, etc., your “global configuration" and 
"collection configuration”, which should change very rarely thus rarely 
generate writes.

The “collection state” (including your “nodes state”) information changes more 
frequently and generates more writes as nodes come up and down, go into 
recovery, etc. That said, for a cluster where all the replicas are “active” and 
don’t go away or go into recovery etc, ZK won’t do any writes.

So the consequence is that when you power up a cluster, there will be a flurry 
of write operations managed by the Overseer, but after all the replicas are up, 
write activity should pretty much cease.

As long as the state is steady, i.e. no replicas changing state, each 
individual Solr node has a copy of the relevant collection’s “state.json” znode 
and has all the information it needs to query or index without asking Zookeeper 
without _either_ reading or writing to ZK.

One rather obscure cause for ZK writes is when using “schemaless” mode. When a 
new field is detected, the schema (and thus the collection’s configuration) is 
changed, which generates writes..

Best,
Erick


> On Nov 15, 2019, at 12:06 PM, Dominique Bejean  
> wrote:
> 
> Hi,
> 
> I would like to be certain to understand how Solr use Zookeeper and more
> precisely when Solr write into Zookeeper.
> 
> Solr stores various informations in ZK
> 
>   - globale configuration (autoscaling, security.json)
>   - collection configuration (configs)
>   - collections state (state.json, leaders, ...)
>   - nodes state (live_nodes, overseer)
> 
> 
> Writes in Zk occur when
> 
>   - a zookeeper member start or stop
>   - a solr node start or stop
>   - a configuration is loaded
>   - a collection is created, deleted or updated (nearly all call to
>   collection, core or config API)
> 
> 
> Write do not occur during
> 
>   - SolrJ client creation
>   - indexing data (Solrj, HTTP, DIH, ...)
>   - searching (Solrj, HTTP)
> 
> 
> In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> calls to  collection, core or config API are done, so there is nearly no
> writes to ZK.
> 
> Is it correct ?
> 
> 
> Regards
> 
> Dominique



When does Solr write in Zookeeper ?

2019-11-15 Thread Dominique Bejean
Hi,

I would like to be certain to understand how Solr use Zookeeper and more
precisely when Solr write into Zookeeper.

Solr stores various informations in ZK

   - globale configuration (autoscaling, security.json)
   - collection configuration (configs)
   - collections state (state.json, leaders, ...)
   - nodes state (live_nodes, overseer)


Writes in Zk occur when

   - a zookeeper member start or stop
   - a solr node start or stop
   - a configuration is loaded
   - a collection is created, deleted or updated (nearly all call to
   collection, core or config API)


Write do not occur during

   - SolrJ client creation
   - indexing data (Solrj, HTTP, DIH, ...)
   - searching (Solrj, HTTP)


In conclusion, if Solr nodes are stable (no failure, no maintenance), no
calls to  collection, core or config API are done, so there is nearly no
writes to ZK.

Is it correct ?


Regards

Dominique