Re: faster than load-server-state-from-file?

2018-10-09 Thread Baptiste
On Mon, Oct 8, 2018 at 7:57 PM Aleksandar Lazic  wrote:

> Am 08.10.2018 um 19:35 schrieb Willy Tarreau:
> > On Mon, Oct 08, 2018 at 07:27:39PM +0200, Aleksandar Lazic wrote:
> >> Hi Baptiste.
> >>
> >> Am 08.10.2018 um 16:20 schrieb Baptiste:
> >>> Bonjour Messieurs,
> >>>
> >>> (je passe en FR et hors ML et je top-poste!!!).
> >>
> >> Just for my curiosity, why not answering in english?
> >
> > He thought he responded privately and excluded the mailing list from
> > the CC but apparently he was facing an ENOCOFFEE type of error :-)
>
> Oh yes we all know this error code ;-)
>
> > Cheers,
> > Willy
>
> Regards
> Aleks
>

That's it, furthermore, Willy, Pierre and I speaks the same protocol (the
French Language)...
I "switched" to private, cause I'm asking some "internal" information to be
able to reproduce the behavior and to troubleshoot it.

Sorry for the noise, beers, tea or coffee are on me!

Baptiste


Re: faster than load-server-state-from-file?

2018-10-08 Thread Aleksandar Lazic
Am 08.10.2018 um 19:35 schrieb Willy Tarreau:
> On Mon, Oct 08, 2018 at 07:27:39PM +0200, Aleksandar Lazic wrote:
>> Hi Baptiste.
>>
>> Am 08.10.2018 um 16:20 schrieb Baptiste:
>>> Bonjour Messieurs,
>>>
>>> (je passe en FR et hors ML et je top-poste!!!).
>>
>> Just for my curiosity, why not answering in english?
> 
> He thought he responded privately and excluded the mailing list from
> the CC but apparently he was facing an ENOCOFFEE type of error :-)

Oh yes we all know this error code ;-)

> Cheers,
> Willy

Regards
Aleks



Re: faster than load-server-state-from-file?

2018-10-08 Thread Willy Tarreau
On Mon, Oct 08, 2018 at 07:27:39PM +0200, Aleksandar Lazic wrote:
> Hi Baptiste.
> 
> Am 08.10.2018 um 16:20 schrieb Baptiste:
> > Bonjour Messieurs,
> > 
> > (je passe en FR et hors ML et je top-poste!!!).
> 
> Just for my curiosity, why not answering in english?

He thought he responded privately and excluded the mailing list from
the CC but apparently he was facing an ENOCOFFEE type of error :-)

Cheers,
Willy



Re: faster than load-server-state-from-file?

2018-10-08 Thread Aleksandar Lazic
Hi Baptiste.

Am 08.10.2018 um 16:20 schrieb Baptiste:
> Bonjour Messieurs,
> 
> (je passe en FR et hors ML et je top-poste!!!).

Just for my curiosity, why not answering in english?

Best regards
aleks

> Pierre, je suis déjà en contact avec plusieurs autres Pierre de chez Critéo 
> (le
> prénom, c'est un critère de recrutement chez vous???)
> En tant que "dev" et "mainteneur" du server state, je ne suis pas surpris pas 
> la
> lenteur de chargement, par contre l'ampleur de cette lenteur m'étonne 
> beaucoup.
> En fait, c'est un parcours de liste fait à base de strcmp de mémoire, donc si 
> tu
> as beaucoup de backend qui eux-même ont beaucoup de serveurs, c'est en effet 
> pas
> super optimal.
> On avait fait comme ça car on pensait que le serveur state ne serait pas 
> utilisé
> "at scale" comme vous le faites.
> 
> Pierre: Combien as-tu de backend et de serveur (en moyenne) par backend, dans
> une seule et même configuration
> Il n'y a qu'un seul moyen de virer tous ces warnings, c'est de forcer l'id des
> backend et des serveurs dans ta conf (paramêtre 'id').
> (j'ai prévu de passer chez Critéo la semaine prochaine, je te ferais signe, on
> pourra voir ton problème en live).
> 
> Willy: Il me semble que les backends sont déjà stockés dans des ebtree.
> Pourrait-on stocker aussi les serveurs dans des ebtrees pour accélerer la 
> recherche?
> Ou mieux, faire un arbre qui avec en point d'entrée "/" ?
> 
> Baptiste
> 
> 
> 
> 
> On Wed, Oct 3, 2018 at 2:00 PM Pierre Cheynier  > wrote:
> 
> Hi Willy,
> 
> > Not really. Maybe we should see how the state file parser works, because
> > multiple seconds to parse only 30K lines seems extremely long.
> 
> I would even say multiple minutes :)
> 
> > I'm just thinking about a few things. Probably that among these 30K 
> servers,
> > most of them are in fact tracking other ones ? In this case it could 
> make
> > sense to have an option to only dump servers which are not tracking
> > others, as for a reload it can make quite some sense. Is this the case
> > for you ?
> 
> What do you mean by "tracking other ones"?
> 
> What I can tell is that, for historical reasons, we named all server the
> same way for each backends (ie. srvN) in the configuration template, and 
> are
> using "server templates" to add MAINT servers in the pool so that they can
> be added at runtime later.
> 
> This naming thing can be changed now, but I don't know this issue could be
> related or not.
> 
> What we're doing basically when getting a new event:
> * if it requires to delete / update / add server(s) in one or multiple 
> pools
> we only use the runtime API and try to reuse free slots.
> * if a backend/frontend has to be created / updated / deleted OR if the 
> free
> slots for a given backend is full we reload using a configuration 
> template.
> * in Jinja2 this template looks like (simplified):
> 
> backend be_foo
>    
>   {%- for server in servers %}
>    server srv{{loop.index0}} {{server.address}}:{{server.port}} weight
> {{server.weight}}{%- if server.tls %} ssl{%- endif %} check port 8500
>   {%- endfor %}
>    # Create 25 free slots, servers are numbered from N to N+25
>    server-template srv {{ servers|length }}-{{ servers|length + 25 }}
> 0.0.0.0:0  check disabled
> 
> Doing this I noticed that we have a lot of 'bad reconciliations' 
> triggering
> warning logs, such as:
> 
> [WARNING] can't find server 'srv28' with id '29' in backend with id '9' or
> name 'be_test'
> [WARNING] backend name mismatch: from server state file: 'be_foo', from
> running config 'be_bar'
> 
> I don't know if these inconsistencies (that clearly have to be fixed) can
> cause additional delays.
> 
> Thanks,
> 
> Pierre
> 




Re: faster than load-server-state-from-file?

2018-10-08 Thread Baptiste
Bonjour Messieurs,

(je passe en FR et hors ML et je top-poste!!!).

Pierre, je suis déjà en contact avec plusieurs autres Pierre de chez Critéo
(le prénom, c'est un critère de recrutement chez vous???)
En tant que "dev" et "mainteneur" du server state, je ne suis pas surpris
pas la lenteur de chargement, par contre l'ampleur de cette lenteur
m'étonne beaucoup.
En fait, c'est un parcours de liste fait à base de strcmp de mémoire, donc
si tu as beaucoup de backend qui eux-même ont beaucoup de serveurs, c'est
en effet pas super optimal.
On avait fait comme ça car on pensait que le serveur state ne serait pas
utilisé "at scale" comme vous le faites.

Pierre: Combien as-tu de backend et de serveur (en moyenne) par backend,
dans une seule et même configuration
Il n'y a qu'un seul moyen de virer tous ces warnings, c'est de forcer l'id
des backend et des serveurs dans ta conf (paramêtre 'id').
(j'ai prévu de passer chez Critéo la semaine prochaine, je te ferais signe,
on pourra voir ton problème en live).

Willy: Il me semble que les backends sont déjà stockés dans des ebtree.
Pourrait-on stocker aussi les serveurs dans des ebtrees pour accélerer la
recherche?
Ou mieux, faire un arbre qui avec en point d'entrée "/" ?

Baptiste




On Wed, Oct 3, 2018 at 2:00 PM Pierre Cheynier 
wrote:

> Hi Willy,
>
> > Not really. Maybe we should see how the state file parser works, because
> > multiple seconds to parse only 30K lines seems extremely long.
>
> I would even say multiple minutes :)
>
> > I'm just thinking about a few things. Probably that among these 30K
> servers,
> > most of them are in fact tracking other ones ? In this case it could make
> > sense to have an option to only dump servers which are not tracking
> > others, as for a reload it can make quite some sense. Is this the case
> > for you ?
>
> What do you mean by "tracking other ones"?
>
> What I can tell is that, for historical reasons, we named all server the
> same way for each backends (ie. srvN) in the configuration template, and
> are using "server templates" to add MAINT servers in the pool so that they
> can be added at runtime later.
>
> This naming thing can be changed now, but I don't know this issue could be
> related or not.
>
> What we're doing basically when getting a new event:
> * if it requires to delete / update / add server(s) in one or multiple
> pools we only use the runtime API and try to reuse free slots.
> * if a backend/frontend has to be created / updated / deleted OR if the
> free slots for a given backend is full we reload using a configuration
> template.
> * in Jinja2 this template looks like (simplified):
>
> backend be_foo
>
>   {%- for server in servers %}
>server srv{{loop.index0}} {{server.address}}:{{server.port}} weight
> {{server.weight}}{%- if server.tls %} ssl{%- endif %} check port 8500
>   {%- endfor %}
># Create 25 free slots, servers are numbered from N to N+25
>server-template srv {{ servers|length }}-{{ servers|length + 25 }}
> 0.0.0.0:0 check disabled
>
> Doing this I noticed that we have a lot of 'bad reconciliations'
> triggering warning logs, such as:
>
> [WARNING] can't find server 'srv28' with id '29' in backend with id '9' or
> name 'be_test'
> [WARNING] backend name mismatch: from server state file: 'be_foo', from
> running config 'be_bar'
>
> I don't know if these inconsistencies (that clearly have to be fixed) can
> cause additional delays.
>
> Thanks,
>
> Pierre
>


RE: faster than load-server-state-from-file?

2018-10-03 Thread Pierre Cheynier
Hi Willy,

> Not really. Maybe we should see how the state file parser works, because
> multiple seconds to parse only 30K lines seems extremely long.

I would even say multiple minutes :)

> I'm just thinking about a few things. Probably that among these 30K servers,
> most of them are in fact tracking other ones ? In this case it could make
> sense to have an option to only dump servers which are not tracking
> others, as for a reload it can make quite some sense. Is this the case
> for you ?

What do you mean by "tracking other ones"?

What I can tell is that, for historical reasons, we named all server the same 
way for each backends (ie. srvN) in the configuration template, and are using 
"server templates" to add MAINT servers in the pool so that they can be added 
at runtime later.

This naming thing can be changed now, but I don't know this issue could be 
related or not.

What we're doing basically when getting a new event:
* if it requires to delete / update / add server(s) in one or multiple pools we 
only use the runtime API and try to reuse free slots.
* if a backend/frontend has to be created / updated / deleted OR if the free 
slots for a given backend is full we reload using a configuration template.
* in Jinja2 this template looks like (simplified):

backend be_foo
   
  {%- for server in servers %}
   server srv{{loop.index0}} {{server.address}}:{{server.port}} weight 
{{server.weight}}{%- if server.tls %} ssl{%- endif %} check port 8500
  {%- endfor %}
   # Create 25 free slots, servers are numbered from N to N+25
   server-template srv {{ servers|length }}-{{ servers|length + 25 }} 0.0.0.0:0 
check disabled

Doing this I noticed that we have a lot of 'bad reconciliations' triggering 
warning logs, such as:

[WARNING] can't find server 'srv28' with id '29' in backend with id '9' or name 
'be_test'
[WARNING] backend name mismatch: from server state file: 'be_foo', from running 
config 'be_bar'

I don't know if these inconsistencies (that clearly have to be fixed) can cause 
additional delays.

Thanks,

Pierre


Re: faster than load-server-state-from-file?

2018-10-01 Thread Willy Tarreau
Hi Pierre,

On Fri, Sep 21, 2018 at 03:50:19PM +, Pierre Cheynier wrote:
> I'm extensively using server-templates to avoid reloading too much but still,
> backend creation or deletion has to be done by reloading as far as I know. In
> my specific context, it can happen every 5/10s or so.
> As a consequence, I have a lot of servers in the server-state file (>30K 
> lines).
> 
> Trying to use load-server-state-from-file to prevent sending trafic to KO
> servers and retoring stats numbers, I feel that it slows down the reload a
> lot (multiple seconds).
> 
> Any known hint or alternative?

Not really. Maybe we should see how the state file parser works, because
multiple seconds to parse only 30K lines seems extremely long.

I'm just thinking about a few things. Probably that among these 30K servers,
most of them are in fact tracking other ones ? In this case it could make
sense to have an option to only dump servers which are not tracking
others, as for a reload it can make quite some sense. Is this the case
for you ?

Thanks,
Willy