Re: [openstack-dev] [Neutron] db-level locks, non-blocking algorithms, active/active DB clusters and IPAM

Salvatore Orlando Wed, 25 Feb 2015 08:13:44 -0800

On 25 February 2015 at 02:35, Robert Collins <robe...@robertcollins.net>
wrote:

> On 24 February 2015 at 01:07, Salvatore Orlando <sorla...@nicira.com>
> wrote:
> > Lazy-Stacker summary:
> ...
> > In the medium term, there are a few things we might consider for
> Neutron's
> > "built-in IPAM".
> > 1) Move the allocation logic out of the driver, thus making IPAM an
> > independent service. The API workers will then communicate with the IPAM
> > service through a message bus, where IP allocation requests will be
> > "naturally serialized"
> > 2) Use 3-party software as dogpile, zookeeper but even memcached to
> > implement distributed coordination. I have nothing against it, and I
> reckon
> > Neutron can only benefit for it (in case you're considering of arguing
> that
> > "it does not scale", please also provide solid arguments to support your
> > claim!). Nevertheless, I do believe API request processing should proceed
> > undisturbed as much as possible. If processing an API requests requires
> > distributed coordination among several components then it probably means
> > that an asynchronous paradigm is more suitable for that API request.
>
> So data is great. It sounds like as long as we have an appropriate
> retry decorator in place, that write locks are better here, at least
> for up to 30 threads. But can we trust the data?
>

Not unless you can prove the process to obtain them is correct.
Otherwise we'd still think the sun rotates around the earth.

>
> One thing I'm not clear on is the SQL statement count.  You say 100
> queries for A-1 with a time on Galera of 0.06*1.2=0.072 seconds per
> allocation ? So is that 2 queries over 50 allocations over 20 threads?
>

So the query number reported in the thread is for a single node test. The
numbers for the galera tests are on github, and if you have a galera
environment you can try and run the experiment there too.
The algorithm indeed should perform a single select query for each IP
allocation and the number appears to be really too high. It is coming from
sqlalchemy hooks, so I guess it's reliable. It's worth noting that I put
the count for all queries, including those for setting up the environment,
and verifying the algorithm successful completion, so those should be
removed. I can easily enable debug logging and provide a detailed breakdown
of db operations for every algorithm.

> I'm not clear on what the request parameter in the test json files
> does, and AFAICT your threads each do one request each. As such I
> suspect that you may be seeing less concurrency - and thus contention
> - than real-world setups where APIs are deployed to run worker
> processes in separate processes and requests are coming in
> willy-nilly. The size of each algorithms workload is so small that its
> feasible to imagine the thread completing before the GIL bytecount
> code trigger (see
> https://docs.python.org/2/library/sys.html#sys.setcheckinterval) and
> the GIL's lack of fairness would exacerbate that.
>

I have a retry counter which testifies that contention is actually
occurring.
Indeed algorithms which do sequential allocation see a lot of contention,
so I do not think that I'm just fooling myself and the tests are actually
running serially!
Anyway, the multiprocess suggestion is very valid and I will repeat the
experiments (I'm afraid that won't happen before Friday), because I did not
consider the GIL aspect you mention, as I dumbly expected that python will
simple spawn a different pthread for each thread and let the OS do the
scheduling.

>
> If I may suggest:
>  - use multiprocessing or some other worker-pool approach rather than
> threads
>  - or set setcheckinterval down low (e.g. to 20 or something)
>  - do multiple units of work (in separate transactions) within each
> worker, aim for e.g. 10 seconds or work or some such.
>

This last suggestion also makes sense.

>  - log with enough detail that we can report on the actual concurrency
> achieved. E.g. log the time in us when each transaction starts and
> finishes, then we can assess how many concurrent requests were
> actually running.
>

I put simple output on github only, but full debug logging can be achieved
by simply changing a constant.
However, I'm collecting the number of retries for each thread as an
indirect marker of concurrency level.

>
> If the results are still the same - great, full steam ahead. If not,
> well lets revisit :)
>

Obviously. We're not religious here. We'll simply do what the data suggest
as the best way forward.

>
> -Rob
>
>
> --
> Robert Collins <rbtcoll...@hp.com>
> Distinguished Technologist
> HP Converged Cloud
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] db-level locks, non-blocking algorithms, active/active DB clusters and IPAM

Reply via email to