On 25 February 2015 at 02:35, Robert Collins <robe...@robertcollins.net> wrote:
> On 24 February 2015 at 01:07, Salvatore Orlando <sorla...@nicira.com> > wrote: > > Lazy-Stacker summary: > ... > > In the medium term, there are a few things we might consider for > Neutron's > > "built-in IPAM". > > 1) Move the allocation logic out of the driver, thus making IPAM an > > independent service. The API workers will then communicate with the IPAM > > service through a message bus, where IP allocation requests will be > > "naturally serialized" > > 2) Use 3-party software as dogpile, zookeeper but even memcached to > > implement distributed coordination. I have nothing against it, and I > reckon > > Neutron can only benefit for it (in case you're considering of arguing > that > > "it does not scale", please also provide solid arguments to support your > > claim!). Nevertheless, I do believe API request processing should proceed > > undisturbed as much as possible. If processing an API requests requires > > distributed coordination among several components then it probably means > > that an asynchronous paradigm is more suitable for that API request. > > So data is great. It sounds like as long as we have an appropriate > retry decorator in place, that write locks are better here, at least > for up to 30 threads. But can we trust the data? > Not unless you can prove the process to obtain them is correct. Otherwise we'd still think the sun rotates around the earth. > > One thing I'm not clear on is the SQL statement count. You say 100 > queries for A-1 with a time on Galera of 0.06*1.2=0.072 seconds per > allocation ? So is that 2 queries over 50 allocations over 20 threads? > So the query number reported in the thread is for a single node test. The numbers for the galera tests are on github, and if you have a galera environment you can try and run the experiment there too. The algorithm indeed should perform a single select query for each IP allocation and the number appears to be really too high. It is coming from sqlalchemy hooks, so I guess it's reliable. It's worth noting that I put the count for all queries, including those for setting up the environment, and verifying the algorithm successful completion, so those should be removed. I can easily enable debug logging and provide a detailed breakdown of db operations for every algorithm. > I'm not clear on what the request parameter in the test json files > does, and AFAICT your threads each do one request each. As such I > suspect that you may be seeing less concurrency - and thus contention > - than real-world setups where APIs are deployed to run worker > processes in separate processes and requests are coming in > willy-nilly. The size of each algorithms workload is so small that its > feasible to imagine the thread completing before the GIL bytecount > code trigger (see > https://docs.python.org/2/library/sys.html#sys.setcheckinterval) and > the GIL's lack of fairness would exacerbate that. > I have a retry counter which testifies that contention is actually occurring. Indeed algorithms which do sequential allocation see a lot of contention, so I do not think that I'm just fooling myself and the tests are actually running serially! Anyway, the multiprocess suggestion is very valid and I will repeat the experiments (I'm afraid that won't happen before Friday), because I did not consider the GIL aspect you mention, as I dumbly expected that python will simple spawn a different pthread for each thread and let the OS do the scheduling. > > If I may suggest: > - use multiprocessing or some other worker-pool approach rather than > threads > - or set setcheckinterval down low (e.g. to 20 or something) > - do multiple units of work (in separate transactions) within each > worker, aim for e.g. 10 seconds or work or some such. > This last suggestion also makes sense. > - log with enough detail that we can report on the actual concurrency > achieved. E.g. log the time in us when each transaction starts and > finishes, then we can assess how many concurrent requests were > actually running. > I put simple output on github only, but full debug logging can be achieved by simply changing a constant. However, I'm collecting the number of retries for each thread as an indirect marker of concurrency level. > > If the results are still the same - great, full steam ahead. If not, > well lets revisit :) > Obviously. We're not religious here. We'll simply do what the data suggest as the best way forward. > > -Rob > > > -- > Robert Collins <rbtcoll...@hp.com> > Distinguished Technologist > HP Converged Cloud > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev