Linux-generic solves this problems with watermarks. As the shared pool depletes and hits its low watermark free() calls always return elements back to the shared pool rather than the local cache until the shared pool reaches its high watermark. This ensures that elements cannot get "lost" in the local caches while the shared pool starves.
Obviously you need sufficient elements in the pool to account for a certain number being in the local pools, but with this scheme the shared pool will never fully deplete as long as frees are occurring anywhere. num needs to be exact per ODP semantics however the application is expected to pick an appropriately sized num to meet its needs based on its understanding of how it intends to use the pool. Originally we had APIs to allow these watermarks to be configured however they weren't added. I still think that this is a worthwhile concept to expose. Perhaps something to discuss in today's ARCH call. On Tue, Jan 12, 2016 at 12:23 PM, Zoltan Kiss <zoltan.k...@linaro.org> wrote: > Hi, > > During the discussion of user area init a separate topic came up: the > number of buffers in various pool types are defined by 'num' in all 3 > current pool types. One could automatically think this is an exact number, > but that changes with caches. > Implementations might want to implement a per-thread object cache, as > without it threads could starve on locks when need for buffers is high. > DPDK does that by default as well, so does ODP-DPDK. But that also means > elements in the local cache are inherently not accessible for other > threads. And making them accessible would probably take away a big chunk of > performance gains. > There is an unit test for timers, timer_test_odp_timer_all, which uses > several threads, and these threads want to allocate all 'num' elements of > the pool. With an object cache this fails, because at the end some threads > will idle with elements in their cache while other fail to allocate. > The current way ODP-DPDK deals with this is to handle 'num' flexibly: > based on odp_cpumask_default_worker() it increases 'num' so even if all the > other thread's thread-local cache is filled up to the max, a single thread > is able to allocate the original 'num' elements. But that also means if the > other thread's thread-local cache is empty, a single thread can allocate > even more than 'num' elements. > Petri said the latter is not good, 'num' should be an exact value, the > minimum and maximum of elements available. I can see the following options > to handle this problem: > - prohibit per-thread object caches: that would be the easiest to > implement, but e.g. ODP-DPDK would loose a lot of performance, and I would > have to recall my performance results published at last Connect as well. > The throughput in that case would drop from 13.8 to 13 even though only one > thread is involved. > - check in ODP-DPDK that a thread can't allocate more than 'num' elements: > would be quite expensive to track that, probably we would loose the > benefits of the caching > - relax Petri's requirement about 'num' being an exact number: this is the > current scenario with ODP-DPDK. It's a minimum, but not a maximum. I'm not > sure why it would be a problem if you could allocate more, probably Petri > has a scenario in mind. > - or the other way around (somewhat): 'num' is an exact number, but if > there are more threads, the application shouldn't make any assumption > towards how much elements it could allocate. I'm also fine with this. It > also means we should probably ditch or seriously rework that unit test, I'm > not sure it would make too much sense in this case. > > Opinions? > > Zoli > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > https://lists.linaro.org/mailman/listinfo/lng-odp >
_______________________________________________ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp