On Tue, Nov 1, 2011 at 6:12 PM, Keith Massey <[email protected]> wrote: > On 11/1/11 3:31 PM, Keith Turner wrote: >> >> On Tue, Nov 1, 2011 at 4:00 PM, Keith Massey >> <[email protected]> wrote: >>> >>> We're querying accumulo through a web application. After it had been hit >>> with one of our test scripts for a few minutes with the debugger attached >>> I >>> noticed that there were hundreds and hundreds of threads being garbage >>> collected. Eventually it crashes my IDE and the server becomes >>> unresponsive. >>> The server recovers eventually. After looking through the code a little >>> bit, >>> it appears that these threads are coming from >>> org.apache.accumulo.core.client.impl.ScannerIterator.initiateReadAhead(). >>> We >>> actually get many threads per iterator. Is there any reason that it can't >>> use a thread pool instead of creating a new thread for every call to that >>> method? >>> Thanks. >>> >>> Keith >>> >> The reason I did not use a thread pool is because the scanner does not >> have a close method. I suppose we could use a thread pool where the >> threads timeout when not used. This could still lead to a lot of >> threads depending on the timeout and how many scanner iterators are >> created. >> >> The BatchScanner and BatchWriter interfaces use thread pools and have >> close methods. >> >> Do you think this issue needs a ticket? > > I'm not incredibly familiar with this code, but it could be a static thread > pool right? And just let all ScannerIterators share some configurable thread > pool? The thread would just be returned to the pool when the Reader > completed. >
A static thread pool may limit a users ability to control the behavior when they have multiple threads using scanners. Along that line of thought, letting the user pass in a thread pool is a flexible solution. It gives the user a lot of control. The scanner factory method could accept a thread pool as an argument. The drawback is that it makes it more cumbersome for the user when they are doing something simple.
