Re: Is there a more efficient threading lock?

Thomas Passin Sat, 25 Feb 2023 14:26:29 -0800

On 2/25/2023 4:41 PM, Skip Montanaro wrote:

Thanks for the responses.
Peter wrote:
Which OS is this?
MacOS Ventura 13.1, M1 MacBook Pro (eight cores).

Thomas wrote:

 > I'm no expert on locks, but you don't usually want to keep a lock while
 > some long-running computation goes on.  You want the computation to be
 > done by a separate thread, put its results somewhere, and then notify
 > the choreographing thread that the result is ready.
In this case I'm extracting the noun phrases from the body of an emailmessage(returned as a list). I have a collection of email messagesorganized by month(typically 1000 to 3000 messages per month). I'm usingconcurrent.futures.ThreadPoolExecutor() with the default number ofworkers (os.cpu_count() * 1.5, or 12 threads on my system)to processeach month, so 12 active threads at a time. Given that the process ispretty much CPU bound, maybe reducing the number of workers to the CPUcount would make sense. Processing of each email message enters thatwith block once.That's about as minimal as I can make it. I thought fora bit about pushing the textblob stuff into a separate worker thread,but it wasn't obvious how to set up queues to handle the communicationbetween the threads created by ThreadPoolExecutor()and the workerthread. Maybe I'll think about it harder. (I have a related problem withSQLite, since an open database can't be manipulated from multiplethreads. That makes much of the program's end-of-run processingsingle-threaded.)

If the noun extractor is single-threaded (which I think you mentioned),no amount of parallel access is going to help. The best you can do isto queue up requests so that as soon as the noun extractor returns fromone call, it gets handed another blob. The CPU will be busy all thetime running the noun-extraction code.

If that's the case, you might just as well eliminate all the threads andjust do it sequentially in the most obvious and simple manner.

It would possibly be worth while to try this approach out and see whathappens to the CPU usage and overall computation time.

 > This link may be helpful -
 >
> https://anandology.com/blog/using-iterators-and-generators/<https://anandology.com/blog/using-iterators-and-generators/>
I don't think that's where my problem is. The lock protects thegeneration of the noun phrases. My loop which does the yielding operatesoutside of that lock's control. The version of the code is my latest, inwhich I tossed out a bunch of phrase-processing code (effectively deadend ideas for processing the phrases). Replacing the for loop with asimple return seems not to have any effect. In any case, the callerwhich uses the phrases does a fair amount of extra work with thephrases, populating a SQLite database, so I don't think the amount oftime it takes to process a single email message is dominated by thephrase generation.
Here's timeitoutput for the noun_phrases code:
% python -m timeit -s 'text = """`python -m timeit --help`""" ; fromtextblob import TextBlob ; from textblob.np_extractors importConllExtractor ; ext = ConllExtractor() ; phrases = TextBlob(text,np_extractor=ext).noun_phrases' 'phrases = TextBlob(text,np_extractor=ext).noun_phrases'
5000 loops, best of 5: 98.7 usec per loop
I process the output of timeit's help message which looks to be aboutthe same length as a typical email message, certainly the same order ofmagnitude. Also, note that I call it once in the setup to eliminate theinitial training of the ConllExtractor instance. I don't know if ~100usqualifies as long running or not.
I'll keep messing with it.

Skip


--
https://mail.python.org/mailman/listinfo/python-list

Re: Is there a more efficient threading lock?

Reply via email to