Re: Multithreaded LT optimization (take 2)

Andriy Rysin Fri, 29 Jan 2016 06:13:03 -0800

Just did some analysis on the paragraph structure and in my 3 tests
(one biography book, archive of fiction books, and archive of
newspaper) only 10% of paragraphs have 8 sentences or more, thus on
8-core cpu we don't utilize all cores for sentence analyzing 90% of
the time. More than that only 50% of paragraphs have 4 sentences or
more thus 50% of the time we don't even utilize half of the cores. And
the worst: for the media archive 30% of the paragraphs contain only 1
sentence (I guess that would be titles, notes, authors, semi-empty
paragraphs etc)!


So without going to constant data flow approach the quick remedy could
be to run reading/splitting file in parallel with analyze/check and
feeding more than 1 paragraph at a time to the analyze/check block.

Regards,
Andriy

2016-01-28 14:16 GMT-05:00 Silvan Jegen <m...@sillymon.ch>:
> Hi Andriy
>
> Thanks for the writeup! It is a very good knowledge base for someone
> who wants to work on improving LT's multithreaded performance.
>
>
> On Thu, Jan 28, 2016 at 01:29:13PM -0500, Andriy Rysin wrote:
>> so currently what we (approximately) do is
>> 1) read file (line by line in general case) (main thread)
>> 2) on a paragraph boundary we split it into sentences (main thread)
>> 2a) only 1 thread is used up to this point
>> 3) send the list of sentences to threaded executor for analysis
>> (tokenization/tagging/disambiguation), here # of callables to feed to
>> the thread pool = # of sentences
>> 3a) now we wait for all threads to finish
>> 4) collected analyzed sentences for the paragraph then sent to
>> threaded executor for rule check, here currently # of callables = # of
>> threads, if we increase # of callables up to certain point we get a
>> speedup, but beyond that point increasing granularity adds too much
>> overhead so at least I saw some slowdown: between 5-8% worse than at
>> the peak)
>> 4a) we wait for all threads to finish
>> 5) we collect all the rule matches and go to 1)
>>
>> We loose time with idle threads in 1-2), 3a) and 4a). Ideally we
>> should have all stages work in parallel, thus e.g. while we run the
>> check some thread should already read next chunk of the file and fed
>> it to the analyzer, and while we're reading the file it would be ideal
>> to feed the sentence-by-sentence to the analysis. In reality it's not
>> that straightforward for several reasons:
>> 1) current code logic:
>>  * we have some code to be parallelized that is not quite functional
>> and we have a lot of auxiliary code that takes care of output
>> formatting etc which needs to be refactored
>> 2) analysis/check specifics:
>>  * we can't easily grab sentences continuously as we need to feed some
>> reasonable blocks of data to sentence tokenizer first (one may argue
>> if paragraph is a good chunking size)
>>  * some languages have rules that check on paragraph level
>> (inter-sentence checks) so they need to be fed the whole paragraph
>
> I did not know that and that definitely makes things more complicated.
>
>
>> 3) texts fed into LT may have quite different characteristics and
>> depending on how paragraphs are slit, how many sentences in paragraph
>> there is, how long sentences are, how many and how complex our
>> logic/rules are, so if you can write "ideal" code you need to consider
>> which approach will benefit where
>>
>> Most of those problems are solvable (to a different degree) but for
>> some the effort may not be trivial. If anybody is willing to dive
>> deeper into the subject I'll be grad to share my knowledge.
>>
>> The only thing I can say is if we not rewriting for the new
>> multithreaded architecture (and this will take both -core and
>> -commandline) but rather take incremental improvements I would look
>> into running analysis/check in parallel to file reading/sentence
>> splitting - we wait for this block the amount of time comparable to
>> wait time in analysis and check blocks but in this case all but 1
>> threads are idle, while in other two cases only some threads are
>> waiting. Also I feel that due to most texts specifics we get a lot of
>> small paragraphs (default logic splits them by two newlines, which
>> triggers many small paragraphs for titles, notes, dialogs etc) which
>> means we start analysis/check blocks too often with small chunks of
>> data thus loosing in performance.
>>
>> For now I thought that changing one number in 1 line that leads to
>> about 25% speedup is worth applying until we have better solution. :)
>
> Definitely. It sounds like solving some of the issues would involve a
> substantial refactoring of certain parts of LT's processing pipeline.
> Interesting work but potentially very time consuming.
>
>
> Cheers,
>
> Silvan
>
>
>> Andriy
>>
>>
>>
>>
>>
>> 2016-01-28 12:46 GMT-05:00 Silvan Jegen <m...@sillymon.ch>:
>> > Heyhey
>> >
>> > On Thu, Jan 28, 2016 at 08:57:49AM -0500, Andriy Rysin wrote:
>> >> yes, that's the case I tried with # of callables = # of rules (see my
>> >> previous email), the wait time went down quite a bit (as expected) but
>> >> the overall processing time went up, I suspect because of split/merge
>> >> overhead. But this depends heavily of type/number of rules, the text
>> >> and cpu (e.g. if rule processing time is more unbabalanced that in
>> >> Ukrainian case then increasing # of callables will help, otherwise the
>> >> effect could be reverse) so we would have to try other languages with
>> >
>> > I may have missed a statistic in your earlier mail but wouldn't splitting
>> > up the text in sentences and then sending batches of them to different
>> > threads result in the most evenly load distribution? Because of the
>> > result merging overhead this would only make sense when the number of
>> > lines crosses a certain threshhold.
>> >
>> >
>> > Cheers,
>> >
>> > Silvan
>> >
>> >> different number of callables to see what's the best approach.
>> >> I know we have regular wikipedia checks for some languages - that
>> >> could be a good benchmarking test.
>> >>
>> >> Regards,
>> >> Andriy
>> >>
>> >> 2016-01-28 8:47 GMT-05:00 Dominique Pellé <dominique.pe...@gmail.com>:
>> >> > Andriy Rysin wrote:
>> >> >
>> >> >
>> >> >> Then I realized that in the check method we split rules into callables
>> >> >> and their count is # of cores available (in my case 8), as I have 347
>> >> >> rules this means each bucket is 43 rules and rules being not equal in
>> >> >> complexity this could lead to quite unequal time for each thread.
>> >> >
>> >> > Hi Andriy
>> >> >
>> >> > Thanks for having a look at multi-thread performances.
>> >> > I don't know the code as much as you do. But if we indeed
>> >> > split the number of rules equally before processing them, then
>> >> > it seems bad for balancing the work.
>> >> >
>> >> > Can't we instead have a queue with all rules to be processed?
>> >> > When a thread is ready to do work, it picks the next rule to process
>> >> > from the queue. So load would be well balanced, even if some rules
>> >> > are 10x slower than others. With such a queue, a thread that picks
>> >> > up an expensive rules would end up processing less rules than a
>> >> > thread that picks up fast rules, keeping all CPUs busy, as much
>> >> > as possible.
>> >> >
>> >> > Regards
>> >> > Dominique
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> >> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> >> > Monitor end-to-end web transactions and take corrective actions now
>> >> > Troubleshoot faster and improve end-user experience. Signup Now!
>> >> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> >> > _______________________________________________
>> >> > Languagetool-devel mailing list
>> >> > Languagetool-devel@lists.sourceforge.net
>> >> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> >> Monitor end-to-end web transactions and take corrective actions now
>> >> Troubleshoot faster and improve end-user experience. Signup Now!
>> >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> >> _______________________________________________
>> >> Languagetool-devel mailing list
>> >> Languagetool-devel@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>> >
>> > ------------------------------------------------------------------------------
>> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> > Monitor end-to-end web transactions and take corrective actions now
>> > Troubleshoot faster and improve end-user experience. Signup Now!
>> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> > _______________________________________________
>> > Languagetool-devel mailing list
>> > Languagetool-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: Multithreaded LT optimization (take 2)

Reply via email to