Just did some analysis on the paragraph structure and in my 3 tests (one biography book, archive of fiction books, and archive of newspaper) only 10% of paragraphs have 8 sentences or more, thus on 8-core cpu we don't utilize all cores for sentence analyzing 90% of the time. More than that only 50% of paragraphs have 4 sentences or more thus 50% of the time we don't even utilize half of the cores. And the worst: for the media archive 30% of the paragraphs contain only 1 sentence (I guess that would be titles, notes, authors, semi-empty paragraphs etc)!
So without going to constant data flow approach the quick remedy could be to run reading/splitting file in parallel with analyze/check and feeding more than 1 paragraph at a time to the analyze/check block. Regards, Andriy 2016-01-28 14:16 GMT-05:00 Silvan Jegen <m...@sillymon.ch>: > Hi Andriy > > Thanks for the writeup! It is a very good knowledge base for someone > who wants to work on improving LT's multithreaded performance. > > > On Thu, Jan 28, 2016 at 01:29:13PM -0500, Andriy Rysin wrote: >> so currently what we (approximately) do is >> 1) read file (line by line in general case) (main thread) >> 2) on a paragraph boundary we split it into sentences (main thread) >> 2a) only 1 thread is used up to this point >> 3) send the list of sentences to threaded executor for analysis >> (tokenization/tagging/disambiguation), here # of callables to feed to >> the thread pool = # of sentences >> 3a) now we wait for all threads to finish >> 4) collected analyzed sentences for the paragraph then sent to >> threaded executor for rule check, here currently # of callables = # of >> threads, if we increase # of callables up to certain point we get a >> speedup, but beyond that point increasing granularity adds too much >> overhead so at least I saw some slowdown: between 5-8% worse than at >> the peak) >> 4a) we wait for all threads to finish >> 5) we collect all the rule matches and go to 1) >> >> We loose time with idle threads in 1-2), 3a) and 4a). Ideally we >> should have all stages work in parallel, thus e.g. while we run the >> check some thread should already read next chunk of the file and fed >> it to the analyzer, and while we're reading the file it would be ideal >> to feed the sentence-by-sentence to the analysis. In reality it's not >> that straightforward for several reasons: >> 1) current code logic: >> * we have some code to be parallelized that is not quite functional >> and we have a lot of auxiliary code that takes care of output >> formatting etc which needs to be refactored >> 2) analysis/check specifics: >> * we can't easily grab sentences continuously as we need to feed some >> reasonable blocks of data to sentence tokenizer first (one may argue >> if paragraph is a good chunking size) >> * some languages have rules that check on paragraph level >> (inter-sentence checks) so they need to be fed the whole paragraph > > I did not know that and that definitely makes things more complicated. > > >> 3) texts fed into LT may have quite different characteristics and >> depending on how paragraphs are slit, how many sentences in paragraph >> there is, how long sentences are, how many and how complex our >> logic/rules are, so if you can write "ideal" code you need to consider >> which approach will benefit where >> >> Most of those problems are solvable (to a different degree) but for >> some the effort may not be trivial. If anybody is willing to dive >> deeper into the subject I'll be grad to share my knowledge. >> >> The only thing I can say is if we not rewriting for the new >> multithreaded architecture (and this will take both -core and >> -commandline) but rather take incremental improvements I would look >> into running analysis/check in parallel to file reading/sentence >> splitting - we wait for this block the amount of time comparable to >> wait time in analysis and check blocks but in this case all but 1 >> threads are idle, while in other two cases only some threads are >> waiting. Also I feel that due to most texts specifics we get a lot of >> small paragraphs (default logic splits them by two newlines, which >> triggers many small paragraphs for titles, notes, dialogs etc) which >> means we start analysis/check blocks too often with small chunks of >> data thus loosing in performance. >> >> For now I thought that changing one number in 1 line that leads to >> about 25% speedup is worth applying until we have better solution. :) > > Definitely. It sounds like solving some of the issues would involve a > substantial refactoring of certain parts of LT's processing pipeline. > Interesting work but potentially very time consuming. > > > Cheers, > > Silvan > > >> Andriy >> >> >> >> >> >> 2016-01-28 12:46 GMT-05:00 Silvan Jegen <m...@sillymon.ch>: >> > Heyhey >> > >> > On Thu, Jan 28, 2016 at 08:57:49AM -0500, Andriy Rysin wrote: >> >> yes, that's the case I tried with # of callables = # of rules (see my >> >> previous email), the wait time went down quite a bit (as expected) but >> >> the overall processing time went up, I suspect because of split/merge >> >> overhead. But this depends heavily of type/number of rules, the text >> >> and cpu (e.g. if rule processing time is more unbabalanced that in >> >> Ukrainian case then increasing # of callables will help, otherwise the >> >> effect could be reverse) so we would have to try other languages with >> > >> > I may have missed a statistic in your earlier mail but wouldn't splitting >> > up the text in sentences and then sending batches of them to different >> > threads result in the most evenly load distribution? Because of the >> > result merging overhead this would only make sense when the number of >> > lines crosses a certain threshhold. >> > >> > >> > Cheers, >> > >> > Silvan >> > >> >> different number of callables to see what's the best approach. >> >> I know we have regular wikipedia checks for some languages - that >> >> could be a good benchmarking test. >> >> >> >> Regards, >> >> Andriy >> >> >> >> 2016-01-28 8:47 GMT-05:00 Dominique Pellé <dominique.pe...@gmail.com>: >> >> > Andriy Rysin wrote: >> >> > >> >> > >> >> >> Then I realized that in the check method we split rules into callables >> >> >> and their count is # of cores available (in my case 8), as I have 347 >> >> >> rules this means each bucket is 43 rules and rules being not equal in >> >> >> complexity this could lead to quite unequal time for each thread. >> >> > >> >> > Hi Andriy >> >> > >> >> > Thanks for having a look at multi-thread performances. >> >> > I don't know the code as much as you do. But if we indeed >> >> > split the number of rules equally before processing them, then >> >> > it seems bad for balancing the work. >> >> > >> >> > Can't we instead have a queue with all rules to be processed? >> >> > When a thread is ready to do work, it picks the next rule to process >> >> > from the queue. So load would be well balanced, even if some rules >> >> > are 10x slower than others. With such a queue, a thread that picks >> >> > up an expensive rules would end up processing less rules than a >> >> > thread that picks up fast rules, keeping all CPUs busy, as much >> >> > as possible. >> >> > >> >> > Regards >> >> > Dominique >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > Site24x7 APM Insight: Get Deep Visibility into Application Performance >> >> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> >> > Monitor end-to-end web transactions and take corrective actions now >> >> > Troubleshoot faster and improve end-user experience. Signup Now! >> >> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> >> > _______________________________________________ >> >> > Languagetool-devel mailing list >> >> > Languagetool-devel@lists.sourceforge.net >> >> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> >> >> >> ------------------------------------------------------------------------------ >> >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> >> Monitor end-to-end web transactions and take corrective actions now >> >> Troubleshoot faster and improve end-user experience. Signup Now! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> >> _______________________________________________ >> >> Languagetool-devel mailing list >> >> Languagetool-devel@lists.sourceforge.net >> >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> > >> > ------------------------------------------------------------------------------ >> > Site24x7 APM Insight: Get Deep Visibility into Application Performance >> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> > Monitor end-to-end web transactions and take corrective actions now >> > Troubleshoot faster and improve end-user experience. Signup Now! >> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> > _______________________________________________ >> > Languagetool-devel mailing list >> > Languagetool-devel@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> >> ------------------------------------------------------------------------------ >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel