On 5/15/15 2:33 PM, Ole Laursen wrote: > I did a profile now on examples/fileman with perf: > > https://perf.wiki.kernel.org/index.php/Tutorial#Sampling_with_perf_record > > Here are the results from pasting 5000 x "aaa " with my default locale > da_DK.UTF-8. As is evident most of the time is spent measuring the > widths of the a's:
Yes, the redisplay code computes all character widths to determine where to break lines and manage terminal line wrapping. > > 21,22% fileman libc-2.19.so [.] __gconv_transform_utf8_internal > 20,60% fileman libc-2.19.so [.] wcwidth > 19,80% fileman libc-2.19.so [.] __mbrtowc That's apparently very expensive using glibc on Linux. > I'm assuming that readline does a redisplay on each inserted char? In > this redisplay, it goes through all characters on the line inserted so > far, so the end result is O(n^2). > > In a multi-byte case, examining each character is obviously much more > expensive. Yes, apparently. > > Notice that in even with LANG=C, __ctype_get_mb_cur_max accounts for > 61.7% of the time. As far as I can tell, this comes from MB_CUR_MAX > which despite the naming is not a constant: > > http://man7.org/linux/man-pages/man3/MB_CUR_MAX.3.html Other systems, such as Mac OS X or FreeBSD, manage to avoid a function call, but it's definitely not guaranteed to be a compile-time constant. Since it only changes on a successful call to setlocale(), it's possible to avoid function calls here. glibc just doesn't, so we should work around that. > As a quickfix for that, I'm attaching a patch to only compute > MB_CUR_MAX once in the top of the function. With this patch, the > LANG=C profile is noticeable faster and looks like this Thanks, this is part of the fix for this. A larger part is to make the calls to wcwidth cheaper on Linux. > I suppose one could try to make the loop content faster somehow, but > to really fix this to not be O(n^2) one should only do the work > necessary to take the single added character into account instead of > starting off from scratch each time. Or perhaps try to tackle this > case of copy-pasting a bucket-load of characters by batching them? The readline redisplay engine is very decoupled from the buffer manipulation engine: it gets only the contents of the buffer that need to be displayed and is responsible for keeping track of the current physical screen contents, computing the differences, and performing the screen updates. It makes no assumptions about what might have happened to the line buffer between calls to rl_redisplay. It's pretty efficient: it knows how to suppress redisplay for identical screen lines and to only append new characters to existing lines when that makes sense. There aren't currently any mechanisms to hint that the only difference between the previous line buffer and the current line buffer is some appended characters; it has to compute the line(s) that will be displayed on the screen before it knows that. As I said above, part of the process involves figuring out character widths to correctly compute line breaks. It might be possible to add an internal interface to tell the redisplay engine that the only thing changing is adding characters at the end of the line and change behavior accordingly, but I am not sure that is worth the effort. If you're going to paste 20,000 characters, you might as well spend the extra 5 or 6 seconds to turn off line editing before you do it. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRU [email protected] http://cnswww.cns.cwru.edu/~chet/ _______________________________________________ Bug-readline mailing list [email protected] https://lists.gnu.org/mailman/listinfo/bug-readline
