Re: Slow operations on buffers of tens of megabytes

Reiner Steib Mon, 06 Nov 2006 01:22:10 -0800

On Mon, Nov 06 2006, Katsumi Yamaoka wrote:

>>>>>> In <[EMAIL PROTECTED]> Richard Stallman wrote:
>
>>     Scoring of the messages closer to the beginning of the buffer is fast,
>>     but as we move to higher-numbered messages, that are closer to the end
>>     of such big files/buffers, gnus will only score 2-3 messages per
>>     minute, and that's what kills performance.
[...]
> (setq gnus-article-button-face nil
>       gnus-signature-face nil
>       gnus-summary-selected-face nil
>       gnus-treat-highlight-citation nil
>       gnus-treat-emphasize nil)
>
> If it makes Gnus fast, improving the performance will be worth
> trying.  However, I didn't feel any difference, though it might
> be because I don't have huge mail folders.


I don't think this matches the problem description.  When scanning big
mbox files, article display isn't involved.  Or am I missing
something?

My guess is that it's problem with case-fold-search when searching for
"X-Gnus-Article-Number" in mbox files in Emacs 22 as analyzed by Elias
Oltmanns back in June:

,----[ http://thread.gmane.org/gmane.emacs.devel/53901/focus=54013 ]
| From: Elias Oltmanns <oltmanns <at> uni-bonn.de>
| Subject: Re: New buffer-case-table makes search_buffer painfully slow
| Newsgroups: gmane.emacs.devel
| Date: 2006-05-06 19:10:08 GMT
| 
| Elias Oltmanns <oltmanns <at> uni-bonn.de> wrote:
| > Hi all,
| >
| > switching from emacs 21 to emacs 22 has a very significant performance
| > impact on packages that make heavy use of search_buffer. An example
| > that actually made me aware of this problem is gnus processing large
| > mbox files. Further analysis of this problem revealed that in emacs 22
| > an "i" in the search string makes search_buffer use simple_search()
| > instead of boyer_moore(). 
| 
| Emacs 22's EQUIVALENCES table relates i, and thus I as well, to two
| more characters with character codes 331857 and 331856. On
| www.unicode.org the character look up engine couldn't find a match for
| U+51051 or U+51050 saying that most likely those codes weren't
| assigned to any characters yet.
| 
| So, here is a plain question: Is there a bug in the case-table in
| emacs 22 or does the search engine on www.unicode.org for some reason
| miss certain character ranges? Slightly biassed, I'm disregarding the
| possibility of me being unable to use www.unicode.org properly, which,
| in fact, might well be the reason for my confusion.
| 
| Second question: If the case-table was right, what would be the right
| way to tacle the problem described in my original post? For me the
| following snippet in .emacs solves the problem:
| --- ~/.emacs ---
| (unless (< emacs-major-version 22)
|   (set-case-syntax 331856 "w" (standard-case-table))
|   (set-case-syntax 331857 "w" (standard-case-table)))
| --- ~/.emacs ---
| 
| This, of course, is a durty hack and I'm wondering whether emacs
| should provide a feature to "clean up" the EQUIVALENCES table in the
| ascii range in order to avoid falling back to a slow search
| algorithm when we are searching for pure ascii strings. Or do you
| think that packages like gnus which make heavy use of
| re-search-forward should handle these performance issues
| themselves---or indeed the users.
`----

Alexandre, could you please try if the hack suggested by Elias makes
your problem go away?

Richard proposed a fix for this, but AFAICS, this has not been
implemented:

,----[ http://thread.gmane.org/gmane.emacs.devel/53901/focus=54025 ]
| From: Richard Stallman <rms <at> gnu.org>
| Subject: Re: New buffer-case-table makes search_buffer painfully slow
| Newsgroups: gmane.emacs.devel
| Date: 2006-05-07 05:01:27 GMT
|
| I think this has to do with the special characters for Turkish,
| lower-case i without dot and upper-case I with dot.  In Turkish,
| upcasing and downcasing preserve the dot, or the absence of the dot.
| 
| I think these lines in characters.el are the cause of the problem.
| 
|   (set-downcase-syntax  ?? ?i tbl)
|   (set-upcase-syntax    ?I ?? tbl)
| 
| They set up only half of what Turkish needs.
| They make dotless-i upcase into I, and they make
| I-with-dot downcase into i.  They can't do vice versa
| because that would break things for other languages.
| So they are not really useful.  We could simply delete them.
| 
| We could also add a minor mode to set up the case table all the way
| for Turkish.
| 
| Would someone like to do that?
`----

Looking at the ChangeLog, it seems that the relevant code in
`characters.el' ...

,----[ international/characters.el ]
| ;; In some languages, U+0049 LATIN CAPITAL LETTER I and U+0131 LATIN
| ;; SMALL LETTER DOTLESS I make a case pair, and so do U+0130 LATIN
| ;; CAPITAL LETTER I WITH DOT ABOVE and U+0069 LATIN SMALL LETTER I.
| ;; Thus we have to check language-environment to handle casing
| ;; correctly.  Currently only I<->i is available.
| [...] 
|   (set-downcase-syntax  ?İ ?i tbl)
|   (set-upcase-syntax    ?I ?ı tbl)
`----

... has been changed back and forth several times:

,----[ ChangeLog ]
| 2005-04-01  Kenichi Handa  <[EMAIL PROTECTED]>
| 
|       * international/characters.el: Enable the correct case setting for
|       dotless-i and dotted-I.
| 
| 2005-02-02  Kenichi Handa  <[EMAIL PROTECTED]>
| 
|       * international/characters.el: Cancel previous change for
|       I-WITH-DOT-ABOVE and DOTLESS-i.
| 
| 2005-02-02  Kenichi Handa  <[EMAIL PROTECTED]>
| 
|       * international/latin-5.el (tbl): Setup cases of I-WITH-DOT-ABOVE,
|       DOTLESS-i.
| 
|       * international/characters.el: Setup cases of GREEK-FINAL-SIGMA,
|       Y-WITH-DIAERESIS, I-WITH-DOT-ABOVE, DOTLESS-i.
`----

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/


_______________________________________________
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug

Re: Slow operations on buffers of tens of megabytes

Reply via email to