Bug#317524: infinite loop when using charset_default=iso-8859-1
Hi David, on Fri, Nov 04, 2005 at 23:23:11 -0500, you wrote: Thanks! :) Your observations about the code are very good, though not totally correct. If you look at function process_config_option(), you'll see that the original config line is duplicated and split at the '=' character (which has been replaced by '\0'). The *t-- code is applied to the part after '=' and the loop to erase whitespace will never backup to the part before the '='. I'm sorry, but apart from the fact that relying upon such circumstances does not lead to robust code, I still beg to differ regarding these circumstances. :) process_config_option() indeed duplicates the original line, splits it as you described and passes a pointer to the second part (val) down through process_config_option_as_arg() and process_arg() to get_string() as the second parameter arg. get_string() as per your patch however then duplicates arg to s, which thus is a new block of memory starting with the beginning of arg and, unlike the one arg points into, the former '=' is not before it. t gets to point to s and then is eventually decreased by one and accessed by isspace(*t) during the first iteration as I described previously. elmar -- .'`./\ | :' : Elmar Hoffmann [EMAIL PROTECTED]ASCII Ribbon Campaign \ / `. `'GPG key available via pgp.netagainst HTML email X `- vCards / \ signature.asc Description: Digital signature
Bug#317524: infinite loop when using charset_default=iso-8859-1
Package: bogofilter-bdb Version: 0.96.4-1 Followup-For: Bug #317524 reopen 317524 ! I can reproduce the problem. bogofilter 0.94.13-1 did not have this bug. Like for the original submitter, it does not seem to depend on the db - fails with both existing, and newly created one (removed ~/.bogofilter) - and it also does not seem to depend on the message either: --8--- $ echo foo | bogofilter -c /etc/bogofilter.cf~ -vvxdi probing /home/elho/.bogofilter and wordlist.db for environment...T_ENABLED open_lockfile: open(/home/elho/.bogofilter/lockfile-p) succeeded, fd #3 db_env_create: 0x807318c DB_ENV-set_cachesize(4) DB_ENV-set_lg_max(1048576) DB_ENV-open(home=/home/elho/.bogofilter) db_version: Header version 4.3, library version 4.3 [pid 8598] DB-open(db=0x8073f48, file=wordlist.db, database=NIL, type=1, flags=0x110=DB_RDONLY DB_AUTO_COMMIT , mode=0664) - 0 Successful return: 0 DB-get_byteswapped: false DB-fd: 5 DB-stat success, pagesize: 4096 converting iso-8859-1 # (alternate) to UTF-8 Conversion from 'iso-8859-1 # (alternate)' to 'UTF-8' is not supported. --8--- At this point bogofilter sits in an infinite loop using up all available CPU time. The only difference of /etc/bogofilter.cf~ to the default config is the charset_default=iso-8859-1 setting. HTH, elmar -- System Information: Debian Release: testing/unstable APT prefers unstable APT policy: (500, 'unstable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.12-bdclaim Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Versions of packages bogofilter-bdb depends on: ii bogofilter-common 0.96.4-1 a fast Bayesian spam filter (commo ii libc6 2.3.5-7GNU C Library: Shared libraries an ii libdb4.3 4.3.29-1 Berkeley v4.3 Database Libraries [ ii libgsl0 1.7-2 GNU Scientific Library (GSL) -- li bogofilter-bdb recommends no packages. -- no debconf information -- .'`./\ | :' : Elmar Hoffmann [EMAIL PROTECTED]ASCII Ribbon Campaign \ / `. `'GPG key available via pgp.netagainst HTML email X `- vCards / \ signature.asc Description: Digital signature
Bug#317524: infinite loop when using charset_default=iso-8859-1
I can reproduce the problem. bogofilter 0.94.13-1 did not have this bug. Like for the original submitter, it does not seem to depend on the db - fails with both existing, and newly created one (removed ~/.bogofilter) - and it also does not seem to depend on the message either: --8--- $ echo foo | bogofilter -c /etc/bogofilter.cf~ -vvxdi probing /home/elho/.bogofilter and wordlist.db for environment...T_ENABLED open_lockfile: open(/home/elho/.bogofilter/lockfile-p) succeeded, fd #3 db_env_create: 0x807318c DB_ENV-set_cachesize(4) DB_ENV-set_lg_max(1048576) DB_ENV-open(home=/home/elho/.bogofilter) db_version: Header version 4.3, library version 4.3 [pid 8598] DB-open(db=0x8073f48, file=wordlist.db, database=NIL, type=1, flags=0x110=DB_RDONLY DB_AUTO_COMMIT , mode=0664) - 0 Successful return: 0 DB-get_byteswapped: false DB-fd: 5 DB-stat success, pagesize: 4096 converting iso-8859-1 # (alternate) to UTF-8 Conversion from 'iso-8859-1 # (alternate)' to 'UTF-8' is not supported. --8--- At this point bogofilter sits in an infinite loop using up all available CPU time. The only difference of /etc/bogofilter.cf~ to the default config is the charset_default=iso-8859-1 setting. Does it work fine if you remove the trailing spaces and comment from the config line? -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#317524: infinite loop when using charset_default=iso-8859-1
Hi Clint, on Fri, Nov 04, 2005 at 15:39:04 -0500, you wrote: Conversion from 'iso-8859-1 # (alternate)' to 'UTF-8' is not supported. Does it work fine if you remove the trailing spaces and comment from the config line? Ah, yes it does. I somehow didn't look properly at the output I pasted. ;-X elmar -- .'`./\ | :' : Elmar Hoffmann [EMAIL PROTECTED]ASCII Ribbon Campaign \ / `. `'GPG key available via pgp.netagainst HTML email X `- vCards / \ signature.asc Description: Digital signature
Bug#317524: infinite loop when using charset_default=iso-8859-1
Ah, yes it does. I somehow didn't look properly at the output I pasted. ;-X Obviously we have a config-parsing problem. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#317524: infinite loop when using charset_default=iso-8859-1
On Fri, 4 Nov 2005 16:34:29 -0500 Clint Adams wrote: Ah, yes it does. I somehow didn't look properly at the output I pasted. ;-X Obviously we have a config-parsing problem. Emar, Thank you for reporting this problem! Clint, The config parsing problem is the first half of the story. Once bogofilter has the bad charset name it tries to open the charset. Naturally that fails. However the failure returns -1 rather than a NULL and that causes the loop. The -1 vs NULL problem was caused by a fix last June. The attached patch fixes both problems. Regards, David patch.1104.charset.name.error Description: Binary data
Bug#317524: [PATCH] Bug#317524: infinite loop when using charset_default=iso-8859-1
Renamed the attachment so that mailman won't delete it.diff -u -r1.232 bogoconfig.c --- bogoconfig.c15 Oct 2005 20:52:51 - 1.232 +++ bogoconfig.c5 Nov 2005 00:14:39 - @@ -169,6 +169,23 @@ static char *get_string(const char *name, const char *arg) { char *s = xstrdup(arg); + +/* scan option to delete comment (after '#') and preceding whitespace */ +char *t = s; +bool quote = false; +for (t = s; *t != '\0' ; t += 1) +{ + char c = *t; + if (c == '') + quote ^= true; + if (!quote c == '#') { + *t-- = '\0'; + while (isspace(*t)) + *t-- = '\0'; + break; + } +} + if (DEBUG_CONFIG(2)) fprintf(dbgout, %s - '%s'\n, name, s); return s; diff -u -r1.90 bogolexer.c --- bogolexer.c 17 Jun 2005 03:31:06 - 1.90 +++ bogolexer.c 5 Nov 2005 00:14:39 - @@ -92,6 +92,23 @@ static char *get_string(const char *name, const char *arg) { char *s = xstrdup(arg); + +/* scan option to delete comment (after '#') and preceding whitespace */ +char *t = s; +bool quote = false; +for (t = s; *t != '\0' ; t += 1) +{ + char c = *t; + if (c == '') + quote ^= true; + if (!quote c == '#') { + *t-- = '\0'; + while (isspace(*t)) + *t-- = '\0'; + break; + } +} + if (DEBUG_CONFIG(2)) fprintf(dbgout, %s - '%s'\n, name, s); return s; diff -u -r1.13 convert_unicode.c --- convert_unicode.c 26 Jun 2005 01:06:26 - 1.13 +++ convert_unicode.c 5 Nov 2005 00:14:39 - @@ -33,7 +33,7 @@ #defineSP ' ' #include iconv.h -iconv_t cd; +iconv_t cd = NULL; static void map_nonascii_characters(void) { @@ -135,6 +135,8 @@ from_charset = charset_default; cd = bf_iconv_open( to_charset, from_charset ); +if (cd == (iconv_t) -1) + cd = NULL; for (idx = 0; idx COUNTOF(charsets); idx += 1) {
Bug#317524: infinite loop when using charset_default=iso-8859-1
Hi David, on Fri, Nov 04, 2005 at 19:22:38 -0500, you wrote: The attached patch fixes both problems. But unfortunately introduces new ones: static char *get_string(const char *name, const char *arg) { char *s = xstrdup(arg); + +/* scan option to delete comment (after '#') and preceding whitespace */ +char *t = s; t initially points to s, a copy of arg... +bool quote = false; +for (t = s; *t != '\0' ; t += 1) +{ + char c = *t; + if (c == '') + quote ^= true; + if (!quote c == '#') { + *t-- = '\0'; ...if arg starts with a '#' (ie. something like option=# foo in the config), t will now point one byte before the beginning of s... + while (isspace(*t)) ...and thus a faulty memory access will happen here. + *t-- = '\0'; + break; + } +} The attached fixed version of the patch avoids this (and further code duplication) by using the existing remove_comment() function, which already is used by other get_*() functions. The other problem is that init_charset_table_iconv() is not the only place bf_iconv_open() is used without checking for a result of -1. text_decode() in lexer.c contains the lines cd = bf_iconv_open( charset_unicode, charset ); iconvert_cd(cd, src, buf); and iconvert_cd() checks for cd == NULL only. Thus I think it makes more sense to fix bf_iconv_open() itself to always return NULL on failure, like in the attached patch. elmar -- .'`./\ | :' : Elmar Hoffmann [EMAIL PROTECTED]ASCII Ribbon Campaign \ / `. `'GPG key available via pgp.netagainst HTML email X `- vCards / \ diff -ru bogofilter-0.96.4.orig/src/bogoconfig.c bogofilter-0.96.4/src/bogoconfig.c --- bogofilter-0.96.4.orig/src/bogoconfig.c 2005-10-31 13:07:12.0 +0100 +++ bogofilter-0.96.4/src/bogoconfig.c 2005-11-05 02:14:59.396592326 +0100 @@ -169,6 +169,7 @@ static char *get_string(const char *name, const char *arg) { char *s = xstrdup(arg); +remove_comment(s); if (DEBUG_CONFIG(2)) fprintf(dbgout, %s - '%s'\n, name, s); return s; diff -ru bogofilter-0.96.4.orig/src/bogolexer.c bogofilter-0.96.4/src/bogolexer.c --- bogofilter-0.96.4.orig/src/bogolexer.c 2005-10-31 13:07:12.0 +0100 +++ bogofilter-0.96.4/src/bogolexer.c 2005-11-05 02:15:46.826805751 +0100 @@ -92,6 +92,7 @@ static char *get_string(const char *name, const char *arg) { char *s = xstrdup(arg); +remove_comment(s); if (DEBUG_CONFIG(2)) fprintf(dbgout, %s - '%s'\n, name, s); return s; diff -ru bogofilter-0.96.4.orig/src/convert_unicode.c bogofilter-0.96.4/src/convert_unicode.c --- bogofilter-0.96.4.orig/src/convert_unicode.c2005-07-27 00:11:20.0 +0200 +++ bogofilter-0.96.4/src/convert_unicode.c 2005-11-05 02:56:14.253757661 +0100 @@ -33,7 +33,7 @@ #defineSP ' ' #include iconv.h -iconv_t cd; +iconv_t cd = NULL; static void map_nonascii_characters(void) { @@ -115,7 +115,10 @@ from_charset, to_charset ); /* error - map default charset to unicode */ xd = iconv_open( charset_unicode, charset_default ); - } + if (xd == (iconv_t)(-1)) + xd = NULL; + } else + xd = NULL; } return xd; signature.asc Description: Digital signature
Bug#317524: infinite loop when using charset_default=iso-8859-1
On Sat, 5 Nov 2005 03:14:38 +0100 Elmar Hoffmann wrote: Hi David, on Fri, Nov 04, 2005 at 19:22:38 -0500, you wrote: The attached patch fixes both problems. But unfortunately introduces new ones: static char *get_string(const char *name, const char *arg) { char *s = xstrdup(arg); + +/* scan option to delete comment (after '#') and preceding whitespace */ +char *t = s; t initially points to s, a copy of arg... +bool quote = false; +for (t = s; *t != '\0' ; t += 1) +{ + char c = *t; + if (c == '') + quote ^= true; + if (!quote c == '#') { + *t-- = '\0'; ...if arg starts with a '#' (ie. something like option=# foo in the config), t will now point one byte before the beginning of s... + while (isspace(*t)) ...and thus a faulty memory access will happen here. + *t-- = '\0'; + break; + } +} The attached fixed version of the patch avoids this (and further code duplication) by using the existing remove_comment() function, which already is used by other get_*() functions. The other problem is that init_charset_table_iconv() is not the only place bf_iconv_open() is used without checking for a result of -1. text_decode() in lexer.c contains the lines cd = bf_iconv_open( charset_unicode, charset ); iconvert_cd(cd, src, buf); and iconvert_cd() checks for cd == NULL only. Thus I think it makes more sense to fix bf_iconv_open() itself to always return NULL on failure, like in the attached patch. elmar Nice work, elmar! Your observations about the code are very good, though not totally correct. If you look at function process_config_option(), you'll see that the original config line is duplicated and split at the '=' character (which has been replaced by '\0'). The *t-- code is applied to the part after '=' and the loop to erase whitespace will never backup to the part before the '='. When I made the quick change to get_string(), I hadn't fully analyzed the situation and had assumed (rather than determined) that all was safe. In any case, using remove_comment() is the better solution, as is using xd=NULL. CVS now has the new version. In honor of your contribution to the code base, your name has been added to the AUTHORS list. Well done! David -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]