Bug#317524: infinite loop when using charset_default=iso-8859-1

2005-11-05 Thread Elmar Hoffmann
Hi David,

on Fri, Nov 04, 2005 at 23:23:11 -0500, you wrote:

Thanks! :)

 Your observations about the code are very good, though not totally
 correct.  If you look at function process_config_option(), you'll see
 that the original config line is duplicated and split at the '='
 character (which has been replaced by '\0').  The *t-- code is
 applied to the part after '=' and the loop to erase whitespace will
 never backup to the part before the '='.

I'm sorry, but apart from the fact that relying upon such
circumstances does not lead to robust code, I still beg to differ
regarding these circumstances. :)
process_config_option() indeed duplicates the original line, splits it
as you described and passes a pointer to the second part (val) down
through process_config_option_as_arg() and process_arg() to
get_string() as the second parameter arg.
get_string() as per your patch however then duplicates arg to s, which
thus is a new block of memory starting with the beginning of arg and,
unlike the one arg points into, the former '=' is not before it.
t gets to point to s and then is eventually decreased by one and
accessed by isspace(*t) during the first iteration as I described
previously.

elmar

-- 

 .'`./\
| :' :   Elmar Hoffmann [EMAIL PROTECTED]ASCII Ribbon Campaign  \ /
`. `'GPG key available via pgp.netagainst HTML email   X
  `- vCards  / \


signature.asc
Description: Digital signature


Bug#317524: infinite loop when using charset_default=iso-8859-1

2005-11-04 Thread Elmar Hoffmann
Package: bogofilter-bdb
Version: 0.96.4-1
Followup-For: Bug #317524

reopen 317524 !

I can reproduce the problem. bogofilter 0.94.13-1 did not have this
bug.

Like for the original submitter, it does not seem to depend on the db
- fails with both existing, and newly created one (removed
~/.bogofilter) - and it also does not seem to depend on the message
either:

--8---
$ echo foo | bogofilter -c /etc/bogofilter.cf~ -vvxdi
probing /home/elho/.bogofilter and wordlist.db for
environment...T_ENABLED
open_lockfile: open(/home/elho/.bogofilter/lockfile-p) succeeded, fd
#3
db_env_create: 0x807318c
DB_ENV-set_cachesize(4)
DB_ENV-set_lg_max(1048576)
DB_ENV-open(home=/home/elho/.bogofilter)
db_version: Header version 4.3, library version 4.3
[pid 8598] DB-open(db=0x8073f48, file=wordlist.db, database=NIL,
type=1, flags=0x110=DB_RDONLY DB_AUTO_COMMIT , mode=0664) - 0
Successful return: 0
DB-get_byteswapped: false
DB-fd: 5
DB-stat success, pagesize: 4096
converting iso-8859-1   # (alternate) to UTF-8
Conversion from 'iso-8859-1 # (alternate)' to 'UTF-8' is
not supported.
--8---

At this point bogofilter sits in an infinite loop using up all
available CPU time.

The only difference of /etc/bogofilter.cf~ to the default config is
the charset_default=iso-8859-1 setting.

HTH,
elmar

-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.12-bdclaim
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages bogofilter-bdb depends on:
ii  bogofilter-common 0.96.4-1   a fast Bayesian spam filter (commo
ii  libc6 2.3.5-7GNU C Library: Shared libraries an
ii  libdb4.3  4.3.29-1   Berkeley v4.3 Database Libraries [
ii  libgsl0   1.7-2  GNU Scientific Library (GSL) -- li

bogofilter-bdb recommends no packages.

-- no debconf information

-- 

 .'`./\
| :' :   Elmar Hoffmann [EMAIL PROTECTED]ASCII Ribbon Campaign  \ /
`. `'GPG key available via pgp.netagainst HTML email   X
  `- vCards  / \


signature.asc
Description: Digital signature


Bug#317524: infinite loop when using charset_default=iso-8859-1

2005-11-04 Thread Clint Adams
 I can reproduce the problem. bogofilter 0.94.13-1 did not have this
 bug.
 
 Like for the original submitter, it does not seem to depend on the db
 - fails with both existing, and newly created one (removed
 ~/.bogofilter) - and it also does not seem to depend on the message
 either:
 
 --8---
 $ echo foo | bogofilter -c /etc/bogofilter.cf~ -vvxdi
 probing /home/elho/.bogofilter and wordlist.db for
 environment...T_ENABLED
 open_lockfile: open(/home/elho/.bogofilter/lockfile-p) succeeded, fd
 #3
 db_env_create: 0x807318c
 DB_ENV-set_cachesize(4)
 DB_ENV-set_lg_max(1048576)
 DB_ENV-open(home=/home/elho/.bogofilter)
 db_version: Header version 4.3, library version 4.3
 [pid 8598] DB-open(db=0x8073f48, file=wordlist.db, database=NIL,
 type=1, flags=0x110=DB_RDONLY DB_AUTO_COMMIT , mode=0664) - 0
 Successful return: 0
 DB-get_byteswapped: false
 DB-fd: 5
 DB-stat success, pagesize: 4096
 converting iso-8859-1   # (alternate) to UTF-8
 Conversion from 'iso-8859-1 # (alternate)' to 'UTF-8' is
 not supported.
 --8---
 
 At this point bogofilter sits in an infinite loop using up all
 available CPU time.
 
 The only difference of /etc/bogofilter.cf~ to the default config is
 the charset_default=iso-8859-1 setting.

Does it work fine if you remove the trailing spaces and comment from the
config line?


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#317524: infinite loop when using charset_default=iso-8859-1

2005-11-04 Thread Elmar Hoffmann
Hi Clint,

on Fri, Nov 04, 2005 at 15:39:04 -0500, you wrote:

  Conversion from 'iso-8859-1 # (alternate)' to 'UTF-8' is
  not supported.
 
 Does it work fine if you remove the trailing spaces and comment from the
 config line?

Ah, yes it does. I somehow didn't look properly at the output I
pasted. ;-X

elmar

-- 

 .'`./\
| :' :   Elmar Hoffmann [EMAIL PROTECTED]ASCII Ribbon Campaign  \ /
`. `'GPG key available via pgp.netagainst HTML email   X
  `- vCards  / \


signature.asc
Description: Digital signature


Bug#317524: infinite loop when using charset_default=iso-8859-1

2005-11-04 Thread Clint Adams
 Ah, yes it does. I somehow didn't look properly at the output I
 pasted. ;-X

Obviously we have a config-parsing problem.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#317524: infinite loop when using charset_default=iso-8859-1

2005-11-04 Thread David Relson
On Fri, 4 Nov 2005 16:34:29 -0500
Clint Adams wrote:

  Ah, yes it does. I somehow didn't look properly at the output I
  pasted. ;-X
 
 Obviously we have a config-parsing problem.

Emar,

Thank you for reporting this problem!

Clint,

The config parsing problem is the first half of the story.  Once
bogofilter has the bad charset name it tries to open the charset.
Naturally that fails.  However the failure returns -1 rather than a
NULL and that causes the loop.  The -1 vs NULL problem was caused by a
fix last June.

The attached patch fixes both problems.

Regards,

David


patch.1104.charset.name.error
Description: Binary data


Bug#317524: [PATCH] Bug#317524: infinite loop when using charset_default=iso-8859-1

2005-11-04 Thread David Relson
Renamed the attachment so that mailman won't delete it.diff -u -r1.232 bogoconfig.c
--- bogoconfig.c15 Oct 2005 20:52:51 -  1.232
+++ bogoconfig.c5 Nov 2005 00:14:39 -
@@ -169,6 +169,23 @@
 static char *get_string(const char *name, const char *arg)
 {
 char *s = xstrdup(arg);
+
+/* scan option to delete comment (after '#') and preceding whitespace */
+char *t = s;
+bool quote = false;
+for (t = s; *t != '\0' ; t += 1)
+{
+   char c = *t;
+   if (c == '')
+   quote ^= true;
+   if (!quote  c == '#') {
+   *t-- = '\0';
+   while (isspace(*t))
+   *t-- = '\0';
+   break;
+   }
+}
+
 if (DEBUG_CONFIG(2))
fprintf(dbgout, %s - '%s'\n, name, s);
 return s;
diff -u -r1.90 bogolexer.c
--- bogolexer.c 17 Jun 2005 03:31:06 -  1.90
+++ bogolexer.c 5 Nov 2005 00:14:39 -
@@ -92,6 +92,23 @@
 static char *get_string(const char *name, const char *arg)
 {
 char *s = xstrdup(arg);
+
+/* scan option to delete comment (after '#') and preceding whitespace */
+char *t = s;
+bool quote = false;
+for (t = s; *t != '\0' ; t += 1)
+{
+   char c = *t;
+   if (c == '')
+   quote ^= true;
+   if (!quote  c == '#') {
+   *t-- = '\0';
+   while (isspace(*t))
+   *t-- = '\0';
+   break;
+   }
+}
+
 if (DEBUG_CONFIG(2))
fprintf(dbgout, %s - '%s'\n, name, s);
 return s;
diff -u -r1.13 convert_unicode.c
--- convert_unicode.c   26 Jun 2005 01:06:26 -  1.13
+++ convert_unicode.c   5 Nov 2005 00:14:39 -
@@ -33,7 +33,7 @@
 #defineSP  ' '
 
 #include iconv.h
-iconv_t cd;
+iconv_t cd = NULL;
 
 static void map_nonascii_characters(void)
 {
@@ -135,6 +135,8 @@
from_charset = charset_default;
 
 cd = bf_iconv_open( to_charset, from_charset );
+if (cd == (iconv_t) -1)
+   cd = NULL;
 
 for (idx = 0; idx  COUNTOF(charsets); idx += 1)
 {


Bug#317524: infinite loop when using charset_default=iso-8859-1

2005-11-04 Thread Elmar Hoffmann
Hi David,

on Fri, Nov 04, 2005 at 19:22:38 -0500, you wrote:

 The attached patch fixes both problems.

But unfortunately introduces new ones:

  static char *get_string(const char *name, const char *arg)
  {
  char *s = xstrdup(arg);
 +
 +/* scan option to delete comment (after '#') and preceding
 whitespace */
 +char *t = s;

t initially points to s, a copy of arg...

 +bool quote = false;
 +for (t = s; *t != '\0' ; t += 1)
 +{
 +   char c = *t;
 +   if (c == '')
 +   quote ^= true;
 +   if (!quote  c == '#') {
 +   *t-- = '\0';

...if arg starts with a '#' (ie. something like option=# foo in the
config), t will now point one byte before the beginning of s...

 +   while (isspace(*t))

...and thus a faulty memory access will happen here.

 +   *t-- = '\0';
 +   break;
 +   }
 +}

The attached fixed version of the patch avoids this (and further code
duplication) by using the existing remove_comment() function, which
already is used by other get_*() functions.

The other problem is that init_charset_table_iconv() is not the only
place bf_iconv_open() is used without checking for a result of -1.
text_decode() in lexer.c contains the lines
cd = bf_iconv_open( charset_unicode, charset );
iconvert_cd(cd, src, buf);
and iconvert_cd() checks for cd == NULL only.

Thus I think it makes more sense to fix bf_iconv_open() itself to
always return NULL on failure, like in the attached patch.

elmar

-- 

 .'`./\
| :' :   Elmar Hoffmann [EMAIL PROTECTED]ASCII Ribbon Campaign  \ /
`. `'GPG key available via pgp.netagainst HTML email   X
  `- vCards  / \
diff -ru bogofilter-0.96.4.orig/src/bogoconfig.c 
bogofilter-0.96.4/src/bogoconfig.c
--- bogofilter-0.96.4.orig/src/bogoconfig.c 2005-10-31 13:07:12.0 
+0100
+++ bogofilter-0.96.4/src/bogoconfig.c  2005-11-05 02:14:59.396592326 +0100
@@ -169,6 +169,7 @@
 static char *get_string(const char *name, const char *arg)
 {
 char *s = xstrdup(arg);
+remove_comment(s);
 if (DEBUG_CONFIG(2))
fprintf(dbgout, %s - '%s'\n, name, s);
 return s;
diff -ru bogofilter-0.96.4.orig/src/bogolexer.c 
bogofilter-0.96.4/src/bogolexer.c
--- bogofilter-0.96.4.orig/src/bogolexer.c  2005-10-31 13:07:12.0 
+0100
+++ bogofilter-0.96.4/src/bogolexer.c   2005-11-05 02:15:46.826805751 +0100
@@ -92,6 +92,7 @@
 static char *get_string(const char *name, const char *arg)
 {
 char *s = xstrdup(arg);
+remove_comment(s);
 if (DEBUG_CONFIG(2))
fprintf(dbgout, %s - '%s'\n, name, s);
 return s;
diff -ru bogofilter-0.96.4.orig/src/convert_unicode.c 
bogofilter-0.96.4/src/convert_unicode.c
--- bogofilter-0.96.4.orig/src/convert_unicode.c2005-07-27 
00:11:20.0 +0200
+++ bogofilter-0.96.4/src/convert_unicode.c 2005-11-05 02:56:14.253757661 
+0100
@@ -33,7 +33,7 @@
 #defineSP  ' '
 
 #include iconv.h
-iconv_t cd;
+iconv_t cd = NULL;
 
 static void map_nonascii_characters(void)
 {
@@ -115,7 +115,10 @@
from_charset, to_charset );
/* error - map default charset to unicode */
xd = iconv_open( charset_unicode, charset_default );
-   }
+   if (xd == (iconv_t)(-1))
+   xd = NULL;
+   } else
+   xd = NULL;
 }
 
 return xd;


signature.asc
Description: Digital signature


Bug#317524: infinite loop when using charset_default=iso-8859-1

2005-11-04 Thread David Relson
On Sat, 5 Nov 2005 03:14:38 +0100
Elmar Hoffmann wrote:

 Hi David,
 
 on Fri, Nov 04, 2005 at 19:22:38 -0500, you wrote:
 
  The attached patch fixes both problems.
 
 But unfortunately introduces new ones:
 
   static char *get_string(const char *name, const char *arg)
   {
   char *s = xstrdup(arg);
  +
  +/* scan option to delete comment (after '#') and preceding
  whitespace */
  +char *t = s;
 
 t initially points to s, a copy of arg...
 
  +bool quote = false;
  +for (t = s; *t != '\0' ; t += 1)
  +{
  +   char c = *t;
  +   if (c == '')
  +   quote ^= true;
  +   if (!quote  c == '#') {
  +   *t-- = '\0';
 
 ...if arg starts with a '#' (ie. something like option=# foo in the
 config), t will now point one byte before the beginning of s...
 
  +   while (isspace(*t))
 
 ...and thus a faulty memory access will happen here.
 
  +   *t-- = '\0';
  +   break;
  +   }
  +}
 
 The attached fixed version of the patch avoids this (and further code
 duplication) by using the existing remove_comment() function, which
 already is used by other get_*() functions.
 
 The other problem is that init_charset_table_iconv() is not the only
 place bf_iconv_open() is used without checking for a result of -1.
 text_decode() in lexer.c contains the lines
 cd = bf_iconv_open( charset_unicode, charset );
 iconvert_cd(cd, src, buf);
 and iconvert_cd() checks for cd == NULL only.
 
 Thus I think it makes more sense to fix bf_iconv_open() itself to
 always return NULL on failure, like in the attached patch.
 
 elmar

Nice work, elmar!

Your observations about the code are very good, though not totally
correct.  If you look at function process_config_option(), you'll see
that the original config line is duplicated and split at the '='
character (which has been replaced by '\0').  The *t-- code is
applied to the part after '=' and the loop to erase whitespace will
never backup to the part before the '='.  When I made the quick change
to get_string(), I hadn't fully analyzed the situation and had assumed
(rather than determined) that all was safe.

In any case, using remove_comment() is the better solution, as is using
xd=NULL.  CVS now has the new version.  In honor of your contribution
to the code base, your name has been added to the AUTHORS list.

Well done!

David


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]