Bug#614760: [PATCH] plug conversion descriptor leak in unac.c::convert() error path

Ersek, Laszlo Tue, 11 Oct 2011 14:27:17 -0700

Sorry for the delayed answer.

On Mon, 10 Oct 2011, Jean-Francois Dockes wrote:

It would seem that there is some file in your document set which iscrashing recoll. We need to determine which it is, get it out of theindexed set so that you can begin to use recoll again, and if at allpossible, I would very much like to get a copy to fix the bug (if thisis confidential data, we'll try other ways to get details about theissue).
For a beginning, we need to have a look at the log file before the pointwhere recoll crashes.

I rebuilt the package with noopt,nostrip,debug. I debugged it down torecoll-1.13.04/common/unacpp.cpp, function unacmaybefold(). It is calledwith dofold = true. unacfold_string() returns with -1, errno set to 12(ENOMEM). Then unacmaybefold() goes on to format an error message:


    45      if (status < 0) {
    46          if (cout)
    47              free(cout);
    48          char cerrno[20];
    49          sprintf(cerrno, "%d", errno);
    50          out = string("unac_string failed, errno : ") + cerrno;
    51          return false;
    52      }

however on line 50 the string concatenation runs out of memory (notsurprising, after unacfold_string() failed with ENONEM, and that's thesource of the std::bad_alloc exception object.


This happens right after the million lines of

:3:../rcldb/rcldb.cpp:813:Db::splitter::takeword: unac failed for [...]

are printed, during which phase the VSZ of the recollindex process growsconstantly. When the process finally reaches the above, the VSZ is1,521,648 KB.

I followed unacfold_string() to unacmaybefold_string() and started tosuspect that it leaks somewhere. The code was very hard to follow ingdb/ddd (I guess some optimization remained enabled, because the linenumber kept jumping around and it was very hard to set breakpoints). Aftera while I got tired and started it under valgrind, and thankfully valgrindcompleted the top of the stack: it is indeed convert(), called byunacmaybefold_string(), that leaks an iconv() conversion descriptor (andtherefore, memory) in the error path(s). (I think it's very wasteful toopen/close a descriptor for the same conversion thousands of times, but Idigress.)

I identified the file that caused this huge number of conversion errors --it's a Maildir file with a zip and a rar attachment. Both compressed fileshave the same contents: two latin2 encoded text files (tables, actually),1.3 and 1.4 MB in size. In total 5.4 MB of latin2 encoded text that caused90,228 conversion failures (and presumably leaked the same number of convdescs).


The following patch fixed my problem. VSZ peaks around 160 MB.

Laszlo

--- build/recoll-1.13.04/unac/unac.c    2010-01-30 08:58:40.000000000 +0100
+++ build2/recoll-1.13.04/unac/unac.c   2011-10-11 23:05:21.000000000 +0200
@@ -10661,7 +10661,7 @@ static int convert(const char* from, con
            if(errno == E2BIG)
              /* fall thru to the E2BIG case below */;
            else
-             return -1;
+             goto err;
          } else {
            /* The offending character was replaced by a SPACE, skip it. */
            in += 2;
@@ -10670,7 +10670,7 @@ static int convert(const char* from, con
            break;
          }
        } else {
-         return -1;
+         goto err;
        }
       case E2BIG:
        {
@@ -10690,7 +10690,7 @@ static int convert(const char* from, con
                      DEBUG("realloc %d bytes failed\n", out_size+1);
                  free(saved);
                  *outp = 0;
-                 return -1;
+                 goto err;
              }
          }
          out = out_base + length;
@@ -10698,7 +10698,7 @@ static int convert(const char* from, con
        }
        break;
       default:
-       return -1;
+       goto err;
        break;
       }
     }
@@ -10710,6 +10710,9 @@ static int convert(const char* from, con
   (*outp)[*out_lengthp] = '\0';

   return 0;
+err:
+  iconv_close(cd);
+  return -1;
 }

 int unacmaybefold_string(const char* charset,



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#614760: [PATCH] plug conversion descriptor leak in unac.c::convert() error path

Reply via email to