According to Marcus Valentine:
> At 12:32 23/11/01 -0600, Gilles Detillieux wrote:
> >According to Marcus Valentine:
> >> At 11:04 23/11/01 -0600, Gilles Detillieux wrote:
> >> >According to Marcus Valentine:
> >> >> On my intranet, there is a unfortunate xls file. Although the xls
> file is
> >> >> only 266 kb big, converting it with xlhtml 0.3 at the command line
> results
> >> >> in a 37 Mb html file. (Running with the -a option [aggressive html
> >> >> optimization] reduces the file size to 23 Mb).
> >> >> 
> >> >> Running htdog 3.1.5 with doc2html.pl version 3 calling xlhtml 0.3
> results
> >> >> in an htdig core dump when it gets to this document. Htdig runs on Linux
> >> >> Redhat 6.2
...
> Here's another back trace. This one was generated when htdig encounter a
> file that began:
> 
> -0.348096
> -0.070797
> 0.204147
> 0.393852
> 0.449417
> 
> and then continued in a similar vein for 921595 lines (file size was 8.3M).
> This time no external convertors were involved. There seems to be a problem
> when htdig encounters big files. I've got max_doc_size set big (20 000 000)
> as I've got some sizable pdfs on my system. I will exclude the files just
> containing numbers (I missed them previously), but I still have the
> previous problem with xlhtml.
...
> #0  0x400a6d21 in __kill () from /lib/libc.so.6
> (gdb) bt
> #0  0x400a6d21 in __kill () from /lib/libc.so.6
> #1  0x400a6996 in raise (sig=6) at ../sysdeps/posix/raise.c:27
> #2  0x400a80b8 in abort () at ../sysdeps/generic/abort.c:88
> #3  0x40057e55 in __default_terminate () from
> /usr/lib/libstdc++-libc6.1-1.so.2
> #4  0x40058c1a in terminate () from /usr/lib/libstdc++-libc6.1-1.so.2
> #5  0x40058cf8 in __eh_alloc (size=36) from /usr/lib/libstdc++-libc6.1-1.so.2
> #6  0x40058d88 in __cp_push_exception (value=0xc1d9fd0, type=0x4006af84,
>     cleanup=0x4005b604 <bad_alloc::~bad_alloc(void)>) from
> /usr/lib/libstdc++-libc6.1-1.so.2
> #7  0x4005a252 in __builtin_new (sz=40) from /usr/lib/libstdc++-libc6.1-1.so.2
> #8  0x805a86b in strcpy () at ../sysdeps/generic/strcpy.c:30
> #9  0x80521db in strcpy () at ../sysdeps/generic/strcpy.c:30
> #10 0x804f531 in strcpy () at ../sysdeps/generic/strcpy.c:30
> #11 0x8050d25 in strcpy () at ../sysdeps/generic/strcpy.c:30
> #12 0x805099a in strcpy () at ../sysdeps/generic/strcpy.c:30
> #13 0x805036d in strcpy () at ../sysdeps/generic/strcpy.c:30
> #14 0x8054b60 in strcpy () at ../sysdeps/generic/strcpy.c:30
> #15 0x400a09cb in __libc_start_main (main=0x80543f0 <strcpy+40380>, argc=7,
> argv=0xbffffb64,
>     init=0x8049da4 <_init>, fini=0x8090eac <_fini>, rtld_fini=0x4000aea0
> <_dl_fini>, stack_end=0xbffffb5c)
>     at ../sysdeps/generic/libc-start.c:92
> (gdb)

Well, both backtraces you sent me don't seem to point to any part of
the htdig code, so it's pretty hard to make sense of them.  I'd guess
that the stack is getting messed up somewhere, causing the program to
run amuck.  So, we don't have anything particularly conclusive yet,
but it is interesting to know that the problem happens even for files
that don't use external parsers or converters.

However, I can't reproduce the problem on my Red hat 6.2 system.
I tried with a 56 MB HTML file, with max_doc_size set to 40000000, and
it ran fine on this file.  Can you get htdig to crash on just one file,
or does it only happen after indexing many files?  If it fails solidly
on just one file, please let me know where I can pick up a copy of it
(please don't e-mail the file to me!) and I'll see if I can reproduce
the problem.  If it requires indexing many files, maybe I could try
indexing your site from my system to see if my htdig crashes too.

Have you ruled out the possibility of a hardware problem on your
Linux box?  If you have a bad memory chip, it could lead to all sorts
of wierdness.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to