John Grubb wrote:
I have posted a similar query on the pdftohtml list
I'm attempting to crawl portions of the web with aspseek. Html output
is working fine a is very stable. I have configured pdftohtml as a
converter. It indexes most pdf's fine, so I don't think its a config
problem, but crashes the crawl on some. when I download the file and
try it command line it works fine. I'm currently running on the latest
sources from cvs, having first tried 1.2.6 and 1.2.10. aspseek log
output is as follows:
( 2 20 20 182 12 29 7 20) Adding URL: http://www.lsic.com/fin/annual01.pdf
exec /usr/bin/pdftohtml -i -noframes -stdout /tmp/asi5dQRXA >/tmp/asoXjR7TX
Address of param: ba072d20
Address of param: ba07a560
all 20 threads then crash.
Just started using pdftohtml yesterday. do I need different params
Try to run pdftohtml from command line:
/usr/bin/pdftohtml -i -noframes -stdout /tmp/asi5dQRXA > out
and see what's in the file 'out'.
Also, it would be great if you send us output of gdb's 'bt full' here,
like this:
$ ulimit -c 0
$ /path/to/index [your flags]
...it crashes...
$ gdb /path/to/index core | tee crashfile
....gdb starts up...
(gdb) bt full
....backtrace is shown, press 'space' when asked
(gdb) quit
And send the file 'crashfile' to this list, or to [EMAIL PROTECTED]
--
== kir_at_asplinux.ru == 7551596_at_ICQ == 6722750_at_sms.beemail.ru ==
Dream like you'll live forever...Love like you've never been hurt...
Work like you don't need the money...and Dance like nobody is watching!
-- Satchel Paige