Re: [aseek-users] Problems with pdftohtml

Kir Kolyshkin Wed, 04 Dec 2002 08:35:20 -0800

John Grubb wrote:

I have posted a similar query on the pdftohtml list
I'm attempting to crawl portions of the web with aspseek. Html output is working fine a is very stable. I have configured pdftohtml as a converter. It indexes most pdf's fine, so I don't think its a config problem, but crashes the crawl on some. when I download the file and try it command line it works fine. I'm currently running on the latest sources from cvs, having first tried 1.2.6 and 1.2.10. aspseek log output is as follows:
( 2 20 20 182 12 29 7 20) Adding URL: http://www.lsic.com/fin/annual01.pdf
exec /usr/bin/pdftohtml -i -noframes -stdout /tmp/asi5dQRXA >/tmp/asoXjR7TX
Address of param: ba072d20
Address of param: ba07a560 all 20 threads then crash.
Just started using pdftohtml yesterday. do I need different params

Try to run pdftohtml from command line:


/usr/bin/pdftohtml -i -noframes -stdout /tmp/asi5dQRXA > out

and see what's in the file 'out'.

Also, it would be great if you send us output of gdb's 'bt full' here,
like this:

$ ulimit -c 0
$ /path/to/index [your flags]
...it crashes...
$ gdb /path/to/index core | tee crashfile
....gdb starts up...
(gdb) bt full
....backtrace is shown, press 'space' when asked
(gdb) quit

And send the file 'crashfile' to this list, or to [EMAIL PROTECTED]

--
== kir_at_asplinux.ru == 7551596_at_ICQ == 6722750_at_sms.beemail.ru ==

Dream like you'll live forever...Love like you've never been hurt...
Work like you don't need the money...and Dance like nobody is watching!
       -- Satchel Paige

Re: [aseek-users] Problems with pdftohtml

Reply via email to