Hi Jasmen,

I believe the external parser works with parse_doc.pl, which in turn picks up the file you want to parse. I.E: It's a two step process:

1) ht://dig finds the XLS file, and invokes parse_doc.pl with the XLS file
2) parse_doc.pl invokes a program (I think it's CATDOC for XLS files, though maybe that's just DOC and RTF files) to convert the file into plain text - the only format ht://dig understands.
3) The converted XLS file, which is now plain text, is returned to ht://dig for parsing.


From the sound of your problem, ht://dig is 'seeing' the XLS file, as it is able to index it, but something is going wrong when it is trying to 'read' it, i.e: it is not being converted into plain text.

This is -probably- something to do with the pathnames to your external parsers, as I have had such problems myself, if not in the htdig.conf (parsers section) file, then in parse_doc.pl.

Check your pathnames, and use absolute paths where possible.

Regards,

Rupert.


[EMAIL PROTECTED] wrote:

Hello,

I use for parsing my Excel files the xlhtml-0.5 parser!
If I set start_url to test.xls (with "test test" many times as content) and
run rundig -ivvv, I get output, that test.xls was parsed. But if I get
"test" in the search form I get message "Nothing found"!

What is wrong hier?

Thanks and best regards!
jasmen





-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to