Hi Jasmen,
I believe the external parser works with parse_doc.pl, which in turn picks up the file you want to parse. I.E: It's a two step process:
1) ht://dig finds the XLS file, and invokes parse_doc.pl with the XLS file
2) parse_doc.pl invokes a program (I think it's CATDOC for XLS files, though maybe that's just DOC and RTF files) to convert the file into plain text - the only format ht://dig understands.
3) The converted XLS file, which is now plain text, is returned to ht://dig for parsing.
From the sound of your problem, ht://dig is 'seeing' the XLS file, as it is able to index it, but something is going wrong when it is trying to 'read' it, i.e: it is not being converted into plain text.
This is -probably- something to do with the pathnames to your external parsers, as I have had such problems myself, if not in the htdig.conf (parsers section) file, then in parse_doc.pl.
Check your pathnames, and use absolute paths where possible.
Regards,
Rupert.
[EMAIL PROTECTED] wrote:
Hello,
I use for parsing my Excel files the xlhtml-0.5 parser! If I set start_url to test.xls (with "test test" many times as content) and run rundig -ivvv, I get output, that test.xls was parsed. But if I get "test" in the search form I get message "Nothing found"!
What is wrong hier?
Thanks and best regards! jasmen
------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

