I have a search searching in up to 6 different DBs.
Right now I have a rundig-script which updates each of them and finaly merges tem into a global one, So I can select via the config-parameter in htsearch which DBs should get spidered.
Now, everything is up and running I want to make it easier to maintain,
such like having:
a centralized file with all domains and their start_url, limit_urls_to, exclude_urls and url_rewrite rules.
My aproach would be something like an own file containing the urls and read them in the bashscript calling htdig.
Therefor I d have to pass the above values (or some vars for includepath and dbdir ) to htdig on the commandlinde or if that doesnt work make use of a dynamical created config-file.
A problem I have here is also giving arguments to external parser scripts
as the following leads to htsearch ignoring my config or parts of it:
external_parsers: application/pdf->text/html /opt/doc2html/doc2html.pl \
text/html->text/html-internal "/opt/htdig/bin/checkLang.pl de"
same thing for htsearch, having 7 config files right now I d like to give at least the dbdir as additional parameter. better would be some custom vars.
related questions:
do I need the endings list and stuff here ?
can I change the config-dir for htsearch ?
Thats it for now, Any thaughts, tips and ideas are greatly apreciated
thanx Sam
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

