On Thu, 8 Apr 2004, [iso-8859-1] Tinni wrote: > - We have almost 99 sites, now i want all 99 sites will be merged into a > single database.. I want every site's database willbe created seperately > so that i can create seperately the databse - this is for server load/space > etc. Finally i will merge all sites into a single database.. > Could you please give me the idea how i will run 'rundig' , 'htmerge' > for the above requirement?
You first need to create separate configuration files for each set of databases. It is probably easiest to start with copies of the default configuration file and make edits as necessary. At a minimum you should probably take a look at the following attributes. database_base database_dir start_url limit_urls_to After creating the appropriate configuration files, you can build each database set by using the standard rundig script and passing the corresponding configuration file with the '-c' option. In order to merge everything together, you need to call htmerge repeatedly with the '-m' option. The merge step is explained in the documentation for htmerge. See http://www.htdig.org/htmerge.html I would suggest that you start with two or three sets of databases and work with that until you are comfortable with the process and verify that you have worked out any kinks that you might run into. > - I am seeing while running rundig, with one of my site, it is spidering > all the sites - i have set the start_url parameter as follows: > > start_url: http://www.example.org/ > > where example.org is the main site.. I want sites related to our > sites only , will be spidered.. But it is spidering many many sites, which > are not related to our specific.. So how can i configure this type of setting... Is the problem that htdig is hitting sites that are not part of www.example.com or that it is hitting parts of www.example.com that you don't want it to hit. If the former, then check your limit_urls_to attribute. The default setting (${start_url}) would limit the dig to URLs that include the string "www.example.com". If the problem is the latter, then it is hard to provide a general answer. You will most likely need to play with combinations of start_url, limit_urls_to and exlude_urls to get the effect you are looking for. Jim ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

