On Thu, 8 Apr 2004, [iso-8859-1] Tinni wrote:

> - We have almost 99 sites, now i want all 99 sites will be merged into a 
>   single database.. I want every site's database willbe created seperately
>   so that i can create seperately the databse - this is for server load/space
>   etc. Finally i will merge all sites into a single database.. 
>   Could you please give me the idea how i will run 'rundig' , 'htmerge'
>   for the above requirement?

You first need to create separate configuration files for each set of
databases. It is probably easiest to start with copies of the default
configuration file and make edits as necessary. At a minimum you should
probably take a look at the following attributes.

 database_base
 database_dir
 start_url
 limit_urls_to

After creating the appropriate configuration files, you can build each
database set by using the standard rundig script and passing the
corresponding configuration file with the '-c' option. In order to merge
everything together, you need to call htmerge repeatedly with the '-m'
option. The merge step is explained in the documentation for htmerge.
See http://www.htdig.org/htmerge.html

I would suggest that you start with two or three sets of databases and
work with that until you are comfortable with the process and verify
that you have worked out any kinks that you might run into.

> - I am seeing while running rundig, with one of my site, it is spidering
>   all the sites - i have set the start_url parameter as follows:
> 
>   start_url:              http://www.example.org/
> 
>   where example.org is the main site.. I want  sites related to our 
>   sites only , will be spidered.. But it is spidering many many sites, which
>   are not related to our specific.. So how can i configure this type of setting...

Is the problem that htdig is hitting sites that are not part of
www.example.com or that it is hitting parts of www.example.com that you
don't want it to hit. If the former, then check your limit_urls_to
attribute. The default setting (${start_url}) would limit the dig to
URLs that include the string "www.example.com". If the problem is the
latter, then it is hard to provide a general answer. You will most
likely need to play with combinations of start_url, limit_urls_to and
exlude_urls to get the effect you are looking for.

Jim


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to