Re: [htdig] SQL handling start_url

2000-12-07 Thread Bill Carlson

On Wed, 6 Dec 2000, Curtis Ireland wrote:

 2) Before htDig starts its database build, dump all the links to a text
 file and have the htdig.conf include this file

 The one problem with these two solutions is how would the limit_urls_to
 variable work? I want to make sure the links are properly indexed
 without going past the linked site.

This is the method I used, though in my case the backend was an email full
of links from the person directing the crawl. :)

Write 2 files, one for start_url and one for limit_urls, include both in
the conf file like so:

start_url:  `/home/htdig/conf/start_url_file`

limit_urls_to:  `/home/htdig/conf/limit_url_file`


The contents of both files are just links.

Good Luck,

Bill Carlson
-- 
Systems Programmer[EMAIL PROTECTED]|  Opinions are mine,
Virtual Hospital  http://www.vh.org/|  not my employer's.
University of Iowa Hospitals and Clinics|



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] SQL handling start_url

2000-12-07 Thread Gilles Detillieux

According to Curtis Ireland:
 Is there any way to have start_url get its list from an SQL back-end?
 Has anyone already built a patch to handle this?
 
 Here are a couple of solutions I can think of to bi-pass the problem,
 but I'm sure I'm not alone in desiring this feature.
 
 1) Build a PHP link built with links to all the sites we want to index.
 Have htDig use this as its start_url
 2) Before htDig starts its database build, dump all the links to a text
 file and have the htdig.conf include this file
 
 The one problem with these two solutions is how would the limit_urls_to
 variable work? I want to make sure the links are properly indexed
 without going past the linked site.

Either solution seems workable - it all depends on what your preference
is.  For the first solution, you'd need to have a limit_urls_to setting
that's liberal enough to allow through all the links that the PHP script
will spit out.  You should probably set your max_hop_count to 1 to avoid
having htdig go beyond the first hop, from the PHP output to the documents
it references.

For the second solution, you could probably just leave limit_urls_to as
the default, which is the same as the value of start_url, and set your
max_hop_count to 0.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] SQL handling start_url

2000-12-06 Thread Curtis Ireland

Hypothetical Situation:

I have an SQL database table of links I wish to present someone visiting
my site. However, I would like to make these links searchable from my
site. Normally, if these links were static, I would just list them in
the htdig.conf file.

Is there any way to have start_url get its list from an SQL back-end?
Has anyone already built a patch to handle this?

Here are a couple of solutions I can think of to bi-pass the problem,
but I'm sure I'm not alone in desiring this feature.

1) Build a PHP link built with links to all the sites we want to index.
Have htDig use this as its start_url
2) Before htDig starts its database build, dump all the links to a text
file and have the htdig.conf include this file

The one problem with these two solutions is how would the limit_urls_to
variable work? I want to make sure the links are properly indexed
without going past the linked site.

Just something for everyone to wrap your heads around.
-C

--
Curtis Ireland  - [EMAIL PROTECTED]
Solidum Systems - http://www.solidum.com
(T) (613)724-6004 x284  - (F) (613)724-6008


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html