Re: [htdig] SQL handling start_url
On Wed, 6 Dec 2000, Curtis Ireland wrote: 2) Before htDig starts its database build, dump all the links to a text file and have the htdig.conf include this file The one problem with these two solutions is how would the limit_urls_to variable work? I want to make sure the links are properly indexed without going past the linked site. This is the method I used, though in my case the backend was an email full of links from the person directing the crawl. :) Write 2 files, one for start_url and one for limit_urls, include both in the conf file like so: start_url: `/home/htdig/conf/start_url_file` limit_urls_to: `/home/htdig/conf/limit_url_file` The contents of both files are just links. Good Luck, Bill Carlson -- Systems Programmer[EMAIL PROTECTED]| Opinions are mine, Virtual Hospital http://www.vh.org/| not my employer's. University of Iowa Hospitals and Clinics| To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] SQL handling start_url
According to Curtis Ireland: Is there any way to have start_url get its list from an SQL back-end? Has anyone already built a patch to handle this? Here are a couple of solutions I can think of to bi-pass the problem, but I'm sure I'm not alone in desiring this feature. 1) Build a PHP link built with links to all the sites we want to index. Have htDig use this as its start_url 2) Before htDig starts its database build, dump all the links to a text file and have the htdig.conf include this file The one problem with these two solutions is how would the limit_urls_to variable work? I want to make sure the links are properly indexed without going past the linked site. Either solution seems workable - it all depends on what your preference is. For the first solution, you'd need to have a limit_urls_to setting that's liberal enough to allow through all the links that the PHP script will spit out. You should probably set your max_hop_count to 1 to avoid having htdig go beyond the first hop, from the PHP output to the documents it references. For the second solution, you could probably just leave limit_urls_to as the default, which is the same as the value of start_url, and set your max_hop_count to 0. -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
[htdig] SQL handling start_url
Hypothetical Situation: I have an SQL database table of links I wish to present someone visiting my site. However, I would like to make these links searchable from my site. Normally, if these links were static, I would just list them in the htdig.conf file. Is there any way to have start_url get its list from an SQL back-end? Has anyone already built a patch to handle this? Here are a couple of solutions I can think of to bi-pass the problem, but I'm sure I'm not alone in desiring this feature. 1) Build a PHP link built with links to all the sites we want to index. Have htDig use this as its start_url 2) Before htDig starts its database build, dump all the links to a text file and have the htdig.conf include this file The one problem with these two solutions is how would the limit_urls_to variable work? I want to make sure the links are properly indexed without going past the linked site. Just something for everyone to wrap your heads around. -C -- Curtis Ireland - [EMAIL PROTECTED] Solidum Systems - http://www.solidum.com (T) (613)724-6004 x284 - (F) (613)724-6008 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html