On Sun, 28 Mar 2004 16:11:21 -0500  Douglas Kline wrote:

> Thanks for your responses.  How were you able to distinguish the error
> message I reported from one generated by ht-Dig?  

30 years of looking at error messages.  And a good memory.


> The same Web pages have been and are still being processed by ht-Dig
> version 3.1.5 without these errors.  What is version 3.2.0b5 doing
> differently?  

Possibly your 3.1.5 installation has user_agent: set in its config file,
but your 3.2.b5 does not.


> Does htdig put these error messages it generates in the databases
> where htsearch finds them?  

Htdig puts whatever the remote sites return into the db.  AFAIK neither
htdig nor any other spidering software try to distinguish errors on
the remote server from "good" data.


> Also, why are some pages not reported by htsearch which should be
> reported?  The tests I ran suggest (but don't prove) that the key
> difference between reporting matching pages and not doing so is that
> the search terms which lead to reports are found in the top-level page
> of the URL given as argument to htdig while search terms which lead to
> the htsearch report of no matching pages when in fact there are some
> aren't in the top-level page.  

Sorry, I failed to parse this.  


> Does the error returned by htdig from some pages explain this or is it
> an independent issue?

If you (or htdig) see the error, then you can be pretty sure that the
proper content of the page has _not_ been returned.  So a search won't
find it.  You can also be pretty sure that any links from the page in
question have _not_ been followed.


What you need to do, if it is essential to you to index 
Coldfusion-generated pages, is to put this line into your htdig.conf
    user_agent: "Mozilla/4.0 (htdig 3.2.b5)"
It should not affect other remote pages, unless they are doing a 
robot-exclusion based on useragent, in which case you'll index things 
you shouldn't, and they'll get upset with you.

Sadly by doing this you are encouraging Macromedia to continue to ignore
the HTTP spec, but quick-fixes generally win over doing it right :-(



Mike
-- 
Mike Causer                          Email - mailto:[EMAIL PROTECTED]
GPG KeyID 1C2DDA07                       WWW - http://www.mikecauser.org
Flood the fen again! - Wicken Fen enlargement - http://www.wicken.org.uk


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to