According to Toby Thain:
> >> David Adams wrote:
> >>
> >>> Toby,
> >>>
> >>> Did you have a valid_extensions: statement originally?  If you did, 
> >>> then
> > it
> >>> might be worthwhile trying without it,
> >>
> >> No, I had no valid_extensions originally. The query URLs were ignored
> >> regardless.
> >>
> >>> as then all extensions not listed in
> >>> your bad_extensions: will be valid.
> >>
> >> I don't think you're correct in the above: the doc says, "This is a 
> >> list
> >> of extensions on URLs which are the only ones considered acceptable."
> >>
> >
> > The 3.1.6 documentation says: "If the list is empty, then all 
> > extensions are
> > acceptable, provided they pass other criteria for acceptance or 
> > rejection.
> > If the list is not empty, only documents with one of the extensions in 
> > the
> > list are parsed."
> 
> That refers to an empty directive. I was using ".html .php3", so 
> bad_extensions would still have been rejected.

David is correct as far as what the documentation says, and as far as
htdig is intended to operate.  valid_extensions is only needed when you
want to limit htdig to only a few specific extensions and exclude all
others.  In cases where you know exactly which extensions you want to
exclude, bad_extensions _should_ be sufficient.

Trouble is the coding bug that affects bad_extensions also affects how
valid_extensions is used.  The patch I just submitted should fix htdig
to completely strip off the query string before looking for the extension,
so it won't get confused by extensions in query string parameters.

David Adams also wrote:
> I was trying to find some possible reason why htdig 3.1.6, with .jpg in the
> bad_extensions: list, indexed pages like
> 
> http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg
> 
> for me but not for you.  Gilles Detillieux has suggested that my source code
> might have been patched (any news on that Gilles?).

I don't know what news I can give you, other than there wasn't a patch
in the ftp.ccsf.org archives to do this.  If you do a "diff -up" to
compare a vanilla 3.1.6 htdig/Retriever.cc with the one you're using,
that would be the only way to know for sure what patches you have.
You reported that you were only using my metadate.0 patch, which has no
effect on Retriever::IsValidURL()'s handling of extensions.

> While I, having eliminated several other configuration file statements, was
> checking out a wild guess on my part that the valid_extensions: statement
> might be relevant.  It seems not.

No, I wouldn't have expected that.  However, I'm at a bit of a loss to
figure out why your site doesn't exhibit the same problem Toby reported.
Regardless, would you mind trying out my patch to see if anything changes
(or breaks) when you use it?  As near as I can tell, this patch should
make htdig do what was intended all along.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to