According to Toby Thain: > >> David Adams wrote: > >> > >>> Toby, > >>> > >>> Did you have a valid_extensions: statement originally? If you did, > >>> then > > it > >>> might be worthwhile trying without it, > >> > >> No, I had no valid_extensions originally. The query URLs were ignored > >> regardless. > >> > >>> as then all extensions not listed in > >>> your bad_extensions: will be valid. > >> > >> I don't think you're correct in the above: the doc says, "This is a > >> list > >> of extensions on URLs which are the only ones considered acceptable." > >> > > > > The 3.1.6 documentation says: "If the list is empty, then all > > extensions are > > acceptable, provided they pass other criteria for acceptance or > > rejection. > > If the list is not empty, only documents with one of the extensions in > > the > > list are parsed." > > That refers to an empty directive. I was using ".html .php3", so > bad_extensions would still have been rejected.
David is correct as far as what the documentation says, and as far as htdig is intended to operate. valid_extensions is only needed when you want to limit htdig to only a few specific extensions and exclude all others. In cases where you know exactly which extensions you want to exclude, bad_extensions _should_ be sufficient. Trouble is the coding bug that affects bad_extensions also affects how valid_extensions is used. The patch I just submitted should fix htdig to completely strip off the query string before looking for the extension, so it won't get confused by extensions in query string parameters. David Adams also wrote: > I was trying to find some possible reason why htdig 3.1.6, with .jpg in the > bad_extensions: list, indexed pages like > > http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg > > for me but not for you. Gilles Detillieux has suggested that my source code > might have been patched (any news on that Gilles?). I don't know what news I can give you, other than there wasn't a patch in the ftp.ccsf.org archives to do this. If you do a "diff -up" to compare a vanilla 3.1.6 htdig/Retriever.cc with the one you're using, that would be the only way to know for sure what patches you have. You reported that you were only using my metadate.0 patch, which has no effect on Retriever::IsValidURL()'s handling of extensions. > While I, having eliminated several other configuration file statements, was > checking out a wild guess on my part that the valid_extensions: statement > might be relevant. It seems not. No, I wouldn't have expected that. However, I'm at a bit of a loss to figure out why your site doesn't exhibit the same problem Toby reported. Regardless, would you mind trying out my patch to see if anything changes (or breaks) when you use it? As near as I can tell, this patch should make htdig do what was intended all along. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

