From: "David Adams" <[EMAIL PROTECTED]>
To: "Toby Thain" <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>
Subject: Re: Fwd: [htdig] query parameters should be ignored by extension filter?
Date: Fri, 26 Mar 2004 10:10:01 -0000



----- Original Message -----
From: "Toby Thain" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, March 25, 2004 10:49 PM
Subject: Re: Fwd: [htdig] query parameters should be ignored by extension
filter?



David Adams wrote:

Toby,

Did you have a valid_extensions: statement originally? If you did, then
it
might be worthwhile trying without it,

No, I had no valid_extensions originally. The query URLs were ignored regardless.

as then all extensions not listed in
your bad_extensions: will be valid.

I don't think you're correct in the above: the doc says, "This is a list
of extensions on URLs which are the only ones considered acceptable."



The 3.1.6 documentation says: "If the list is empty, then all extensions are
acceptable, provided they pass other criteria for acceptance or rejection.
If the list is not empty, only documents with one of the extensions in the
list are parsed."

That refers to an empty directive. I was using ".html .php3", so bad_extensions would still have been rejected.




Good point, but beware of the trap of setting http://www.foo.bar/index.htm
as your starting point.

I didn't :-)




I have a list of 108 bad extensions if anyone is interested, but I make
no
claims that it is anywhere near complete.

I needed to add .ico & .swf because they were actually used on the site.
I don't need a catch-all list as we are responsible for all site content.

No list is going to make everybody happy. I would not want .swf as a bad
extension because we parse .swf files for links.

Would this not risk the parser picking up spurious and/or useless text inside the swf?


Toby



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to