Toby,
Did you have a valid_extensions: statement originally? If you did, then it might be worthwhile trying without it,
No, I had no valid_extensions originally. The query URLs were ignored regardless.
> as then all extensions not listed in
your bad_extensions: will be valid.
I don't think you're correct in the above: the doc says, "This is a list of extensions on URLs which are the only ones considered acceptable."
Do you really have no limit_urls_to: statement? That doesn't strike me as a good idea.
It's not needed, because "the value of start_url will be the default value for limit_urls_to," (see doc).
I have a list of 108 bad extensions if anyone is interested, but I make no claims that it is anywhere near complete.
I needed to add .ico & .swf because they were actually used on the site. I don't need a catch-all list as we are responsible for all site content.
Toby
----- Original Message ----- From: "Toby Thain" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, March 25, 2004 12:49 AM
Subject: Re: Fwd: [htdig] query parameters should be ignored by extension
filter?
Toby Thain wrote:
Begin forwarded message:
*From: *"David Adams" <[EMAIL PROTECTED]> *Date: *24 March 2004 9:18:17 PM *To: *<[EMAIL PROTECTED]>, "Toby Thain" <[EMAIL PROTECTED]> *Subject: Re: [htdig] query parameters should be ignored by extension filter? * I am also using ht://Dig version 3.1.6 and for me it IS indexing URLs like
http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg
even though I have .jpg in my bad_extensions: list.
I suggest that you take a hard look at your configuration file and check that one of:
exclude_urls: limit_urls_to: bad_querystr: url_rewrite_rules:
isn't excluding them.
David,
Thanks for your suggestions.
I am not using any of those directives; the .conf is vanilla except for customising the search results wrapper.
I did need to add .swf and .ico to the bad extensions list. IMHO these should really be in there by default (may be fixed in later version?)
Adding "valid_extensions: .php3 .html" did not help either; the URLs are still not being indexed. Even adding a fake "&q" to the end of the URL doesn't stop htdig rejecting it - a sample rejection from rundig -vvv:
----- href: http://stegbar.intranet/php/photo.php3?f=s_rc_pl_wd_t_aw_1.jpg&q (Thumbnail: windows_and_doors Enlarge)
Rejected: Extension is not valid! -----
Toby
Personally, I don't need those ~lopsoc/...jpg files and will be adding them to exclude_urls: if they publish many more of them!
David Adams Corporate Information Services Information Systems Services University of Southampton
----- Original Message ----- From: "Toby Thain" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, March 24, 2004 9:58 AM Subject: [htdig] query parameters should be ignored by extension filter?
List,
I noticed today that htdig is not indexing URLs like:
/foo/page.php3?f=bar.jpg
because it notices the URL ends with ".jpg". I am surprised that it's not smart enough to realise that the fetched object is actually
a
".php3", and I definitely want that URL followed.
Is this fixed in a recent version (I am using ht://Dig 3.1.6)? Or is there a simple configuration fix?
Toby
------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to
system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general
------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general
------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

