Toby, Did you have a valid_extensions: statement originally? If you did, then it might be worthwhile trying without it, as then all extensions not listed in your bad_extensions: will be valid.
Do you really have no limit_urls_to: statement? That doesn't strike me as a good idea. I have a list of 108 bad extensions if anyone is interested, but I make no claims that it is anywhere near complete. ----- Original Message ----- From: "Toby Thain" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, March 25, 2004 12:49 AM Subject: Re: Fwd: [htdig] query parameters should be ignored by extension filter? > Toby Thain wrote: > > > > > > > > > Begin forwarded message: > > > > *From: *"David Adams" <[EMAIL PROTECTED]> > > *Date: *24 March 2004 9:18:17 PM > > *To: *<[EMAIL PROTECTED]>, "Toby Thain" > > <[EMAIL PROTECTED]> > > *Subject: Re: [htdig] query parameters should be ignored by > > extension filter? > > * > > I am also using ht://Dig version 3.1.6 and for me it IS indexing > > URLs like > > > > http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg > > > > > > even though I have .jpg in my bad_extensions: list. > > > > I suggest that you take a hard look at your configuration file and > > check > > that one of: > > > > exclude_urls: > > limit_urls_to: > > bad_querystr: > > url_rewrite_rules: > > > > isn't excluding them. > > David, > > Thanks for your suggestions. > > I am not using any of those directives; the .conf is vanilla except for > customising the search results wrapper. > > I did need to add .swf and .ico to the bad extensions list. IMHO these > should really be in there by default (may be fixed in later version?) > > Adding "valid_extensions: .php3 .html" did not help either; the URLs are > still not being indexed. Even adding a fake "&q" to the end of the URL > doesn't stop htdig rejecting it - a sample rejection from rundig -vvv: > > ----- > href: http://stegbar.intranet/php/photo.php3?f=s_rc_pl_wd_t_aw_1.jpg&q > (Thumbnail: windows_and_doors > Enlarge) > > Rejected: Extension is not valid! > ----- > > Toby > > > > > Personally, I don't need those ~lopsoc/...jpg files and will be > > adding them > > to exclude_urls: if they publish many more of them! > > > > David Adams > > Corporate Information Services > > Information Systems Services > > University of Southampton > > > > ----- Original Message ----- > > From: "Toby Thain" <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Sent: Wednesday, March 24, 2004 9:58 AM > > Subject: [htdig] query parameters should be ignored by extension > > filter? > > > > > > List, > > > > I noticed today that htdig is not indexing URLs like: > > > > /foo/page.php3?f=bar.jpg > > > > because it notices the URL ends with ".jpg". I am surprised that > > it's > > not smart enough to realise that the fetched object is actually a > > ".php3", and I definitely want that URL followed. > > > > Is this fixed in a recent version (I am using ht://Dig 3.1.6)? > > Or is > > there a simple configuration fix? > > > > Toby > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IBM Linux Tutorials > > Free Linux tutorial presented by Daniel Robbins, President and > > CEO of > > GenToo technologies. Learn everything from fundamentals to system > > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > > > > _______________________________________________ > > ht://Dig general mailing list: > > <[EMAIL PROTECTED]> > > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > > List information (subscribe/unsubscribe, etc.) > > https://lists.sourceforge.net/lists/listinfo/htdig-general > > > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > ht://Dig general mailing list: <[EMAIL PROTECTED]> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

