Toby,

Did you have a valid_extensions: statement originally?  If you did, then it
might be worthwhile trying without it, as then all extensions not listed in
your bad_extensions: will be valid.

Do you really have no limit_urls_to: statement?  That doesn't strike me as a
good idea.

I have a list of 108 bad extensions if anyone is interested, but I make no
claims that it is anywhere near complete.

----- Original Message ----- 
From: "Toby Thain" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, March 25, 2004 12:49 AM
Subject: Re: Fwd: [htdig] query parameters should be ignored by extension
filter?


> Toby Thain wrote:
>
> >
> >
> >
> > Begin forwarded message:
> >
> >     *From: *"David Adams" <[EMAIL PROTECTED]>
> >     *Date: *24 March 2004 9:18:17 PM
> >     *To: *<[EMAIL PROTECTED]>, "Toby Thain"
> >     <[EMAIL PROTECTED]>
> >     *Subject: Re: [htdig] query parameters should be ignored by
> >     extension filter?
> >     *
> >     I am also using ht://Dig version 3.1.6 and for me it IS indexing
> >     URLs like
> >
> >
http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg
> >
> >
> >     even though I have .jpg in my bad_extensions: list.
> >
> >     I suggest that you take a hard look at your configuration file and
> >     check
> >     that one of:
> >
> >     exclude_urls:
> >     limit_urls_to:
> >     bad_querystr:
> >     url_rewrite_rules:
> >
> >     isn't excluding them.
>
> David,
>
> Thanks for your suggestions.
>
> I am not using any of those directives; the .conf is vanilla except for
> customising the search results wrapper.
>
> I did need to add .swf and .ico to the bad extensions list. IMHO these
> should really be in there by default (may be fixed in later version?)
>
> Adding "valid_extensions: .php3 .html" did not help either; the URLs are
> still not being indexed. Even adding a fake "&q" to the end of the URL
> doesn't stop htdig rejecting it - a sample rejection from rundig -vvv:
>
> -----
> href: http://stegbar.intranet/php/photo.php3?f=s_rc_pl_wd_t_aw_1.jpg&q
> (Thumbnail: windows_and_doors
>   Enlarge)
>
>     Rejected: Extension is not valid!
> -----
>
> Toby
>
> >
> >     Personally, I don't need those ~lopsoc/...jpg files and will be
> >     adding them
> >     to exclude_urls: if they publish many more of them!
> >
> >     David Adams
> >     Corporate Information Services
> >     Information Systems Services
> >     University of Southampton
> >
> >     ----- Original Message -----
> >     From: "Toby Thain" <[EMAIL PROTECTED]>
> >     To: <[EMAIL PROTECTED]>
> >     Sent: Wednesday, March 24, 2004 9:58 AM
> >     Subject: [htdig] query parameters should be ignored by extension
> >     filter?
> >
> >
> >         List,
> >
> >         I noticed today that htdig is not indexing URLs like:
> >
> >         /foo/page.php3?f=bar.jpg
> >
> >         because it notices the URL ends with ".jpg". I am surprised that
> >         it's
> >         not smart enough to realise that the fetched object is actually
a
> >         ".php3", and I definitely want that URL followed.
> >
> >         Is this fixed in a recent version (I am using ht://Dig 3.1.6)?
> >         Or is
> >         there a simple configuration fix?
> >
> >         Toby
> >
> >
> >
> >         -------------------------------------------------------
> >         This SF.Net email is sponsored by: IBM Linux Tutorials
> >         Free Linux tutorial presented by Daniel Robbins, President and
> >         CEO of
> >         GenToo technologies. Learn everything from fundamentals to
system
> >
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> >
> >         _______________________________________________
> >         ht://Dig general mailing list:
> >         <[EMAIL PROTECTED]>
> >         ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> >         List information (subscribe/unsubscribe, etc.)
> >         https://lists.sourceforge.net/lists/listinfo/htdig-general
> >
> >
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> ht://Dig general mailing list: <[EMAIL PROTECTED]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general
>



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to