David Adams wrote:

Toby,

Did you have a valid_extensions: statement originally?  If you did, then it
might be worthwhile trying without it,

No, I had no valid_extensions originally. The query URLs were ignored regardless.


> as then all extensions not listed in
your bad_extensions: will be valid.

I don't think you're correct in the above: the doc says, "This is a list of extensions on URLs which are the only ones considered acceptable."



Do you really have no limit_urls_to: statement? That doesn't strike me as a good idea.

It's not needed, because "the value of start_url will be the default value for limit_urls_to," (see doc).



I have a list of 108 bad extensions if anyone is interested, but I make no claims that it is anywhere near complete.

I needed to add .ico & .swf because they were actually used on the site. I don't need a catch-all list as we are responsible for all site content.


Toby


----- Original Message ----- From: "Toby Thain" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, March 25, 2004 12:49 AM
Subject: Re: Fwd: [htdig] query parameters should be ignored by extension
filter?




Toby Thain wrote:




Begin forwarded message:

   *From: *"David Adams" <[EMAIL PROTECTED]>
   *Date: *24 March 2004 9:18:17 PM
   *To: *<[EMAIL PROTECTED]>, "Toby Thain"
   <[EMAIL PROTECTED]>
   *Subject: Re: [htdig] query parameters should be ignored by
   extension filter?
   *
   I am also using ht://Dig version 3.1.6 and for me it IS indexing
   URLs like



http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg



even though I have .jpg in my bad_extensions: list.


   I suggest that you take a hard look at your configuration file and
   check
   that one of:

   exclude_urls:
   limit_urls_to:
   bad_querystr:
   url_rewrite_rules:

isn't excluding them.

David,


Thanks for your suggestions.

I am not using any of those directives; the .conf is vanilla except for
customising the search results wrapper.

I did need to add .swf and .ico to the bad extensions list. IMHO these
should really be in there by default (may be fixed in later version?)

Adding "valid_extensions: .php3 .html" did not help either; the URLs are
still not being indexed. Even adding a fake "&q" to the end of the URL
doesn't stop htdig rejecting it - a sample rejection from rundig -vvv:

-----
href: http://stegbar.intranet/php/photo.php3?f=s_rc_pl_wd_t_aw_1.jpg&q
(Thumbnail: windows_and_doors
 Enlarge)

   Rejected: Extension is not valid!
-----

Toby


   Personally, I don't need those ~lopsoc/...jpg files and will be
   adding them
   to exclude_urls: if they publish many more of them!

   David Adams
   Corporate Information Services
   Information Systems Services
   University of Southampton

   ----- Original Message -----
   From: "Toby Thain" <[EMAIL PROTECTED]>
   To: <[EMAIL PROTECTED]>
   Sent: Wednesday, March 24, 2004 9:58 AM
   Subject: [htdig] query parameters should be ignored by extension
   filter?


List,


I noticed today that htdig is not indexing URLs like:

/foo/page.php3?f=bar.jpg

       because it notices the URL ends with ".jpg". I am surprised that
       it's
       not smart enough to realise that the fetched object is actually

a


".php3", and I definitely want that URL followed.

       Is this fixed in a recent version (I am using ht://Dig 3.1.6)?
       Or is
       there a simple configuration fix?

Toby



       -------------------------------------------------------
       This SF.Net email is sponsored by: IBM Linux Tutorials
       Free Linux tutorial presented by Daniel Robbins, President and
       CEO of
       GenToo technologies. Learn everything from fundamentals to

system


administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click

       _______________________________________________
       ht://Dig general mailing list:
       <[EMAIL PROTECTED]>
       ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
       List information (subscribe/unsubscribe, etc.)
       https://lists.sourceforge.net/lists/listinfo/htdig-general





------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general






------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to