Gilles,

I can confirm that I am using ht://Dig version 3.1.6.

My htdig/Retriever.cc is:

// $Id: Retriever.cc,v 1.36.2.28 2002/01/25 04:44:33 ghutchis Exp $

It has been patched by a patch written by yourself which reads:

"This patch fixes a problem introduced in 3.1.6's handling of use_doc_date,
which wasn't in the 3.1.5 patches for this feature.  The new date parsing
code in 3.1.6 didn't allow a '-' character after the year in the content
attribute of meta date tags, but only allowed white space, which is
obviously not in  accordance with the ISO 8601 date format standard."

which does not sound relevant.

I do have .jpg in the file listing bad_extensions.

The -v output ONLY lists pages like
http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg
and no other pages containing .jpg.

I do not have a valid_extensions:  statement.

David Adams
Corporate Information Services
Information Systems Services
University of Southampton

----- Original Message ----- 
From: "Gilles Detillieux" <[EMAIL PROTECTED]>
To: "David Adams" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; "Toby Thain"
<[EMAIL PROTECTED]>
Sent: Wednesday, March 24, 2004 9:42 PM
Subject: Re: [htdig] query parameters should be ignored by extension filter?


> According to David Adams:
> > I am also using ht://Dig version 3.1.6 and for me it IS indexing URLs
like
> >
> >
http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg
> >
> > even though I have .jpg in my bad_extensions: list.
>
> Actually, I find this surprising.  Upon looking at the code that handles
> bad_extensions, in both 3.1.6 and 3.2.0b5, it seems to me that there is
> indeed a bug in the way htdig locates filename extensions in URLs, as
> Toby described.  Can you confirm that you're running vanilla 3.1.6 with
> no patches to htdig/Retriever.cc which might correct this bug?
>
> The fix to the code should be pretty simple, but I haven't had the time
> to sit down and stare at it long enough to get the fix coded yet.  I'll
> try to get around to it by Friday, so it'll be in the next development
> snapshot for the 3.2 betas, and posted to the list.
>
> > ----- Original Message ----- 
> > From: "Toby Thain" <[EMAIL PROTECTED]>
> ...
> > > I noticed today that htdig is not indexing URLs like:
> > >
> > > /foo/page.php3?f=bar.jpg
> > >
> > > because it notices the URL ends with ".jpg". I am surprised that it's
> > > not smart enough to realise that the fetched object is actually a
> > > ".php3", and I definitely want that URL followed.
> > >
> > > Is this fixed in a recent version (I am using ht://Dig 3.1.6)? Or is
> > > there a simple configuration fix?
>
>
> -- 
> Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
> Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
> Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)
>



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to