Great!  Thanks.  I've committed the fix to CVS just now.

According to David Adams:
> I installed the patch and ran htdig to re-index our pages.  I don't see any
> difference in what is indexed, which I think is good news.
> 
> David Adams
> Corporate Information Services
> Information Systems Services
> University of Southampton
> 
> ----- Original Message ----- 
> From: "Gilles Detillieux" <[EMAIL PROTECTED]>
> To: "ht://Dig mailing list" <[EMAIL PROTECTED]>
> Sent: Tuesday, March 30, 2004 7:20 PM
> Subject: Re: [htdig] query parameters should be ignored by extension
> filter? - PATCH for 3.2.0b5
> 
> 
> > According to me:
> > > Last week, I wrote:
> > > > According to David Adams:
> > > > > I am also using ht://Dig version 3.1.6 and for me it IS indexing
> URLs like
> > > > >
> > > > >
> http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg
> > > > >
> > > > > even though I have .jpg in my bad_extensions: list.
> > > >
> > > > Actually, I find this surprising.  Upon looking at the code that
> handles
> > > > bad_extensions, in both 3.1.6 and 3.2.0b5, it seems to me that there
> is
> > > > indeed a bug in the way htdig locates filename extensions in URLs, as
> > > > Toby described.  Can you confirm that you're running vanilla 3.1.6
> with
> > > > no patches to htdig/Retriever.cc which might correct this bug?
> > > >
> > > > The fix to the code should be pretty simple, but I haven't had the
> time
> > > > to sit down and stare at it long enough to get the fix coded yet.
> I'll
> > > > try to get around to it by Friday, so it'll be in the next development
> > > > snapshot for the 3.2 betas, and posted to the list.
> > >
> > > OK, last week got a bit crazy, so I wrote the patch yesterday afternoon,
> > > just before the end of my work day.  Here it is.  Apply it in your main
> > > 3.1.6 source directory using "patch -p0 < this-message-file".  Please
> > > let me know if it solves the problem for you and/or causes others.  I've
> > > made sure the code compiles with the patch, but haven't tested it beyond
> > > that.  Thanks.
> >
> > And this is the same patch for the 3.2.0b5 version, in case anyone wants
> > to give it a shot.  Same story.
> >
> > --- htdig/Retriever.cc.orig 2003-10-23 12:40:20.000000000 -0500
> > +++ htdig/Retriever.cc 2004-03-29 17:47:25.000000000 -0600
> > @@ -1023,16 +1023,17 @@ int Retriever::IsValidURL(const String &
> >   //
> >   // See if the file extension is in the list of invalid ones
> >   //
> > - ext = strrchr((char *) url, '.');
> > + String urlpath = url.get();
> > + int parm = urlpath.indexOf('?'); // chop off URL parameter
> > + if (parm >= 0)
> > + urlpath.chop(urlpath.length() - parm);
> > + ext = strrchr((char *) urlpath.get(), '.');
> >   String lowerext;
> >   if (ext && strchr(ext, '/')) // Ignore a dot if it's not in the
> >   ext = NULL;   // final component of the path.
> >   if (ext)
> >   {
> >   lowerext.set(ext);
> > - int parm = lowerext.indexOf('?'); // chop off URL parameter
> > - if (parm >= 0)
> > - lowerext.chop(lowerext.length() - parm);
> >   lowerext.lowercase();
> >   if (invalids.Exists(lowerext))
> >   {
> >
> > -- 
> > Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
> > Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
> > Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: IBM Linux Tutorials
> > Free Linux tutorial presented by Daniel Robbins, President and CEO of
> > GenToo technologies. Learn everything from fundamentals to system
> > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> > _______________________________________________
> > ht://Dig general mailing list: <[EMAIL PROTECTED]>
> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> > List information (subscribe/unsubscribe, etc.)
> > https://lists.sourceforge.net/lists/listinfo/htdig-general
> >
> 


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to