Great! Thanks. I've committed the fix to CVS just now. According to David Adams: > I installed the patch and ran htdig to re-index our pages. I don't see any > difference in what is indexed, which I think is good news. > > David Adams > Corporate Information Services > Information Systems Services > University of Southampton > > ----- Original Message ----- > From: "Gilles Detillieux" <[EMAIL PROTECTED]> > To: "ht://Dig mailing list" <[EMAIL PROTECTED]> > Sent: Tuesday, March 30, 2004 7:20 PM > Subject: Re: [htdig] query parameters should be ignored by extension > filter? - PATCH for 3.2.0b5 > > > > According to me: > > > Last week, I wrote: > > > > According to David Adams: > > > > > I am also using ht://Dig version 3.1.6 and for me it IS indexing > URLs like > > > > > > > > > > > http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg > > > > > > > > > > even though I have .jpg in my bad_extensions: list. > > > > > > > > Actually, I find this surprising. Upon looking at the code that > handles > > > > bad_extensions, in both 3.1.6 and 3.2.0b5, it seems to me that there > is > > > > indeed a bug in the way htdig locates filename extensions in URLs, as > > > > Toby described. Can you confirm that you're running vanilla 3.1.6 > with > > > > no patches to htdig/Retriever.cc which might correct this bug? > > > > > > > > The fix to the code should be pretty simple, but I haven't had the > time > > > > to sit down and stare at it long enough to get the fix coded yet. > I'll > > > > try to get around to it by Friday, so it'll be in the next development > > > > snapshot for the 3.2 betas, and posted to the list. > > > > > > OK, last week got a bit crazy, so I wrote the patch yesterday afternoon, > > > just before the end of my work day. Here it is. Apply it in your main > > > 3.1.6 source directory using "patch -p0 < this-message-file". Please > > > let me know if it solves the problem for you and/or causes others. I've > > > made sure the code compiles with the patch, but haven't tested it beyond > > > that. Thanks. > > > > And this is the same patch for the 3.2.0b5 version, in case anyone wants > > to give it a shot. Same story. > > > > --- htdig/Retriever.cc.orig 2003-10-23 12:40:20.000000000 -0500 > > +++ htdig/Retriever.cc 2004-03-29 17:47:25.000000000 -0600 > > @@ -1023,16 +1023,17 @@ int Retriever::IsValidURL(const String & > > // > > // See if the file extension is in the list of invalid ones > > // > > - ext = strrchr((char *) url, '.'); > > + String urlpath = url.get(); > > + int parm = urlpath.indexOf('?'); // chop off URL parameter > > + if (parm >= 0) > > + urlpath.chop(urlpath.length() - parm); > > + ext = strrchr((char *) urlpath.get(), '.'); > > String lowerext; > > if (ext && strchr(ext, '/')) // Ignore a dot if it's not in the > > ext = NULL; // final component of the path. > > if (ext) > > { > > lowerext.set(ext); > > - int parm = lowerext.indexOf('?'); // chop off URL parameter > > - if (parm >= 0) > > - lowerext.chop(lowerext.length() - parm); > > lowerext.lowercase(); > > if (invalids.Exists(lowerext)) > > { > > > > -- > > Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> > > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ > > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IBM Linux Tutorials > > Free Linux tutorial presented by Daniel Robbins, President and CEO of > > GenToo technologies. Learn everything from fundamentals to system > > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > > _______________________________________________ > > ht://Dig general mailing list: <[EMAIL PROTECTED]> > > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > > List information (subscribe/unsubscribe, etc.) > > https://lists.sourceforge.net/lists/listinfo/htdig-general > > >
-- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

