According to me:
> Last week, I wrote:
> > According to David Adams:
> > > I am also using ht://Dig version 3.1.6 and for me it IS indexing URLs like
> > >
> > > http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg
> > >
> > > even though I have .jpg in my bad_extensions: list.
> >
> > Actually, I find this surprising. Upon looking at the code that handles
> > bad_extensions, in both 3.1.6 and 3.2.0b5, it seems to me that there is
> > indeed a bug in the way htdig locates filename extensions in URLs, as
> > Toby described. Can you confirm that you're running vanilla 3.1.6 with
> > no patches to htdig/Retriever.cc which might correct this bug?
> >
> > The fix to the code should be pretty simple, but I haven't had the time
> > to sit down and stare at it long enough to get the fix coded yet. I'll
> > try to get around to it by Friday, so it'll be in the next development
> > snapshot for the 3.2 betas, and posted to the list.
>
> OK, last week got a bit crazy, so I wrote the patch yesterday afternoon,
> just before the end of my work day. Here it is. Apply it in your main
> 3.1.6 source directory using "patch -p0 < this-message-file". Please
> let me know if it solves the problem for you and/or causes others. I've
> made sure the code compiles with the patch, but haven't tested it beyond
> that. Thanks.
And this is the same patch for the 3.2.0b5 version, in case anyone wants
to give it a shot. Same story.
--- htdig/Retriever.cc.orig 2003-10-23 12:40:20.000000000 -0500
+++ htdig/Retriever.cc 2004-03-29 17:47:25.000000000 -0600
@@ -1023,16 +1023,17 @@ int Retriever::IsValidURL(const String &
//
// See if the file extension is in the list of invalid ones
//
- ext = strrchr((char *) url, '.');
+ String urlpath = url.get();
+ int parm = urlpath.indexOf('?'); // chop off URL parameter
+ if (parm >= 0)
+ urlpath.chop(urlpath.length() - parm);
+ ext = strrchr((char *) urlpath.get(), '.');
String lowerext;
if (ext && strchr(ext, '/')) // Ignore a dot if it's not in the
ext = NULL; // final component of the path.
if (ext)
{
lowerext.set(ext);
- int parm = lowerext.indexOf('?'); // chop off URL parameter
- if (parm >= 0)
- lowerext.chop(lowerext.length() - parm);
lowerext.lowercase();
if (invalids.Exists(lowerext))
{
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general