Toby Thain wrote:




Begin forwarded message:


    *From: *"David Adams" <[EMAIL PROTECTED]>
    *Date: *24 March 2004 9:18:17 PM
    *To: *<[EMAIL PROTECTED]>, "Toby Thain"
    <[EMAIL PROTECTED]>
    *Subject: Re: [htdig] query parameters should be ignored by
    extension filter?
    *
    I am also using ht://Dig version 3.1.6 and for me it IS indexing
    URLs like

http://www.soton.ac.uk/~lopsoc/gallery.php?gallery=sorcerer1&photo=CNV00023.jpg


even though I have .jpg in my bad_extensions: list.


    I suggest that you take a hard look at your configuration file and
    check
    that one of:

    exclude_urls:
    limit_urls_to:
    bad_querystr:
    url_rewrite_rules:

isn't excluding them.

David,


Thanks for your suggestions.

I am not using any of those directives; the .conf is vanilla except for customising the search results wrapper.

I did need to add .swf and .ico to the bad extensions list. IMHO these should really be in there by default (may be fixed in later version?)

Adding "valid_extensions: .php3 .html" did not help either; the URLs are still not being indexed. Even adding a fake "&q" to the end of the URL doesn't stop htdig rejecting it - a sample rejection from rundig -vvv:

-----
href: http://stegbar.intranet/php/photo.php3?f=s_rc_pl_wd_t_aw_1.jpg&q (Thumbnail: windows_and_doors
Enlarge)


   Rejected: Extension is not valid!
-----

Toby


Personally, I don't need those ~lopsoc/...jpg files and will be adding them to exclude_urls: if they publish many more of them!

    David Adams
    Corporate Information Services
    Information Systems Services
    University of Southampton

    ----- Original Message -----
    From: "Toby Thain" <[EMAIL PROTECTED]>
    To: <[EMAIL PROTECTED]>
    Sent: Wednesday, March 24, 2004 9:58 AM
    Subject: [htdig] query parameters should be ignored by extension
    filter?


List,


I noticed today that htdig is not indexing URLs like:

/foo/page.php3?f=bar.jpg

        because it notices the URL ends with ".jpg". I am surprised that
        it's
        not smart enough to realise that the fetched object is actually a
        ".php3", and I definitely want that URL followed.

        Is this fixed in a recent version (I am using ht://Dig 3.1.6)?
        Or is
        there a simple configuration fix?

Toby



        -------------------------------------------------------
        This SF.Net email is sponsored by: IBM Linux Tutorials
        Free Linux tutorial presented by Daniel Robbins, President and
        CEO of
        GenToo technologies. Learn everything from fundamentals to system
        administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click

        _______________________________________________
        ht://Dig general mailing list:
        <[EMAIL PROTECTED]>
        ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
        List information (subscribe/unsubscribe, etc.)
        https://lists.sourceforge.net/lists/listinfo/htdig-general





------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to