I looked through the patch thoroughly today. I have some notes below.
At 4:54 PM -0600 11/25/99, Gilles Detillieux wrote:
>1) htcommon/DocumentDB.cc & htdig/Retriever.cc: allow file:... as well
>as http:... URLs. (This doesn't change anything in htlib/URL.cc, so I'm
>not sure about how well it'll handle hrefs in documents from the file:
Yeah, this is why I think moving file:// access to 3.2 is a better
idea. A lot of work has been done on that tree (esp. in URL.cc) to
allow multiple URL schemes. I still need to commit my URL test suite,
but the URL class is much more robust.
One thing I didn't think about when making URL revisions was the use
of 'localhost' in a file:// URL. I've mostly seen them of the form:
file:///home/ghutchis/www/index.html
So I'll need to add some file://localhost tests to my collection--I
bet the current URL parser won't like them.
>2) htdig/HTML.cc: add support for an ignore_noindex attribute. This is
>undocumented and no default is defined, but I think the behaviour is
>pretty obvious from the code. I'd question the desirability/need for
>this, but it seems harmless enough. The value should be set in a static
I actually disagree on this. I don't think the indexer should ever
ignore the directive of the page author. If the author intended that
the page should not be indexed, then ht://Dig should follow those
wishes. I'd have a similar opinion about something that ignored
robots.txt.
>3) htdig/Retriever.cc & htdig/Server.cc: modified to allow local file
>access to persist even if the HTTP server is down. Looks good to me.
I'm still thinking about this. It looks good, but I'm wondering why
every server needs to have a boolean, when only 'localhost' is going
to allow local file access. Otherwise, this is fine. The
Retriever/Server classes are still in need of some work.
>6) htlib/cgi.cc & htsearch/htsearch.cc: add a -a option to htsearch, to
>add name=value parameters to those in query string. This is undocumented
>as well. I'm not sure how it relates to the other changes, but it seems
>simple enough.
I have to think about this one too. It seems reasonable, but I always
think through changes that might allow the CGI to be cracked. Since I
can't think of a way to send command-line arguments to the CGI, this
seems OK.
>I'm all for these fixes, but some documentation explaining the less
>clear aspects, or even a note explaining the rationale for these,
>...
>Separate patches for these would be a big help, as fixes are generally
>committed one at a time.
Or, as I mentioned, a full ChangeLog-style list of rationales would help.
-Geoff
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You'll receive a message confirming the unsubscription.