On Tue, 6 Jul 1999, Gilles Detillieux wrote:
> According to S. Hayles:
> > This is with ht://Dig 3.1.2 under IRIX 6.5
> >
> > I used a script to build a list of all files on our server that were not
> > externally accessible - so they could be excluded from the externally
> > accessible index. It ended up ~250Kb with ~5000 entries. I wasn't too
> > surprised that it didn't seem to work.
.
> Any list of about 250 Kb is going to be somewhat unwieldy, but I can't
> see anything in the code that would prevent it. The buffer length
> in Configuration::Read only limits the length of individual lines
> in your config file to that length. If you continue your lines onto
> separate lines with backslashes, the total length should be unlimited.
My mistake.
> Similarly, if in your config file you use the `file` mechanism to set
> the exclude_urls attribute from the contents of another file, then the
> buffer size in ParsedString::getFileContents will limit the length of any
> individual line in that file to 1000 bytes (longer lines get "folded",
> i.e. the string will be split in two). Other than that, I think the
> string length is limited only by available virtual memory. Using the
> numbers you gave above, it would seem the average URL length in your
> exclude_urls is about 50 bytes, so it seems unlikely that the problem
> would be URLs over 1000 bytes getting split in two.
>
> When you say it didn't seem to work, do you mean it wasn't excluding
> what you wanted it to, it was excluding stuff you wanted included, or
> was it failing in some other way?
It wasn't excluding what I wanted it to.
> How were you setting exclude_urls?
I tried a variety of approaches. Initially I created a file starting:
exclude_urls: /cgi-bin/ .cgi \
/ad/gem/gem.html \
/adultedu/gem/gem.html \
/ad/ars1/ \
/adultedu/ars1/ \
/ad/info/ \
/adultedu/info/ \
/ad/rs50/rs50.html \
/adultedu/rs50/rs50.html \
/ad/rs50/index.html \
/adultedu/rs50/index.html \
/ad/jrs12/jrs12.html \
/adultedu/jrs12/jrs12.html \
/ad/adflag \
/adultedu/adflag \
/ad/test1.html~ \
/adultedu/test1.html~ \
/ad/test.html \
/adultedu/test.html \
and used
include: file
I also tried embedding the data in the config file, removing the back
slashes and putting everyting on one line, and including the file list
using
exclude_urls: `file`
I never saw it reject any URL after the first 9, but in most cases it
didn't seem to match anything beyond the first 2.
If you can see no reason why it shouldn't work, I'll check everything
and give it one more go.
Thanks
Steven
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.