According to S. Hayles:
> This is with ht://Dig 3.1.2 under IRIX 6.5
> 
> I used a script to build a list of all files on our server that were not
> externally accessible - so they could be excluded from the externally
> accessible index. It ended up ~250Kb with ~5000 entries. I wasn't too
> surprised that it didn't seem to work.
> 
> On a quick examination, the only limit in this area I could find was the 
> buffer length in Configuration::Read - but increasing this didn't seem to
> help. I tried using robots.txt to restrict indexing, and once max_doc_size
> was adjusted this worked fine - but it seems an unwieldy solution. 
> 
> Since they both appear to use StringMatch for comparisons I would have
> expected exclude_urls to work if robots.txt works. Has anyone else had
> problems with exclude_urls?

Any list of about 250 Kb is going to be somewhat unwieldy, but I can't
see anything in the code that would prevent it.  The buffer length
in Configuration::Read only limits the length of individual lines
in your config file to that length.  If you continue your lines onto
separate lines with backslashes, the total length should be unlimited.

Similarly, if in your config file you use the `file` mechanism to set
the exclude_urls attribute from the contents of another file, then the
buffer size in ParsedString::getFileContents will limit the length of any
individual line in that file to 1000 bytes (longer lines get "folded",
i.e. the string will be split in two).  Other than that, I think the
string length is limited only by available virtual memory.  Using the
numbers you gave above, it would seem the average URL length in your
exclude_urls is about 50 bytes, so it seems unlikely that the problem
would be URLs over 1000 bytes getting split in two.

When you say it didn't seem to work, do you mean it wasn't excluding
what you wanted it to, it was excluding stuff you wanted included, or
was it failing in some other way?  How were you setting exclude_urls?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to