At 4:36 PM -0500 2/22/00, Robert Marchand wrote:
>1) We badly need the 'fuzzy accent' algorithm or whatever the solution
>would be to be able to search a word with and without accents: like
>"Montr�al" and "Montreal" and get the same results. This is very
>important for us. I've look at some discussion on this topic here and
>would like to know if it is soon to be released. If not, then we will
>have to find a quick-and-dirty solution like patch some files by
>ourselves.
It is not likely to be released soon. However, it won't require
patching files--it will require a new class in htfuzzy/ along the
lines of the Substring class (or the Speling class in 3.2.0b1). If
you'd like some suggestions about how to do that, let me know.
>2) We have a problem with robots.txt and the database. It seems that if
>the file robots.txt is modified or added after a complete reindex from
>scratch and BEFORE an update reindex, some files that are now no more
>accepted are keeped in the database. Does it means that a complete
>reindex has to be done after a change in a robots.txt? That seems a bit
>harsh. We have no control over all the sites to index.
Yes, but think of it like this. You tell me that you want me to make
a map of your house. You give me a certain set of keys (i.e. I can
only get into certain rooms). I go off and do this and then you want
me to give back some keys. I still have the map that I made though!
The analogy I'm trying to make is that for the change in robots.txt
to affect the database, it would have to "forget" parts of what it
indexed!
In short, it might sound harsh, but it wouldn't be easy to realize
that because of a change in robots.txt (remember, we don't store
them), we need to remove certain URLs. What if there's a page that's
disallowed that linked to a section that would still be allowed but
is now unreachable from other URLs?
-Geoff
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.