Hello all,

I am using lucene 1.2 (Java 1.4 on Solaris 7) and the xml indexer to index ~24000 small xml documents. The finished and optimized index uses around 340 MB disk space. The documents are reindexed once a week and this has worked without any trouble for months. Recently the free space on the hard drive was down to 1.36 GB and the optimization crashed due to "no space left on device". Deleting the index directory freed up 1.36 GB.
Question 1) Is it normal for the optimization process to require this much extra space?
2) Did I miss an option somewhere to limit the space usage of the optimization process?
3) More philosophically, do I really need the optimization?

Also, in the archives I came across a message talking about an Ispell-based stemmer to which Doug Cutting replied
������� ������� wrote:
> http://www.halyava.ru/do/org.apache.lucene.analysis.zip

This looks great!  If I understand correctly, it can be used to quickly
build stemmers for lots of languages.  For example, the following page
lists the location of ispell dictionaries for over 30 languages!

   http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html

This page should probably be referenced from the documentation.
I have not found the code anywhere on the lucene site and the link to the code above does not work any more. Does someone have this code or could the original author please repost the code? I am using the french stemmer from snowball and it does some strange things, like stemming paris to par and not stemming many verbs properly. I would like to try a different stemmer to see whether it is more useable.

I would also like to take this opportunity to thank the lucene developers for their work.

Konrad Scherer


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@;jakarta.apache.org>

Reply via email to