On Tue, Nov 9, 2010 at 3:53 PM, Marvin Humphrey <[email protected]> wrote:
> On Tue, Nov 09, 2010 at 04:51:33AM -0500, Robert Muir wrote:
>> Some quick notes, from lucene-java:
>

One more note that I forgot to mention: in snowball's svn (but i think
not in the libstemmer pkg) there is actually vocabulary test data:
input files containing a sample vocabulary for each language, expected
output, and combined files called 'diffs' that show what the stemmer
changes.

these provide pretty good coverage for tests to ensure your
integration is working... when they make a change to the algorithms
these are updated too (though it seems not always in the same commit):

example: 
http://svn.tartarus.org/snowball/trunk/data/german/diffs.txt?r1=527&r2=526&pathrev=527

Reply via email to