Grant,
i'm trying to generate the Sequence Vectors using the SnowballAnlyzer as
opposed to the StandardAnlyzer. I've already gone through this process using
the StandardAnlyzer and plotted the output clusters using the k-means dump
file, so i'm familiar with clustering in Mahout. i'd like to repeat this
exercise with the SnowballAnlyzer, running the following command.
./mahout seq2sparse -s 2 -a
org.apache.lucene.anlysis.snowball.SnowballAnlyzer -chunk 100 -i
/home/hadoop/tmp/trecdata-seqfiles/chunk-0 -o
/home/hadoop/tmp/trecdata-vectors -md 1 -x 75 -wt TFIDF -n 0
1) i've placed the lucene-snowball jar in the m2 repository
/home/delroy/.m2/repository/org/apache/lucene/lucene-snowball/2.9.1
2) and i also updated the Mahout_CORE/pom xml to reflect the dependency
<!-- updated by Delroy to use Snowball Anlyzer -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-snowball</artifactId>
<version>2.9.1</version>
</dependency>
3) then i did a mvn install on the Mahout_CORE and on Mahout_ROOT, which
downloaded the lucene-snowball pom and lucene-snowball pom sha1 to the m2
repository
this error seems to stem from developer code, which incidentally notes that
you should not instantiate the anlyzer at
SparseVectorsFromSequenceFiles.java:176 any suggestions here?
Output:
Exception in thread "main" java.lang.InstantiationException:
org.apache.lucene.anlysis.snowball.SnowballAnlyzer
at java.lang.Class.newInstance0(Class.java:357)
at java.lang.Class.newInstance(Class.java:325)
at org.apache.mahout.text.SparseVectorsFromSequenceFiles.main()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)
PS: I just love the spam filter..won't let me write too many variants of the
word Analyzer because it contains the word anal.
-----
--cheers
Delroy
--
View this message in context:
http://n3.nabble.com/SnowballAnalyzer-tp729983p732912.html
Sent from the Mahout User List mailing list archive at Nabble.com.