Joseph K. Bradley created SPARK-9578: ----------------------------------------
Summary: Stemmer feature transformer Key: SPARK-9578 URL: https://issues.apache.org/jira/browse/SPARK-9578 Project: Spark Issue Type: New Feature Components: ML Reporter: Joseph K. Bradley Priority: Minor Transformer mentioned first in [SPARK-5571] based on suggestion from [~aloknsingh]. Very standard NLP preprocessing task. >From [~aloknsingh]: {quote} We have one scala stemmer in scalanlp%chalk https://github.com/scalanlp/chalk/tree/master/src/main/scala/chalk/text/analyze which can easily copied (as it is apache project) and is in scala too. I think this will be better alternative than lucene englishAnalyzer or opennlp. Note: we already use the scalanlp%breeze via the maven dependency so I think adding scalanlp%chalk dependency is also the options. But as you had said we can copy the code as it is small. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org