[ 
https://issues.apache.org/jira/browse/MAHOUT-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984801#comment-13984801
 ] 

Drew Farris commented on MAHOUT-1252:
-------------------------------------

Ok. Are we preferring the Scala or Java Spark APIs moving forward?

As far as word2vec - I haven't worked with it directly but it looks very 
interesting (Sumeet Vij & colleagues presented on this at BigConf.io).

The functionality would be great to have as a part of the Mahout tooling. Radim 
Řehůřek has written about his experiences porting word2vec to python/gensim, so 
his writings at 
http://radimrehurek.com/2013/09/deep-learning-with-word2vec-and-gensim/ (also 
parts 2 and 3) will be useful as a reference implementation.

I think that providing basic tf & tf/idf bag-of-words vectorization will be 
useful and may be more straightforward to implement in the short term. That 
said, I have no sense of the complexity of a word2vec port at this point in 
time.


> Add support for Finite State Transducers (FST) as a DictionaryType.
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-1252
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1252
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Integration
>    Affects Versions: 0.7
>            Reporter: Suneel Marthi
>            Assignee: Suneel Marthi
>             Fix For: 1.0
>
>
> Add support for Finite State Transducers (FST) as a DictionaryType, this 
> should result in an order of magnitude speedup of seq2sparse.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to