Re: ml.feature.Word2Vec.transform() very slow issue

2015-11-09 Thread Sean Owen
Since it's a fairly expensive operation to build the Map, I tend to agree it should not happen in the loop. On Tue, Nov 10, 2015 at 5:08 AM, Yuming Wang wrote: > Hi > > > > I found org.apache.spark.ml.feature.Word2Vec.transform() very slow. > > I think we should not read

Re: ml.feature.Word2Vec.transform() very slow issue

2015-11-09 Thread Nick Pentreath
Seems a straightforward change that purely enhances efficiency, so yes please submit a JIRA and PR for this On Tue, Nov 10, 2015 at 8:56 AM, Sean Owen wrote: > Since it's a fairly expensive operation to build the Map, I tend to agree > it should not happen in the loop. > >

ml.feature.Word2Vec.transform() very slow issue

2015-11-09 Thread Yuming Wang
Hi I found org.apache.spark.ml.feature.Word2Vec.transform() very slow. I think we should not read broadcast every sentence, so I fixed on my forked. https://github.com/979969786/spark/commit/a9f894df3671bb8df2f342de1820dab3185598f3 I have use 2 number rows test it. Original version