Re: running lda in spark throws exception

2016-01-19 Thread Li Li
term and should be the same size as the > vocabulary (so if the vocabulary, or dictionary has 10 words, each vector > should have a size of 10). This probably means that there will be some > elements with zero counts, and a sparse vector might be a good way to handle > that. > > On Wed

Re: running lda in spark throws exception

2016-01-14 Thread Li Li
he same size as the > vocabulary (so if the vocabulary, or dictionary has 10 words, each vector > should have a size of 10). This probably means that there will be some > elements with zero counts, and a sparse vector might be a good way to handle > that. > > On Wed, Jan 13, 2016 at 6:40 PM, Li

Re: running lda in spark throws exception

2016-01-13 Thread Li Li
I will try spark 1.6.0 to see it is the bug of 1.5.2. On Wed, Jan 13, 2016 at 3:58 PM, Li Li <fancye...@gmail.com> wrote: > I have set up a stand alone spark cluster and use the same codes. it > still failed with the same exception > I also preprocessed the data to lines of i

Re: running lda in spark throws exception

2016-01-13 Thread Li Li
e > improper vector size, it will not throw an exception but the term indices > will start to be incorrect. For a small number of iterations, it is ok, but > increasing iterations causes the indices to get larger also. Maybe that is > what is going on in the JIRA you linked to? > > On W

Re: running lda in spark throws exception

2016-01-08 Thread Li Li
d it ran correctly with output that looks good. Sorry, I don't > have a YARN cluster setup right now, so maybe the error you are seeing is > specific to that. Btw, I am running the latest Spark code from the master > branch. Hope that helps some! > > Bryan > > On Mon, Jan 4, 2016

Re: running lda in spark throws exception

2016-01-04 Thread Li Li
anyone could help? the problem is very easy to reproduce. What's wrong? On Wed, Dec 30, 2015 at 8:59 PM, Li Li <fancye...@gmail.com> wrote: > I use a small data and reproduce the problem. > But I don't know my codes are correct or not because I am not familiar > with spark. >

Re: running lda in spark throws exception

2015-12-30 Thread Li Li
return new Tuple2(id2doc._1, id2doc._2.vec); } })); corpus.cache(); // Cluster the documents into three topics using LDA DistributedLDAModel ldaModel = (DistributedLDAModel) new LDA().setMaxIterations(iterNumber) .setK(topicNumber).run(corpus);

Re: running lda in spark throws exception

2015-12-29 Thread Li Li
es.apache.org/jira/browse/SPARK-12488 > > I haven't figured out yet what is causing it. Do you have a small corpus > which reproduces this error, and which you can share on the JIRA? If so, > that would help a lot in debugging this failure. > > Thanks! > Joseph > > O

running lda in spark throws exception

2015-12-27 Thread Li Li
I ran my lda example in a yarn 2.6.2 cluster with spark 1.5.2. it throws exception in line: Matrix topics = ldaModel.topicsMatrix(); But in yarn job history ui, it's successful. What's wrong with it? I submit job with .bin/spark-submit --class Myclass \ --master yarn-client \