Re: running lda in spark throws exception

2016-04-04 Thread Joseph Bradley
It's possible this was caused by incorrect Graph creation, fixed in [SPARK-13355]. Could you retry your dataset using the current master to see if the problem is fixed? Thanks! On Tue, Jan 19, 2016 at 5:31 AM, Li Li wrote: > I have modified my codes. I can get the total

Re: running lda in spark throws exception

2016-01-19 Thread Li Li
I have modified my codes. I can get the total vocabulary size and index array and freq array from the jsonobject. JsonArray idxArr = jo.get("idxArr").getAsJsonArray(); JsonArray freqArr=jo.get("freqArr").getAsJsonArray(); int total=jo.get("vocabSize").getAsInt();

Re: running lda in spark throws exception

2016-01-14 Thread Li Li
I got it. I mistakenly thought that each line is a wordid list. On Fri, Jan 15, 2016 at 3:24 AM, Bryan Cutler wrote: > What I mean is the input to LDA.run() is a RDD[(Long, Vector)] and the > Vector is a vector of counts of each term and should be the same size as the >

Re: running lda in spark throws exception

2016-01-14 Thread Bryan Cutler
What I mean is the input to LDA.run() is a RDD[(Long, Vector)] and the Vector is a vector of counts of each term and should be the same size as the vocabulary (so if the vocabulary, or dictionary has 10 words, each vector should have a size of 10). This probably means that there will be some

Re: running lda in spark throws exception

2016-01-13 Thread Li Li
I will try spark 1.6.0 to see it is the bug of 1.5.2. On Wed, Jan 13, 2016 at 3:58 PM, Li Li wrote: > I have set up a stand alone spark cluster and use the same codes. it > still failed with the same exception > I also preprocessed the data to lines of integers and use the

Re: running lda in spark throws exception

2016-01-13 Thread Bryan Cutler
I was now able to reproduce the exception using the master branch and local mode. It looks like the problem is the vectors of term counts in the corpus are not always the vocabulary size. Once I padded these with zero counts to the vocab size, it ran without the exception. Joseph, I also tried

Re: running lda in spark throws exception

2016-01-13 Thread Li Li
It looks like the problem is the vectors of term counts in the corpus are not always the vocabulary size. Do you mean some integers not occured in the corpus? for example, I have the dictionary is 0 - 9 (total 10 words). The docs are: 0 2 4 6 8 1 3 5 7 9 Then it will be correct If the docs are: 0

Re: running lda in spark throws exception

2016-01-08 Thread Li Li
I am running it in 1.5.2. I will try running it in small standalone cluster to see whether it's correct. On Sat, Jan 9, 2016 at 6:21 AM, Bryan Cutler wrote: > Hi Li, > > I tried out your code and sample data in both local mode and Spark > Standalone and it ran correctly with

Re: running lda in spark throws exception

2016-01-08 Thread Bryan Cutler
Hi Li, I tried out your code and sample data in both local mode and Spark Standalone and it ran correctly with output that looks good. Sorry, I don't have a YARN cluster setup right now, so maybe the error you are seeing is specific to that. Btw, I am running the latest Spark code from the

Re: running lda in spark throws exception

2016-01-04 Thread Li Li
anyone could help? the problem is very easy to reproduce. What's wrong? On Wed, Dec 30, 2015 at 8:59 PM, Li Li wrote: > I use a small data and reproduce the problem. > But I don't know my codes are correct or not because I am not familiar > with spark. > So I first post my

Re: running lda in spark throws exception

2015-12-30 Thread Li Li
I use a small data and reproduce the problem. But I don't know my codes are correct or not because I am not familiar with spark. So I first post my codes here. If it's correct, then I will post the data. one line of my data like: { "time":"08-09-17","cmtUrl":"2094361"

Re: running lda in spark throws exception

2015-12-29 Thread Joseph Bradley
Hi Li, I'm wondering if you're running into the same bug reported here: https://issues.apache.org/jira/browse/SPARK-12488 I haven't figured out yet what is causing it. Do you have a small corpus which reproduces this error, and which you can share on the JIRA? If so, that would help a lot in

Re: running lda in spark throws exception

2015-12-29 Thread Li Li
I will use a portion of data and try. will the hdfs block affect spark?(if so, it's hard to reproduce) On Wed, Dec 30, 2015 at 3:22 AM, Joseph Bradley wrote: > Hi Li, > > I'm wondering if you're running into the same bug reported here: >

running lda in spark throws exception

2015-12-27 Thread Li Li
I ran my lda example in a yarn 2.6.2 cluster with spark 1.5.2. it throws exception in line: Matrix topics = ldaModel.topicsMatrix(); But in yarn job history ui, it's successful. What's wrong with it? I submit job with .bin/spark-submit --class Myclass \ --master yarn-client \