Re: running lda in spark throws exception

Li Li Tue, 29 Dec 2015 23:35:24 -0800

I will use a portion of data and try. will the hdfs block affect
spark?(if so, it's hard to reproduce)


On Wed, Dec 30, 2015 at 3:22 AM, Joseph Bradley <jos...@databricks.com> wrote:
> Hi Li,
>
> I'm wondering if you're running into the same bug reported here:
> https://issues.apache.org/jira/browse/SPARK-12488
>
> I haven't figured out yet what is causing it.  Do you have a small corpus
> which reproduces this error, and which you can share on the JIRA?  If so,
> that would help a lot in debugging this failure.
>
> Thanks!
> Joseph
>
> On Sun, Dec 27, 2015 at 7:26 PM, Li Li <fancye...@gmail.com> wrote:
>>
>> I ran my lda example in a yarn 2.6.2 cluster with spark 1.5.2.
>> it throws exception in line:   Matrix topics = ldaModel.topicsMatrix();
>> But in yarn job history ui, it's successful. What's wrong with it?
>> I submit job with
>> .bin/spark-submit --class Myclass \
>>     --master yarn-client \
>>     --num-executors 2 \
>>     --driver-memory 4g \
>>     --executor-memory 4g \
>>     --executor-cores 1 \
>>
>>
>> My codes:
>>
>>    corpus.cache();
>>
>>
>>     // Cluster the documents into three topics using LDA
>>
>>     DistributedLDAModel ldaModel = (DistributedLDAModel) new
>>
>> LDA().setOptimizer("em").setMaxIterations(iterNumber).setK(topicNumber).run(corpus);
>>
>>
>>     // Output topics. Each is a distribution over words (matching word
>> count vectors)
>>
>>     System.out.println("Learned topics (as distributions over vocab of
>> " + ldaModel.vocabSize()
>>
>>         + " words):");
>>
>>    //Line81, exception here:    Matrix topics = ldaModel.topicsMatrix();
>>
>>     for (int topic = 0; topic < topicNumber; topic++) {
>>
>>       System.out.print("Topic " + topic + ":");
>>
>>       for (int word = 0; word < ldaModel.vocabSize(); word++) {
>>
>>         System.out.print(" " + topics.apply(word, topic));
>>
>>       }
>>
>>       System.out.println();
>>
>>     }
>>
>>
>>     ldaModel.save(sc.sc(), modelPath);
>>
>>
>> Exception in thread "main" java.lang.IndexOutOfBoundsException:
>> (1025,0) not in [-58,58) x [-100,100)
>>
>>         at
>> breeze.linalg.DenseMatrix$mcD$sp.update$mcD$sp(DenseMatrix.scala:112)
>>
>>         at
>> org.apache.spark.mllib.clustering.DistributedLDAModel$$anonfun$topicsMatrix$1.apply(LDAModel.scala:534)
>>
>>         at
>> org.apache.spark.mllib.clustering.DistributedLDAModel$$anonfun$topicsMatrix$1.apply(LDAModel.scala:531)
>>
>>         at
>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>>
>>         at
>> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>>
>>         at
>> org.apache.spark.mllib.clustering.DistributedLDAModel.topicsMatrix$lzycompute(LDAModel.scala:531)
>>
>>         at
>> org.apache.spark.mllib.clustering.DistributedLDAModel.topicsMatrix(LDAModel.scala:523)
>>
>>         at
>> com.mobvoi.knowledgegraph.textmining.lda.ReviewLDA.main(ReviewLDA.java:81)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>
>>         at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
>>
>>         at
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>>
>>         at
>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>>
>>         at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>>
>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> 15/12/23 00:01:16 INFO spark.SparkContext: Invoking stop() from shutdown
>> hook
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: running lda in spark throws exception

Reply via email to