I will use a portion of data and try. will the hdfs block affect spark?(if so, it's hard to reproduce)
On Wed, Dec 30, 2015 at 3:22 AM, Joseph Bradley <jos...@databricks.com> wrote: > Hi Li, > > I'm wondering if you're running into the same bug reported here: > https://issues.apache.org/jira/browse/SPARK-12488 > > I haven't figured out yet what is causing it. Do you have a small corpus > which reproduces this error, and which you can share on the JIRA? If so, > that would help a lot in debugging this failure. > > Thanks! > Joseph > > On Sun, Dec 27, 2015 at 7:26 PM, Li Li <fancye...@gmail.com> wrote: >> >> I ran my lda example in a yarn 2.6.2 cluster with spark 1.5.2. >> it throws exception in line: Matrix topics = ldaModel.topicsMatrix(); >> But in yarn job history ui, it's successful. What's wrong with it? >> I submit job with >> .bin/spark-submit --class Myclass \ >> --master yarn-client \ >> --num-executors 2 \ >> --driver-memory 4g \ >> --executor-memory 4g \ >> --executor-cores 1 \ >> >> >> My codes: >> >> corpus.cache(); >> >> >> // Cluster the documents into three topics using LDA >> >> DistributedLDAModel ldaModel = (DistributedLDAModel) new >> >> LDA().setOptimizer("em").setMaxIterations(iterNumber).setK(topicNumber).run(corpus); >> >> >> // Output topics. Each is a distribution over words (matching word >> count vectors) >> >> System.out.println("Learned topics (as distributions over vocab of >> " + ldaModel.vocabSize() >> >> + " words):"); >> >> //Line81, exception here: Matrix topics = ldaModel.topicsMatrix(); >> >> for (int topic = 0; topic < topicNumber; topic++) { >> >> System.out.print("Topic " + topic + ":"); >> >> for (int word = 0; word < ldaModel.vocabSize(); word++) { >> >> System.out.print(" " + topics.apply(word, topic)); >> >> } >> >> System.out.println(); >> >> } >> >> >> ldaModel.save(sc.sc(), modelPath); >> >> >> Exception in thread "main" java.lang.IndexOutOfBoundsException: >> (1025,0) not in [-58,58) x [-100,100) >> >> at >> breeze.linalg.DenseMatrix$mcD$sp.update$mcD$sp(DenseMatrix.scala:112) >> >> at >> org.apache.spark.mllib.clustering.DistributedLDAModel$$anonfun$topicsMatrix$1.apply(LDAModel.scala:534) >> >> at >> org.apache.spark.mllib.clustering.DistributedLDAModel$$anonfun$topicsMatrix$1.apply(LDAModel.scala:531) >> >> at >> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >> >> at >> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) >> >> at >> org.apache.spark.mllib.clustering.DistributedLDAModel.topicsMatrix$lzycompute(LDAModel.scala:531) >> >> at >> org.apache.spark.mllib.clustering.DistributedLDAModel.topicsMatrix(LDAModel.scala:523) >> >> at >> com.mobvoi.knowledgegraph.textmining.lda.ReviewLDA.main(ReviewLDA.java:81) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:606) >> >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674) >> >> at >> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) >> >> at >> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) >> >> at >> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) >> >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> >> 15/12/23 00:01:16 INFO spark.SparkContext: Invoking stop() from shutdown >> hook >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org