Jeff Stein created SPARK-17975:
----------------------------------

             Summary: EMLDAOptimizer fails with ClassCastException on YARN
                 Key: SPARK-17975
                 URL: https://issues.apache.org/jira/browse/SPARK-17975
             Project: Spark
          Issue Type: Bug
          Components: MLlib
    Affects Versions: 2.0.1
         Environment: Centos 6, CDH 5.7, Java 1.7u80
            Reporter: Jeff Stein


I'm able to reproduce the error consistently with a 2000 record text file with 
each record having 1-5 terms and checkpointing enabled. It looks like the 
problem was introduced with the resolution for SPARK-13355.

The EdgeRDD class seems to be lying about it's type in a way that causes 
RDD.mapPartitionsWithIndex method to be unusable when it's referenced as an RDD 
of Edge elements.

{code}
val spark = SparkSession.builder.appName("lda").getOrCreate()
spark.sparkContext.setCheckpointDir("hdfs:///tmp/checkpoints")
val data: RDD[(Long, Vector)] = // snip
data.setName("data").cache()
val lda = new LDA
val optimizer = new EMLDAOptimizer
lda.setOptimizer(optimizer)
  .setK(10)
  .setMaxIterations(400)
  .setAlpha(-1)
  .setBeta(-1)
  .setCheckpointInterval(7)
val ldaModel = lda.run(data)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to