Jeff Stein created SPARK-17975: ---------------------------------- Summary: EMLDAOptimizer fails with ClassCastException on YARN Key: SPARK-17975 URL: https://issues.apache.org/jira/browse/SPARK-17975 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 2.0.1 Environment: Centos 6, CDH 5.7, Java 1.7u80 Reporter: Jeff Stein
I'm able to reproduce the error consistently with a 2000 record text file with each record having 1-5 terms and checkpointing enabled. It looks like the problem was introduced with the resolution for SPARK-13355. The EdgeRDD class seems to be lying about it's type in a way that causes RDD.mapPartitionsWithIndex method to be unusable when it's referenced as an RDD of Edge elements. {code} val spark = SparkSession.builder.appName("lda").getOrCreate() spark.sparkContext.setCheckpointDir("hdfs:///tmp/checkpoints") val data: RDD[(Long, Vector)] = // snip data.setName("data").cache() val lda = new LDA val optimizer = new EMLDAOptimizer lda.setOptimizer(optimizer) .setK(10) .setMaxIterations(400) .setAlpha(-1) .setBeta(-1) .setCheckpointInterval(7) val ldaModel = lda.run(data) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org