Davide Benedetto created SPARK-36917: ----------------------------------------
Summary: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD Key: SPARK-36917 URL: https://issues.apache.org/jira/browse/SPARK-36917 Project: Spark Issue Type: Bug Components: Spark Core, Spark Submit Affects Versions: 3.1.2 Environment: Ubuntu 20 Spark3.1.2-hadoop3.2 Hadoop 3.1 Reporter: Davide Benedetto My spark Job fails with this error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (davben-lubuntu executor 2): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD My OS Linux Ubuntu 20 is in this way organized: I have two user: /home/davben and /home/hadoop. Into hadoop user I have installed hadoop 3.1 and spark-3.1.2-hadoop3.2. Both users refers to java-8-openjdk Java installation. The Spark job is launched from user davben on eclipse IDE in this way: I create the spark conf and the spark session {code:java} System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop"); SparkConf sparkConf = new SparkConf() .setAppName("simple") .setMaster("yarn") .set("spark.executor.memory", "1g") .set("deploy.mode", "cluster") .set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/") .set("spark.hadoop.fs.defaultFS","hdfs://localhost:9000") .set("spark.hadoop.yarn.resourcemanager.hostname","localhost") .set("spark.hadoop.yarn.resourcemanager.scheduler.address","localhost:8030") .set("spark.hadoop.yarn.resourcemanager.address ","localhost:8032") .set("spark.hadoop.yarn.resourcemanager.webapp.address","localhost:8088") .set("spark.hadoop.yarn.resourcemanager.admin.address","localhost:8083") SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate();{code} Then I create a dataset with two entries: {code:java} List<Row> rows = new ArrayList<>(); rows.add(RowFactory.create("a", "b")); rows.add(RowFactory.create("a", "a")); StructType structType = new StructType(); structType = structType.add("edge_1", DataTypes.StringType, false); structType = structType.add("edge_2", DataTypes.StringType, false); ExpressionEncoder<Row> edgeEncoder = RowEncoder.apply(structType); Dataset<Row> edge = spark.createDataset(rows, edgeEncoder); {code} Then I print the content of the current dataset edge {code:java} edge.show(); {code} Then I perform a map transformation on edge that upper cases the values of the two entries and return the result in edge2 {code:java} Dataset<Row> edge2 = edge.map(new MyFunction2(), edgeEncoder);{code} The following is the code of MyFunction2 {code:java} public class MyFunction2 implements MapFunction<Row, Row>, scala.Serializable { private static final long serialVersionUID = 1L; @Override public Row call(Row v1) throws Exception { String el1 = v1.get(0).toString().toUpperCase(); String el2 = v1.get(1).toString().toUpperCase(); return RowFactory.create(el1,el2); } }{code} Finally I show the content of edge2 {code:java} edge2.show(); {code} I can confirm that, checking on the hadoop UI a localhost:8088, the job is submitted correctly, and what sounds strange is that the first show is returned correctly in my console, but the second one fails returning the up mentioned error. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org