[ https://issues.apache.org/jira/browse/SPARK-33103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Justin Mays resolved SPARK-33103. --------------------------------- Resolution: Not A Problem > Custom Schema with Custom RDD reorders columns when more than 4 added > --------------------------------------------------------------------- > > Key: SPARK-33103 > URL: https://issues.apache.org/jira/browse/SPARK-33103 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.1 > Environment: Java Application > Reporter: Justin Mays > Priority: Major > > I have a custom RDD written in Java that uses a custom schema. Everything > appears to work fine with using 4 columns, but when i add a 5th column, > calling show() fails with > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > java.lang.Long is not a valid external type for schema of > here is the schema definition in java: > StructType schema = new StructType() StructType schema = new StructType() > .add("recordId", DataTypes.LongType, false) .add("col1", > DataTypes.DoubleType, false) .add("col2", DataTypes.DoubleType, false) > .add("col3", DataTypes.IntegerType, false) .add("col4", > DataTypes.IntegerType, false); > > Here is the printout of schema.printTreeString(); > == Physical Plan == > *(1) Scan dw [recordId#0L,col1#1,col2#2,col3#3,col4#4] PushedFilters: [], > ReadSchema: struct<recordId:bigint,col1:double,col2:double,col3:int,col4:int> > > I hardcoded a return in my Row object with values matching the schema: > @Override @Override public Object get(int i) \{ switch(i) { case 0: return > 0L; case 1: return 1.1911950001644689D; case 2: return 9.100000949955666E9D; > case 3: return 476; case 4: return 500; } return 0L; } > > Here is the output of the show command: > 15:30:26.875 ERROR org.apache.spark.executor.Executor - Exception in task 0.0 > in stage 0.0 (TID 0)15:30:26.875 ERROR org.apache.spark.executor.Executor - > Exception in task 0.0 in stage 0.0 (TID 0)java.lang.RuntimeException: Error > while encoding: java.lang.RuntimeException: java.lang.Long is not a valid > external type for schema of > doublevalidateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 0, col1), DoubleType) AS > col1#30validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 1, recordId), LongType) AS > recordId#31Lvalidateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 2, col2), DoubleType) AS > col2#32validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 3, col3), IntegerType) AS > col3#33validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 4, col4), IntegerType) AS col4#34 at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215) > ~[spark-catalyst_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:197) > ~[spark-catalyst_2.12-3.0.1.jar:3.0.1] at > scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > ~[scala-library-2.12.10.jar:?] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) ~[?:?] at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > ~[spark-sql_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) > ~[spark-sql_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) > ~[spark-sql_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872) > ~[spark-core_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872) > ~[spark-core_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > ~[spark-core_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) > ~[spark-core_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.rdd.RDD.iterator(RDD.scala:313) > ~[spark-core_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > ~[spark-core_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.scheduler.Task.run(Task.scala:127) > ~[spark-core_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) > ~[spark-core_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) > ~[spark-core_2.12-3.0.1.jar:3.0.1] at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) > [spark-core_2.12-3.0.1.jar:3.0.1] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_265] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_265] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]Caused > by: java.lang.RuntimeException: java.lang.Long is not a valid external type > for schema of double at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown > Source) ~[?:?] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[?:?] at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211) > ~[spark-catalyst_2.12-3.0.1.jar:3.0.1] ... 19 more > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org