[ https://issues.apache.org/jira/browse/SPARK-21230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065347#comment-16065347 ]
Michael Kunkel commented on SPARK-21230: ---------------------------------------- The problem is with the Spark Encoder of type enum. So the problem is spark as the post attempts to point out. > Spark Encoder with mysql Enum and data truncated Error > ------------------------------------------------------ > > Key: SPARK-21230 > URL: https://issues.apache.org/jira/browse/SPARK-21230 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 2.1.1 > Environment: macosX > Reporter: Michael Kunkel > > I am using Spark via Java for a MYSQL/ML(machine learning) project. > In the mysql database, I have a column "status_change_type" of type enum = > {broke, fixed} in a table called "status_change" in a DB called "test". > I have an object StatusChangeDB that constructs the needed structure for the > table, however for the "status_change_type", I constructed it as a String. I > know the bytes from MYSQL enum to Java string are much different, but I am > using Spark, so the encoder does not recognize enums properly. However when I > try to set the value of the enum via a Java string, I receive the "data > truncated" error > h5. org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in > stage 4.0 (TID 9, localhost, executor driver): java.sql.BatchUpdateException: > Data truncated for column 'status_change_type' at row 1 at > com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2055) > I have tried to use enum for "status_change_type", however it fails with a > stack trace of > h5. Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException > at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at > org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126) > at > org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at > org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125) > at > org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127) > at > org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at > org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125) > at > org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127) > at > org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) ... ... > h5. > I have tried to use the jdbc setting "jdbcCompliantTruncation=false" but this > does nothing as I get the same error of "data truncated" as first stated. > Here are my jdbc options map, in case I am using the > "jdbcCompliantTruncation=false" incorrectly. > public static Map<String, String> jdbcOptions() { > Map<String, String> jdbcOptions = new HashMap<String, String>(); > jdbcOptions.put("url", > "jdbc:mysql://localhost:3306/test?jdbcCompliantTruncation=false"); > jdbcOptions.put("driver", "com.mysql.jdbc.Driver"); > jdbcOptions.put("dbtable", "status_change"); > jdbcOptions.put("user", "root"); > jdbcOptions.put("password", ""); > return jdbcOptions; > } > Here is the Spark method for inserting into the mysql DB > private void insertMYSQLQuery(Dataset<Row> changeDF) { > try { > > changeDF.write().mode(SaveMode.Append).jdbc(SparkManager.jdbcAppendOptions(), > "status_change", > new java.util.Properties()); > } catch (Exception e) { > System.out.println(e); > } > } > where jdbcAppendOptions uses the jdbcOptions methods as: > public static String jdbcAppendOptions() { > return SparkManager.jdbcOptions().get("url") + "&user=" + > SparkManager.jdbcOptions().get("user") + "&password=" > + SparkManager.jdbcOptions().get("password"); > } > How do I achieve getting the values of type enum into the mysqlDB using > spark, or avoiding this "data truncated" error? > My only other thought would be to change the DB itself to use VARCHAR, but > the project leader is not to happy with the idea. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org