[jira] [Commented] (SPARK-21230) Spark Encoder with mysql Enum and data truncated Error

Michael Kunkel (JIRA) Tue, 27 Jun 2017 12:41:19 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065347#comment-16065347
 ]


Michael Kunkel commented on SPARK-21230:
----------------------------------------

The problem is with the Spark Encoder of type enum. So the problem is spark as 
the post attempts to point out.

> Spark Encoder with mysql Enum and data truncated Error
> ------------------------------------------------------
>
>                 Key: SPARK-21230
>                 URL: https://issues.apache.org/jira/browse/SPARK-21230
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.1.1
>         Environment: macosX
>            Reporter: Michael Kunkel
>
> I am using Spark via Java for a MYSQL/ML(machine learning) project.
> In the mysql database, I have a column "status_change_type" of type enum = 
> {broke, fixed} in a table called "status_change" in a DB called "test".
> I have an object StatusChangeDB that constructs the needed structure for the 
> table, however for the "status_change_type", I constructed it as a String. I 
> know the bytes from MYSQL enum to Java string are much different, but I am 
> using Spark, so the encoder does not recognize enums properly. However when I 
> try to set the value of the enum via a Java string, I receive the "data 
> truncated" error
> h5.     org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 4.0 (TID 9, localhost, executor driver): java.sql.BatchUpdateException: 
> Data truncated for column 'status_change_type' at row 1 at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2055)
> I have tried to use enum for "status_change_type", however it fails with a 
> stack trace of
> h5.     Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException 
> at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at 
> org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126)
>  at 
> org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at 
> org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125)
>  at 
> org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127)
>  at 
> org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at 
> org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125)
>  at 
> org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127)
>  at 
> org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) ... ...
> h5. 
> I have tried to use the jdbc setting "jdbcCompliantTruncation=false" but this 
> does nothing as I get the same error of "data truncated" as first stated. 
> Here are my jdbc options map, in case I am using the 
> "jdbcCompliantTruncation=false" incorrectly.
> public static Map<String, String> jdbcOptions() {
>     Map<String, String> jdbcOptions = new HashMap<String, String>();
>     jdbcOptions.put("url", 
> "jdbc:mysql://localhost:3306/test?jdbcCompliantTruncation=false");
>     jdbcOptions.put("driver", "com.mysql.jdbc.Driver");
>     jdbcOptions.put("dbtable", "status_change");
>     jdbcOptions.put("user", "root");
>     jdbcOptions.put("password", "");
>     return jdbcOptions;
> }
> Here is the Spark method for inserting into the mysql DB
> private void insertMYSQLQuery(Dataset<Row> changeDF) {
>     try {
>         
> changeDF.write().mode(SaveMode.Append).jdbc(SparkManager.jdbcAppendOptions(), 
> "status_change",
>                 new java.util.Properties());
>     } catch (Exception e) {
>         System.out.println(e);
>     }
> }
> where jdbcAppendOptions uses the jdbcOptions methods as:
> public static String jdbcAppendOptions() {
>     return SparkManager.jdbcOptions().get("url") + "&user=" + 
> SparkManager.jdbcOptions().get("user") + "&password="
>             + SparkManager.jdbcOptions().get("password");
> }
> How do I achieve getting the values of type enum into the mysqlDB using 
> spark, or avoiding this "data truncated" error?
> My only other thought would be to change the DB itself to use VARCHAR, but 
> the project leader is not to happy with the idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21230) Spark Encoder with mysql Enum and data truncated Error

Reply via email to