[jira] [Created] (SPARK-21230) Spark Encoder with mysql Enum and data truncated Error

Michael Kunkel (JIRA) Tue, 27 Jun 2017 12:27:17 -0700

Michael Kunkel created SPARK-21230:
--------------------------------------

             Summary: Spark Encoder with mysql Enum and data truncated Error
                 Key: SPARK-21230
                 URL: https://issues.apache.org/jira/browse/SPARK-21230
             Project: Spark
          Issue Type: Bug
          Components: Java API
    Affects Versions: 2.1.1
         Environment: macosX
            Reporter: Michael Kunkel





I am using Spark via Java for a MYSQL/ML(machine learning) project.

In the mysql database, I have a column "status_change_type" of type enum = 
{broke, fixed} in a table called "status_change" in a DB called "test".

I have an object StatusChangeDB that constructs the needed structure for the 
table, however for the "status_change_type", I constructed it as a String. I 
know the bytes from MYSQL enum to Java string are much different, but I am 
using Spark, so the encoder does not recognize enums properly. However when I 
try to set the value of the enum via a Java string, I receive the "data 
truncated" error

h5.     org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 
(TID 9, localhost, executor driver): java.sql.BatchUpdateException: Data 
truncated for column 'status_change_type' at row 1 at 
com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2055)

I have tried to use enum for "status_change_type", however it fails with a 
stack trace of

h5.     Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException 
at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126)
 at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125)
 at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127)
 at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125)
 at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:127)
 at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) ... ...
h5. 
I have tried to use the jdbc setting "jdbcCompliantTruncation=false" but this 
does nothing as I get the same error of "data truncated" as first stated. Here 
are my jdbc options map, in case I am using the "jdbcCompliantTruncation=false" 
incorrectly.

public static Map<String, String> jdbcOptions() {
    Map<String, String> jdbcOptions = new HashMap<String, String>();
    jdbcOptions.put("url", 
"jdbc:mysql://localhost:3306/test?jdbcCompliantTruncation=false");
    jdbcOptions.put("driver", "com.mysql.jdbc.Driver");
    jdbcOptions.put("dbtable", "status_change");
    jdbcOptions.put("user", "root");
    jdbcOptions.put("password", "");
    return jdbcOptions;
}

Here is the Spark method for inserting into the mysql DB

private void insertMYSQLQuery(Dataset<Row> changeDF) {
    try {
        
changeDF.write().mode(SaveMode.Append).jdbc(SparkManager.jdbcAppendOptions(), 
"status_change",
                new java.util.Properties());
    } catch (Exception e) {
        System.out.println(e);
    }
}

where jdbcAppendOptions uses the jdbcOptions methods as:

public static String jdbcAppendOptions() {

    return SparkManager.jdbcOptions().get("url") + "&user=" + 
SparkManager.jdbcOptions().get("user") + "&password="
            + SparkManager.jdbcOptions().get("password");

}

How do I achieve getting the values of type enum into the mysqlDB using spark, 
or avoiding this "data truncated" error?

My only other thought would be to change the DB itself to use VARCHAR, but the 
project leader is not to happy with the idea.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21230) Spark Encoder with mysql Enum and data truncated Error

Reply via email to