[ https://issues.apache.org/jira/browse/SPARK-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maciej Bryński updated SPARK-10392: ----------------------------------- Summary: Pyspark - Wrong DateType support on JDBC connection (was: Pyspark - Wrong DateType support) > Pyspark - Wrong DateType support on JDBC connection > --------------------------------------------------- > > Key: SPARK-10392 > URL: https://issues.apache.org/jira/browse/SPARK-10392 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 1.4.1 > Reporter: Maciej Bryński > > I have following problem. > I created table. > {code} > CREATE TABLE `spark_test` ( > `id` INT(11) NULL, > `date` DATE NULL > ) > COLLATE='utf8_general_ci' > ENGINE=InnoDB > ; > INSERT INTO `spark_test` (`id`, `date`) VALUES (1, '1970-01-01'); > {code} > Then I'm trying to read data - date '1970-01-01' is converted to int. This > makes data frame incompatible with its own schema. > {code} > df = > sqlCtx.read.jdbc("jdbc:mysql://host/sandbox?user=user&password=password", > 'spark_test') > print(df.collect()) > df = sqlCtx.createDataFrame(df.rdd, df.schema) > [Row(id=1, date=0)] > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > <ipython-input-36-ebc1d94e0d8c> in <module>() > 1 df = > sqlCtx.read.jdbc("jdbc:mysql://a2.adpilot.co/sandbox?user=mbrynski&password=CebO3ax4", > 'spark_test') > 2 print(df.collect()) > ----> 3 df = sqlCtx.createDataFrame(df.rdd, df.schema) > /mnt/spark/spark/python/pyspark/sql/context.py in createDataFrame(self, data, > schema, samplingRatio) > 402 > 403 if isinstance(data, RDD): > --> 404 rdd, schema = self._createFromRDD(data, schema, > samplingRatio) > 405 else: > 406 rdd, schema = self._createFromLocal(data, schema) > /mnt/spark/spark/python/pyspark/sql/context.py in _createFromRDD(self, rdd, > schema, samplingRatio) > 296 rows = rdd.take(10) > 297 for row in rows: > --> 298 _verify_type(row, schema) > 299 > 300 else: > /mnt/spark/spark/python/pyspark/sql/types.py in _verify_type(obj, dataType) > 1152 "length of fields (%d)" % (len(obj), > len(dataType.fields))) > 1153 for v, f in zip(obj, dataType.fields): > -> 1154 _verify_type(v, f.dataType) > 1155 > 1156 > /mnt/spark/spark/python/pyspark/sql/types.py in _verify_type(obj, dataType) > 1136 # subclass of them can not be fromInternald in JVM > 1137 if type(obj) not in _acceptable_types[_type]: > -> 1138 raise TypeError("%s can not accept object in type %s" % > (dataType, type(obj))) > 1139 > 1140 if isinstance(dataType, ArrayType): > TypeError: DateType can not accept object in type <class 'int'> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org