[ https://issues.apache.org/jira/browse/SPARK-16542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takuya Ueshin reassigned SPARK-16542: ------------------------------------- Assignee: Xiang Gao > bugs about types that result an array of null when creating dataframe using > python > ---------------------------------------------------------------------------------- > > Key: SPARK-16542 > URL: https://issues.apache.org/jira/browse/SPARK-16542 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Reporter: Xiang Gao > Assignee: Xiang Gao > > This is a bugs about types that result an array of null when creating > DataFrame using python. > Python's array.array have richer type than python itself, e.g. we can have > {{array('f',[1,2,3])}} and {{array('d',[1,2,3])}}. Codes in spark-sql didn't > take this into consideration which might cause a problem that you get an > array of null values when you have {{array('f')}} in your rows. > A simple code to reproduce this is: > {code} > from pyspark import SparkContext > from pyspark.sql import SQLContext,Row,DataFrame > from array import array > sc = SparkContext() > sqlContext = SQLContext(sc) > row1 = Row(floatarray=array('f',[1,2,3]), doublearray=array('d',[1,2,3])) > rows = sc.parallelize([ row1 ]) > df = sqlContext.createDataFrame(rows) > df.show() > {code} > which have output > {code} > +---------------+------------------+ > | doublearray| floatarray| > +---------------+------------------+ > |[1.0, 2.0, 3.0]|[null, null, null]| > +---------------+------------------+ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org