This was fixed by 1.5, could you download 1.5-RC3 to test this? On Thu, Sep 3, 2015 at 4:45 PM, Wei Chen <wei.chen.ri...@gmail.com> wrote: > Hey Friends, > > Recently I have been using Spark 1.3.1, mainly pyspark.sql. I noticed that > the Row object collected directly from a DataFrame is different from the Row > object we directly defined from Row(*arg, **kwarg). > >>>>from pyspark.sql.types import Row >>>>aaa = Row(a=1, b=2, c=Row(a=1, b=2)) >>>>tuple(sc.parallelize([aaa]).toDF().collect()[0]) > > (1, 2, (1, 2)) > >>>>tuple(aaa) > > (1, 2, Row(a=1, b=2)) > > > This matters to me because I wanted to be able to create a DataFrame with > one of the columns being a Row object by sqlcontext.createDataFrame(data, > schema) where I specifically pass in the schema. However, if the data is RDD > of Row objects like "aaa" in my example, it'll fail in __verify_type > function. > > > > Thank you, > > Wei
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org