Hey Friends, Recently I have been using Spark 1.3.1, mainly pyspark.sql. I noticed that the Row object collected directly from a DataFrame is different from the Row object we directly defined from Row(*arg, **kwarg).
>>>from pyspark.sql.types import Row >>>aaa = Row(a=1, b=2, c=Row(a=1, b=2)) >>>tuple(sc.parallelize([aaa]).toDF().collect()[0]) (1, 2, (1, 2)) >>>tuple(aaa) (1, 2, Row(a=1, b=2)) This matters to me because I wanted to be able to create a DataFrame with one of the columns being a Row object by sqlcontext.createDataFrame(data, schema) where I specifically pass in the schema. However, if the data is RDD of Row objects like "aaa" in my example, it'll fail in __verify_type function. Thank you, Wei