different Row objects?

2015-09-03 Thread Wei Chen
Hey Friends,

Recently I have been using Spark 1.3.1, mainly pyspark.sql. I noticed that
the Row object collected directly from a DataFrame is different from the
Row object we directly defined from Row(*arg, **kwarg).

>>>from pyspark.sql.types import Row
>>>aaa = Row(a=1, b=2, c=Row(a=1, b=2))
>>>tuple(sc.parallelize([aaa]).toDF().collect()[0])

(1, 2, (1, 2))

>>>tuple(aaa)

(1, 2, Row(a=1, b=2))


This matters to me because I wanted to be able to create a DataFrame
with one of the columns being a Row object by
sqlcontext.createDataFrame(data, schema) where I specifically pass in
the schema. However, if the data is RDD of Row objects like "aaa" in
my example, it'll fail in __verify_type function.



Thank you,

Wei


Re: different Row objects?

2015-09-03 Thread Davies Liu
This was fixed by 1.5, could you download 1.5-RC3 to test this?

On Thu, Sep 3, 2015 at 4:45 PM, Wei Chen  wrote:
> Hey Friends,
>
> Recently I have been using Spark 1.3.1, mainly pyspark.sql. I noticed that
> the Row object collected directly from a DataFrame is different from the Row
> object we directly defined from Row(*arg, **kwarg).
>
from pyspark.sql.types import Row
aaa = Row(a=1, b=2, c=Row(a=1, b=2))
tuple(sc.parallelize([aaa]).toDF().collect()[0])
>
> (1, 2, (1, 2))
>
tuple(aaa)
>
> (1, 2, Row(a=1, b=2))
>
>
> This matters to me because I wanted to be able to create a DataFrame with
> one of the columns being a Row object by sqlcontext.createDataFrame(data,
> schema) where I specifically pass in the schema. However, if the data is RDD
> of Row objects like "aaa" in my example, it'll fail in __verify_type
> function.
>
>
>
> Thank you,
>
> Wei

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org