Re: Column renaming after DataFrame.groupBy

2015-04-21 Thread Reynold Xin
You can use the more verbose syntax:

d.groupBy("_1").agg(d("_1"), sum("_1").as("sum_1"), sum("_2").as("sum_2"))

On Tue, Apr 21, 2015 at 1:06 AM, Justin Yip  wrote:

> Hello,
>
> I would like rename a column after aggregation. In the following code, the
> column name is "SUM(_1#179)", is there a way to rename it to a more
> friendly name?
>
> scala> val d = sqlContext.createDataFrame(Seq((1, 2), (1, 3), (2, 10)))
> scala> d.groupBy("_1").sum().printSchema
> root
>  |-- _1: integer (nullable = false)
>  |-- SUM(_1#179): long (nullable = true)
>  |-- SUM(_2#180): long (nullable = true)
>
> Thanks.
>
> Justin
>
> --
> View this message in context: Column renaming after DataFrame.groupBy
> <http://apache-spark-user-list.1001560.n3.nabble.com/Column-renaming-after-DataFrame-groupBy-tp22586.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>


Re: Column renaming after DataFrame.groupBy

2015-04-21 Thread ayan guha
Hi

There are 2 ways of doing it.

1. Using SQL - this method directly creates another dataframe object.
2. Using methods of the DF object, but in that case you have to provide the
schema through a row object. In this case you need to explicitly call
createDataFrame again which will infer the schema for you.

Here is python code
Method 1:
userStat = ssc.sql("select userId,sum(rating) total  from ratings group
by userId")
print userStat.collect()[10]
userStat.printSchema()

Method 2:
userStatDF = userStat.groupBy("userId").sum().map(lambda t:
Row(userId=t[0],total=t[1]))
userStatDFSchema = ssc.createDataFrame(userStatDF)
print type(userStatDFSchema)
print userStatDFSchema.printSchema()

Output:
Row(userId=233, total=478)
root
 |-- userId: long (nullable = true)
 |-- total: long (nullable = true)


root
 |-- total: long (nullable = true)
 |-- userId: long (nullable = true)

As you can see, the downside of Method 2 is order of the fields are now
inferred (and most likely created in a dict under the hood) so ordered
alphabetically.

Hope this helps

Best
Ayan

On Tue, Apr 21, 2015 at 6:06 PM, Justin Yip  wrote:

> Hello,
>
> I would like rename a column after aggregation. In the following code, the
> column name is "SUM(_1#179)", is there a way to rename it to a more
> friendly name?
>
> scala> val d = sqlContext.createDataFrame(Seq((1, 2), (1, 3), (2, 10)))
> scala> d.groupBy("_1").sum().printSchema
> root
>  |-- _1: integer (nullable = false)
>  |-- SUM(_1#179): long (nullable = true)
>  |-- SUM(_2#180): long (nullable = true)
>
> Thanks.
>
> Justin
>
> --
> View this message in context: Column renaming after DataFrame.groupBy
> <http://apache-spark-user-list.1001560.n3.nabble.com/Column-renaming-after-DataFrame-groupBy-tp22586.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>



-- 
Best Regards,
Ayan Guha


Column renaming after DataFrame.groupBy

2015-04-21 Thread Justin Yip
Hello,

I would like rename a column after aggregation. In the following code, the
column name is "SUM(_1#179)", is there a way to rename it to a more
friendly name?

scala> val d = sqlContext.createDataFrame(Seq((1, 2), (1, 3), (2, 10)))
scala> d.groupBy("_1").sum().printSchema
root
 |-- _1: integer (nullable = false)
 |-- SUM(_1#179): long (nullable = true)
 |-- SUM(_2#180): long (nullable = true)

Thanks.

Justin




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Column-renaming-after-DataFrame-groupBy-tp22586.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.