Re: Customized Aggregation Query on Spark SQL

ayan guha Fri, 24 Apr 2015 22:05:42 -0700

Here you go....

t =
[["A",10,"A10"],["A",20,"A20"],["A",30,"A30"],["B",15,"B15"],["C",10,"C10"],["C",20,"C200"]]
    TRDD = sc.parallelize(t).map(lambda t:
Row(name=str(t[0]),age=int(t[1]),other=str(t[2])))
    TDF = ssc.createDataFrame(TRDD)
    print TDF.printSchema()
    TDF.registerTempTable("tab")
    JN = ssc.sql("select t.name,t.age,t.other from tab t inner join (select
name,max(age) age from tab group by name) t1 on t.name=t1.name and
t.age=t1.age")
    for i in JN.collect():
        print i


Result:
Row(name=u'A', age=30, other=u'A30')
Row(name=u'B', age=15, other=u'B15')
Row(name=u'C', age=20, other=u'C200')

On Sat, Apr 25, 2015 at 2:48 PM, Wenlei Xie <wenlei....@gmail.com> wrote:

> Sure. A simple example of data would be (there might be many other columns)
>
> Name             Age        Other
> A                   10            A10
> A                    20           A20
> A                    30           A30
> B                    15           B15
> C                    10            C10
> C                    20           C20
>
> The desired output would be
> Name          Age        Other
> A                 30           A30
> B                 15           B15
> C                 20           C20
>
> Thank you so much for the help!
>
> On Sat, Apr 25, 2015 at 12:41 AM, ayan guha <guha.a...@gmail.com> wrote:
>
>> can you give an example set of data and desired output>
>>
>> On Sat, Apr 25, 2015 at 2:32 PM, Wenlei Xie <wenlei....@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I would like to answer the following customized aggregation query on
>>> Spark SQL
>>> 1. Group the table by the value of Name
>>> 2. For each group, choose the tuple with the max value of Age (the ages
>>> are distinct for every name)
>>>
>>> I am wondering what's the best way to do it on Spark SQL? Should I use
>>> UDAF? Previously I am doing something like the following on Spark:
>>>
>>> personRDD.map(t => (t.name, t))
>>>     .reduceByKey((a, b) => if (a.age > b.age) a else b)
>>>
>>> Thank you!
>>>
>>> Best,
>>> Wenlei
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>
>
> --
> Wenlei Xie (谢文磊)
>
> Ph.D. Candidate
> Department of Computer Science
> 456 Gates Hall, Cornell University
> Ithaca, NY 14853, USA
> Email: wenlei....@gmail.com
>



-- 
Best Regards,
Ayan Guha

Re: Customized Aggregation Query on Spark SQL

Reply via email to