ECT time_bucket,
>
>identifier1,
>
>identifier2,
>
>count(identifier2) as myCount
>
> FROM table
>
> GROUP BY time_bucket,
>
>identifier1,
>
>identifier2
>
> ORDER BY time_bucket,
>
>
unt DESC) as rowNum
>>>
>>> FROM tablename) tmp
>>>
>>> WHERE rowNum <=4
>>>
>>> ORDER BY time_bucket, identifier1, rowNum
>>>
>>>
>>>
>>> The count and order by:
>>>
>
03, 2017 8:03 PM
To: Mendelson, Assaf
Cc: user
Subject: Re: top-k function for Window
> Furthermore, in your example you don’t even need a window function, you can
> simply use groupby and explode
Can you clarify? You need to sort somehow (be it map-side sorting or
reduce-side s
>>
>>
>> SELECT time_bucket,
>>
>> identifier1,
>>
>>identifier2,
>>
>>count(identifier2) as myCount
>>
>> FROM table
>>
>> GROUP BY time_bucket,
>>
>>identifier1,
>&g
t and order by:
>
>
>
>
>
> SELECT time_bucket,
>
>identifier1,
>
>identifier2,
>
>count(identifier2) as myCount
>
> FROM table
>
> GROUP BY time_bucket,
>
>identifier1,
>
>identifier2
>
> OR
bucket,
identifier1,
identifier2
ORDER BY time_bucket,
identifier1,
count(identifier2) DESC
From: Andy Dang <nam...@gmail.com>
Date: Tuesday, January 3, 2017 at 7:06 AM
To: user <user@spark.apache.org>
Subject: top-k function for Window
Hi all,
W
[mailto:nam...@gmail.com]
Sent: Tuesday, January 03, 2017 3:07 PM
To: user
Subject: top-k function for Window
Hi all,
What's the best way to do top-k with Windowing in Dataset world?
I have a snippet of code that filters the data to the top-k, but with skewed
keys:
val windowSpec
Hi all,
What's the best way to do top-k with Windowing in Dataset world?
I have a snippet of code that filters the data to the top-k, but with
skewed keys:
val windowSpec = Window.parititionBy(skewedKeys).orderBy(dateTime)
val rank = row_number().over(windowSpec)
input.withColumn("rank",