Re: top-k function for Window

2017-01-04 Thread Georg Heiler
ECT time_bucket, > >identifier1, > >identifier2, > >count(identifier2) as myCount > > FROM table > > GROUP BY time_bucket, > >identifier1, > >identifier2 > > ORDER BY time_bucket, > >

Re: top-k function for Window

2017-01-04 Thread Koert Kuipers
unt DESC) as rowNum >>> >>> FROM tablename) tmp >>> >>> WHERE rowNum <=4 >>> >>> ORDER BY time_bucket, identifier1, rowNum >>> >>> >>> >>> The count and order by: >>> >

RE: top-k function for Window

2017-01-03 Thread Mendelson, Assaf
03, 2017 8:03 PM To: Mendelson, Assaf Cc: user Subject: Re: top-k function for Window > Furthermore, in your example you don’t even need a window function, you can > simply use groupby and explode Can you clarify? You need to sort somehow (be it map-side sorting or reduce-side s

Re: top-k function for Window

2017-01-03 Thread Koert Kuipers
>> >> >> SELECT time_bucket, >> >> identifier1, >> >>identifier2, >> >>count(identifier2) as myCount >> >> FROM table >> >> GROUP BY time_bucket, >> >>identifier1, >&g

Re: top-k function for Window

2017-01-03 Thread Andy Dang
t and order by: > > > > > > SELECT time_bucket, > >identifier1, > >identifier2, > >count(identifier2) as myCount > > FROM table > > GROUP BY time_bucket, > >identifier1, > >identifier2 > > OR

Re: top-k function for Window

2017-01-03 Thread HENSLEE, AUSTIN L
bucket, identifier1, identifier2 ORDER BY time_bucket, identifier1, count(identifier2) DESC From: Andy Dang <nam...@gmail.com> Date: Tuesday, January 3, 2017 at 7:06 AM To: user <user@spark.apache.org> Subject: top-k function for Window Hi all, W

RE: top-k function for Window

2017-01-03 Thread Mendelson, Assaf
[mailto:nam...@gmail.com] Sent: Tuesday, January 03, 2017 3:07 PM To: user Subject: top-k function for Window Hi all, What's the best way to do top-k with Windowing in Dataset world? I have a snippet of code that filters the data to the top-k, but with skewed keys: val windowSpec

top-k function for Window

2017-01-03 Thread Andy Dang
Hi all, What's the best way to do top-k with Windowing in Dataset world? I have a snippet of code that filters the data to the top-k, but with skewed keys: val windowSpec = Window.parititionBy(skewedKeys).orderBy(dateTime) val rank = row_number().over(windowSpec) input.withColumn("rank",