Re: SQL query in scala API

2014-12-06 Thread Arun Luthra
Thanks, I will try this. On Fri, Dec 5, 2014 at 1:19 AM, Cheng Lian lian.cs@gmail.com wrote: Oh, sorry. So neither SQL nor Spark SQL is preferred. Then you may write you own aggregation with aggregateByKey: users.aggregateByKey((0, Set.empty[String]))({ case ((count, seen), user) =

Re: SQL query in scala API

2014-12-05 Thread Cheng Lian
Oh, sorry. So neither SQL nor Spark SQL is preferred. Then you may write you own aggregation with |aggregateByKey|: |users.aggregateByKey((0,Set.empty[String]))({case ((count, seen), user) = (count +1, seen + user) }, {case ((count0, seen0), (count1, seen1)) = (count0 + count1, seen0 ++

Re: SQL query in scala API

2014-12-04 Thread Arun Luthra
Is that Spark SQL? I'm wondering if it's possible without spark SQL. On Wed, Dec 3, 2014 at 8:08 PM, Cheng Lian lian.cs@gmail.com wrote: You may do this: table(users).groupBy('zip)('zip, count('user), countDistinct('user)) On 12/4/14 8:47 AM, Arun Luthra wrote: I'm wondering how to

Re: SQL query in scala API

2014-12-04 Thread Stéphane Verlet
Disclaimer : I am new at Spark I did something similar in a prototype which works but I that did not test at scale yet val agg =3D users.mapValues(_ =3D 1)..aggregateByKey(new CustomAggregation())(CustomAggregation.sequenceOp, CustomAggregation.comboO= p) class CustomAggregation() extends

SQL query in scala API

2014-12-03 Thread Arun Luthra
I'm wondering how to do this kind of SQL query with PairRDDFunctions. SELECT zip, COUNT(user), COUNT(DISTINCT user) FROM users GROUP BY zip In the Spark scala API, I can make an RDD (called users) of key-value pairs where the keys are zip (as in ZIP code) and the values are user id's. Then I can

Re: SQL query in scala API

2014-12-03 Thread Cheng Lian
You may do this: |table(users).groupBy('zip)('zip, count('user), countDistinct('user)) | On 12/4/14 8:47 AM, Arun Luthra wrote: I'm wondering how to do this kind of SQL query with PairRDDFunctions. SELECT zip, COUNT(user), COUNT(DISTINCT user) FROM users GROUP BY zip In the Spark scala API,