Re: SQL query in scala API

Cheng Lian Wed, 03 Dec 2014 20:11:17 -0800

You may do this:

|table("users").groupBy('zip)('zip, count('user), countDistinct('user))
|


On 12/4/14 8:47 AM, Arun Luthra wrote:

I'm wondering how to do this kind of SQL query with PairRDDFunctions.

SELECT zip, COUNT(user), COUNT(DISTINCT user)
FROM users
GROUP BY zip
In the Spark scala API, I can make an RDD (called "users") ofkey-value pairs where the keys are zip (as in ZIP code) and the valuesare user id's. Then I can compute the count and distinct count like this:
val count = users.mapValues(_ => 1).reduceByKey(_ + _)
val countDistinct = users.distinct().mapValues(_ => 1).reduceByKey(_ + _)
Then, if I want count and countDistinct in the same table, I have tojoin them on the key.
Is there a way to do this without doing a join (and without using SQLor spark SQL)?
Arun

Re: SQL query in scala API

Reply via email to