Is that Spark SQL? I'm wondering if it's possible without spark SQL.

On Wed, Dec 3, 2014 at 8:08 PM, Cheng Lian <lian.cs....@gmail.com> wrote:

>  You may do this:
>
> table("users").groupBy('zip)('zip, count('user), countDistinct('user))
>
> On 12/4/14 8:47 AM, Arun Luthra wrote:
>
>   I'm wondering how to do this kind of SQL query with PairRDDFunctions.
>
>  SELECT zip, COUNT(user), COUNT(DISTINCT user)
> FROM users
> GROUP BY zip
>
>  In the Spark scala API, I can make an RDD (called "users") of key-value
> pairs where the keys are zip (as in ZIP code) and the values are user id's.
> Then I can compute the count and distinct count like this:
>
>  val count = users.mapValues(_ => 1).reduceByKey(_ + _)
> val countDistinct = users.distinct().mapValues(_ => 1).reduceByKey(_ + _)
>
>  Then, if I want count and countDistinct in the same table, I have to
> join them on the key.
>
>  Is there a way to do this without doing a join (and without using SQL or
> spark SQL)?
>
>  Arun
>
>   ​
>

Reply via email to