Hi Sonu,
You could use a query that is similar to the below one. You could further
optimize the below query by adding a WHERE clause. I would suggest that you
benchmark the performance of both approaches (multiple group-by queries vs
single query with multiple window functions), before
Shiva Prashanth Vallabhaneni would like to recall the message, "spark sql
in-clause problem".
Any comments or statements made in this email are not necessarily those of
Tavant Technologies. The information transmitted is intended only for the
person
Assuming the list of values in the “IN” clause is small, you could try using
sparkSqlContext.sql(select * from mytable where key = 1 and ( (X,Y) = (1,2) OR
(X,Y) = (3,4) )
Another solution could be to load the possible values for X & Y into a table
and then using this table in the sub-query;