RE: Multiple column aggregations

2019-02-11 Thread Shiva Prashanth Vallabhaneni
Hi Sonu, You could use a query that is similar to the below one. You could further optimize the below query by adding a WHERE clause. I would suggest that you benchmark the performance of both approaches (multiple group-by queries vs single query with multiple window functions), before

Recall: spark sql in-clause problem

2018-05-23 Thread Shiva Prashanth Vallabhaneni
Shiva Prashanth Vallabhaneni would like to recall the message, "spark sql in-clause problem". Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person

RE: spark sql in-clause problem

2018-05-22 Thread Shiva Prashanth Vallabhaneni
Assuming the list of values in the “IN” clause is small, you could try using sparkSqlContext.sql(select * from mytable where key = 1 and ( (X,Y) = (1,2) OR (X,Y) = (3,4) ) Another solution could be to load the possible values for X & Y into a table and then using this table in the sub-query;