My bad. Aggregation Query:
# Write your MySQL query statement below SELECT D.Name AS Department, E.Name AS Employee, E.Salary AS Salary FROM Employee E INNER JOIN Department D ON E.DepartmentId = D.Id WHERE (SELECT COUNT(DISTINCT(Salary)) FROM Employee WHERE DepartmentId = E.DepartmentId AND Salary > E.Salary) < 3 ORDER by E.DepartmentId, E.Salary DESC Time Taken: 1212 ms Windowing Query: select Department,Employee,Salary from ( select d.name as Department, e.name as Employee,e.salary as Salary,dense_rank() over(partition by d.name order by e.salary desc) as rnk from Department d join Employee e on e.departmentId=d.id ) a where rnk<=3 Time Taken: 790 ms Thanks, Sid On Sun, Feb 27, 2022 at 11:35 PM Sean Owen <sro...@gmail.com> wrote: > Those two queries are identical? > > On Sun, Feb 27, 2022 at 11:30 AM Sid <flinkbyhe...@gmail.com> wrote: > >> Hi Team, >> >> I am aware that if windowing functions are used, then at first it loads >> the entire dataset into one window,scans and then performs the other >> mentioned operations for that particular window which could be slower when >> dealing with trillions / billions of records. >> >> I did a POC where I used an example to find the max 3 highest salary for >> an employee per department. So, I wrote a below queries and compared the >> time for it: >> >> Windowing Query: >> >> select Department,Employee,Salary from ( >> select d.name as Department, e.name as Employee,e.salary as >> Salary,dense_rank() over(partition by d.name order by e.salary desc) as >> rnk from Department d join Employee e on e.departmentId=d.id ) a where >> rnk<=3 >> >> Time taken: 790 ms >> >> Aggregation Query: >> >> select Department,Employee,Salary from ( >> select d.name as Department, e.name as Employee,e.salary as >> Salary,dense_rank() over(partition by d.name order by e.salary desc) as >> rnk from Department d join Employee e on e.departmentId=d.id ) a where >> rnk<=3 >> >> Time taken: 1212 ms >> >> But as per my understanding, the aggregation should have run faster. So, >> my whole point is if the dataset is huge I should force some kind of map >> reduce jobs like we have an option called df.groupby().reduceByGroups() >> >> So I think the aggregation query is taking more time since the dataset >> size here is smaller and as we all know that map reduce works faster when >> there is a huge volume of data. Haven't tested it yet on big data but >> needed some expert guidance over here. >> >> Please correct me if I am wrong. >> >> TIA, >> Sid >> >> >> >>