Hi all, As we know that parquet is stored in columnar format and filtering on the column will require that column only instead of the complete record.
So when we are creating Dataset[Class] and doing group by on the column vs same on steps DataFrame is performing differently. Operations on Dataset is causing OOM issues with same execution parameters. Thanks -- Shivam Sharma Indian Institute Of Information Technology, Design and Manufacturing Jabalpur Email:- 28shivamsha...@gmail.com LinkedIn:-*https://www.linkedin.com/in/28shivamsharma <https://www.linkedin.com/in/28shivamsharma>*