Hi all,

As we know that parquet is stored in columnar format and filtering on the
column will require that column only instead of the complete record.

So when we are creating Dataset[Class] and doing group by on the column vs
same on steps DataFrame is performing differently. Operations on Dataset is
causing OOM issues with same execution parameters.

Thanks

-- 
Shivam Sharma
Indian Institute Of Information Technology, Design and Manufacturing
Jabalpur
Email:- 28shivamsha...@gmail.com
LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
<https://www.linkedin.com/in/28shivamsharma>*

Reply via email to