Spark Doubts

Sid Fri, 24 Jun 2022 23:42:47 -0700

Hi Team,

I have various doubts as below:


1) Can I apply predicate pushdown filters if I have data stored in S3 or it
should be used only while reading from DBs?

2) While running the data in distributed form, is my code copied to each
and every executor. As per me, it should be the case since code.zip would
be smaller in size to be copied on each worker node.

3) Also my understanding of shuffling of data is " It is moving one
partition to another partition or moving data(keys) of one partition to
another partition of those keys. It increases memory since before shuffling
it copies the data in the memory and then transfers to another partition".
Is it correct? If not, please correct me.

Please help me to understand these things in layman's terms if my
assumptions are not correct.

Thanks,
Sid

Spark Doubts

Reply via email to