Hi Team, I have various doubts as below:
1) Can I apply predicate pushdown filters if I have data stored in S3 or it should be used only while reading from DBs? 2) While running the data in distributed form, is my code copied to each and every executor. As per me, it should be the case since code.zip would be smaller in size to be copied on each worker node. 3) Also my understanding of shuffling of data is " It is moving one partition to another partition or moving data(keys) of one partition to another partition of those keys. It increases memory since before shuffling it copies the data in the memory and then transfers to another partition". Is it correct? If not, please correct me. Please help me to understand these things in layman's terms if my assumptions are not correct. Thanks, Sid