I am new to Spark source code and looking to see if i can add push-down
support of spark filters to the storage (in my
case an object store). I am willing to consider how this can be generically
done for any store that we might want to
integrate with spark. I am looking to know the areas that I should look into
to provide support for a new data store in
this context. Following below are some of the questions I have to start with:
1. Do we need to create a new RDD class for the new store that we want to
support? From Spark Context, we create an RDD
and the operations on data including the filter are performed through the RDD
methods.
2. When we specify the code for filter task in the RDD.filter() method, how
does it get communicated to the Executor on
the data node? Does the Executor need to compile this code on the fly and
execute it? or how does it work? ( I have
looked at the code for sometime, but not yet got to figuring this out, so i am
looking for some pointers that can help me
come a little up-to-speed in this part of the code)
3. How long the Executor holds the memory? and how does it decide when to
release the memory/cache?
Thank you in advance.
Regards,
Rajendran.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]