Greetings! I am looking into the possibility of JRuby support for Spark, and could use some pointers (references?) to orient myself a bit better within the codebase.
JRuby fat jars load just fine in Spark but where things start to get predictably dicey is with object serialization for RDDs getting sent to the workers. Having worked on something similar for Apache Storm (https://github.com/jruby-gradle/redstorm), what we ended up doing was shimming some classes to handy Ruby object/class serialization properly. I'm expecting to do something similar in Spark but I'm not entirely sure which interfaces/classes describe the serialization of RDDs. I'm figuring that I'll need to implement a Ruby equivalent of the org.apache.spark.api.java.function namespaces, but am not entirely where the pieces come together to turn those into serialized objects. Appreciate any direction you all might be able to share, in the meantime, I've got my miner's cap on and am presently digging through core/ :) Cheers -- GitHub: https://github.com/rtyler GPG Key ID: 0F2298A980EE31ACCA0A7825E5C92681BEF6CEA2
signature.asc
Description: OpenPGP digital signature