I've been doing some looking at EclairJS (Spark + Javascript) which takes a really interesting approach. The driver program is run in node and the workers are run in nashorn. I was wondering if anyone has given much though to optionally exposing an interface for PySpark in a similar fashion. For some UDFs and UDAFs we could keep the data entirely in the JVM, and still go back to our old PipelinedRDD based interface for operations which require native libraries or otherwise aren't supported in Jython. Have I had too much coffee and this is actually a bad idea or is this something people think would be worth investigating some?
-- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau