@Mark Hamstra: Thanks, good to know. @Ognen Duzlevski:
2013/12/24 Ognen Duzlevski <og...@nengoiksvelzud.com> > Hello, > > On Mon, Dec 23, 2013 at 3:23 PM, Jie Deng <deng113...@gmail.com> wrote: > >> I am using Java, and Spark has APIs for Java as well. Though there is a >> saying that Java in Spark is slower than Scala shell, well, depends on your >> requirement. >> I am not an expert in Spark, but as far as I know, Spark provide >> different level of storage including memory or disk. And for the disk part, >> HDFS is just a choice. I am not using hdfs myself, but you will loss the >> benefit of hdfs as well. In other words, it's also just based on your >> requirements. >> And MongoDB or S3 are also doable, at least with Java APIs, I suppose. >> >> > I guess that answers the question of whether it is doable. Where/how do I > find out how it is doable? :) > > I am guessing every pipeline is a "custom job" of sorts - hence it is the > developer's job to write the "connectors" to 0mq or dynamodb, for example? > Or....? Is there some kind of a "plug in" system for Spark? > > Thanks! >