For pluggable serialization, I think if there is not JIRA opened I could open one as reccomended by Lewis.
As for low hanging fruit, I am currently not sure. Maybe we could add Gora store manager to Spark to allow read and persist from different NoSQL databases. - Henry On Wed, Jul 9, 2014 at 2:45 AM, Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com> wrote: > 2014-07-09 11:10 GMT+02:00 Henry Saputra <henry.sapu...@gmail.com>: > >> Internally, Apache Spark can use Hadoop input format for its >> distributed data structure (a.k.a RDD). >> So, I guess we could still join the cool kids with Spark via our input >> format implementation. >> > > Cool Henry! I didn't know about we could use Hadoop input formats for > Spark's RDD :) > > >> However, I could think of other improvements that could be useful >> (apology to Lewis if I hijacked his discussion): >> 1. Pluggable serialization mechanism to allow other like Thrift or >> Protocol Buffer instead of just Avro. >> > > Yes, we have been talking about this as well for quite some time. I think > we have two options in here: a) Changing the way we hold objects in memory > to make it not only Avro. b) Keeping the Avro objects for in-memory > processing and serializing using different formats (including > native/datastore format). I think both options should be doable at some > point as well. > > >> 2. Directly work with DAG frameworks like Spark or Flink (incubating) >> to provide client module to directly use Gora via their abstraction, >> i.e RDD for Spark and Dataset for Flink. >> > > Yes! We have to continue integrating with other projects, specially with > popular projects which could give Gora more visibility in the open source > space. > So what do you think is the "low hanging" fruit here Henry?I mean there is > a lot to do, but we should start putting things into our Roadmap so at > least we know what we have to do. > > > Renato M. > > >> - Henry >> >> On Mon, Jul 7, 2014 at 8:19 AM, Lewis John Mcgibbney >> <lewis.mcgibb...@gmail.com> wrote: >> > Hi Folks, >> > Many people know the way that things are going with regards to in-memory >> > computing being 'the' hot topic on the planet right now (outside of the >> > world cup). >> > We have made good strides in Gora to get it to where it is as a top level >> > project. It has also become aparent to me that something we embrace very >> > well is the notion of abstraction and flexability in the way we modules >> are >> > implemented via DataStore API. >> > One thing which is apparent to me though, is that we may be restricting >> the >> > project scope and capablities if we do not embrace new technologies >> within >> > our development model. >> > I am of course talking about embracing the Spark paradigm within Gora and >> > abstracting ourselves away from the traditional MapReduce Input/Output >> > Formats which we currently use. >> > A colleague of mine was at Spark Summit last week in San Francisco and >> > mentioned that there is ongoing work to move towards a connector-based >> > approach for IO so that different datastores can be used within Spark >> SQL. >> > The point I want to pose here is where can we take advantage of this in >> an >> > attempt to further grow the Gora community and improve the project? >> > Thanks in advance for any thoughts folks. >> > Lewis >> > >> > >> > >> > -- >> > *Lewis* >>