[ https://issues.apache.org/jira/browse/S2GRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213974#comment-15213974 ]
Minseok Kim commented on S2GRAPH-15: ------------------------------------ As I wrote http://markmail.org/message/6ykv7uxhreo2bkmm, I would like to break down this issue as follow sub-tasks: 1. Configurable Spark launcher like predictionio 2. Resumable Kafka stream (at least once or exact once) 3. HA and fault tolerance scheduler using such as Marathon or Chronos. > S2Lambda, speed and batch layers of the lambda architecture > ----------------------------------------------------------- > > Key: S2GRAPH-15 > URL: https://issues.apache.org/jira/browse/S2GRAPH-15 > Project: S2Graph > Issue Type: New Feature > Reporter: Minseok Kim > Labels: features > Attachments: s2lambda.001.png > > > h4. Background > As the lambda architecture view, S2Graph provides a great real-time view with > serving layer on HBase. > The input stream came from the REST API is stored to HBase, and it can be > served by the graph query in real-time. > The stream, which is write-ahead log is also written to Kafka, it allows us > to do a lot of things. > There are several works (or sub-projects) using this stream. > * S2Counter - computes the real-time count by the combinations of > properties using Kafka stream directly. > * WalToHdfs - Kafka stream to the incremental view > * S2ML - performs machine learning algorithm using the incremental view. > * … > h4. S2Lambda > Because the above works have been developed, respectively, they use different > Spark versions and duplicated codes. > This causes difficulty of build and code reusability. > S2Lambda should be designed to solve this problem to support a general > framework of speed and batch layers. > IMHO, first, A JSON-formatted job description is designed for compatible with > both speed and batch layer. > then the S2Lambda is implemented by corresponding it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)