[ https://issues.apache.org/jira/browse/KYLIN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shaofeng SHI reassigned KYLIN-3679: ----------------------------------- Assignee: weibin0516 Awesome! [~codingforfun] please go ahead, pull request to Kylin github is welcomed > Fetch Kafka topic with Spark streaming > -------------------------------------- > > Key: KYLIN-3679 > URL: https://issues.apache.org/jira/browse/KYLIN-3679 > Project: Kylin > Issue Type: New Feature > Components: Spark Engine > Reporter: Shaofeng SHI > Assignee: weibin0516 > Priority: Major > > Now Kylin uses a MR job to fetch Kafka messages in parallel and then persist > to HDFS for subsequent processing. If user selects to use Spark engine, we > can use Spark streaming API to do this. Spark streaming can read the Kafka > message in a given offset range as a RDD, then it would be easy to process; > https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html > With Spark streaming, Kylin can also easily connect with other data source > like Kinesis, Flume, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)