[ 
https://issues.apache.org/jira/browse/KYLIN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI reassigned KYLIN-3679:
-----------------------------------

    Assignee: weibin0516

Awesome! [~codingforfun] please go ahead, pull request to Kylin github is 
welcomed

> Fetch Kafka topic with Spark streaming
> --------------------------------------
>
>                 Key: KYLIN-3679
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3679
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Spark Engine
>            Reporter: Shaofeng SHI
>            Assignee: weibin0516
>            Priority: Major
>
> Now Kylin uses a MR job to fetch Kafka messages in parallel and then persist 
> to HDFS for subsequent processing. If user selects to use Spark engine, we 
> can use Spark streaming API to do this. Spark streaming can read the Kafka 
> message in a given offset range as a RDD, then it would be easy to process;
> https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html 
> With Spark streaming, Kylin can also easily connect with other data source 
> like Kinesis, Flume, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to