Leonidas Fegaras created MRQL-66:
------------------------------------

             Summary: Add support for MRQL streaming in Flink streaming mode
                 Key: MRQL-66
                 URL: https://issues.apache.org/jira/browse/MRQL-66
             Project: MRQL
          Issue Type: New Feature
          Components: Run-Time/Flink, Streaming
    Affects Versions: 0.9.6
            Reporter: Leonidas Fegaras
            Priority: Critical


The new extension, MRQL Streaming, works fine with Spark Streaming (see 
MRQL-63) but it would be nice if we make it work with Flink Streaming too. It 
was easy to make it work with Spark Streaming: Data in one sliding window in a 
Spark's DStream is viewed as an RDD. So a DStream can be viewed as a continuous 
sequence of RDDs. A DStream has a method foreachRDD that applies a function to 
each RDD in the stream. So to implement MRQL Streaming, we just had to use the 
MRQL Spark evaluator (a function from RDD to RDD) as an argument to foreachRDD. 
For Flink Streaming, the implementation will be more complicated. A Flink 
Streaming DataStream doesn't provide a hook to a DataSet object. I am guessing 
that this is because Flink Streaming is far more general than Spark Streaming 
(it's not just sliding windows) and because Flink Streaming needs to do special 
optimizations. So we need to copy the FlinkEvaluator class into a new class 
FlinkStreaming and change all methods to be on DataStream instead of DataSet. 
Many DataSet methods have an equivalent in DataStream but some are missing. I 
have already provided the input formats for streaming (method 
FlinkStreaming.stream_source) but we need to write a stream evaluator for MRQL 
plans.
Any volunteer?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to