[ https://issues.apache.org/jira/browse/MRQL-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292588#comment-14292588 ]
Hudson commented on MRQL-63: ---------------------------- SUCCESS: Integrated in mrql-master-snapshot #11 (See [https://builds.apache.org/job/mrql-master-snapshot/11/]) MRQL-63: Add support for MRQL streaming in spark streaming mode (fegaras: https://git-wip-us.apache.org/repos/asf?p=incubator-mrql.git&a=commit&h=e15285c2c97c7b14c25b660b85dee9ad52abf429) * conf/mrql-env.sh * core/src/main/java/org/apache/mrql/Streaming.gen * flink/src/main/java/org/apache/mrql/FlinkStreaming.gen * spark/pom.xml * core/src/main/java/org/apache/mrql/Evaluator.java * core/src/main/java/org/apache/mrql/Config.java * flink/pom.xml * queries/streaming.mrql * core/src/main/java/org/apache/mrql/TopLevel.gen * pom.xml * core/src/main/java/org/apache/mrql/PlanGeneration.gen * core/src/main/java/org/apache/mrql/TypeInference.gen * spark/src/main/java/org/apache/mrql/SparkFileInputStream.java * spark/src/main/java/org/apache/mrql/SparkStreaming.gen * flink/src/main/java/org/apache/mrql/FlinkEvaluator.gen * spark/src/main/java/org/apache/mrql/SparkEvaluator.gen * core/src/main/java/org/apache/mrql/DataSetFunction.java * core/src/main/java/org/apache/mrql/Interpreter.gen > Add support for MRQL streaming in spark streaming mode > ------------------------------------------------------ > > Key: MRQL-63 > URL: https://issues.apache.org/jira/browse/MRQL-63 > Project: MRQL > Issue Type: Improvement > Components: Streaming > Affects Versions: 0.9.4 > Reporter: Leonidas Fegaras > Assignee: Leonidas Fegaras > Attachments: MRQL-63.patch > > > This patch introduces a major extension to MRQL, called MRQL streaming. > We can now run continuous MRQL queries on streams of data. > Currently, it works on Spark Streaming only but we may add support for Flink > Streaming and/or Storm in the future. > It has been tested in Spark local mode and in Spark distributed mode on a > Yarn cluster. > MRQL now supports window-based streaming based on a sliding window during a > certain time interval. To support MRQL streaming, you need to add the > parameter "-stream t" to the mrql command, where t is the time interval in > milliseconds. Then MRQL will processes the new batch of data in the input > streams every t milliseconds. > A stream source in MRQL takes the form stream(...), which has the same > parameters as the source(...) form. For example: > {code:SQL} > select (k,avg(p.Y)) > from p in stream(binary,"tmp/points.bin") > group by k: p.X; > {code} > This query process all sequence files in the directory tmp/points.bin and > then checks this directory every t milliseconds for new files. When a new > file is inserted in the directory (or if the modification time of an existing > file changes), it processes the new files. One may work on multiple files and > the query may contain both stream and regular data sources. If there is at > least one stream source, the query becomes continuous (never stops). One may > dump the output stream to binary or CVS files using the existing MRQL syntax: > {code:SQL} > store "tmp/out" from e > {code} > This dumps the output of the continuous query e into tmp/out/f1, tmp/out/f2, > ... etc. > Example for testing: > First create data: > {quote} > mrql.spark -local queries/points.mrql 100 > {quote} > Then run the continuous query: > {quote} > mrql.spark -local -stream 1000 queries/streaming.mrql > {quote} > On a separate terminal, you can type: > {quote} > touch tmp/points.bin/part-00000 > {quote} > to process a new batch of data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)