Is there a way to create multiple streams in spark streaming?
Hi all,I wonder if there is a way to create some child streaming while using spark streaming?For example, I create a netcat main stream, read data from a socket, then create 3 different child streams on the main stream,in stream1, we do fun1 on the input data then print result to screen;in stream2, we do fun2 on the input data then print result to screen;in stream3, we do fun3 on the input data then print result to screen.Is any one some hints?
Re: Is there a way to create multiple streams in spark streaming?
You can create as many functional derivates of your original stream by using transformations. That's exactly the model that Spark Streaming offers. In your example, that would become something like: val stream = ssc.socketTextStream("localhost", ) val stream1 = stream.map(fun1) val stream2 = stream.map(fun2) // you could also: val stream3 = stream2.filter(predicate).flatMap(ffun3) // Then you need some action to materialize the streams: stream2.print stream2.saveAsTextFiles() -kr, Gerard. On Tue, Oct 20, 2015 at 12:20 PM, LinQiliwrote: > Hi all, > I wonder if there is a way to create some child streaming while using > spark streaming? > For example, I create a netcat main stream, read data from a socket, then > create 3 different child streams on the main stream, > in stream1, we do fun1 on the input data then print result to screen; > in stream2, we do fun2 on the input data then print result to screen; > in stream3, we do fun3 on the input data then print result to screen. > Is any one some hints? >
Re: Multiple Streams with Spark Streaming
if you want to use true Spark Streaming (not the same as Hadoop Streaming/Piping, as Mayur pointed out), you can use the DStream.union() method as described in the following docs: http://spark.apache.org/docs/0.9.1/streaming-custom-receivers.html http://spark.apache.org/docs/0.9.1/streaming-programming-guide.html our friend, diana carroll, from cloudera recently posted a nice little utility for sending files to a Spark Streaming Receiver to simulate a streaming scenario from disk. here's the link to her post: http://apache-spark-user-list.1001560.n3.nabble.com/streaming-code-to-simulate-a-network-socket-data-source-tc3431.html -chris On Thu, May 1, 2014 at 3:09 AM, Mayur Rustagi mayur.rust...@gmail.comwrote: File as a stream? I think you are confusing Spark Streaming with buffer reader. Spark streaming is meant to process batches of data (files, packets, messages) as they come in, infact utilizing time of packet reception as a way to create windows etc. In your case you are better off reading the file, partitioning it operating on each column individually if that makes more sense to you. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Thu, May 1, 2014 at 3:24 PM, Laeeq Ahmed laeeqsp...@yahoo.com wrote: Hi all, Is it possible to read and process multiple streams with spark. I have eeg(brain waves) csv file with 23 columns Each column is one stream(wave) and each column has one million values. I know one way to do it is to take transpose of the file and then give it to spark and each mapper will get one or more waves out of the 23 waves, but then it will be non-streaming problem and I want to read the file as stream. Please correct me if I am wrong. I have to apply simple operations(mean and SD) on each window of a wave. Regards, Laeeq
Re: Multiple Streams with Spark Streaming
File as a stream? I think you are confusing Spark Streaming with buffer reader. Spark streaming is meant to process batches of data (files, packets, messages) as they come in, infact utilizing time of packet reception as a way to create windows etc. In your case you are better off reading the file, partitioning it operating on each column individually if that makes more sense to you. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Thu, May 1, 2014 at 3:24 PM, Laeeq Ahmed laeeqsp...@yahoo.com wrote: Hi all, Is it possible to read and process multiple streams with spark. I have eeg(brain waves) csv file with 23 columns Each column is one stream(wave) and each column has one million values. I know one way to do it is to take transpose of the file and then give it to spark and each mapper will get one or more waves out of the 23 waves, but then it will be non-streaming problem and I want to read the file as stream. Please correct me if I am wrong. I have to apply simple operations(mean and SD) on each window of a wave. Regards, Laeeq