Is there a way to create multiple streams in spark streaming?

2015-10-20 Thread LinQili
Hi all,I wonder if there is a way to create some child streaming while using 
spark streaming?For example, I create a netcat main stream, read data from a 
socket, then create 3 different child streams on the main stream,in stream1, we 
do fun1 on the input data then print result to screen;in stream2, we do fun2 on 
the input data then print result to screen;in stream3, we do fun3 on the input 
data then print result to screen.Is any one some hints? 
 

Re: Is there a way to create multiple streams in spark streaming?

2015-10-20 Thread Gerard Maas
You can create as many functional derivates of your original stream by
using transformations. That's exactly the model that Spark Streaming offers.

In your example, that would become something like:

val stream = ssc.socketTextStream("localhost", )
val stream1 = stream.map(fun1)
val stream2 = stream.map(fun2)
// you could also:
val stream3 = stream2.filter(predicate).flatMap(ffun3)

// Then you need some action to materialize the streams:
stream2.print
stream2.saveAsTextFiles()

-kr, Gerard.


On Tue, Oct 20, 2015 at 12:20 PM, LinQili  wrote:

> Hi all,
> I wonder if there is a way to create some child streaming while using
> spark streaming?
> For example, I create a netcat main stream, read data from a socket, then
> create 3 different child streams on the main stream,
> in stream1, we do fun1 on the input data then print result to screen;
> in stream2, we do fun2 on the input data then print result to screen;
> in stream3, we do fun3 on the input data then print result to screen.
> Is any one some hints?
>


Re: Multiple Streams with Spark Streaming

2014-05-03 Thread Chris Fregly
if you want to use true Spark Streaming (not the same as Hadoop
Streaming/Piping, as Mayur pointed out), you can use the DStream.union()
method as described in the following docs:

http://spark.apache.org/docs/0.9.1/streaming-custom-receivers.html
http://spark.apache.org/docs/0.9.1/streaming-programming-guide.html

our friend, diana carroll, from cloudera recently posted a nice little
utility for sending files to a Spark Streaming Receiver to simulate a
streaming scenario from disk.

here's the link to her post:
http://apache-spark-user-list.1001560.n3.nabble.com/streaming-code-to-simulate-a-network-socket-data-source-tc3431.html

-chris

On Thu, May 1, 2014 at 3:09 AM, Mayur Rustagi mayur.rust...@gmail.comwrote:

 File as a stream?
 I think you are confusing Spark Streaming with buffer reader. Spark
 streaming is meant to process batches of data (files, packets, messages) as
 they come in, infact utilizing time of packet reception as a way to create
 windows etc.

 In your case you are better off reading the file, partitioning it 
 operating on each column individually if that makes more sense to you.



 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
 @mayur_rustagi https://twitter.com/mayur_rustagi



 On Thu, May 1, 2014 at 3:24 PM, Laeeq Ahmed laeeqsp...@yahoo.com wrote:

 Hi all,

 Is it possible to read and process multiple streams with spark. I have
 eeg(brain waves) csv file with 23 columns  Each column is one stream(wave)
 and each column has one million values.

 I know one way to do it is to take transpose of the file and then give it
 to spark and each mapper will get one or more waves out of the 23 waves,
 but then it will be non-streaming problem and I want to read the file as
 stream. Please correct me if I am wrong.

 I have to apply simple operations(mean and SD) on each window of a wave.

 Regards,
 Laeeq






Re: Multiple Streams with Spark Streaming

2014-05-01 Thread Mayur Rustagi
File as a stream?
I think you are confusing Spark Streaming with buffer reader. Spark
streaming is meant to process batches of data (files, packets, messages) as
they come in, infact utilizing time of packet reception as a way to create
windows etc.

In your case you are better off reading the file, partitioning it 
operating on each column individually if that makes more sense to you.



Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Thu, May 1, 2014 at 3:24 PM, Laeeq Ahmed laeeqsp...@yahoo.com wrote:

 Hi all,

 Is it possible to read and process multiple streams with spark. I have
 eeg(brain waves) csv file with 23 columns  Each column is one stream(wave)
 and each column has one million values.

 I know one way to do it is to take transpose of the file and then give it
 to spark and each mapper will get one or more waves out of the 23 waves,
 but then it will be non-streaming problem and I want to read the file as
 stream. Please correct me if I am wrong.

 I have to apply simple operations(mean and SD) on each window of a wave.

 Regards,
 Laeeq