Re: Multiple Streams with Spark Streaming

2014-05-03 Thread Chris Fregly
if you want to use true Spark Streaming (not the same as Hadoop
Streaming/Piping, as Mayur pointed out), you can use the DStream.union()
method as described in the following docs:

http://spark.apache.org/docs/0.9.1/streaming-custom-receivers.html
http://spark.apache.org/docs/0.9.1/streaming-programming-guide.html

our friend, diana carroll, from cloudera recently posted a nice little
utility for sending files to a Spark Streaming Receiver to simulate a
streaming scenario from disk.

here's the link to her post:
http://apache-spark-user-list.1001560.n3.nabble.com/streaming-code-to-simulate-a-network-socket-data-source-tc3431.html

-chris

On Thu, May 1, 2014 at 3:09 AM, Mayur Rustagi wrote:

> File as a stream?
> I think you are confusing Spark Streaming with buffer reader. Spark
> streaming is meant to process batches of data (files, packets, messages) as
> they come in, infact utilizing time of packet reception as a way to create
> windows etc.
>
> In your case you are better off reading the file, partitioning it &
> operating on each column individually if that makes more sense to you.
>
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi 
>
>
>
> On Thu, May 1, 2014 at 3:24 PM, Laeeq Ahmed  wrote:
>
>> Hi all,
>>
>> Is it possible to read and process multiple streams with spark. I have
>> eeg(brain waves) csv file with 23 columns  Each column is one stream(wave)
>> and each column has one million values.
>>
>> I know one way to do it is to take transpose of the file and then give it
>> to spark and each mapper will get one or more waves out of the 23 waves,
>> but then it will be non-streaming problem and I want to read the file as
>> stream. Please correct me if I am wrong.
>>
>> I have to apply simple operations(mean and SD) on each window of a wave.
>>
>> Regards,
>> Laeeq
>>
>>
>
>


Re: Multiple Streams with Spark Streaming

2014-05-01 Thread Mayur Rustagi
File as a stream?
I think you are confusing Spark Streaming with buffer reader. Spark
streaming is meant to process batches of data (files, packets, messages) as
they come in, infact utilizing time of packet reception as a way to create
windows etc.

In your case you are better off reading the file, partitioning it &
operating on each column individually if that makes more sense to you.



Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Thu, May 1, 2014 at 3:24 PM, Laeeq Ahmed  wrote:

> Hi all,
>
> Is it possible to read and process multiple streams with spark. I have
> eeg(brain waves) csv file with 23 columns  Each column is one stream(wave)
> and each column has one million values.
>
> I know one way to do it is to take transpose of the file and then give it
> to spark and each mapper will get one or more waves out of the 23 waves,
> but then it will be non-streaming problem and I want to read the file as
> stream. Please correct me if I am wrong.
>
> I have to apply simple operations(mean and SD) on each window of a wave.
>
> Regards,
> Laeeq
>
>


Multiple Streams with Spark Streaming

2014-05-01 Thread Laeeq Ahmed
Hi all,

Is it possible to read and process multiple streams with spark. I have 
eeg(brain waves) csv file with 23 columns  Each column is one stream(wave) and 
each column has one million values.

I know one way to do it is to take transpose of the file and then give it to 
spark and each mapper will get one or more waves out of the 23 waves, but then 
it will be non-streaming problem and I want to read the file as stream. Please 
correct me if I am wrong.

I have to apply simple operations(mean and SD) on each window of a wave.

Regards,
Laeeq