if you want to use true Spark Streaming (not the same as Hadoop
Streaming/Piping, as Mayur pointed out), you can use the DStream.union()
method as described in the following docs:

http://spark.apache.org/docs/0.9.1/streaming-custom-receivers.html
http://spark.apache.org/docs/0.9.1/streaming-programming-guide.html

our friend, diana carroll, from cloudera recently posted a nice little
utility for sending files to a Spark Streaming Receiver to simulate a
streaming scenario from disk.

here's the link to her post:
http://apache-spark-user-list.1001560.n3.nabble.com/streaming-code-to-simulate-a-network-socket-data-source-tc3431.html

-chris

On Thu, May 1, 2014 at 3:09 AM, Mayur Rustagi <mayur.rust...@gmail.com>wrote:

> File as a stream?
> I think you are confusing Spark Streaming with buffer reader. Spark
> streaming is meant to process batches of data (files, packets, messages) as
> they come in, infact utilizing time of packet reception as a way to create
> windows etc.
>
> In your case you are better off reading the file, partitioning it &
> operating on each column individually if that makes more sense to you.
>
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Thu, May 1, 2014 at 3:24 PM, Laeeq Ahmed <laeeqsp...@yahoo.com> wrote:
>
>> Hi all,
>>
>> Is it possible to read and process multiple streams with spark. I have
>> eeg(brain waves) csv file with 23 columns  Each column is one stream(wave)
>> and each column has one million values.
>>
>> I know one way to do it is to take transpose of the file and then give it
>> to spark and each mapper will get one or more waves out of the 23 waves,
>> but then it will be non-streaming problem and I want to read the file as
>> stream. Please correct me if I am wrong.
>>
>> I have to apply simple operations(mean and SD) on each window of a wave.
>>
>> Regards,
>> Laeeq
>>
>>
>
>

Reply via email to