if you want to use true Spark Streaming (not the same as Hadoop Streaming/Piping, as Mayur pointed out), you can use the DStream.union() method as described in the following docs:
http://spark.apache.org/docs/0.9.1/streaming-custom-receivers.html http://spark.apache.org/docs/0.9.1/streaming-programming-guide.html our friend, diana carroll, from cloudera recently posted a nice little utility for sending files to a Spark Streaming Receiver to simulate a streaming scenario from disk. here's the link to her post: http://apache-spark-user-list.1001560.n3.nabble.com/streaming-code-to-simulate-a-network-socket-data-source-tc3431.html -chris On Thu, May 1, 2014 at 3:09 AM, Mayur Rustagi <mayur.rust...@gmail.com>wrote: > File as a stream? > I think you are confusing Spark Streaming with buffer reader. Spark > streaming is meant to process batches of data (files, packets, messages) as > they come in, infact utilizing time of packet reception as a way to create > windows etc. > > In your case you are better off reading the file, partitioning it & > operating on each column individually if that makes more sense to you. > > > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Thu, May 1, 2014 at 3:24 PM, Laeeq Ahmed <laeeqsp...@yahoo.com> wrote: > >> Hi all, >> >> Is it possible to read and process multiple streams with spark. I have >> eeg(brain waves) csv file with 23 columns Each column is one stream(wave) >> and each column has one million values. >> >> I know one way to do it is to take transpose of the file and then give it >> to spark and each mapper will get one or more waves out of the 23 waves, >> but then it will be non-streaming problem and I want to read the file as >> stream. Please correct me if I am wrong. >> >> I have to apply simple operations(mean and SD) on each window of a wave. >> >> Regards, >> Laeeq >> >> > >