Re: Create RDD from output of unix command

2015-07-18 Thread Gylfi
You may want to look into using the pipe command .. http://blog.madhukaraphatak.com/pipe-in-spark/ http://spark.apache.org/docs/0.6.0/api/core/spark/rdd/PipedRDD.html -- View this message in context:

Re: Create RDD from output of unix command

2015-07-14 Thread Hafsa Asif
Your question is very interesting. What I suggest is, that copy your output in some text file. Read text file in your code and apply RDD. Just consider wordcount example by Spark. I love this example with Java client. Well, Spark is an analytical engine and it has a slogan to analyze big big data

Re: Create RDD from output of unix command

2015-07-14 Thread Igor Berman
haven't you thought about spark streaming? there is thread that could help https://www.mail-archive.com/user%40spark.apache.org/msg30105.html On 14 July 2015 at 18:20, Hafsa Asif hafsa.a...@matchinguu.com wrote: Your question is very interesting. What I suggest is, that copy your output in

Re: Create RDD from output of unix command

2015-07-08 Thread Richard Marscher
As a distributed data processing engine, Spark should be fine with millions of lines. It's built with the idea of massive data sets in mind. Do you have more details on how you anticipate the output of a unix command interacting with a running Spark application? Do you expect Spark to be