Re: Create RDD from output of unix command

2015-07-18 Thread Gylfi
You may want to look into using the pipe command .. 
http://blog.madhukaraphatak.com/pipe-in-spark/
http://spark.apache.org/docs/0.6.0/api/core/spark/rdd/PipedRDD.html




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23895.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Create RDD from output of unix command

2015-07-14 Thread Hafsa Asif
Your question is very interesting. What I suggest is, that copy your output
in some text file. Read text file in your code and apply RDD. Just consider
wordcount example by Spark. I love this example with Java client. Well,
Spark is an analytical engine and it has a slogan to analyze big big data so
from my point of view your assumption is wrong. 

You can also save your data in any respository in some structured form. This
will give you more exposure of Spark behavior.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23830.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Create RDD from output of unix command

2015-07-14 Thread Igor Berman
haven't you thought about spark streaming? there is thread that could help
https://www.mail-archive.com/user%40spark.apache.org/msg30105.html

On 14 July 2015 at 18:20, Hafsa Asif hafsa.a...@matchinguu.com wrote:

 Your question is very interesting. What I suggest is, that copy your output
 in some text file. Read text file in your code and apply RDD. Just consider
 wordcount example by Spark. I love this example with Java client. Well,
 Spark is an analytical engine and it has a slogan to analyze big big data
 so
 from my point of view your assumption is wrong.

 You can also save your data in any respository in some structured form.
 This
 will give you more exposure of Spark behavior.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23830.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Create RDD from output of unix command

2015-07-08 Thread foobar
What's the best practice of creating RDD from some external unix command
output? I assume if the output size is large (say millions of lines),
creating RDD from an array of all lines is not a good idea? Thanks! 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Create RDD from output of unix command

2015-07-08 Thread Richard Marscher
As a distributed data processing engine, Spark should be fine with millions
of lines. It's built with the idea of massive data sets in mind. Do you
have more details on how you anticipate the output of a unix command
interacting with a running Spark application? Do you expect Spark to be
continuously running and somehow observe unix command outputs? Or are you
thinking more along the lines of running a unix command with output and
then taking whatever format that is and running a spark job against it? If
it's the latter, it should be as simple as writing the command output to a
file and then loading the file into an RDD in Spark.

On Wed, Jul 8, 2015 at 2:02 PM, foobar heath...@fb.com wrote:

 What's the best practice of creating RDD from some external unix command
 output? I assume if the output size is large (say millions of lines),
 creating RDD from an array of all lines is not a good idea? Thanks!



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com http://localytics.com/ | Our Blog
http://localytics.com/blog | Twitter http://twitter.com/localytics |
Facebook http://facebook.com/localytics | LinkedIn
http://www.linkedin.com/company/1148792?trk=tyah