Re: Spark Streaming output cannot be used as input?

2015-02-18 Thread Emre Sevinc
Hello Jose, We've hit the same issue a couple of months ago. It is possible to write directly to files instead of creating directories, but it is not straightforward, and I haven't seen any clear demonstration in books, tutorials, etc. We do something like: SparkConf sparkConf = new

Re: Spark Streaming output cannot be used as input?

2015-02-18 Thread Sean Owen
To clarify, sometimes in the world of Hadoop people freely refer to an output 'file' when it's really a directory containing 'part-*' files which are pieces of the file. It's imprecise but that's the meaning. I think the scaladoc may be referring to 'the path to the file, which includes this

Re: Spark Streaming output cannot be used as input?

2015-02-18 Thread Tim Smith
*To:* Emre Sevinc *Cc:* Jose Fernandez; user@spark.apache.org *Subject:* Re: Spark Streaming output cannot be used as input? To clarify, sometimes in the world of Hadoop people freely refer to an output 'file' when it's really a directory containing 'part-*' files which are pieces of the file. It's

RE: Spark Streaming output cannot be used as input?

2015-02-18 Thread Jose Fernandez
, February 18, 2015 1:53 AM To: Emre Sevinc Cc: Jose Fernandez; user@spark.apache.org Subject: Re: Spark Streaming output cannot be used as input? To clarify, sometimes in the world of Hadoop people freely refer to an output 'file' when it's really a directory containing 'part-*' files which are pieces

Spark Streaming output cannot be used as input?

2015-02-17 Thread Jose Fernandez
Hello folks, Our intended use case is: - Spark Streaming app #1 reads from RabbitMQ and output to HDFS - Spark Streaming app #2 reads #1's output and stores the data into Elasticsearch The idea behind this architecture is that if Elasticsearch is down due to an upgrade or