RE: Handling worker batch processing during driver shutdown

2015-03-13 Thread Jose Fernandez
and fails silently. I really appreciate your help, but it looks like I’m back to the drawing board on this one. From: Tathagata Das [mailto:t...@databricks.com] Sent: Thursday, March 12, 2015 7:53 PM To: Jose Fernandez Cc: user@spark.apache.org Subject: Re: Handling worker batch processing during

RE: Handling worker batch processing during driver shutdown

2015-03-12 Thread Jose Fernandez
function calls and even used try/catch around it. I’m running in yarn-cluster mode using Spark 1.2 on CDH 5.3. I stop the application with yarn application -kill appID. From: Tathagata Das [mailto:t...@databricks.com] Sent: Thursday, March 12, 2015 1:29 PM To: Jose Fernandez Cc: user@spark.apache.org

RE: Spark Streaming output cannot be used as input?

2015-02-18 Thread Jose Fernandez
, February 18, 2015 1:53 AM To: Emre Sevinc Cc: Jose Fernandez; user@spark.apache.org Subject: Re: Spark Streaming output cannot be used as input? To clarify, sometimes in the world of Hadoop people freely refer to an output 'file' when it's really a directory containing 'part-*' files which are pieces

RE: NotSerializableException: org.apache.http.impl.client.DefaultHttpClient when trying to send documents to Solr

2015-02-18 Thread Jose Fernandez
You need to instantiate the server in the forEachPartition block or Spark will attempt to serialize it to the task. See the design patterns section in the Spark Streaming guide. Jose Fernandez | Principal Software Developer jfernan...@sdl.com | The information transmitted, including

Spark Streaming output cannot be used as input?

2015-02-17 Thread Jose Fernandez
Hello folks, Our intended use case is: - Spark Streaming app #1 reads from RabbitMQ and output to HDFS - Spark Streaming app #2 reads #1's output and stores the data into Elasticsearch The idea behind this architecture is that if Elasticsearch is down due to an upgrade or