Re: Writing to a single file from multiple executors

2015-03-12 Thread Tathagata Das
If you use DStream.saveAsHadoopFiles (or equivalent RDD ops) with the
appropriate output format (for Avro) then each partition of the RDDs will
be written to a different file. However there is probably going to be a
large number of small files and you may have to run a separate compaction
phase to coalesce them into larger files.
On Mar 12, 2015 9:47 AM, Maiti, Samya samya.ma...@philips.com wrote:

  Hi TD,

  I want to append my record to a AVRO file which will be later used for
 querying.

  Having a single file is not mandatory for us but then how can we make
 the executors append the AVRO data to multiple files.

  Thanks,
 Sam
  On Mar 12, 2015, at 4:09 AM, Tathagata Das t...@databricks.com wrote:

  Why do you have to write a single file?



 On Wed, Mar 11, 2015 at 1:00 PM, SamyaMaiti samya.maiti2...@gmail.com
 wrote:

 Hi Experts,

 I have a scenario, where in I want to write to a avro file from a
 streaming
 job that reads data from kafka.

 But the issue is, as there are multiple executors and when all try to
 write
 to a given file I get a concurrent exception.

 I way to mitigate the issue is to repartition  have a single writer task,
 but as my data is huge that is not a feasible option.

 Any suggestions welcomed.

 Regards,
 Sam



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Writing-to-a-single-file-from-multiple-executors-tp22003.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --
 The information contained in this message may be confidential and legally
 protected under applicable law. The message is intended solely for the
 addressee(s). If you are not the intended recipient, you are hereby
 notified that any use, forwarding, dissemination, or reproduction of this
 message is strictly prohibited and may be unlawful. If you are not the
 intended recipient, please contact the sender by return e-mail and destroy
 all copies of the original message.



Re: Writing to a single file from multiple executors

2015-03-12 Thread Maiti, Samya
Hi TD,

I want to append my record to a AVRO file which will be later used for querying.

Having a single file is not mandatory for us but then how can we make the 
executors append the AVRO data to multiple files.

Thanks,
Sam
On Mar 12, 2015, at 4:09 AM, Tathagata Das 
t...@databricks.commailto:t...@databricks.com wrote:

Why do you have to write a single file?



On Wed, Mar 11, 2015 at 1:00 PM, SamyaMaiti 
samya.maiti2...@gmail.commailto:samya.maiti2...@gmail.com wrote:
Hi Experts,

I have a scenario, where in I want to write to a avro file from a streaming
job that reads data from kafka.

But the issue is, as there are multiple executors and when all try to write
to a given file I get a concurrent exception.

I way to mitigate the issue is to repartition  have a single writer task,
but as my data is huge that is not a feasible option.

Any suggestions welcomed.

Regards,
Sam



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-to-a-single-file-from-multiple-executors-tp22003.html
Sent from the Apache Spark User List mailing list archive at 
Nabble.comhttp://Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org





The information contained in this message may be confidential and legally 
protected under applicable law. The message is intended solely for the 
addressee(s). If you are not the intended recipient, you are hereby notified 
that any use, forwarding, dissemination, or reproduction of this message is 
strictly prohibited and may be unlawful. If you are not the intended recipient, 
please contact the sender by return e-mail and destroy all copies of the 
original message.


Re: Writing to a single file from multiple executors

2015-03-11 Thread Tathagata Das
Why do you have to write a single file?



On Wed, Mar 11, 2015 at 1:00 PM, SamyaMaiti samya.maiti2...@gmail.com
wrote:

 Hi Experts,

 I have a scenario, where in I want to write to a avro file from a streaming
 job that reads data from kafka.

 But the issue is, as there are multiple executors and when all try to write
 to a given file I get a concurrent exception.

 I way to mitigate the issue is to repartition  have a single writer task,
 but as my data is huge that is not a feasible option.

 Any suggestions welcomed.

 Regards,
 Sam



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Writing-to-a-single-file-from-multiple-executors-tp22003.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org