subject:"Re\: Spark Streaming Checkpointing solutions"

Re: Spark Streaming Checkpointing solutions

2015-07-21 Thread Dean Wampler

TD's Spark Summit talk offers suggestions (
https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-applications-in-production/).
He recommends using HDFS, because you get the triplicate resiliency it
offers, albeit with extra overhead. I believe the driver doesn't need
visibility to the checkpointing directory, e.g., if you're running in
client mode, but all the cluster nodes would need to see it for recovering
a lost stage, where it might get started on a different node. Hence, I
would think NFS could work, if all nodes have the same mount, although
there would be a lot of network overhead. In some situations, a high
performance file system appliance, e.g., NAS, could suffice.

My $0.02,
dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Tue, Jul 21, 2015 at 10:43 AM, Emmanuel fortin.emman...@gmail.com
wrote:

Hi,

I'm working on a Spark Streaming application and I would like to know what
is the best storage to use
for checkpointing.

For testing purposes we're are using NFS between the worker, the master and
the driver program (in client mode),
but we have some issues with the CheckpointWriter (1 thread dedicated). *My
understanding is that NFS is not a good candidate for this usage.*

1. What is the best solution for checkpointing and what are the
alternatives
?

2. Does checkpointings directories need to be shared by the driver
application and the workers too ?

Thanks for your replies

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Checkpointing-solutions-tp23932.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Streaming Checkpointing solutions

2015-07-21 Thread Emmanuel Fortin

Thank you for your reply. I will consider hdfs for the checkpoint storage.

Le mar. 21 juil. 2015 à 17:51, Dean Wampler deanwamp...@gmail.com a
écrit :

My $0.02,
dean

On Tue, Jul 21, 2015 at 10:43 AM, Emmanuel fortin.emman...@gmail.com
wrote:

Hi,

I'm working on a Spark Streaming application and I would like to know what
is the best storage to use
for checkpointing.

For testing purposes we're are using NFS between the worker, the master
and
the driver program (in client mode),
but we have some issues with the CheckpointWriter (1 thread dedicated).
*My
understanding is that NFS is not a good candidate for this usage.*

1. What is the best solution for checkpointing and what are the
alternatives
?

2. Does checkpointings directories need to be shared by the driver
application and the workers too ?

Thanks for your replies

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Streaming Checkpointing solutions

Re: Spark Streaming Checkpointing solutions

2 matches

Site Navigation

Mail list logo

Footer information