spark.local.dir and spark.worker.dir not used
Hi, I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /tmp directory, but still couldnt see anything in worker directory andspark ocal directory. I also tried specifying the local dir and worker dir from the spark code while defining the SparkConf as conf.set(spark.local.dir, /home/padma/sparkdir) but the directories are not used. In general which directories spark would be using for map output files, intermediate writes and persisting rdd to disk ? Thanks, Padma Ch
RE: spark.local.dir and spark.worker.dir not used
Hi, Spark.local.dir is the one used to write map output data and persistent RDD blocks, but the path of file has been hashed, so you cannot directly find the persistent rdd block files, but definitely it will be in this folders on your worker node. Thanks Jerry From: Priya Ch [mailto:learnings.chitt...@gmail.com] Sent: Tuesday, September 23, 2014 6:31 PM To: user@spark.apache.org; d...@spark.apache.org Subject: spark.local.dir and spark.worker.dir not used Hi, I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /tmp directory, but still couldnt see anything in worker directory andspark ocal directory. I also tried specifying the local dir and worker dir from the spark code while defining the SparkConf as conf.set(spark.local.dir, /home/padma/sparkdir) but the directories are not used. In general which directories spark would be using for map output files, intermediate writes and persisting rdd to disk ? Thanks, Padma Ch
Re: spark.local.dir and spark.worker.dir not used
Is it possible to view the persisted RDD blocks ? If I use YARN, RDD blocks would be persisted to hdfs then will i be able to read the hdfs blocks as i could do in hadoop ? On Tue, Sep 23, 2014 at 5:56 PM, Shao, Saisai [via Apache Spark User List] ml-node+s1001560n14885...@n3.nabble.com wrote: Hi, Spark.local.dir is the one used to write map output data and persistent RDD blocks, but the path of file has been hashed, so you cannot directly find the persistent rdd block files, but definitely it will be in this folders on your worker node. Thanks Jerry *From:* Priya Ch [mailto:[hidden email] http://user/SendEmail.jtp?type=nodenode=14885i=0] *Sent:* Tuesday, September 23, 2014 6:31 PM *To:* [hidden email] http://user/SendEmail.jtp?type=nodenode=14885i=1; [hidden email] http://user/SendEmail.jtp?type=nodenode=14885i=2 *Subject:* spark.local.dir and spark.worker.dir not used Hi, I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /tmp directory, but still couldnt see anything in worker directory andspark ocal directory. I also tried specifying the local dir and worker dir from the spark code while defining the SparkConf as conf.set(spark.local.dir, /home/padma/sparkdir) but the directories are not used. In general which directories spark would be using for map output files, intermediate writes and persisting rdd to disk ? Thanks, Padma Ch -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14885.html To start a new topic under Apache Spark User List, email ml-node+s1001560n1...@n3.nabble.com To unsubscribe from Apache Spark User List, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=bGVhcm5pbmdzLmNoaXR0dXJpQGdtYWlsLmNvbXwxfC03NzExMjUwMg== . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14886.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: spark.local.dir and spark.worker.dir not used
I couldnt even see the spark-id folder in the default /tmp directory of local.dir. On Tue, Sep 23, 2014 at 6:01 PM, Priya Ch learnings.chitt...@gmail.com wrote: Is it possible to view the persisted RDD blocks ? If I use YARN, RDD blocks would be persisted to hdfs then will i be able to read the hdfs blocks as i could do in hadoop ? On Tue, Sep 23, 2014 at 5:56 PM, Shao, Saisai [via Apache Spark User List] ml-node+s1001560n14885...@n3.nabble.com wrote: Hi, Spark.local.dir is the one used to write map output data and persistent RDD blocks, but the path of file has been hashed, so you cannot directly find the persistent rdd block files, but definitely it will be in this folders on your worker node. Thanks Jerry *From:* Priya Ch [mailto:[hidden email] http://user/SendEmail.jtp?type=nodenode=14885i=0] *Sent:* Tuesday, September 23, 2014 6:31 PM *To:* [hidden email] http://user/SendEmail.jtp?type=nodenode=14885i=1; [hidden email] http://user/SendEmail.jtp?type=nodenode=14885i=2 *Subject:* spark.local.dir and spark.worker.dir not used Hi, I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /tmp directory, but still couldnt see anything in worker directory andspark ocal directory. I also tried specifying the local dir and worker dir from the spark code while defining the SparkConf as conf.set(spark.local.dir, /home/padma/sparkdir) but the directories are not used. In general which directories spark would be using for map output files, intermediate writes and persisting rdd to disk ? Thanks, Padma Ch -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14885.html To start a new topic under Apache Spark User List, email ml-node+s1001560n1...@n3.nabble.com To unsubscribe from Apache Spark User List, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=bGVhcm5pbmdzLmNoaXR0dXJpQGdtYWlsLmNvbXwxfC03NzExMjUwMg== . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14887.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
RE: spark.local.dir and spark.worker.dir not used
This folder will be created when you start your Spark application under your spark.local.dir, with the name “spark-local-xxx” as prefix. It’s quite strange you don’t see this folder, maybe you miss something. Besides if Spark cannot create this folder on start, persist rdd to disk will be failed. Also I think there’s no way to persist RDD to HDFS, even in YARN, only RDD’s checkpoint can save data on HDFS. Thanks Jerry From: Chitturi Padma [mailto:learnings.chitt...@gmail.com] Sent: Tuesday, September 23, 2014 8:33 PM To: u...@spark.incubator.apache.org Subject: Re: spark.local.dir and spark.worker.dir not used I couldnt even see the spark-id folder in the default /tmp directory of local.dir. On Tue, Sep 23, 2014 at 6:01 PM, Priya Ch [hidden email]/user/SendEmail.jtp?type=nodenode=14887i=0 wrote: Is it possible to view the persisted RDD blocks ? If I use YARN, RDD blocks would be persisted to hdfs then will i be able to read the hdfs blocks as i could do in hadoop ? On Tue, Sep 23, 2014 at 5:56 PM, Shao, Saisai [via Apache Spark User List] [hidden email]/user/SendEmail.jtp?type=nodenode=14887i=1 wrote: Hi, Spark.local.dir is the one used to write map output data and persistent RDD blocks, but the path of file has been hashed, so you cannot directly find the persistent rdd block files, but definitely it will be in this folders on your worker node. Thanks Jerry From: Priya Ch [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=14885i=0] Sent: Tuesday, September 23, 2014 6:31 PM To: [hidden email]http://user/SendEmail.jtp?type=nodenode=14885i=1; [hidden email]http://user/SendEmail.jtp?type=nodenode=14885i=2 Subject: spark.local.dir and spark.worker.dir not used Hi, I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /tmp directory, but still couldnt see anything in worker directory andspark ocal directory. I also tried specifying the local dir and worker dir from the spark code while defining the SparkConf as conf.set(spark.local.dir, /home/padma/sparkdir) but the directories are not used. In general which directories spark would be using for map output files, intermediate writes and persisting rdd to disk ? Thanks, Padma Ch If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14885.html To start a new topic under Apache Spark User List, email [hidden email]/user/SendEmail.jtp?type=nodenode=14887i=2 To unsubscribe from Apache Spark User List, click here. NAMLhttp://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml View this message in context: Re: spark.local.dir and spark.worker.dir not usedhttp://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14887.html Sent from the Apache Spark User List mailing list archivehttp://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.