Re: Accessing log for lost executors

2016-12-02 Thread Benyi Wang
Usually your executors were killed by YARN due to exceeding the memory. You can change NodeManager's log to see if your application got killed. or use command "yarn logs -applicationId " to download the logs. On Thu, Dec 1, 2016 at 10:30 PM, Nisrina Luthfiyati < nisrina.luthfiy...@gmail.com>

Accessing log for lost executors

2016-12-01 Thread Nisrina Luthfiyati
Hi all, I'm trying to troubleshoot an ExecutorLostFailure issue. In Spark UI I noticed that executors tab only list active executors, is there any way that I can see the log for dead executors so that I can find out why it's dead/lost? I'm using Spark 1.5.2 on YARN 2.7.1. Thanks! Nisrina

Lost executors failed job unable to execute spark examples Triangle Count (Analytics triangles)

2016-02-16 Thread Ovidiu-Cristian MARCU
Hi, I am able to run the Triangle Count example with some smaller graphs but when I am using http://snap.stanford.edu/data/com-Friendster.html I am not able to get the job finished ok. For some reason Spark loses its executors. No matter what

Re: PySpark Lost Executors

2015-11-19 Thread Ross.Cramblit
Thank you Ted and Sandy for getting me pointed in the right direction. From the logs: WARN yarn.YarnAllocator: Container killed by YARN for exceeding memory limits. 25.4 GB of 25.3 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. On Nov 19, 2015, at 12:20 PM,

Re: PySpark Lost Executors

2015-11-19 Thread Ross.Cramblit
Hmm I guess I do not - I get 'application_1445957755572_0176 does not have any log files.’ Where can I enable log aggregation? On Nov 19, 2015, at 11:07 AM, Ted Yu > wrote: Do you have YARN log aggregation enabled ? You can try retrieving log for

Re: PySpark Lost Executors

2015-11-19 Thread Ted Yu
Do you have YARN log aggregation enabled ? You can try retrieving log for the container using the following command: yarn logs -applicationId application_1445957755572_0176 -containerId container_1445957755572_0176_01_03 Cheers On Thu, Nov 19, 2015 at 8:02 AM,

PySpark Lost Executors

2015-11-19 Thread Ross.Cramblit
I am running Spark 1.5.2 on Yarn. My job consists of a number of SparkSQL transforms on a JSON data set that I load into a data frame. The data set is not large (~100GB) and most stages execute without any issues. However, some more complex stages tend to lose executors/nodes regularly. What

Re: PySpark Lost Executors

2015-11-19 Thread Sandy Ryza
Hi Ross, This is most likely occurring because YARN is killing containers for exceeding physical memory limits. You can make this less likely to happen by bumping spark.yarn.executor.memoryOverhead to something higher than 10% of your spark.executor.memory. -Sandy On Thu, Nov 19, 2015 at 8:14

Re: PySpark Lost Executors

2015-11-19 Thread Ted Yu
Here are the parameters related to log aggregation : yarn.log-aggregation-enable true yarn.log-aggregation.retain-seconds 2592000 yarn.nodemanager.log-aggregation.compression-type gz

Re: Lost executors

2014-11-20 Thread Pala M Muthaia
these errors, and continues to show errors about lost executors and launching new executors, and this just continues for a long time. Could this be because the executors are running out of memory? In terms of memory usage, the intermediate data could be large (after the HBase lookup

Lost executors

2014-11-18 Thread Pala M Muthaia
-22] WARN org.apache.spark.storage.BlockManagerMasterActor - *Removing BlockManager BlockManagerId(9186, machine name, 54600, 0) with no recent heart beats: 82313ms exceeds 45000ms* Looking at the logs, the job never recovers from these errors, and continues to show errors about lost executors

Re: Lost executors

2014-11-18 Thread Sandy Ryza
org.apache.spark.storage.BlockManagerMasterActor - *Removing BlockManager BlockManagerId(9186, machine name, 54600, 0) with no recent heart beats: 82313ms exceeds 45000ms* Looking at the logs, the job never recovers from these errors, and continues to show errors about lost executors and launching new executors

Re: Lost executors

2014-11-18 Thread Pala M Muthaia
org.apache.spark.storage.BlockManagerMasterActor - *Removing BlockManager BlockManagerId(9186, machine name, 54600, 0) with no recent heart beats: 82313ms exceeds 45000ms* Looking at the logs, the job never recovers from these errors, and continues to show errors about lost executors

Re: Lost executors

2014-08-13 Thread rpandya
of memory if I tried to cache() the RDD, but I would hope that persist() is implemented so that it would stream to disk without trying to materialize too much data in RAM. Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p12032.html

Re: Lost executors

2014-08-13 Thread Matei Zaharia
() is implemented so that it would stream to disk without trying to materialize too much data in RAM. Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p12032.html Sent from the Apache Spark User List mailing list archive

Re: Lost executors

2014-08-13 Thread Shivaram Venkataraman
that it would stream to disk without trying to materialize too much data in RAM. Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p12032.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Lost executors

2014-08-13 Thread Andrew Or
in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p12032.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: Lost executors

2014-08-13 Thread rpandya
seriously corrupted so I need to rebuild my HDP cluster... Thanks, Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p12050.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Lost executors

2014-08-13 Thread Andrew Or
my HDFS got seriously corrupted so I need to rebuild my HDP cluster... Thanks, Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p12050.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Lost executors

2014-08-08 Thread Avishek Saha
.1001560.n3.nabble.com/Lost-executors-tp11722.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Lost executors

2014-08-08 Thread rpandya
Hi Avishek, I'm running on a manual cluster setup, and all the code is Scala. The load averages don't seem high when I see these failures (about 12 on a 16-core machine). Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p11819

Lost executors

2014-08-07 Thread rpandya
, Ravi Pandya Microsoft Research -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Lost executors

2014-07-23 Thread Eric Friedman
I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc. Cluster resources are available to me via Yarn and I am seeing these errors quite often. ERROR YarnClientClusterScheduler: Lost executor 63 on host: remote Akka client disassociated This is in an interactive shell

Re: Lost executors

2014-07-23 Thread Andrew Or
Hi Eric, Have you checked the executor logs? It is possible they died because of some exception, and the message you see is just a side effect. Andrew 2014-07-23 8:27 GMT-07:00 Eric Friedman eric.d.fried...@gmail.com: I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc.

Re: Lost executors

2014-07-23 Thread Eric Friedman
hi Andrew, Thanks for your note. Yes, I see a stack trace now. It seems to be an issue with python interpreting a function I wish to apply to an RDD. The stack trace is below. The function is a simple factorial: def f(n): if n == 1: return 1 return n * f(n-1) and I'm trying to use it

Re: Lost executors

2014-07-23 Thread Eric Friedman
And... PEBCAK I mistakenly believed I had set PYSPARK_PYTHON to a python 2.7 install, but it was on a python 2.6 install on the remote nodes, hence incompatible with what the master was sending. Have set this to point to the correct version everywhere and it works. Apologies for the false