subject:"All master are unreponsive issue"

Re: All master are unreponsive issue

2015-07-05 Thread Aaron Davidson

Are you seeing this after the app has already been running for some time,
or just at the beginning? Generally, registration should only occur once
initially, and a timeout would be due to the master not being accessible.
Try telneting to the master IP/port from the machine on which the driver
will run.

Re: All master are unreponsive issue

2015-07-04 Thread Ted Yu

Currently the number of retries is hardcoded.

You may want to open a JIRA which makes the retry count configurable.

Cheers

On Thu, Jul 2, 2015 at 8:35 PM, luohui20...@sina.com wrote:

Hi there，

i check the source code and found that in
org.apache.spark.deploy.client.AppClient, there is a parameter tells(line
52):

val REGISTRATION_TIMEOUT = 20.seconds

val REGISTRATION_RETRIES = 3

As I know If I wanna increase the retry times, must I modify this
value,rebuild the entire Spark project and then redeply spark cluster with
my modified version?

Or is there a better way to solve this issue?

Thanks.

Thanksamp;Best regards!
San.Luo

- 原始邮件 -
发件人：luohui20...@sina.com
收件人：user user@spark.apache.org
主题：All master are unreponsive issue
日期：2015年07月02日 17点31分

Hi there:

I got an problem that Application has been killed.Reason:All
masters are unresponsive!Giving up. I check the network I/O and found
sometimes it is really high when running my app. Pls refer to the attached
pic for more info.

I also checked
http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html,
and set SPARK_LOCAL_IP in every node's spark-env.sh of my spark cluster.
However it does not benifit in solving this problem.

I am not sure if this parameter is correctly set,my setting is like this:

On node1:

export SPARK_LOCAL_IP={node1's IP}

On node2:

export SPARK_LOCAL_IP={node2's IP}

BTW,I guess that the akka will retry 3 times when communicate between
master and slave, it is possible to increase the akka retries?

And except expand the network bandwidth, is there another way to solve
this problem?

thanks for any coming ideas.

Thanksamp;Best regards!
San.Luo

All master are unreponsive issue

2015-07-02 Thread luohui20001

Hi there:  I got an problem that Application has been killed.Reason:All 
masters are unresponsive!Giving up. I check the network I/O and found 
sometimes it is really high when running my app. Pls refer to the attached pic 
for more info.I also checked 
http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html,
 and set SPARK_LOCAL_IP in every node's spark-env.sh of my spark cluster. 
However it does not benifit in solving this problem.I am not sure if this 
parameter is correctly set,my setting is like this:On node1:export 
SPARK_LOCAL_IP={node1's IP}On node2:export SPARK_LOCAL_IP={node2's IP}..

BTW,I guess that the akka will retry 3 times when communicate between master 
and slave, it is possible to increase the akka retries?

And except expand the network bandwidth, is there another way to solve this 
problem?
thanks for any coming ideas.



 

Thanksamp;Best regards!
San.Luo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

回复：All master are unreponsive issue

2015-07-02 Thread luohui20001

Hi there，   i check the source code and found that in 
org.apache.spark.deploy.client.AppClient, there is a parameter tells(line 52):  
val REGISTRATION_TIMEOUT = 20.seconds
  val REGISTRATION_RETRIES = 3As I know If I wanna increase the retry times, 
must I modify this value,rebuild the entire Spark project and then redeply 
spark cluster with my modified version?Or is there a better way to solve this 
issue?Thanks.





 

Thanksamp;Best regards!
San.Luo

- 原始邮件 -
发件人：luohui20...@sina.com
收件人：user user@spark.apache.org
主题：All master are unreponsive issue
日期：2015年07月02日 17点31分

Hi there:  I got an problem that Application has been killed.Reason:All 
masters are unresponsive!Giving up. I check the network I/O and found 
sometimes it is really high when running my app. Pls refer to the attached pic 
for more info.I also checked 
http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html,
 and set SPARK_LOCAL_IP in every node's spark-env.sh of my spark cluster. 
However it does not benifit in solving this problem.I am not sure if this 
parameter is correctly set,my setting is like this:On node1:export 
SPARK_LOCAL_IP={node1's IP}On node2:export SPARK_LOCAL_IP={node2's IP}..

BTW,I guess that the akka will retry 3 times when communicate between master 
and slave, it is possible to increase the akka retries?

And except expand the network bandwidth, is there another way to solve this 
problem?
thanks for any coming ideas.



 

Thanksamp;Best regards!
San.Luo

Re: All master are unreponsive issue

Re: All master are unreponsive issue

All master are unreponsive issue

回复：All master are unreponsive issue

4 matches

Site Navigation

Mail list logo

Footer information