[ https://issues.apache.org/jira/browse/SPARK-22382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
DUC LIEM NGUYEN updated SPARK-22382: ------------------------------------ Description: I've installed a system as followed: --mesos master private IP of 10.x.x.2 , Public 35.x.x.6 --mesos slave private IP of 192.x.x.10, Public 111.x.x.2 Now the master assigned the task successfully to the slave, however, the task failed. The error message is as followed: {color:#d04437}{{Exception in thread "main" 17/10/11 22:38:01 ERROR RpcOutboxMessage: Ask timeout before connecting successfully Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout }}{color} When I look at the environment, the spark.driver.host points to the private IP address of the master 10.x.x.2 instead of it public IP address 35.x.x.6. I look at the Wireshark capture and indeed, there was failed TCP package to the master private IP address. Now if I set spark.driver.bindAddress from the master to its local IP address, spark.driver.host from the master to its public IP address, I get the following message. {{ERROR TaskSchedulerImpl: Lost executor 1 on myhostname.singnet.com.sg: Unable to create executor due to Cannot assign requested address.}} >From my understanding, the spark.driver.bindAddress set it for both master and >slave, hence the slave get the said error. Now I'm really wondering how do I >proper setup spark to work on this clustering over public IP? was: I've installed a system as followed: --mesos master private IP of 10.x.x.2 , Public 35.x.x.6 --mesos slave private IP of 192.x.x.10, Public 111.x.x.2 Now the master assigned the task successfully to the slave, however, the task failed. The error message is as followed: {{Exception in thread "main" 17/10/11 22:38:01 ERROR RpcOutboxMessage: Ask timeout before connecting successfully Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout}} When I look at the environment, the spark.driver.host points to the private IP address of the master 10.x.x.2 instead of it public IP address 35.x.x.6. I look at the Wireshark capture and indeed, there was failed TCP package to the master private IP address. Now if I set spark.driver.bindAddress from the master to its local IP address, spark.driver.host from the master to its public IP address, I get the following message. {{ERROR TaskSchedulerImpl: Lost executor 1 on myhostname.singnet.com.sg: Unable to create executor due to Cannot assign requested address.}} >From my understanding, the spark.driver.bindAddress set it for both master and >slave, hence the slave get the said error. Now I'm really wondering how do I >proper setup spark to work on this clustering over public IP? > Spark on mesos: doesn't support public IP setup for agent and master. > ---------------------------------------------------------------------- > > Key: SPARK-22382 > URL: https://issues.apache.org/jira/browse/SPARK-22382 > Project: Spark > Issue Type: Question > Components: Mesos > Affects Versions: 2.1.0 > Reporter: DUC LIEM NGUYEN > > I've installed a system as followed: > --mesos master private IP of 10.x.x.2 , Public 35.x.x.6 > --mesos slave private IP of 192.x.x.10, Public 111.x.x.2 > Now the master assigned the task successfully to the slave, however, the task > failed. The error message is as followed: > {color:#d04437}{{Exception in thread "main" 17/10/11 22:38:01 ERROR > RpcOutboxMessage: Ask timeout before connecting successfully > Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply > in 120 seconds. This timeout is controlled by spark.rpc.askTimeout > }}{color} > When I look at the environment, the spark.driver.host points to the private > IP address of the master 10.x.x.2 instead of it public IP address 35.x.x.6. I > look at the Wireshark capture and indeed, there was failed TCP package to the > master private IP address. > Now if I set spark.driver.bindAddress from the master to its local IP > address, spark.driver.host from the master to its public IP address, I get > the following message. > {{ERROR TaskSchedulerImpl: Lost executor 1 on myhostname.singnet.com.sg: > Unable to create executor due to Cannot assign requested address.}} > From my understanding, the spark.driver.bindAddress set it for both master > and slave, hence the slave get the said error. Now I'm really wondering how > do I proper setup spark to work on this clustering over public IP? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org