[ 
https://issues.apache.org/jira/browse/SINGA-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346050#comment-15346050
 ] 

ASF subversion and git services commented on SINGA-201:
-------------------------------------------------------

Commit 1ca8c638b132009e213fda8e02e77cc2d09fb824 in incubator-singa's branch 
refs/heads/master from [~ug93tad]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=1ca8c63 ]

SINGA-201 Error when running Mesos

A bug was reported (https://issues.apache.org/jira/browse/SINGA-201) when
launching SINGA on Mesos in fully distributed mode.

The main cause was determined to be of ZeroMQ binding to the localhost. In fully
distributed mode, SINGA on each node should be passed a `-host` flag specifying
the public IP address of the local host.

The Mesos scheduler is modified accordingly:

1. When a Mesos slave starts connecting to the master, it passes `--hostname` 
flag specifying its public IP address

2. The scheduler now sends to each executor command of the form:

          `singa -conf ./job.conf -singa_conf ./singa.conf -singa_job XX -host 
XX`


> Error while running singa on mesos in fully distributed mode
> ------------------------------------------------------------
>
>                 Key: SINGA-201
>                 URL: https://issues.apache.org/jira/browse/SINGA-201
>             Project: Singa
>          Issue Type: Bug
>         Environment: Linux 
>            Reporter: Venkata Satish Katta
>            Assignee: Anh Dinh
>            Priority: Blocker
>              Labels: mesos, singa
>
> Log file created at: 2016/06/17 10:00:43
> Running on machine: ip-172-31-52-12
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I0617 10:00:43.202184  2751 zk_service.cc:215] GLOBAL_WATCHER connected to 
> zookeeper successfully!
> W0617 10:00:43.203711  2742 zk_service.cc:109] zookeeper node /singa already 
> exists
> W0617 10:00:43.205016  2742 zk_service.cc:109] zookeeper node /singa/app 
> already exists
> W0617 10:00:43.206166  2742 zk_service.cc:109] zookeeper node 
> /singa/app/job-0000000017 already exists
> W0617 10:00:43.207147  2742 zk_service.cc:109] zookeeper node 
> /singa/app/job-0000000017/group already exists
> W0617 10:00:43.208237  2742 zk_service.cc:109] zookeeper node 
> /singa/app/job-0000000017/proc already exists
> W0617 10:00:43.209300  2742 zk_service.cc:109] zookeeper node 
> /singa/app/job-0000000017/proc-lock already exists
> F0617 10:00:43.862246  2742 socket.cc:98] Check failed: port != -1 (-1 vs. 
> -1) tcp://localhost:*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to