Re: Error while running singa on mesos

Anh Dinh Tue, 21 Jun 2016 23:53:57 -0700

We had problems with Docker version >= 1.9 (yours is even newer), as noted
in https://singa.incubator.apache.org/docs/docker.html#launch_pseudo


Basically new versions of Docker changed the DNS resolution mechanism: the
Docker daemon no longer updates the /etc/hosts file of existing containers
when new one is launched.

One suggestion is to downgrade Docker to 1.8:

sudo apt-get install docker-engine=1.8.3-0~trusty

Another option is to enter IP addresses manually into /etc/hosts files. But
we have not tried it with Weaver, so there's high chance that it won't work
with Weaver.


On 22 June 2016 at 14:39, Venkat Katta <ska...@adobe.com> wrote:

> docker version : 1.11.2
>
> regards,
> venkat satish katta
> ------------------------------
> *From:* Anh Dinh <dinh...@comp.nus.edu.sg>
> *Sent:* Wednesday, June 22, 2016 12:04:56 PM
> *To:* Wang Wei; Venkat Katta
>
> *Cc:* dev@singa.incubator.apache.org
> *Subject:* Re: Error while running singa on mesos
>
> what version of Docker are you running?
>
> Anh.
>
>
> On 22 June 2016 at 14:26, Wang Wei <wang...@apache.org> wrote:
>
>>
>> ---------- Forwarded message ----------
>> From: Venkat Katta <ska...@adobe.com>
>> Date: Wed, Jun 22, 2016 at 1:31 PM
>> Subject: Re: Error while running singa on mesos
>> To: Wang Wei <wang...@apache.org>
>>
>>
>> It works fine if I replace the node0 and node2 with their IP address. I
>> am using weave for transparent communication between the containers.  In
>> singa.conf to connect to zookeeper i used node0 but not the ipaddress of
>> node0 it is able to connect why can't singa resolve the hostname. And while
>> running singa with mesos it is using localhost rather ip address node1 and
>> node2, also we are not giving any arguement while running the singa
>>  regarding ip address of the slaves.
>>
>>
>> F0622 05:18:28.932391  1513 socket.cc:98] Check failed: port != -1 (-1
>> vs. -1) tcp://localhost:*
>>
>>
>> Thanks,
>>
>> Venkat satish katta
>> ------------------------------
>> *From:* Wang Wei <wang...@apache.org>
>> *Sent:* Wednesday, June 22, 2016 8:46:36 AM
>> *To:* Venkat Katta
>>
>> *Subject:* Re: Error while running singa on mesos
>>
>> If you are using Docker (withou mesos), it could be the problem of
>> network routing. May need to configure the Docker to setup the network then
>> node0 and node2 can be accessed from node1.
>> We are trying your configuration.
>>
>> regards,
>> wang wei
>>
>>
>> On Wed, Jun 22, 2016 at 10:32 AM, Wang Wei <wang...@apache.org> wrote:
>>
>>> Hi Venkat,
>>>
>>> It should be the problem of the node address.
>>> Pls replace node0 and node2 with their IP addresses.
>>>
>>> regards,
>>> wei
>>>
>>> On Wed, Jun 22, 2016 at 2:40 AM, Venkat Katta <ska...@adobe.com> wrote:
>>>
>>>> i tried running without mesos i got the same error
>>>>
>>>>
>>>> root@node0:~/incubator-singa# ./bin/singa-run.sh -conf
>>>> examples/cifar10/hybrid.conf
>>>> Unique JOB_ID is 4
>>>> Record job information to /tmp/singa-log/job-info/job-4-20160621-183305
>>>> Executing @ node2 : cd /root/incubator-singa; source
>>>> /root/incubator-singa/conf/profile; ./singa -singa_conf
>>>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>>> /root/incubator-singa/examples/cifar10/hybrid.conf
>>>> Executing @ node0 : cd /root/incubator-singa; source
>>>> /root/incubator-singa/conf/profile; ./singa -singa_conf
>>>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>>> /root/incubator-singa/examples/cifar10/hybrid.conf
>>>> F0621 18:33:24.171468   725 socket.cc:98] Check failed: port != -1 (-1
>>>> vs. -1) tcp://node2:*
>>>> *** Check failure stack trace: ***
>>>>     @     0x7f10d0a6b9fd  google::LogMessage::Fail()
>>>>     @     0x7f10d0a6d89d  google::LogMessage::SendToLog()
>>>>     @     0x7f10d0a6b5ec  google::LogMessage::Flush()
>>>>     @     0x7f10d0a6e1be  google::LogMessageFatal::~LogMessageFatal()
>>>>     @     0x7f10d0e05d79  singa::Router::Bind()
>>>>     @     0x7f10d0d7a8bc  singa::Driver::Train()
>>>>     @     0x7f10d0d7f48b  singa::Driver::Train()
>>>>     @           0x40c915  main
>>>>     @     0x7f10c5f13f45  (unknown)
>>>>     @           0x40cb7e  (unknown)
>>>> F0621 18:33:06.244278  1042 socket.cc:98] Check failed: port != -1 (-1
>>>> vs. -1) tcp://node0:*
>>>> *** Check failure stack trace: ***
>>>>     @     0x7f6d4516d9fd  google::LogMessage::Fail()
>>>>     @     0x7f6d4516f89d  google::LogMessage::SendToLog()
>>>>     @     0x7f6d4516d5ec  google::LogMessage::Flush()
>>>>     @     0x7f6d451701be  google::LogMessageFatal::~LogMessageFatal()
>>>>     @     0x7f6d45507d79  singa::Router::Bind()
>>>>     @     0x7f6d4547c8bc  singa::Driver::Train()
>>>>     @     0x7f6d4548148b  singa::Driver::Train()
>>>>     @           0x40c915  main
>>>>     @     0x7f6d3a615f45  (unknown)
>>>>     @           0x40cb7e  (unknown)
>>>> bash: line 1:   725 Aborted                 (core dumped) ./singa
>>>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node2
>>>> bash: line 1:  1042 Aborted                 (core dumped) ./singa
>>>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node0
>>>> E0621 18:33:07.467438  1067 job_manager.cc:156] job 4 not exists
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Wang Wei <wang...@apache.org>
>>>> *Sent:* Tuesday, June 21, 2016 7:09:46 PM
>>>> *To:* Venkat Katta
>>>> *Cc:* dev@singa.incubator.apache.org
>>>> *Subject:* Re: Error while running singa on mesos
>>>>
>>>> Hi,
>>>>
>>>> Can you try to run it without Mesos?
>>>> 1. Compile singa with enable-dist
>>>> 2. change conf/singa.conf to set the zookeeper host
>>>> 3. update the conf/hostfile one line per machine
>>>> 4. update the conf/profile to export LD_LIBRARY_PATH
>>>>
>>>> regards,
>>>> Wei
>>>>
>>>> On Tue, Jun 21, 2016 at 8:52 PM, Venkat Katta <ska...@adobe.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> I am actually trying to run singa on mesos in fully distributed
>>>>> architecture. I built the docker images as given in the documentation. I 
>>>>> am
>>>>> using mesos 0.28.2 and singa 0.3-rc3.I am running each docker container
>>>>> using --net=host flag so that they take the ip of the system. Singa works
>>>>> as long as the workers are all in one machine .
>>>>> When I try to use two machines for training it shows error
>>>>>
>>>>>
>>>>> F0617 10:00:43.862246 2742 socket.cc:98] Check failed: port != -1 (-1
>>>>> vs. -1) tcp://localhost:*
>>>>>
>>>>>
>>>>>   so while running the scheduler do we need to give it hostfile
>>>>> containing all the hosts. How does it know the remaining hosts in cluster.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Venkat Satish Katta.
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Error while running singa on mesos

Reply via email to