Re: Hadoop on Mesos Memory Configuration Question

2015-10-02 Thread Adam Bordelon
Mesos will take the resources from each slave (total-1GB) and offer that to
the various frameworks registered on your cluster. If Hadoop is the only
framework, it will get offered all of the resources. If there are other
frameworks running, Mesos will offer resources to the framework furthest
below its fair share, to try to maintain Dominant Resource Fairness (see
the DRF paper). It is up to the Hadoop framework scheduler to decide how
much of each offer to use to launch its tasks (TaskTrackers/NodeManagers).
Are you using Hadoop1 framework: https://github.com/mesos/hadoop
Or the Hadoop2/YARN framework: https://github.com/apache/incubator-myriad

On Thu, Oct 1, 2015 at 6:26 PM, Ajit Jagdale  wrote:

> Hi all,
>
> I'm new to Mesos and to using Hadoop over Mesos.  I've been trying to
> determine if Mesos memory configurations are affecting the memory that I
> allocate to Hadoop mappers and reducers (in Hadoop's mapped-site.xml
> file).  When I set values to the mappers, something seems to interfere with
> allocating that memory.
>
> Cluster setup:
> - 1 master node and 6 slave nodes
> - There is no /etc/mesos-slave/resource file, so memory is configured by
> Mesos.  My understanding of this is that since there are no explicit memory
> settings on each slave node, Mesos is giving the asking application
> (Hadoop) all of the available memory minus 1GB for running the OS.
>
> But there still must be some mesos memory configuration somewhere, right?
> Something that knows how much a slice of memory is.  I'm not sure if I know
> where that is.
>
> Any suggestions of how mesos' process of memory allocation could be
> affecting how Hadoop affects memory allocation would be appreciated.
>
> Thanks,
> Ajit
>


Re: Viewing old versions of docs

2015-10-02 Thread Benjamin Mahler
We'd like to provide versioned docs on the website, as follows:

http://mesos.apache.org/documentation/latest/
http://mesos.apache.org/documentation/0.24.0/
http://mesos.apache.org/documentation/0.23.0/
etc.

This is why "latest" was put into the link format.

On Fri, Oct 2, 2015 at 2:29 PM, Alan Braithwaite 
wrote:

> Thanks Joseph.
>
> To your point of not reading older versions of the docs though: that's
> pretty silly.  You must realize how quickly the project is moving and as
> such there's no sense in being mislead about the feature set of the version
> which you're using.
>
> In my case I was looking through the docs to find the new 0.24 API which
> doesn't exist on the version I'm currently using: 0.23. :-|
>
> Also, as a user if I see /latest/ in the url, I'm going to assume that the
> older versions are hosted online as well.  Maybe I'm just weird though. :-)
>
> Thanks for the pointers,
> - Alan
>
> On Fri, Oct 2, 2015 at 11:34 AM, Joseph Wu  wrote:
>
>> Hi Alan,
>>
>> I don't think it's recommended to refer to older versions of the docs.
>> But if you absolutely need to, you can find those by browsing the source.
>>
>> Take the version of Mesos you're looking for, and substitute it for
>> "" below:
>> https://github.com/apache/mesos/blob//docs/
>>
>> i.e. For the most recent release:
>> https://github.com/apache/mesos/blob/0.24.1/docs/
>>
>> ~Joseph
>>
>> On Fri, Oct 2, 2015 at 11:02 AM, Alan Braithwaite 
>> wrote:
>>
>>> Hey All,
>>>
>>> Trying to figure out how to view older versions of the docs on the web.
>>> Can't find an index or link to versioned docs from google.
>>>
>>> Can anyone point me in the right direction?
>>>
>>> Thanks,
>>> - Alan
>>>
>>
>>
>


Re: Viewing old versions of docs

2015-10-02 Thread Alan Braithwaite
Thanks Joseph.

To your point of not reading older versions of the docs though: that's
pretty silly.  You must realize how quickly the project is moving and as
such there's no sense in being mislead about the feature set of the version
which you're using.

In my case I was looking through the docs to find the new 0.24 API which
doesn't exist on the version I'm currently using: 0.23. :-|

Also, as a user if I see /latest/ in the url, I'm going to assume that the
older versions are hosted online as well.  Maybe I'm just weird though. :-)

Thanks for the pointers,
- Alan

On Fri, Oct 2, 2015 at 11:34 AM, Joseph Wu  wrote:

> Hi Alan,
>
> I don't think it's recommended to refer to older versions of the docs.
> But if you absolutely need to, you can find those by browsing the source.
>
> Take the version of Mesos you're looking for, and substitute it for
> "" below:
> https://github.com/apache/mesos/blob//docs/
>
> i.e. For the most recent release:
> https://github.com/apache/mesos/blob/0.24.1/docs/
>
> ~Joseph
>
> On Fri, Oct 2, 2015 at 11:02 AM, Alan Braithwaite 
> wrote:
>
>> Hey All,
>>
>> Trying to figure out how to view older versions of the docs on the web.
>> Can't find an index or link to versioned docs from google.
>>
>> Can anyone point me in the right direction?
>>
>> Thanks,
>> - Alan
>>
>
>


Files not being copied to all slaves from hdfs w/spark-submit

2015-10-02 Thread Rodrick Brown
For some reason my jobs are not being copied to all the slaves when they’re 
download from hdfs am I missing something obvious? 
They only seem to be copied to the node where the job is submitted. 

-- 
 
Rodrick Brown / DevOPs Engineer 
+1 917 445 6839 / rodr...@orchardplatform.com 

Orchard Platform 
101 5th Avenue, 4th Floor, New York, NY 10003 
http://www.orchardplatform.com 
Orchard Blog  | Marketplace Lending 
Meetup 

-- 
*NOTICE TO RECIPIENTS*: This communication is confidential and intended for 
the use of the addressee only. If you are not an intended recipient of this 
communication, please delete it immediately and notify the sender by return 
email. Unauthorized reading, dissemination, distribution or copying of this 
communication is prohibited. This communication does not constitute an 
offer to sell or a solicitation of an indication of interest to purchase 
any loan, security or any other financial product or instrument, nor is it 
an offer to sell or a solicitation of an indication of interest to purchase 
any products or services to any persons who are prohibited from receiving 
such information under applicable law. The contents of this communication 
may not be accurate or complete and are subject to change without notice. 
As such, Orchard App, Inc. (including its subsidiaries and affiliates, 
"Orchard") makes no representation regarding the accuracy or completeness 
of the information contained herein. The intended recipient is advised to 
consult its own professional advisors, including those specializing in 
legal, tax and accounting matters. Orchard does not provide legal, tax or 
accounting advice.


Re: Viewing old versions of docs

2015-10-02 Thread Joseph Wu
Hi Alan,

I don't think it's recommended to refer to older versions of the docs.  But
if you absolutely need to, you can find those by browsing the source.

Take the version of Mesos you're looking for, and substitute it for
"" below:
https://github.com/apache/mesos/blob//docs/

i.e. For the most recent release:
https://github.com/apache/mesos/blob/0.24.1/docs/

~Joseph

On Fri, Oct 2, 2015 at 11:02 AM, Alan Braithwaite 
wrote:

> Hey All,
>
> Trying to figure out how to view older versions of the docs on the web.
> Can't find an index or link to versioned docs from google.
>
> Can anyone point me in the right direction?
>
> Thanks,
> - Alan
>


Viewing old versions of docs

2015-10-02 Thread Alan Braithwaite
Hey All,

Trying to figure out how to view older versions of the docs on the web.
Can't find an index or link to versioned docs from google.

Can anyone point me in the right direction?

Thanks,
- Alan


Re: Tasks not shown in "Completed Tasks" after agent reinstall.

2015-10-02 Thread Joris Van Remoortere
Hi Mauricio,
When you remove the workdir, that means the next agent launched on that
machine will have a different agent-id.
When this new agent registers with the master, it will be a totally new
agent.
Since most of the data on the master is re-constructed from agent
re-connections, this means you will not see the completed tasks of an agent
that has been removed due to the issue above.
You can likely still see some of the completed tasks at the framework
level, as those are kept around until a master fail-over, or the buffer
that keeps the history gets truncated.

Joris

On Fri, Oct 2, 2015 at 6:27 AM, Mauricio Garavaglia 
wrote:

> Hi guys,
>
> If I remove the workdir (/var/lib/mesos) entries in the agents, does it
> mean I lost the "Completed Tasks" view in the masters dashboard?
>
> I'm debugging a case in which some agent nodes got recreated from scratch
> and the tasks that they ran disappeared from the dashboard.
>
> Thanks,
> Mauricio
>
>
>
>


Re: [VOTE] Release Apache Mesos 0.21.2 (rc1)

2015-10-02 Thread Niklas Nielsen
+1 (binding)

On 1 October 2015 at 20:27, Michael Park  wrote:

> +1 (binding)
>
> *make distcheck* passed with the Jenkins Build script with
>
> *OS=ubuntu:15.04 COMPILER=gcc CONFIGURATION="--enable-optimize"
> ./support/jenkins_build.sh*
>
> On Fri, Sep 25, 2015 at 5:29 PM Vinod Kone  wrote:
>
>> +1 (binding)
>>
>> Tested on CI for CentOS5/6.
>>
>>
>> On Thu, Sep 24, 2015 at 6:12 PM, Adam Bordelon 
>> wrote:
>>
>>> +1 (binding) Tested on CI for CentOS7 and Ubuntu 14.04.
>>>
>>> On Thu, Sep 24, 2015 at 5:44 PM, Adam Bordelon 
>>> wrote:
>>>
 Hi friends,

 Here's a candidate for the last of the docker patch releases
 (0.21.x-0.24.x).
 Please vote on releasing the following candidate as Apache Mesos 0.21.2.

 0.21.2 is a bug fix release and includes the following:

 
 * [MESOS-2986] - Docker version output is not compatible with Mesos

 The CHANGELOG for the release is available at:

 https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.21.2-rc1

 

 The candidate for Mesos 0.21.2 release is available at:

 https://dist.apache.org/repos/dist/dev/mesos/0.21.2-rc1/mesos-0.21.2.tar.gz

 The tag to be voted on is 0.21.2-rc1:

 https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.21.2-rc1

 The MD5 checksum of the tarball can be found at:

 https://dist.apache.org/repos/dist/dev/mesos/0.21.2-rc1/mesos-0.21.2.tar.gz.md5

 The signature of the tarball can be found at:

 https://dist.apache.org/repos/dist/dev/mesos/0.21.2-rc1/mesos-0.21.2.tar.gz.asc

 The PGP key used to sign the release is here:
 https://dist.apache.org/repos/dist/release/mesos/KEYS

 The JAR is up in Maven in a staging repository here:
 https://repository.apache.org/content/repositories/orgapachemesos-1074

 Please vote on releasing this package as Apache Mesos 0.21.2!

 The vote is open until Tue Sep 29 18:00 PDT 2015 and passes if a
 majority of at least 3 +1 PMC votes are cast.

 [ ] +1 I tested this package. Release this package as Apache Mesos
 0.21.2
 [ ] -1 Do not release this package because ...

 Thanks,
 -Adam-

>>>
>>>
>>


Tasks not shown in "Completed Tasks" after agent reinstall.

2015-10-02 Thread Mauricio Garavaglia
Hi guys,

If I remove the workdir (/var/lib/mesos) entries in the agents, does it
mean I lost the "Completed Tasks" view in the masters dashboard?

I'm debugging a case in which some agent nodes got recreated from scratch
and the tasks that they ran disappeared from the dashboard.

Thanks,
Mauricio


Re: Running a task in Mesos cluster

2015-10-02 Thread Pradeep Kiruvale
Hi Ondrej,

Thanks for your reply

I did solve that issue, yes you are right there was an issue with slave IP
address setting.

Now I am facing issue with the scheduling the tasks. When I try to schedule
a task using

/src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
--command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
--resources="cpus(*):3;mem(*):2560"

The tasks always get scheduled on the same node. The resources from the
other nodes are not getting used to schedule the tasks.

 I just start the mesos slaves like below

./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos  --hostname=slave1

If I submit the task using the above (mesos-execute) command from same as
one of the slave it runs on that system.

But when I submit the task from some different system. It uses just that
system and queues the tasks not runs on the other slaves.
Some times I see the message "Failed to getgid: unknown user"

Do I need to start some process to push the task on all the slaves equally?
Am I missing something here?

Regards,
Pradeep



On 2 October 2015 at 15:07, Ondrej Smola  wrote:

> Hi Pradeep,
>
> the problem is with IP your slave advertise - mesos by default resolves
> your hostname - there are several solutions  (let say your node ip is
> 192.168.56.128)
>
> 1)  export LIBPROCESS_IP=192.168.56.128
> 2)  set mesos options - ip, hostname
>
> one way to do this is to create files
>
> echo "192.168.56.128" > /etc/mesos-slave/ip
> echo "abc.mesos.com" > /etc/mesos-slave/hostname
>
> for more configuration options see
> http://mesos.apache.org/documentation/latest/configuration
>
>
>
>
>
> 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale :
>
>> Hi Guangya,
>>
>> Thanks for reply. I found one interesting log message.
>>
>>  7410 master.cpp:5977] Removed slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
>> registered at the same address
>>
>> Mostly because of this issue, the systems/slave nodes are getting
>> registered and de-registered to make a room for the next node. I can even
>> see this on
>> the UI interface, for some time one node got added and after some time
>> that will be replaced with the new slave node.
>>
>> The above log is followed by the below log messages.
>>
>>
>> I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action (18 bytes)
>> to leveldb took 104089ns
>> I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at 384
>> E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown socket
>> with fd 15: Transport endpoint is not connected
>> I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578;
>> ports(*):[31000-32000]
>> I1002 10:01:12.754065  7413 master.cpp:1080] Slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116) disconnected
>> I1002 10:01:12.754072  7416 hierarchical.hpp:675] Added slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8;
>> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: )
>> I1002 10:01:12.754084  7413 master.cpp:2534] Disconnecting slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116)
>> E1002 10:01:12.754118  7417 process.cpp:1912] Failed to shutdown socket
>> with fd 16: Transport endpoint is not connected
>> I1002 10:01:12.754132  7413 master.cpp:2553] Deactivating slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116)
>> I1002 10:01:12.754237  7416 hierarchical.hpp:768] Slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated
>> I1002 10:01:12.754240  7413 replica.cpp:658] Replica received learned
>> notice for position 384
>> I1002 10:01:12.754360  7413 leveldb.cpp:343] Persisting action (20 bytes)
>> to leveldb took 95171ns
>> I1002 10:01:12.754395  7413 leveldb.cpp:401] Deleting ~2 keys from
>> leveldb took 20333ns
>> I1002 10:01:12.754406  7413 replica.cpp:679] Persisted action at 384
>>
>>
>> Thanks,
>> Pradeep
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2 October 2015 at 02:35, Guangya Liu  wrote:
>>
>>> Hi Pradeep,
>>>
>>> Please check some of my questions in line.
>>>
>>> Thanks,
>>>
>>> Guangya
>>>
>>> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale <
>>> pradeepkiruv...@gmail.com> wrote:
>>>
 Hi All,

 I am new to Mesos. I have set up a Mesos cluster with 1 Master and 3
 Slaves.

 One slave runs on the Master Node itself and Other slaves run on
 different nodes. Here node means the physical boxes.

 I tried running the tasks by configuring one Node cluster. Tested the
 task scheduling using mesos-execute, works fine.

 When I configure three Node cluster (1master and 3 slaves) and try to
 see the resources on the master (in GUI) only the Master node resources are
 visible.
  The other nodes re

Re: Running a task in Mesos cluster

2015-10-02 Thread Ondrej Smola
Hi Pradeep,

the problem is with IP your slave advertise - mesos by default resolves
your hostname - there are several solutions  (let say your node ip is
192.168.56.128)

1)  export LIBPROCESS_IP=192.168.56.128
2)  set mesos options - ip, hostname

one way to do this is to create files

echo "192.168.56.128" > /etc/mesos-slave/ip
echo "abc.mesos.com" > /etc/mesos-slave/hostname

for more configuration options see
http://mesos.apache.org/documentation/latest/configuration





2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale :

> Hi Guangya,
>
> Thanks for reply. I found one interesting log message.
>
>  7410 master.cpp:5977] Removed slave
> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
> registered at the same address
>
> Mostly because of this issue, the systems/slave nodes are getting
> registered and de-registered to make a room for the next node. I can even
> see this on
> the UI interface, for some time one node got added and after some time
> that will be replaced with the new slave node.
>
> The above log is followed by the below log messages.
>
>
> I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action (18 bytes)
> to leveldb took 104089ns
> I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at 384
> E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown socket
> with fd 15: Transport endpoint is not connected
> I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
> (192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578;
> ports(*):[31000-32000]
> I1002 10:01:12.754065  7413 master.cpp:1080] Slave
> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
> (192.168.0.116) disconnected
> I1002 10:01:12.754072  7416 hierarchical.hpp:675] Added slave
> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8;
> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: )
> I1002 10:01:12.754084  7413 master.cpp:2534] Disconnecting slave
> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
> (192.168.0.116)
> E1002 10:01:12.754118  7417 process.cpp:1912] Failed to shutdown socket
> with fd 16: Transport endpoint is not connected
> I1002 10:01:12.754132  7413 master.cpp:2553] Deactivating slave
> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
> (192.168.0.116)
> I1002 10:01:12.754237  7416 hierarchical.hpp:768] Slave
> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated
> I1002 10:01:12.754240  7413 replica.cpp:658] Replica received learned
> notice for position 384
> I1002 10:01:12.754360  7413 leveldb.cpp:343] Persisting action (20 bytes)
> to leveldb took 95171ns
> I1002 10:01:12.754395  7413 leveldb.cpp:401] Deleting ~2 keys from leveldb
> took 20333ns
> I1002 10:01:12.754406  7413 replica.cpp:679] Persisted action at 384
>
>
> Thanks,
> Pradeep
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2 October 2015 at 02:35, Guangya Liu  wrote:
>
>> Hi Pradeep,
>>
>> Please check some of my questions in line.
>>
>> Thanks,
>>
>> Guangya
>>
>> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale <
>> pradeepkiruv...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am new to Mesos. I have set up a Mesos cluster with 1 Master and 3
>>> Slaves.
>>>
>>> One slave runs on the Master Node itself and Other slaves run on
>>> different nodes. Here node means the physical boxes.
>>>
>>> I tried running the tasks by configuring one Node cluster. Tested the
>>> task scheduling using mesos-execute, works fine.
>>>
>>> When I configure three Node cluster (1master and 3 slaves) and try to
>>> see the resources on the master (in GUI) only the Master node resources are
>>> visible.
>>>  The other nodes resources are not visible. Some times visible but in a
>>> de-actived state.
>>>
>> Can you please append some logs from mesos-slave and mesos-master? There
>> should be some logs in either master or slave telling you what is wrong.
>>
>>>
>>> *Please let me know what could be the reason. All the nodes are in the
>>> same network. *
>>>
>>> When I try to schedule a task using
>>>
>>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
>>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
>>> --resources="cpus(*):3;mem(*):2560"
>>>
>>> The tasks always get scheduled on the same node. The resources from the
>>> other nodes are not getting used to schedule the tasks.
>>>
>> Based on your previous question, there is only one node in your cluster,
>> that's why other nodes are not available. We need first identify what is
>> wrong with other three nodes first.
>>
>>>
>>> I*s it required to register the frameworks from every slave node on the
>>> Master?*
>>>
>> It is not required.
>>
>>>
>>> *I have configured this cluster using the git-hub code.*
>>>
>>>
>>> Thanks & Regards,
>>> Pradeep
>>>
>>>
>>
>


Re: resource fragmentation/mesos-master offers too low

2015-10-02 Thread Eren Güven
Hey Vinod,

Mesos: 0.23.0-1.0.ubuntu1404
Marathon: 0.10.1-1.0.416.ubuntu1404
Chronos: 2.4.0-0.1.20150828104228.ubuntu1404

Single master, 16 slaves, each with 6cpu 6mem, many (perhaps not all) had
more resources than mentioned in the logs (mesos offers with 6m mem etc..)
or what was required for the marathon tasks (.25cpu 128mem).

# cat /etc/mesos-master/offer_timeout
30secs

Marathon is running 200-400 tasks at any given time, chronos <20.

There aren't too much interesting in the logs, prior to this many lines of
Marathon app definition and then the 'Not all basic resources satisfied'
spam starts, something like


Sep 29 09:35:35 lab-mesos-master1 marathon[60609]: [INFO] [09/29/2015
09:35:35.938] [marathon-akka.actor.default-dispatcher-1396]
[akka://marathon/user/MarathonScheduler/$a/DeploymentManager/6d87463d-38
d6-4f20-86ba-9caefce12ad7/$a] Successfully started 0 instances of 
Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929
09:35:36.444304 60462 master.cpp:4290] Sending 1 offers to framework
20150928-140031-2581335306-5050-60442- (marathon) at scheduler-3cced
735-b1d8-4739-adae-b479a0411593@127.0.1.1:50185
Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929
09:35:36.445269 60462 master.cpp:4290] Sending 3 offers to framework
20150928-104536-2581335306-5050-25731- (chronos-2.4.0) at scheduler-
16cd75ef-d383-4c4c-8239-89a0c1fac536@127.0.1.1:57205
Sep 29 09:35:36 lab-mesos-master1 chronos[60791]: [2015-09-29 09:35:36,447]
INFO Received resource offers
(org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:82)
Sep 29 09:35:36 lab-mesos-master1 chronos[60791]: [2015-09-29 09:35:36,447]
INFO No tasks scheduled or next task has been disabled.
Sep 29 09:35:36 lab-mesos-master1 chronos[60791]:
 (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:131)
Sep 29 09:35:36 lab-mesos-master1 chronos[60791]: [2015-09-29 09:35:36,447]
INFO Declining unused offers.
(org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:89)
Sep 29 09:35:36 lab-mesos-master1 chronos[60791]: [2015-09-29 09:35:36,448]
INFO Declined unused offers with filter refuseSeconds=5.0 (use
--decline_offer_duration to reconfigure) (org.apache.mesos.chro
nos.scheduler.mesos.MesosJobFramework:97)
Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929
09:35:36.450083 60465 master.cpp:2884] Processing DECLINE call for offers:
[ 20150928-140031-2581335306-5050-60442-O436519 ] for framework 20
150928-104536-2581335306-5050-25731- (chronos-2.4.0) at
scheduler-16cd75ef-d383-4c4c-8239-89a0c1fac536@127.0.1.1:57205
Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929
09:35:36.451763 60465 master.cpp:2884] Processing DECLINE call for offers:
[ 20150928-140031-2581335306-5050-60442-O436520 ] for framework 20
150928-104536-2581335306-5050-25731- (chronos-2.4.0) at
scheduler-16cd75ef-d383-4c4c-8239-89a0c1fac536@127.0.1.1:57205
Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929
09:35:36.453340 60465 master.cpp:2884] Processing DECLINE call for offers:
[ 20150928-140031-2581335306-5050-60442-O436521 ] for framework 20
150928-104536-2581335306-5050-25731- (chronos-2.4.0) at
scheduler-16cd75ef-d383-4c4c-8239-89a0c1fac536@127.0.1.1:57205
Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929
09:35:36.457660 60467 hierarchical.hpp:761] Recovered cpus(*):0.05;
mem(*):12; disk(*):41211; ports(*):[31000-31005, 31007-31014, 31016-31063
, 31065-31088, 31090-31146, 31148-31176, 31178-31193, 31195-31226,
31228-31239, 31241-31321, 31323-31367, 31369-31524, 31526-31544,
31546-31562, 31564-31564, 31566-31598, 31600-31667, 31669-31709, 31711
-31711, 31713-31732, 31734-31750, 31752-31850, 31852-31864, 31866-31910,
31912-31992, 31994-32000] (total: cpus(*):6; mem(*):2922; disk(*):41211;
ports(*):[31000-32000], allocated: cpus(*):5.95; mem(*):
2910; ports(*):[31006-31006, 31015-31015, 31064-31064, 31089-31089,
31147-31147, 31177-31177, 31194-31194, 31227-31227, 31240-31240,
31322-31322, 31368-31368, 31525-31525, 31545-31545, 31563-31563, 3156
5-31565, 31599-31599, 31668-31668, 31710-31710, 31712-31712, 31733-31733,
31751-31751, 31851-31851, 31865-31865, 31911-31911, 31993-31993]) on slave
20150827-122131-2581335306-5050-34525-S274 from frame
work 20150928-104536-2581335306-5050-25731-
Sep 29 09:35:36 lab-mesos-master1 mesos-master[60442]: I0929
09:35:36.459028 60467 hierarchical.hpp:761] Recovered cpus(*):1.45;
mem(*):70; disk(*):41211; ports(*):[31000-31236, 31238-31242, 31244-31256
, 31258-31262, 31264-31267, 31269-31307, 31309-31313, 31315-31450,
31452-31486, 31488-31618, 31620-31622, 31624-31634, 31636-31653,
31655-31751, 31753-31826, 31828-31837, 31839-31896, 31898-31962, 31964
-31967, 31969-31983, 31985-32000] (total: cpus(*):6; mem(*):2922;
disk(*):41211; ports(*):[31000-32000], allocated: cpus(*):4.55;
mem(*):2852; ports(*):[31237-31237, 31243-31243, 31257-31257, 31263-3126
3, 31268-31268, 31308-31308, 31314-31314, 31451-31451, 31487-31487,
31619-31619, 31623-31623, 31635-31635, 316

Re: Running a task in Mesos cluster

2015-10-02 Thread Pradeep Kiruvale
Hi Guangya,

Thanks for reply. I found one interesting log message.

 7410 master.cpp:5977] Removed slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
registered at the same address

Mostly because of this issue, the systems/slave nodes are getting
registered and de-registered to make a room for the next node. I can even
see this on
the UI interface, for some time one node got added and after some time that
will be replaced with the new slave node.

The above log is followed by the below log messages.


I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action (18 bytes)
to leveldb took 104089ns
I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at 384
E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown socket
with fd 15: Transport endpoint is not connected
I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
(192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578;
ports(*):[31000-32000]
I1002 10:01:12.754065  7413 master.cpp:1080] Slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
(192.168.0.116) disconnected
I1002 10:01:12.754072  7416 hierarchical.hpp:675] Added slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8;
mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: )
I1002 10:01:12.754084  7413 master.cpp:2534] Disconnecting slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
(192.168.0.116)
E1002 10:01:12.754118  7417 process.cpp:1912] Failed to shutdown socket
with fd 16: Transport endpoint is not connected
I1002 10:01:12.754132  7413 master.cpp:2553] Deactivating slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
(192.168.0.116)
I1002 10:01:12.754237  7416 hierarchical.hpp:768] Slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated
I1002 10:01:12.754240  7413 replica.cpp:658] Replica received learned
notice for position 384
I1002 10:01:12.754360  7413 leveldb.cpp:343] Persisting action (20 bytes)
to leveldb took 95171ns
I1002 10:01:12.754395  7413 leveldb.cpp:401] Deleting ~2 keys from leveldb
took 20333ns
I1002 10:01:12.754406  7413 replica.cpp:679] Persisted action at 384


Thanks,
Pradeep



















On 2 October 2015 at 02:35, Guangya Liu  wrote:

> Hi Pradeep,
>
> Please check some of my questions in line.
>
> Thanks,
>
> Guangya
>
> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am new to Mesos. I have set up a Mesos cluster with 1 Master and 3
>> Slaves.
>>
>> One slave runs on the Master Node itself and Other slaves run on
>> different nodes. Here node means the physical boxes.
>>
>> I tried running the tasks by configuring one Node cluster. Tested the
>> task scheduling using mesos-execute, works fine.
>>
>> When I configure three Node cluster (1master and 3 slaves) and try to see
>> the resources on the master (in GUI) only the Master node resources are
>> visible.
>>  The other nodes resources are not visible. Some times visible but in a
>> de-actived state.
>>
> Can you please append some logs from mesos-slave and mesos-master? There
> should be some logs in either master or slave telling you what is wrong.
>
>>
>> *Please let me know what could be the reason. All the nodes are in the
>> same network. *
>>
>> When I try to schedule a task using
>>
>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
>> --resources="cpus(*):3;mem(*):2560"
>>
>> The tasks always get scheduled on the same node. The resources from the
>> other nodes are not getting used to schedule the tasks.
>>
> Based on your previous question, there is only one node in your cluster,
> that's why other nodes are not available. We need first identify what is
> wrong with other three nodes first.
>
>>
>> I*s it required to register the frameworks from every slave node on the
>> Master?*
>>
> It is not required.
>
>>
>> *I have configured this cluster using the git-hub code.*
>>
>>
>> Thanks & Regards,
>> Pradeep
>>
>>
>