Re: Failed to authenticate

2015-11-10 Thread Pradeep Kiruvale
This issue is only on centos 7, On ubuntu its working fine.

Any idea?

Regards,
Pradeep


On 9 November 2015 at 17:32, Pradeep Kiruvale 
wrote:

> Hi All,
>
> I am getting authentication issue on my mesos cluster
>
> Please find the slave side and master side logs.
>
> Regards,
> Pradeep
>
> *Slave  logs *
>
> W1110 01:54:18.641191 111550 slave.cpp:877] Authentication timed out
> W1110 01:54:18.641309 111550 slave.cpp:841] Failed to authenticate with
> master master@192.168.0.102:5050: Authentication discarded
> I1110 01:54:18.641355 111550 slave.cpp:792] Authenticating with master
> master@192.168.0.102:5050
> I1110 01:54:18.641369 111550 slave.cpp:797] Using default CRAM-MD5
> authenticatee
> I1110 01:54:18.641616 111539 authenticatee.cpp:123] Creating new client
> SASL connection
> W1110 01:54:23.646075 111555 slave.cpp:877] Authentication timed out
> W1110 01:54:23.646205 111555 slave.cpp:841] Failed to authenticate with
> master master@192.168.0.102:5050: Authentication discarded
> I1110 01:54:23.646266 111555 slave.cpp:792] Authenticating with master
> master@192.168.0.102:5050
> I1110 01:54:23.646286 111555 slave.cpp:797] Using default CRAM-MD5
> authenticatee
> I1110 01:54:23.646406 111544 authenticatee.cpp:123] Creating new client
> SASL connection
> W1110 01:54:28.651070 111554 slave.cpp:877] Authentication timed out
> W1110 01:54:28.651206 111554 slave.cpp:841] Failed to authenticate with
> master master@192.168.0.102:5050: Authentication discarded
> I1110 01:54:28.651257 111554 slave.cpp:792] Authenticating with master
> master@192.168.0.102:5050
>
>
> *Master logs*
>
> E1109 17:27:36.455260 27950 process.cpp:1911] Failed to shutdown socket
> with fd 11: Transport endpoint is not connected
> W1109 17:27:36.455517 27949 master.cpp:5177] Failed to authenticate
> slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
> E1109 17:27:36.455602 27950 process.cpp:1911] Failed to shutdown socket
> with fd 12: Transport endpoint is not connected
> I1109 17:27:41.459787 27946 master.cpp:5150] Authenticating slave(1)@
> 192.168.0.169:5051
> I1109 17:27:41.460211 27946 authenticator.cpp:100] Creating new server
> SASL connection
> E1109 17:27:41.460376 27950 process.cpp:1911] Failed to shutdown socket
> with fd 11: Transport endpoint is not connected
> W1109 17:27:41.460578 27947 master.cpp:5177] Failed to authenticate
> slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
> E1109 17:27:41.460695 27950 process.cpp:1911] Failed to shutdown socket
> with fd 12: Transport endpoint is not connected
> I1109 17:27:46.460510 27948 master.cpp:5150] Authenticating slave(1)@
> 192.168.0.169:5051
> I1109 17:27:46.460930 27944 authenticator.cpp:100] Creating new server
> SASL connection
> E1109 17:27:46.461139 27950 process.cpp:1911] Failed to shutdown socket
> with fd 11: Transport endpoint is not connected
> W1109 17:27:46.461392 27944 master.cpp:5177] Failed to authenticate
> slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
> E1109 17:27:46.461444 27950 process.cpp:1911] Failed to shutdown socket
> with fd 12: Transport endpoint is not connected
> I1109 17:27:51.466349 27945 master.cpp:5150] Authenticating slave(1)@
> 192.168.0.169:5051
> I1109 17:27:51.466747 27945 authenticator.cpp:100] Creating new server
> SASL connection
>
>


Failed to authenticate

2015-11-09 Thread Pradeep Kiruvale
Hi All,

I am getting authentication issue on my mesos cluster

Please find the slave side and master side logs.

Regards,
Pradeep

*Slave  logs *

W1110 01:54:18.641191 111550 slave.cpp:877] Authentication timed out
W1110 01:54:18.641309 111550 slave.cpp:841] Failed to authenticate with
master master@192.168.0.102:5050: Authentication discarded
I1110 01:54:18.641355 111550 slave.cpp:792] Authenticating with master
master@192.168.0.102:5050
I1110 01:54:18.641369 111550 slave.cpp:797] Using default CRAM-MD5
authenticatee
I1110 01:54:18.641616 111539 authenticatee.cpp:123] Creating new client
SASL connection
W1110 01:54:23.646075 111555 slave.cpp:877] Authentication timed out
W1110 01:54:23.646205 111555 slave.cpp:841] Failed to authenticate with
master master@192.168.0.102:5050: Authentication discarded
I1110 01:54:23.646266 111555 slave.cpp:792] Authenticating with master
master@192.168.0.102:5050
I1110 01:54:23.646286 111555 slave.cpp:797] Using default CRAM-MD5
authenticatee
I1110 01:54:23.646406 111544 authenticatee.cpp:123] Creating new client
SASL connection
W1110 01:54:28.651070 111554 slave.cpp:877] Authentication timed out
W1110 01:54:28.651206 111554 slave.cpp:841] Failed to authenticate with
master master@192.168.0.102:5050: Authentication discarded
I1110 01:54:28.651257 111554 slave.cpp:792] Authenticating with master
master@192.168.0.102:5050


*Master logs*

E1109 17:27:36.455260 27950 process.cpp:1911] Failed to shutdown socket
with fd 11: Transport endpoint is not connected
W1109 17:27:36.455517 27949 master.cpp:5177] Failed to authenticate
slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
E1109 17:27:36.455602 27950 process.cpp:1911] Failed to shutdown socket
with fd 12: Transport endpoint is not connected
I1109 17:27:41.459787 27946 master.cpp:5150] Authenticating slave(1)@
192.168.0.169:5051
I1109 17:27:41.460211 27946 authenticator.cpp:100] Creating new server SASL
connection
E1109 17:27:41.460376 27950 process.cpp:1911] Failed to shutdown socket
with fd 11: Transport endpoint is not connected
W1109 17:27:41.460578 27947 master.cpp:5177] Failed to authenticate
slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
E1109 17:27:41.460695 27950 process.cpp:1911] Failed to shutdown socket
with fd 12: Transport endpoint is not connected
I1109 17:27:46.460510 27948 master.cpp:5150] Authenticating slave(1)@
192.168.0.169:5051
I1109 17:27:46.460930 27944 authenticator.cpp:100] Creating new server SASL
connection
E1109 17:27:46.461139 27950 process.cpp:1911] Failed to shutdown socket
with fd 11: Transport endpoint is not connected
W1109 17:27:46.461392 27944 master.cpp:5177] Failed to authenticate
slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
E1109 17:27:46.461444 27950 process.cpp:1911] Failed to shutdown socket
with fd 12: Transport endpoint is not connected
I1109 17:27:51.466349 27945 master.cpp:5150] Authenticating slave(1)@
192.168.0.169:5051
I1109 17:27:51.466747 27945 authenticator.cpp:100] Creating new server SASL
connection


Re: Mesos .26 failing on centos7

2015-11-09 Thread Pradeep Kiruvale
Thanks it helped me.

Regards,
Pradeep

On 9 November 2015 at 15:37, Plotka, Bartlomiej  wrote:

> I had the same issue (broken build) on Ubuntu 14.04.. Commit “cee4958”
> helped.
>
>
>
> *Kind Regards,*
>
> Bartek Plotka
>
>
>
> *From:* Jan Schlicht [mailto:j...@mesosphere.io]
> *Sent:* Monday, November 9, 2015 3:27 PM
> *To:* user@mesos.apache.org
> *Cc:* dev 
> *Subject:* Re: Mesos .26 failing on centos7
>
>
>
> There were some build errors due to some reverts in `registry_puller.cpp`.
> Your error logs hints that it may be related to this. They should be fixed
> now (with `cee4958`).
>
>
>
> On Mon, Nov 9, 2015 at 3:23 PM, haosdent  wrote:
>
> Could you show more details about error log? I could build current master
> branch in CentOS 7.
>
>
>
> On Mon, Nov 9, 2015 at 10:00 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
> Hi All,
>
>
>
> I am trying to compile mesos on Centos7, but its failing. Please let me
> know what is the reason.
>
>
>
> Find the logs below.
>
>
>
> Regards,
>
> Pradeep
>
>
>
> make[2]: ***
> [slave/containerizer/mesos/provisioner/docker/libmesos_no_3rdparty_la-registry_puller.lo]
> Error 1
>
> make[2]: *** Waiting for unfinished jobs
>
> mv -f
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-cpushare.Tpo
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-cpushare.Plo
>
> mv -f master/.deps/libmesos_no_3rdparty_la-master.Tpo
> master/.deps/libmesos_no_3rdparty_la-master.Plo
>
> mv -f java/jni/.deps/libjava_la-convert.Tpo
> java/jni/.deps/libjava_la-convert.Plo
>
> mv -f examples/.deps/libexamplemodule_la-example_module_impl.Tpo
> examples/.deps/libexamplemodule_la-example_module_impl.Plo
>
> mv -f
> slave/containerizer/mesos/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo
> slave/containerizer/mesos/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Plo
>
> mv -f
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-perf_event.Tpo
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-perf_event.Plo
>
> mv -f
> slave/containerizer/mesos/provisioner/backends/.deps/libmesos_no_3rdparty_la-bind.Tpo
> slave/containerizer/mesos/provisioner/backends/.deps/libmesos_no_3rdparty_la-bind.Plo
>
> mv -f
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-mem.Tpo
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-mem.Plo
>
> mv -f linux/.deps/libmesos_no_3rdparty_la-perf.Tpo
> linux/.deps/libmesos_no_3rdparty_la-perf.Plo
>
> mv -f
> slave/containerizer/.deps/libmesos_no_3rdparty_la-external_containerizer.Tpo
> slave/containerizer/.deps/libmesos_no_3rdparty_la-external_containerizer.Plo
>
> mv -f log/.deps/liblog_la-replica.Tpo log/.deps/liblog_la-replica.Plo
>
> mv -f slave/.deps/libmesos_no_3rdparty_la-slave.Tpo
> slave/.deps/libmesos_no_3rdparty_la-slave.Plo
>
> mv -f
> slave/containerizer/mesos/.deps/libmesos_no_3rdparty_la-containerizer.Tpo
> slave/containerizer/mesos/.deps/libmesos_no_3rdparty_la-containerizer.Plo
>
> mv -f
> slave/resource_estimators/.deps/libfixed_resource_estimator_la-fixed.Tpo
> slave/resource_estimators/.deps/libfixed_resource_estimator_la-fixed.Plo
>
> mv -f
> slave/containerizer/mesos/isolators/filesystem/.deps/libmesos_no_3rdparty_la-linux.Tpo
> slave/containerizer/mesos/isolators/filesystem/.deps/libmesos_no_3rdparty_la-linux.Plo
>
> mv -f log/.deps/liblog_la-coordinator.Tpo
> log/.deps/liblog_la-coordinator.Plo
>
> mv -f log/.deps/liblog_la-recover.Tpo log/.deps/liblog_la-recover.Plo
>
>
>
>
>
>
>
> --
>
> Best Regards,
>
> Haosdent Huang
>
>
>
>
>
> --
>
> *Jan Schlicht*
>
> Distributed Systems Engineer, Mesosphere
>
> -
>
> *Intel Technology Poland sp. z o.o.*ul. Słowackiego 173 | 80-298
> Gdańsk | Sąd Rejonowy Gdańsk Północ | VII Wydział
> Gospodarczy Krajowego Rejestru Sądowego - KRS 101882 | NIP
> 957-07-52-316 | Kapitał zakładowy 200.000 PLN.
>
> Ta wiadomość wraz z załącznikami jest przeznaczona dla
> określonego adresata i może zawierać informacje poufne. W razie
> przypadkowego otrzymania tej wiadomości, prosimy o powiadomienie
> nadawcy oraz trwałe jej usunięcie; jakiekolwiek przeglądanie
> lub rozpowszechnianie jest zabronione.
> This e-mail and any attachments may contain confidential material for the
> sole use of the intended recipient(s). If you are not the intended
> recipient, please contact the sender and delete all copies; any review or
> distribution by others is strictly prohibited.
>
>


Re: Mesos .26 failing on centos7

2015-11-09 Thread Pradeep Kiruvale
I just have only below logs.



libtool: compile:  g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\"
-DPACKAGE_VERSION=\"0.26.0\" "-DPACKAGE_STRING=\"mesos 0.26.0\""
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\"
-DVERSION=\"0.26.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1
-DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1
-DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1
-DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1
-DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" -DMESOS_HAS_PYTHON=1 -I.
-I../../src -Wall -Werror -DLIBDIR=\"/usr/local/lib\"
-DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\"
-DPKGDATADIR=\"/usr/local/share/mesos\" -I../../include
-I../../3rdparty/libprocess/include
-I../../3rdparty/libprocess/3rdparty/stout/include -I../include
-I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0
-I../3rdparty/libprocess/3rdparty/picojson-1.3.0 -DPICOJSON_USE_INT64
-D__STDC_FORMAT_MACROS -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src
-I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include
-I../3rdparty/zookeeper-3.4.5/src/c/generated
-I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src
-I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0
-pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT
v1/scheduler/libmesos_no_3rdparty_la-scheduler.pb.lo -MD -MP -MF
v1/scheduler/.deps/libmesos_no_3rdparty_la-scheduler.pb.Tpo -c v1/scheduler/
scheduler.pb.cc  -fPIC -DPIC -o
v1/scheduler/.libs/libmesos_no_3rdparty_la-scheduler.pb.o
make[2]: ***
[slave/containerizer/mesos/provisioner/docker/libmesos_no_3rdparty_la-registry_puller.lo]
Error 1
make[2]: *** Waiting for unfinished jobs
mv -f containerizer/.deps/libmesos_no_3rdparty_la-containerizer.pb.Tpo
containerizer/.deps/libmesos_no_3rdparty_la-containerizer.pb.Plo
mv -f zookeeper/.deps/libmesos_no_3rdparty_la-zookeeper.Tpo
zookeeper/.deps/libmesos_no_3rdparty_la-zookeeper.Plo
mv -f fetcher/.deps/libmesos_no_3rdparty_la-fetcher.pb.Tpo
fetcher/.deps/libmesos_no_3rdparty_la-fetcher.pb.Plo
mv -f maintenance/.deps/libmesos_no_3rdparty_la-maintenance.pb.Tpo
maintenance/.deps/libmesos_no_3rdparty_la-maintenance.pb.Plo
mv -f executor/.deps/libmesos_no_3rdparty_la-executor.pb.Tpo
executor/.deps/libmesos_no_3rdparty_la-executor.pb.Plo
mv -f v1/executor/.deps/libmesos_no_3rdparty_la-executor.pb.Tpo
v1/executor/.deps/libmesos_no_3rdparty_la-executor.pb.Plo
mv -f v1/scheduler/.deps/libmesos_no_3rdparty_la-scheduler.pb.Tpo
v1/scheduler/.deps/libmesos_no_3rdparty_la-scheduler.pb.Plo
mv -f
slave/containerizer/.deps/libmesos_no_3rdparty_la-external_containerizer.Tpo
slave/containerizer/.deps/libmesos_no_3rdparty_la-external_containerizer.Plo
mv -f zookeeper/.deps/libmesos_no_3rdparty_la-group.Tpo
zookeeper/.deps/libmesos_no_3rdparty_la-group.Plo
mv -f slave/.deps/libmesos_no_3rdparty_la-slave.Tpo
slave/.deps/libmesos_no_3rdparty_la-slave.Plo
mv -f
slave/containerizer/mesos/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo
slave/containerizer/mesos/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Plo
mv -f
slave/containerizer/mesos/.deps/libmesos_no_3rdparty_la-containerizer.Tpo
slave/containerizer/mesos/.deps/libmesos_no_3rdparty_la-containerizer.Plo
mv -f ../include/mesos/.deps/libmesos_no_3rdparty_la-mesos.pb.Tpo
../include/mesos/.deps/libmesos_no_3rdparty_la-mesos.pb.Plo
mv -f
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-cpushare.Tpo
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-cpushare.Plo
mv -f
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-perf_event.Tpo
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-perf_event.Plo
mv -f ../include/mesos/v1/.deps/libmesos_no_3rdparty_la-mesos.pb.Tpo
../include/mesos/v1/.deps/libmesos_no_3rdparty_la-mesos.pb.Plo
mv -f linux/.deps/libmesos_no_3rdparty_la-cgroups.Tpo
linux/.deps/libmesos_no_3rdparty_la-cgroups.Plo
mv -f
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-mem.Tpo
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-mem.Plo
mv -f linux/.deps/libmesos_no_3rdparty_la-perf.Tpo
linux/.deps/libmesos_no_3rdparty_la-perf.Plo
make[2]: Leaving directory `/root/DSKB/mesos/build/src'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/root/DSKB/mesos/build/src'
make: *** [all-recursive] Error 1


On 9 November 2015 at 15:23, haosdent  wrote:

> Could you show more details about error log? I

Mesos .26 failing on centos7

2015-11-09 Thread Pradeep Kiruvale
Hi All,

I am trying to compile mesos on Centos7, but its failing. Please let me
know what is the reason.

Find the logs below.

Regards,
Pradeep

make[2]: ***
[slave/containerizer/mesos/provisioner/docker/libmesos_no_3rdparty_la-registry_puller.lo]
Error 1
make[2]: *** Waiting for unfinished jobs
mv -f
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-cpushare.Tpo
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-cpushare.Plo
mv -f master/.deps/libmesos_no_3rdparty_la-master.Tpo
master/.deps/libmesos_no_3rdparty_la-master.Plo
mv -f java/jni/.deps/libjava_la-convert.Tpo
java/jni/.deps/libjava_la-convert.Plo
mv -f examples/.deps/libexamplemodule_la-example_module_impl.Tpo
examples/.deps/libexamplemodule_la-example_module_impl.Plo
mv -f
slave/containerizer/mesos/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo
slave/containerizer/mesos/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Plo
mv -f
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-perf_event.Tpo
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-perf_event.Plo
mv -f
slave/containerizer/mesos/provisioner/backends/.deps/libmesos_no_3rdparty_la-bind.Tpo
slave/containerizer/mesos/provisioner/backends/.deps/libmesos_no_3rdparty_la-bind.Plo
mv -f
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-mem.Tpo
slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-mem.Plo
mv -f linux/.deps/libmesos_no_3rdparty_la-perf.Tpo
linux/.deps/libmesos_no_3rdparty_la-perf.Plo
mv -f
slave/containerizer/.deps/libmesos_no_3rdparty_la-external_containerizer.Tpo
slave/containerizer/.deps/libmesos_no_3rdparty_la-external_containerizer.Plo
mv -f log/.deps/liblog_la-replica.Tpo log/.deps/liblog_la-replica.Plo
mv -f slave/.deps/libmesos_no_3rdparty_la-slave.Tpo
slave/.deps/libmesos_no_3rdparty_la-slave.Plo
mv -f
slave/containerizer/mesos/.deps/libmesos_no_3rdparty_la-containerizer.Tpo
slave/containerizer/mesos/.deps/libmesos_no_3rdparty_la-containerizer.Plo
mv -f
slave/resource_estimators/.deps/libfixed_resource_estimator_la-fixed.Tpo
slave/resource_estimators/.deps/libfixed_resource_estimator_la-fixed.Plo
mv -f
slave/containerizer/mesos/isolators/filesystem/.deps/libmesos_no_3rdparty_la-linux.Tpo
slave/containerizer/mesos/isolators/filesystem/.deps/libmesos_no_3rdparty_la-linux.Plo
mv -f log/.deps/liblog_la-coordinator.Tpo
log/.deps/liblog_la-coordinator.Plo
mv -f log/.deps/liblog_la-recover.Tpo log/.deps/liblog_la-recover.Plo


Re: Executor can't start d-bus communication

2015-10-14 Thread Pradeep Kiruvale
Hi Greg,

 I solved the issue. The problem was with credentials. I was running Master
and Slave as root and Framework as normal user.

 After I started all components as user, its worked.

I wrote the executor using the Executor APIs. By creating a new class.

Thanks for your reply.

Regards,
Pradeep


On 14 October 2015 at 17:18, Greg Mann  wrote:

> Hi Pradeep,
> Can you tell us a bit about your executor: in particular, did you write it
> against the Executor API
> <http://mesos.apache.org/documentation/latest/app-framework-development-guide/>,
> or is it a plain executable that gets run through the command executor?
>
> Cheers,
> Greg
>
> On Wed, Oct 14, 2015 at 1:26 AM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi All,
>>
>> I have written a new framework and an executor. I launch the task from
>> Framework and the task get pushed to the specific slave/executor based on
>> the resource availability. But on the executor side it has to connect to an
>> application which is running and providing the service through the d-bus
>> interface. But the executor fails to connect to the application through the
>> d-bus interfaces.
>>
>> But if I launch the d-bus connection from a standalone application it
>> works fine. But I cant connect from executor.
>>
>> Any idea how can I fix this?
>>
>> Thanks & Regards,
>> Pradeep
>>
>>
>


Executor can't start d-bus communication

2015-10-14 Thread Pradeep Kiruvale
Hi All,

I have written a new framework and an executor. I launch the task from
Framework and the task get pushed to the specific slave/executor based on
the resource availability. But on the executor side it has to connect to an
application which is running and providing the service through the d-bus
interface. But the executor fails to connect to the application through the
d-bus interfaces.

But if I launch the d-bus connection from a standalone application it works
fine. But I cant connect from executor.

Any idea how can I fix this?

Thanks & Regards,
Pradeep


Re: Running a task in Mesos cluster

2015-10-07 Thread Pradeep Kiruvale
lloon Framework (C++)' at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843
I1007 12:16:41.853281  8002 master.cpp:2250] Subscribing framework Balloon
Framework (C++) with checkpointing disabled and capabilities [  ]
E1007 12:16:41.853806  8007 process.cpp:1912] Failed to shutdown socket
with fd 13: Transport endpoint is not connected
I1007 12:16:41.853833  8004 hierarchical.hpp:515] Added framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0002
I1007 12:16:41.854032  8002 master.cpp:1119] Framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0002 (Balloon Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843 disconnected
I1007 12:16:41.854063  8002 master.cpp:2475] Disconnecting framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0002 (Balloon Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843
I1007 12:16:41.854076  8002 master.cpp:2499] Deactivating framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0002 (Balloon Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843
E1007 12:16:41.854080  8007 process.cpp:1912] Failed to shutdown socket
with fd 13: Transport endpoint is not connected
I1007 12:16:41.854126  8005 hierarchical.hpp:599] Deactivated framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0002
I1007 12:16:41.854121  8002 master.cpp:1143] Giving framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0002 (Balloon Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843 0ns to
failover
I1007 12:16:41.855482  8006 master.cpp:4815] Framework failover timeout,
removing framework 0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0002 (Balloon
Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843
I1007 12:16:41.855515  8006 master.cpp:5571] Removing framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0002 (Balloon Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843
I1007 12:16:41.855692  8001 hierarchical.hpp:552] Removed framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0002
I1007 12:16:42.772830  8000 master.cpp:2179] Received SUBSCRIBE call for
framework 'Balloon Framework (C++)' at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843
I1007 12:16:42.772974  8000 master.cpp:2250] Subscribing framework Balloon
Framework (C++) with checkpointing disabled and capabilities [  ]
I1007 12:16:42.773470  8004 hierarchical.hpp:515] Added framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0003
E1007 12:16:42.773495  8007 process.cpp:1912] Failed to shutdown socket
with fd 13: Transport endpoint is not connected
I1007 12:16:42.773679  8000 master.cpp:1119] Framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0003 (Balloon Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843 disconnected
I1007 12:16:42.773697  8000 master.cpp:2475] Disconnecting framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0003 (Balloon Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843
I1007 12:16:42.773708  8000 master.cpp:2499] Deactivating framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0003 (Balloon Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843
E1007 12:16:42.773710  8007 process.cpp:1912] Failed to shutdown socket
with fd 14: Transport endpoint is not connected
I1007 12:16:42.773761  8000 master.cpp:1143] Giving framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0003 (Balloon Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843 0ns to
failover
I1007 12:16:42.773779  8001 hierarchical.hpp:599] Deactivated framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0003
I1007 12:16:42.775089  8005 master.cpp:4815] Framework failover timeout,
removing framework 0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0003 (Balloon
Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843
I1007 12:16:42.775126  8005 master.cpp:5571] Removing framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0003 (Balloon Framework (C++)) at
scheduler-a8deafaa-cf10-401c-a61c-515340560c49@127.0.1.1:58843
I1007 12:16:42.775324  8005 hierarchical.hpp:552] Removed framework
0ccab17d-20e8-4ab8-9de4-ae60691f8c8e-0003
I1007 12:16:47.665941  8001 http.cpp:336] HTTP GET for /master/state.json
from 192.168.0.102:40722 with User-Agent='Mozilla/5.0 (X11; Linux x86_64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.52 Safari/537.36'


On 7 October 2015 at 12:12, Guangya Liu  wrote:

> Hi Pradeep,
>
> Can you please append more log for your master node? Just want to see what
> is wrong with your master, why the framework start to failover?
>
> Thanks,
>
> Guangya
>
> On Wed, Oct 7, 2015 at 5:27 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Guangya,
>>
>> I am running a frame work from some other physical node, which is part of
>> the same network. Still I am getting below messages and the framework not
>> 

Re: Running a task in Mesos cluster

2015-10-07 Thread Pradeep Kiruvale
Hi Guangya,

I am running a frame work from some other physical node, which is part of
the same network. Still I am getting below messages and the framework not
getting registered.

Any idea what is the reason?

I1007 11:24:58.781914 32392 master.cpp:4815] Framework failover timeout,
removing framework 89b179d8-9fb7-4a61-ad03-a9a5525482ff-0019 (Balloon
Framework (C++)) at
scheduler-3848d80c-8d27-48e0-a6b7-7e1678d5401d@127.0.1.1:54203
I1007 11:24:58.781968 32392 master.cpp:5571] Removing framework
89b179d8-9fb7-4a61-ad03-a9a5525482ff-0019 (Balloon Framework (C++)) at
scheduler-3848d80c-8d27-48e0-a6b7-7e1678d5401d@127.0.1.1:54203
I1007 11:24:58.782352 32392 hierarchical.hpp:552] Removed framework
89b179d8-9fb7-4a61-ad03-a9a5525482ff-0019
E1007 11:24:58.782577 32399 process.cpp:1912] Failed to shutdown socket
with fd 13: Transport endpoint is not connected
I1007 11:24:59.699587 32396 master.cpp:2179] Received SUBSCRIBE call for
framework 'Balloon Framework (C++)' at
scheduler-3848d80c-8d27-48e0-a6b7-7e1678d5401d@127.0.1.1:54203
I1007 11:24:59.699717 32396 master.cpp:2250] Subscribing framework Balloon
Framework (C++) with checkpointing disabled and capabilities [  ]
I1007 11:24:59.700251 32393 hierarchical.hpp:515] Added framework
89b179d8-9fb7-4a61-ad03-a9a5525482ff-0020
E1007 11:24:59.700253 32399 process.cpp:1912] Failed to shutdown socket
with fd 13: Transport endpoint is not connected


Regards,
Pradeep


On 5 October 2015 at 13:51, Guangya Liu  wrote:

> Hi Pradeep,
>
> I think that the problem might be caused by that you are running the lxc
> container on master node and not sure if there are any port conflict or
> what else wrong.
>
> For my case, I was running the client in a new node but not on master
> node, perhaps you can have a try to put your client on a new node but not
> on master node.
>
> Thanks,
>
> Guangya
>
>
> On Mon, Oct 5, 2015 at 7:30 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Guangya,
>>
>> Hmm!...That is strange in my case!
>>
>> If I run from the mesos-execute on one of the slave/master node then the
>> tasks get their resources and they get scheduled well.
>> But if I start the mesos-execute on another node which is neither
>> slave/master then I have this issue.
>>
>> I am using an lxc container on master as a client to launch the tasks.
>> This is also in the same network as master/slaves.
>> And I just launch the task as you did. But the tasks are not getting
>> scheduled.
>>
>>
>> On master the logs are same as I sent you before
>>
>> Deactivating framework 77539063-89ce-4efa-a20b-ca788abbd912-0066
>>
>> On both of the slaves I can see the below logs
>>
>> I1005 13:23:32.547987  4831 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0060 by master@192.168.0.102:5050
>> W1005 13:23:32.548135  4831 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0060
>> I1005 13:23:33.697707  4833 slave.cpp:3926] Current disk usage 3.60%. Max
>> allowed age: 6.047984349521910days
>> I1005 13:23:34.098599  4829 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0061 by master@192.168.0.102:5050
>> W1005 13:23:34.098740  4829 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0061
>> I1005 13:23:35.274569  4831 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0062 by master@192.168.0.102:5050
>> W1005 13:23:35.274683  4831 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0062
>> I1005 13:23:36.193964  4829 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0063 by master@192.168.0.102:5050
>> W1005 13:23:36.194090  4829 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0063
>> I1005 13:24:01.914788  4827 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0064 by master@192.168.0.102:5050
>> W1005 13:24:01.914937  4827 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0064
>> I1005 13:24:03.469974  4833 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0065 by master@192.168.0.102:5050
>> W1005 13:24:03.470118  4833 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0065
>> I1005 13:24:04.642654  4826 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0066 by master@192.168.0.102:5050
>> W1005 13:24:04.642812  4826 slave

Re: Scheduling tasks based on dependancy

2015-10-07 Thread Pradeep Kiruvale
Hi Sharma,

Thanks for the clarification. Now I got it!

Regards,
Pradeep

On 6 October 2015 at 19:35, Sharma Podila  wrote:

> Pradeep, attributes show up as name value pairs in the offers. Custom
> attributes can also be used in Fenzo for assignment optimizations. For
> example, we set custom attributes for AWS EC2 ZONE names and ASG names. We
> use the ZONE name custom attribute to balance tasks of a job across zones
> via the built in constraint plugin, BalancedHostAttributeConstraint
> <https://github.com/Netflix/Fenzo/blob/master/fenzo-core/src/main/java/com/netflix/fenzo/plugins/BalancedHostAttrConstraint.java>
>
>
> On Tue, Oct 6, 2015 at 4:03 AM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Guangya,
>>
>> One doubt about the  --attributes=rackid:r1;groupid:g1 option.
>>
>> How does the master provisions the resources? How will be the resource
>> offer?
>>
>> Is it like (Rack 1 , G1, System)? how does this way of  doing resource
>> offer will help?
>>
>> Can you please give me more information?
>>
>>
>> -Pradeep
>>
>>
>>
>> On 5 October 2015 at 17:45, Guangya Liu  wrote:
>>
>>> Hi Pradeep,
>>>
>>> I think that you can try Chronos and Marathon which can help you.
>>>
>>> *Marathon:* https://github.com/mesosphere/marathon
>>> You can try Marathon + Mesos + Mesos Resource Attribute
>>>
>>> When you start up mesos slave, uses --attributes option, here is an
>>> example:
>>> ./bin/mesos-slave.sh --master=9.21.61.21:5050 --quiet
>>> --log_dir=/tmp/mesos --attributes=rackid:r1;groupid:g1
>>> This basically defines two attributes for this mesos slave host. rackid
>>> with value r1 and groupid with value g1.
>>>
>>> marathon start -i "like_test" -C "sleep 100" -n 4 -c 1 -m 50 -o
>>> "rackid:LIKE:r1"
>>>
>>> this will place applications on the slave node whose rackid is r1
>>>
>>> *Chronos:* https://github.com/mesos/chronos , Chronos supports the
>>> definition of jobs triggered by the completion of other jobs. It supports
>>> arbitrarily long dependency chains.
>>>
>>> Thanks,
>>>
>>> Guangya
>>>
>>> On Mon, Oct 5, 2015 at 11:21 PM, Pradeep Kiruvale <
>>> pradeepkiruv...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Are there any frameworks that exists with the Mesos to schedule the
>>>> bigger apps?
>>>> I mean to say scheduling a app which has many services and will not fit
>>>> into one physical node.
>>>>
>>>> Is there any frame work that can be used to
>>>>  schedule tasks based on the underlying hardware constraints like
>>>> Network bandwidth ?
>>>>
>>>  Schedule the tasks based on their dependencies and proximity to each
>>>> other in a cluster or a rack?
>>>>
>>>> Thanks & Regards,
>>>> Pradeep
>>>>
>>>
>>>
>>
>


Re: ARM64 version Mesos

2015-10-06 Thread Pradeep Kiruvale
Hi Haosdent,

Thanks..I will have a look.

-Pradeep

On 6 October 2015 at 18:22, haosdent  wrote:

> I think this issue maybe helpful for you.
> https://issues.apache.org/jira/browse/MESOS-2786 The patches maybe out of
> date.
>
> On Wed, Oct 7, 2015 at 12:13 AM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi All,
>>
>> Is there a Mesos that runs on ARM64? I just tried compiling, its not
>> working, has some issues.
>>
>> Please let me know if some one already tried on ARM64. I am trying mesos
>> on a physical box, ubuntu running on it.
>>
>> Regards,
>> Pradeep
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: ARM64 version Mesos

2015-10-06 Thread Pradeep Kiruvale
As of now I am using Applied Micro box.
We also have AMD (Seatle) box, Cavium Thunder X and our own home grown ARM
boxes.

-Pradeep

On 6 October 2015 at 18:24, Michael Schenck  wrote:

> out of curiosity, which ARM64 server(s) are you guys using?
>
> On Tue, Oct 6, 2015 at 12:22 PM, haosdent  wrote:
>
>> I think this issue maybe helpful for you.
>> https://issues.apache.org/jira/browse/MESOS-2786 The patches maybe out
>> of date.
>>
>> On Wed, Oct 7, 2015 at 12:13 AM, Pradeep Kiruvale <
>> pradeepkiruv...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Is there a Mesos that runs on ARM64? I just tried compiling, its not
>>> working, has some issues.
>>>
>>> Please let me know if some one already tried on ARM64. I am trying mesos
>>> on a physical box, ubuntu running on it.
>>>
>>> Regards,
>>> Pradeep
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


ARM64 version Mesos

2015-10-06 Thread Pradeep Kiruvale
Hi All,

Is there a Mesos that runs on ARM64? I just tried compiling, its not
working, has some issues.

Please let me know if some one already tried on ARM64. I am trying mesos on
a physical box, ubuntu running on it.

Regards,
Pradeep


Re: Scheduling tasks based on dependancy

2015-10-06 Thread Pradeep Kiruvale
Hi Guangya,

One doubt about the  --attributes=rackid:r1;groupid:g1 option.

How does the master provisions the resources? How will be the resource
offer?

Is it like (Rack 1 , G1, System)? how does this way of  doing resource
offer will help?

Can you please give me more information?


-Pradeep



On 5 October 2015 at 17:45, Guangya Liu  wrote:

> Hi Pradeep,
>
> I think that you can try Chronos and Marathon which can help you.
>
> *Marathon:* https://github.com/mesosphere/marathon
> You can try Marathon + Mesos + Mesos Resource Attribute
>
> When you start up mesos slave, uses --attributes option, here is an
> example:
> ./bin/mesos-slave.sh --master=9.21.61.21:5050 --quiet
> --log_dir=/tmp/mesos --attributes=rackid:r1;groupid:g1
> This basically defines two attributes for this mesos slave host. rackid
> with value r1 and groupid with value g1.
>
> marathon start -i "like_test" -C "sleep 100" -n 4 -c 1 -m 50 -o
> "rackid:LIKE:r1"
>
> this will place applications on the slave node whose rackid is r1
>
> *Chronos:* https://github.com/mesos/chronos , Chronos supports the
> definition of jobs triggered by the completion of other jobs. It supports
> arbitrarily long dependency chains.
>
> Thanks,
>
> Guangya
>
> On Mon, Oct 5, 2015 at 11:21 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi All,
>>
>> Are there any frameworks that exists with the Mesos to schedule the
>> bigger apps?
>> I mean to say scheduling a app which has many services and will not fit
>> into one physical node.
>>
>> Is there any frame work that can be used to
>>  schedule tasks based on the underlying hardware constraints like Network
>> bandwidth ?
>>
>  Schedule the tasks based on their dependencies and proximity to each
>> other in a cluster or a rack?
>>
>> Thanks & Regards,
>> Pradeep
>>
>
>


Re: Scheduling tasks based on dependancy

2015-10-06 Thread Pradeep Kiruvale
Hi Sharma,

Is this how you collect the network info from the VMs?

First you get the resource offers from the Mesos and then you collect the
network bandwidth info and then you use that for assigning for your tasks?
Or
The mesos-slave collects the resource information? But I don't see any code
to that and also the existing mesos-slave does not collects these resource
information by itself.

Am I missing something here?

Regards,
Pradeep

On 5 October 2015 at 18:28, Sharma Podila  wrote:

> Pradeep,
>
> We recently open sourced Fenzo <https://github.com/Netflix/Fenzo> (wiki
> <https://github.com/Netflix/Fenzo/wiki>) to handle these scenarios. We
> add a custom attribute for network bandwidth for each agent's "mesos-slave"
> command line. And we have Fenzo assign resources to tasks based on CPU,
> memory, disk, ports, and network bandwidth requirements. With Fenzo you can
> define affinity, locality, and any other custom scheduling objectives using
> plugins. Some of the plugins are already built in. It is also easy to add
> additional plugins to cover other objectives you care about.
>
> "Dependencies" can mean multiple things. Do you mean dependencies on
> certain attributes of resources/agents? Dependencies on where other tasks
> are assigned? All of these are covered. However, if you mean workflow type
> of dependencies on completion of other tasks, then, there are no built in
> plugins. You could write one using Fenzo. It is also common for such
> workflow dependencies to be covered by an entity external to the scheduler.
> Both techniques can be made to work.
>
> Fenzo has the concept of hard Vs soft constraints. You could specify, for
> example, resource affinity and/or task locality as a soft constraint or a
> hard constraint. See the wiki docs link I provided above for details.
>
> Sharma
>
>
> On Mon, Oct 5, 2015 at 8:21 AM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi All,
>>
>> Are there any frameworks that exists with the Mesos to schedule the
>> bigger apps?
>> I mean to say scheduling a app which has many services and will not fit
>> into one physical node.
>>
>> Is there any frame work that can be used to
>>  schedule tasks based on the underlying hardware constraints like Network
>> bandwidth ?
>>  Schedule the tasks based on their dependencies and proximity to each
>> other in a cluster or a rack?
>>
>> Thanks & Regards,
>> Pradeep
>>
>
>


Re: Scheduling tasks based on dependancy

2015-10-06 Thread Pradeep Kiruvale
Hi Sharma,

Awesome!. This is what I was looking for. Thanks for the reply.

I will have a look for more info in wiki.

Regards,
Pradeep

On 5 October 2015 at 18:28, Sharma Podila  wrote:

> Pradeep,
>
> We recently open sourced Fenzo <https://github.com/Netflix/Fenzo> (wiki
> <https://github.com/Netflix/Fenzo/wiki>) to handle these scenarios. We
> add a custom attribute for network bandwidth for each agent's "mesos-slave"
> command line. And we have Fenzo assign resources to tasks based on CPU,
> memory, disk, ports, and network bandwidth requirements. With Fenzo you can
> define affinity, locality, and any other custom scheduling objectives using
> plugins. Some of the plugins are already built in. It is also easy to add
> additional plugins to cover other objectives you care about.
>
> "Dependencies" can mean multiple things. Do you mean dependencies on
> certain attributes of resources/agents? Dependencies on where other tasks
> are assigned? All of these are covered. However, if you mean workflow type
> of dependencies on completion of other tasks, then, there are no built in
> plugins. You could write one using Fenzo. It is also common for such
> workflow dependencies to be covered by an entity external to the scheduler.
> Both techniques can be made to work.
>
> Fenzo has the concept of hard Vs soft constraints. You could specify, for
> example, resource affinity and/or task locality as a soft constraint or a
> hard constraint. See the wiki docs link I provided above for details.
>
> Sharma
>
>
> On Mon, Oct 5, 2015 at 8:21 AM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi All,
>>
>> Are there any frameworks that exists with the Mesos to schedule the
>> bigger apps?
>> I mean to say scheduling a app which has many services and will not fit
>> into one physical node.
>>
>> Is there any frame work that can be used to
>>  schedule tasks based on the underlying hardware constraints like Network
>> bandwidth ?
>>  Schedule the tasks based on their dependencies and proximity to each
>> other in a cluster or a rack?
>>
>> Thanks & Regards,
>> Pradeep
>>
>
>


Re: Scheduling tasks based on dependancy

2015-10-05 Thread Pradeep Kiruvale
Hi  Guangya,

Thanks for information.

Regards,
Pradeep

On 5 October 2015 at 17:57, Guangya Liu  wrote:

> Pradeep,
>
> There is a JIRA ticket doing your requirement but not finished yet, please
> refer to https://issues.apache.org/jira/browse/MESOS-3366 for detail. The
> basic idea is using customized hook modules to collect customized metrics.
>
> For now, you have to set metrics/resource manually for each slave to
> workaround.
>
> Thanks,
>
> Guangya
>
> On Mon, Oct 5, 2015 at 11:49 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Guangya,
>>
>> Thanks for your reply. That is nice feature to group the slaves into
>> different racks and etc...But is there any way I can get metric of other
>> hardware
>> features other than CPU,MEM, DISK like IO, PCI devices that exists with
>> the node etc?
>>
>> Thanks,
>> Pradeep
>>
>> On 5 October 2015 at 17:45, Guangya Liu  wrote:
>>
>>> Hi Pradeep,
>>>
>>> I think that you can try Chronos and Marathon which can help you.
>>>
>>> *Marathon:* https://github.com/mesosphere/marathon
>>> You can try Marathon + Mesos + Mesos Resource Attribute
>>>
>>> When you start up mesos slave, uses --attributes option, here is an
>>> example:
>>> ./bin/mesos-slave.sh --master=9.21.61.21:5050 --quiet
>>> --log_dir=/tmp/mesos --attributes=rackid:r1;groupid:g1
>>> This basically defines two attributes for this mesos slave host. rackid
>>> with value r1 and groupid with value g1.
>>>
>>> marathon start -i "like_test" -C "sleep 100" -n 4 -c 1 -m 50 -o
>>> "rackid:LIKE:r1"
>>>
>>> this will place applications on the slave node whose rackid is r1
>>>
>>> *Chronos:* https://github.com/mesos/chronos , Chronos supports the
>>> definition of jobs triggered by the completion of other jobs. It supports
>>> arbitrarily long dependency chains.
>>>
>>> Thanks,
>>>
>>> Guangya
>>>
>>> On Mon, Oct 5, 2015 at 11:21 PM, Pradeep Kiruvale <
>>> pradeepkiruv...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Are there any frameworks that exists with the Mesos to schedule the
>>>> bigger apps?
>>>> I mean to say scheduling a app which has many services and will not fit
>>>> into one physical node.
>>>>
>>>> Is there any frame work that can be used to
>>>>  schedule tasks based on the underlying hardware constraints like
>>>> Network bandwidth ?
>>>>
>>>  Schedule the tasks based on their dependencies and proximity to each
>>>> other in a cluster or a rack?
>>>>
>>>> Thanks & Regards,
>>>> Pradeep
>>>>
>>>
>>>
>>
>


Re: Scheduling tasks based on dependancy

2015-10-05 Thread Pradeep Kiruvale
Hi Guangya,

Thanks for your reply. That is nice feature to group the slaves into
different racks and etc...But is there any way I can get metric of other
hardware
features other than CPU,MEM, DISK like IO, PCI devices that exists with the
node etc?

Thanks,
Pradeep

On 5 October 2015 at 17:45, Guangya Liu  wrote:

> Hi Pradeep,
>
> I think that you can try Chronos and Marathon which can help you.
>
> *Marathon:* https://github.com/mesosphere/marathon
> You can try Marathon + Mesos + Mesos Resource Attribute
>
> When you start up mesos slave, uses --attributes option, here is an
> example:
> ./bin/mesos-slave.sh --master=9.21.61.21:5050 --quiet
> --log_dir=/tmp/mesos --attributes=rackid:r1;groupid:g1
> This basically defines two attributes for this mesos slave host. rackid
> with value r1 and groupid with value g1.
>
> marathon start -i "like_test" -C "sleep 100" -n 4 -c 1 -m 50 -o
> "rackid:LIKE:r1"
>
> this will place applications on the slave node whose rackid is r1
>
> *Chronos:* https://github.com/mesos/chronos , Chronos supports the
> definition of jobs triggered by the completion of other jobs. It supports
> arbitrarily long dependency chains.
>
> Thanks,
>
> Guangya
>
> On Mon, Oct 5, 2015 at 11:21 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi All,
>>
>> Are there any frameworks that exists with the Mesos to schedule the
>> bigger apps?
>> I mean to say scheduling a app which has many services and will not fit
>> into one physical node.
>>
>> Is there any frame work that can be used to
>>  schedule tasks based on the underlying hardware constraints like Network
>> bandwidth ?
>>
>  Schedule the tasks based on their dependencies and proximity to each
>> other in a cluster or a rack?
>>
>> Thanks & Regards,
>> Pradeep
>>
>
>


Scheduling tasks based on dependancy

2015-10-05 Thread Pradeep Kiruvale
Hi All,

Are there any frameworks that exists with the Mesos to schedule the bigger
apps?
I mean to say scheduling a app which has many services and will not fit
into one physical node.

Is there any frame work that can be used to
 schedule tasks based on the underlying hardware constraints like Network
bandwidth ?
 Schedule the tasks based on their dependencies and proximity to each other
in a cluster or a rack?

Thanks & Regards,
Pradeep


Re: Running a task in Mesos cluster

2015-10-05 Thread Pradeep Kiruvale
Hi Guangya,

Thanks for the reply.

I also think the same. I found one of this old e-mail thread where in the
same thing was discussed.

He set up a client on a separate physical system, then it started working
fine.

I will also try and see.

Regards,
Pradeep


On 5 October 2015 at 13:51, Guangya Liu  wrote:

> Hi Pradeep,
>
> I think that the problem might be caused by that you are running the lxc
> container on master node and not sure if there are any port conflict or
> what else wrong.
>
> For my case, I was running the client in a new node but not on master
> node, perhaps you can have a try to put your client on a new node but not
> on master node.
>
> Thanks,
>
> Guangya
>
>
> On Mon, Oct 5, 2015 at 7:30 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Guangya,
>>
>> Hmm!...That is strange in my case!
>>
>> If I run from the mesos-execute on one of the slave/master node then the
>> tasks get their resources and they get scheduled well.
>> But if I start the mesos-execute on another node which is neither
>> slave/master then I have this issue.
>>
>> I am using an lxc container on master as a client to launch the tasks.
>> This is also in the same network as master/slaves.
>> And I just launch the task as you did. But the tasks are not getting
>> scheduled.
>>
>>
>> On master the logs are same as I sent you before
>>
>> Deactivating framework 77539063-89ce-4efa-a20b-ca788abbd912-0066
>>
>> On both of the slaves I can see the below logs
>>
>> I1005 13:23:32.547987  4831 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0060 by master@192.168.0.102:5050
>> W1005 13:23:32.548135  4831 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0060
>> I1005 13:23:33.697707  4833 slave.cpp:3926] Current disk usage 3.60%. Max
>> allowed age: 6.047984349521910days
>> I1005 13:23:34.098599  4829 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0061 by master@192.168.0.102:5050
>> W1005 13:23:34.098740  4829 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0061
>> I1005 13:23:35.274569  4831 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0062 by master@192.168.0.102:5050
>> W1005 13:23:35.274683  4831 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0062
>> I1005 13:23:36.193964  4829 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0063 by master@192.168.0.102:5050
>> W1005 13:23:36.194090  4829 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0063
>> I1005 13:24:01.914788  4827 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0064 by master@192.168.0.102:5050
>> W1005 13:24:01.914937  4827 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0064
>> I1005 13:24:03.469974  4833 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0065 by master@192.168.0.102:5050
>> W1005 13:24:03.470118  4833 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0065
>> I1005 13:24:04.642654  4826 slave.cpp:1980] Asked to shut down framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0066 by master@192.168.0.102:5050
>> W1005 13:24:04.642812  4826 slave.cpp:1995] Cannot shut down unknown
>> framework 77539063-89ce-4efa-a20b-ca788abbd912-0066
>>
>>
>>
>> On 5 October 2015 at 13:09, Guangya Liu  wrote:
>>
>>> Hi Pradeep,
>>>
>>> From your log, seems that the master process is exiting and this caused
>>> the framework fail over to another mesos master. Can you please show more
>>> detail for your issue reproduced steps?
>>>
>>> I did some test by running mesos-execute on a client host which does not
>>> have any mesos service and the task can schedule well.
>>>
>>> root@mesos008:~/src/mesos/m1/mesos/build# ./src/mesos-execute --master=
>>> 192.168.0.107:5050 --name="cluster-test" --command="/bin/sleep 10"
>>> --resources="cpus(*):1;mem(*):256"
>>> I1005 18:59:47.974123  1233 sched.cpp:164] Version: 0.26.0
>>> I1005 18:59:47.990890  1248 sched.cpp:262] New master detected at
>>> master@192.168.0.107:5050
>>> I1005 18:59:47.993074  1248 sched.cpp:272] No credentials provided.
>>> Attempting to register wit

Re: Running a task in Mesos cluster

2015-10-05 Thread Pradeep Kiruvale
Hi Guangya,

Hmm!...That is strange in my case!

If I run from the mesos-execute on one of the slave/master node then the
tasks get their resources and they get scheduled well.
But if I start the mesos-execute on another node which is neither
slave/master then I have this issue.

I am using an lxc container on master as a client to launch the tasks. This
is also in the same network as master/slaves.
And I just launch the task as you did. But the tasks are not getting
scheduled.


On master the logs are same as I sent you before

Deactivating framework 77539063-89ce-4efa-a20b-ca788abbd912-0066

On both of the slaves I can see the below logs

I1005 13:23:32.547987  4831 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0060 by master@192.168.0.102:5050
W1005 13:23:32.548135  4831 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0060
I1005 13:23:33.697707  4833 slave.cpp:3926] Current disk usage 3.60%. Max
allowed age: 6.047984349521910days
I1005 13:23:34.098599  4829 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0061 by master@192.168.0.102:5050
W1005 13:23:34.098740  4829 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0061
I1005 13:23:35.274569  4831 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0062 by master@192.168.0.102:5050
W1005 13:23:35.274683  4831 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0062
I1005 13:23:36.193964  4829 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0063 by master@192.168.0.102:5050
W1005 13:23:36.194090  4829 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0063
I1005 13:24:01.914788  4827 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0064 by master@192.168.0.102:5050
W1005 13:24:01.914937  4827 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0064
I1005 13:24:03.469974  4833 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0065 by master@192.168.0.102:5050
W1005 13:24:03.470118  4833 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0065
I1005 13:24:04.642654  4826 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0066 by master@192.168.0.102:5050
W1005 13:24:04.642812  4826 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0066



On 5 October 2015 at 13:09, Guangya Liu  wrote:

> Hi Pradeep,
>
> From your log, seems that the master process is exiting and this caused
> the framework fail over to another mesos master. Can you please show more
> detail for your issue reproduced steps?
>
> I did some test by running mesos-execute on a client host which does not
> have any mesos service and the task can schedule well.
>
> root@mesos008:~/src/mesos/m1/mesos/build# ./src/mesos-execute --master=
> 192.168.0.107:5050 --name="cluster-test" --command="/bin/sleep 10"
> --resources="cpus(*):1;mem(*):256"
> I1005 18:59:47.974123  1233 sched.cpp:164] Version: 0.26.0
> I1005 18:59:47.990890  1248 sched.cpp:262] New master detected at
> master@192.168.0.107:5050
> I1005 18:59:47.993074  1248 sched.cpp:272] No credentials provided.
> Attempting to register without authentication
> I1005 18:59:48.001194  1249 sched.cpp:641] Framework registered with
> 04b9af5e-e9b6-4c59-8734-eba407163922-0002
> Framework registered with 04b9af5e-e9b6-4c59-8734-eba407163922-0002
> task cluster-test submitted to slave
> c0e5fdde-595e-4768-9d04-25901d4523b6-S0
> Received status update TASK_RUNNING for task cluster-test
> Received status update TASK_FINISHED for task cluster-test
> I1005 18:59:58.431144  1249 sched.cpp:1771] Asked to stop the driver
> I1005 18:59:58.431591  1249 sched.cpp:1040] Stopping framework
> '04b9af5e-e9b6-4c59-8734-eba407163922-0002'
> root@mesos008:~/src/mesos/m1/mesos/build# ps -ef | grep mesos
> root      1259  1159  0 19:06 pts/000:00:00 grep --color=auto mesos
>
> Thanks,
>
> Guangya
>
>
> On Mon, Oct 5, 2015 at 6:50 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Guangya,
>>
>> I am facing one more issue. If I try to schedule the tasks from some
>> external client system running the same cli mesos-execute.
>> The tasks are not getting launched. The tasks reach the Master and it
>> just drops the requests, below are the logs related to that
>>
>> I1005 11:33:35.025594 21369 master.cpp:2250] Subscribing framework  with
>> checkpointing disabled and capabilities [  ]
>> E1005 11:33:35.026100 21373 process.cpp:1912] Failed to shutdow

Re: Running a task in Mesos cluster

2015-10-05 Thread Pradeep Kiruvale
Hi Guangya,

I am facing one more issue. If I try to schedule the tasks from some
external client system running the same cli mesos-execute.
The tasks are not getting launched. The tasks reach the Master and it just
drops the requests, below are the logs related to that

I1005 11:33:35.025594 21369 master.cpp:2250] Subscribing framework  with
checkpointing disabled and capabilities [  ]
E1005 11:33:35.026100 21373 process.cpp:1912] Failed to shutdown socket
with fd 14: Transport endpoint is not connected
I1005 11:33:35.026129 21372 hierarchical.hpp:515] Added framework
77539063-89ce-4efa-a20b-ca788abbd912-0055
I1005 11:33:35.026298 21369 master.cpp:1119] Framework
77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259 disconnected
I1005 11:33:35.026329 21369 master.cpp:2475] Disconnecting framework
77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259
I1005 11:33:35.026340 21369 master.cpp:2499] Deactivating framework
77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259
E1005 11:33:35.026345 21373 process.cpp:1912] Failed to shutdown socket
with fd 14: Transport endpoint is not connected
I1005 11:33:35.026376 21369 master.cpp:1143] Giving framework
77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259 0ns to
failover
I1005 11:33:35.026743 21372 hierarchical.hpp:599] Deactivated framework
77539063-89ce-4efa-a20b-ca788abbd912-0055
W1005 11:33:35.026757 21368 master.cpp:4828] Master returning resources
offered to framework 77539063-89ce-4efa-a20b-ca788abbd912-0055 because the
framework has terminated or is inactive
I1005 11:33:35.027014 21371 hierarchical.hpp:1103] Recovered cpus(*):8;
mem(*):14868; disk(*):218835; ports(*):[31000-32000] (total: cpus(*):8;
mem(*):14868; disk(*):218835; ports(*):[31000-32000], allocated: ) on slave
77539063-89ce-4efa-a20b-ca788abbd912-S2 from framework
77539063-89ce-4efa-a20b-ca788abbd912-0055
I1005 11:33:35.027159 21371 hierarchical.hpp:1103] Recovered cpus(*):8;
mem(*):14930; disk(*):218578; ports(*):[31000-32000] (total: cpus(*):8;
mem(*):14930; disk(*):218578; ports(*):[31000-32000], allocated: ) on slave
77539063-89ce-4efa-a20b-ca788abbd912-S1 from framework
77539063-89ce-4efa-a20b-ca788abbd912-0055
I1005 11:33:35.027668 21366 master.cpp:4815] Framework failover timeout,
removing framework 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259
I1005 11:33:35.027715 21366 master.cpp:5571] Removing framework
77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259


Can you please tell me what is the reason? The client is in the same
network as well. But it does not run any master or slave processes.

Thanks & Regards,
Pradeeep

On 5 October 2015 at 12:13, Guangya Liu  wrote:

> Hi Pradeep,
>
> Glad it finally works! Not sure if you are using systemd.slice or not, are
> you running to this issue:
> https://issues.apache.org/jira/browse/MESOS-1195
>
> Hope Jie Yu can give you some help on this ;-)
>
> Thanks,
>
> Guangya
>
> On Mon, Oct 5, 2015 at 5:25 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Guangya,
>>
>>
>> Thanks for sharing the information.
>>
>> Now I could launch the tasks. The problem was with the permission. If I
>> start all the slaves and Master as root it works fine.
>> Else I have problem with launching the tasks.
>>
>> But on one of the slave I could not launch the slave as root, I am facing
>> the following issue.
>>
>> Failed to create a containerizer: Could not create MesosContainerizer:
>> Failed to create launcher: Failed to create Linux launcher: Failed to mount
>> cgroups hierarchy at '/sys/fs/cgroup/freezer': 'freezer' is already
>> attached to another hierarchy
>>
>> I took that out from the cluster for now. The tasks are getting scheduled
>> on the other two slave nodes.
>>
>> Thanks for your timely help
>>
>> -Pradeep
>>
>> On 5 October 2015 at 10:54, Guangya Liu  wrote:
>>
>>> Hi Pradeep,
>>>
>>> My steps was pretty simple just as
>>> https://github.com/apache/mesos/blob/master/docs/getting-started.md#examples
>>>
>>> On Master node: root@mesos1:~/src/mesos/m1/mesos/build# GLOG_v=1
>>>  ./bin/mesos-master.sh --ip=192.168.0.107 --work_dir=/var/lib/mesos
>>> On 3 Slave node: root@mesos007:~/src/mesos/m1/mesos/build# GLOG_v=1
>>> ./bin/mesos-slave.sh --master=192.168.0.107:5050
>>>
>>> Then schedule a task on any of the node, here I was using slave node
>>> mes

Re: Running a task in Mesos cluster

2015-10-05 Thread Pradeep Kiruvale
Hi Guangya,


Thanks for sharing the information.

Now I could launch the tasks. The problem was with the permission. If I
start all the slaves and Master as root it works fine.
Else I have problem with launching the tasks.

But on one of the slave I could not launch the slave as root, I am facing
the following issue.

Failed to create a containerizer: Could not create MesosContainerizer:
Failed to create launcher: Failed to create Linux launcher: Failed to mount
cgroups hierarchy at '/sys/fs/cgroup/freezer': 'freezer' is already
attached to another hierarchy

I took that out from the cluster for now. The tasks are getting scheduled
on the other two slave nodes.

Thanks for your timely help

-Pradeep

On 5 October 2015 at 10:54, Guangya Liu  wrote:

> Hi Pradeep,
>
> My steps was pretty simple just as
> https://github.com/apache/mesos/blob/master/docs/getting-started.md#examples
>
> On Master node: root@mesos1:~/src/mesos/m1/mesos/build# GLOG_v=1
>  ./bin/mesos-master.sh --ip=192.168.0.107 --work_dir=/var/lib/mesos
> On 3 Slave node: root@mesos007:~/src/mesos/m1/mesos/build# GLOG_v=1
> ./bin/mesos-slave.sh --master=192.168.0.107:5050
>
> Then schedule a task on any of the node, here I was using slave node
> mesos007, you can see that the two tasks was launched on different host.
>
> root@mesos007:~/src/mesos/m1/mesos/build# ./src/mesos-execute --master=
> 192.168.0.107:5050 --name="cluster-test" --command="/bin/sleep 100"
> --resources="cpus(*):1;mem(*):256"
> I1005 16:49:11.013432  2971 sched.cpp:164] Version: 0.26.0
> I1005 16:49:11.027802  2992 sched.cpp:262] New master detected at
> master@192.168.0.107:5050
> I1005 16:49:11.029579  2992 sched.cpp:272] No credentials provided.
> Attempting to register without authentication
> I1005 16:49:11.038182  2985 sched.cpp:641] Framework registered with
> c0e5fdde-595e-4768-9d04-25901d4523b6-0002
> Framework registered with c0e5fdde-595e-4768-9d04-25901d4523b6-0002
> task cluster-test submitted to slave
> c0e5fdde-595e-4768-9d04-25901d4523b6-S0  <<<<<<<<<<<<<<<<<<
> Received status update TASK_RUNNING for task cluster-test
> ^C
> root@mesos007:~/src/mesos/m1/mesos/build# ./src/mesos-execute --master=
> 192.168.0.107:5050 --name="cluster-test" --command="/bin/sleep 100"
> --resources="cpus(*):1;mem(*):256"
> I1005 16:50:18.346984  3036 sched.cpp:164] Version: 0.26.0
> I1005 16:50:18.366114  3055 sched.cpp:262] New master detected at
> master@192.168.0.107:5050
> I1005 16:50:18.368010  3055 sched.cpp:272] No credentials provided.
> Attempting to register without authentication
> I1005 16:50:18.376338  3056 sched.cpp:641] Framework registered with
> c0e5fdde-595e-4768-9d04-25901d4523b6-0003
> Framework registered with c0e5fdde-595e-4768-9d04-25901d4523b6-0003
> task cluster-test submitted to slave
> c0e5fdde-595e-4768-9d04-25901d4523b6-S1 <<<<<<<<<<<<<<<<<<<<
> Received status update TASK_RUNNING for task cluster-test
>
> Thanks,
>
> Guangya
>
> On Mon, Oct 5, 2015 at 4:21 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Guangya,
>>
>> Thanks for your reply.
>>
>> I just want to know how did you launch the tasks.
>>
>> 1. What processes you have started on Master?
>> 2. What are the processes you have started on Slaves?
>>
>> I am missing something here, otherwise all my slave have enough memory
>> and cpus to launch the tasks I mentioned.
>> What I am missing is some configuration steps.
>>
>> Thanks & Regards,
>> Pradeep
>>
>>
>> On 3 October 2015 at 13:14, Guangya Liu  wrote:
>>
>>> Hi Pradeep,
>>>
>>> I did some test with your case and found that the task can run randomly
>>> on the three slave hosts, every time may have different result. The logic
>>> is here:
>>> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.hpp#L1263-#L1266
>>> The allocator will help random shuffle the slaves every time when
>>> allocate resources for offers.
>>>
>>> I see that every of your task need the minimum resources as "
>>> resources="cpus(*):3;mem(*):2560", can you help check if all of your
>>> slaves have enough resources? If you want your task run on other slaves,
>>> then those slaves need to have at least 3 cpus and 2550M memory.
>>>
>>> Thanks
>>>
>>> On Fri, Oct 2, 2015 at 9:26 PM, Pradeep Kiruvale <
>>> pradeepkiruv...@gmail.com> wrote:
>>>
>>>> Hi On

Re: Running a task in Mesos cluster

2015-10-05 Thread Pradeep Kiruvale
Hi Guangya,

Thanks for your reply.

I just want to know how did you launch the tasks.

1. What processes you have started on Master?
2. What are the processes you have started on Slaves?

I am missing something here, otherwise all my slave have enough memory and
cpus to launch the tasks I mentioned.
What I am missing is some configuration steps.

Thanks & Regards,
Pradeep


On 3 October 2015 at 13:14, Guangya Liu  wrote:

> Hi Pradeep,
>
> I did some test with your case and found that the task can run randomly on
> the three slave hosts, every time may have different result. The logic is
> here:
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.hpp#L1263-#L1266
> The allocator will help random shuffle the slaves every time when
> allocate resources for offers.
>
> I see that every of your task need the minimum resources as "
> resources="cpus(*):3;mem(*):2560", can you help check if all of your
> slaves have enough resources? If you want your task run on other slaves,
> then those slaves need to have at least 3 cpus and 2550M memory.
>
> Thanks
>
> On Fri, Oct 2, 2015 at 9:26 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Ondrej,
>>
>> Thanks for your reply
>>
>> I did solve that issue, yes you are right there was an issue with slave
>> IP address setting.
>>
>> Now I am facing issue with the scheduling the tasks. When I try to
>> schedule a task using
>>
>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
>> --resources="cpus(*):3;mem(*):2560"
>>
>> The tasks always get scheduled on the same node. The resources from the
>> other nodes are not getting used to schedule the tasks.
>>
>>  I just start the mesos slaves like below
>>
>> ./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos  --hostname=slave1
>>
>> If I submit the task using the above (mesos-execute) command from same as
>> one of the slave it runs on that system.
>>
>> But when I submit the task from some different system. It uses just that
>> system and queues the tasks not runs on the other slaves.
>> Some times I see the message "Failed to getgid: unknown user"
>>
>> Do I need to start some process to push the task on all the slaves
>> equally? Am I missing something here?
>>
>> Regards,
>> Pradeep
>>
>>
>>
>> On 2 October 2015 at 15:07, Ondrej Smola  wrote:
>>
>>> Hi Pradeep,
>>>
>>> the problem is with IP your slave advertise - mesos by default resolves
>>> your hostname - there are several solutions  (let say your node ip is
>>> 192.168.56.128)
>>>
>>> 1)  export LIBPROCESS_IP=192.168.56.128
>>> 2)  set mesos options - ip, hostname
>>>
>>> one way to do this is to create files
>>>
>>> echo "192.168.56.128" > /etc/mesos-slave/ip
>>> echo "abc.mesos.com" > /etc/mesos-slave/hostname
>>>
>>> for more configuration options see
>>> http://mesos.apache.org/documentation/latest/configuration
>>>
>>>
>>>
>>>
>>>
>>> 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale :
>>>
>>>> Hi Guangya,
>>>>
>>>> Thanks for reply. I found one interesting log message.
>>>>
>>>>  7410 master.cpp:5977] Removed slave
>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
>>>> registered at the same address
>>>>
>>>> Mostly because of this issue, the systems/slave nodes are getting
>>>> registered and de-registered to make a room for the next node. I can even
>>>> see this on
>>>> the UI interface, for some time one node got added and after some time
>>>> that will be replaced with the new slave node.
>>>>
>>>> The above log is followed by the below log messages.
>>>>
>>>>
>>>> I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action (18
>>>> bytes) to leveldb took 104089ns
>>>> I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at 384
>>>> E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown socket
>>>> with fd 15: Transport endpoint is not connected
>>>> I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>>>> (192.168.0.116) with c

Re: Running a task in Mesos cluster

2015-10-03 Thread Pradeep Kiruvale
I have different login names for different system. I have a client system,
from where I launch the tasks. But these tasks are not getting any
resources. So, they are not getting scheduled.

I mean to say my cluster arrangement is 1 client, 1 Master, 3 slaves. All
are different physical systems.

Is there any way of run the tasks under one unified user?

Regards,
Pradeep

On 3 October 2015 at 10:43, Ondrej Smola  wrote:

>
> mesos framework receive offers and based on those offers it decides where
> to run tasks.
>
>
> mesos-execute is little framework that executes your task (hackbench) -
> see here https://github.com/apache/mesos/blob/master/src/cli/execute.cpp
>
> https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L320 you
> can see that it uses user that run mesos-execute command
>
> error you can see should be from here (su command)
>
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/posix/os.hpp#L520
>
> under which user do you run mesos-execute and mesos daemons?
>
> 2015-10-02 15:26 GMT+02:00 Pradeep Kiruvale :
>
>> Hi Ondrej,
>>
>> Thanks for your reply
>>
>> I did solve that issue, yes you are right there was an issue with slave
>> IP address setting.
>>
>> Now I am facing issue with the scheduling the tasks. When I try to
>> schedule a task using
>>
>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
>> --resources="cpus(*):3;mem(*):2560"
>>
>> The tasks always get scheduled on the same node. The resources from the
>> other nodes are not getting used to schedule the tasks.
>>
>>  I just start the mesos slaves like below
>>
>> ./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos  --hostname=slave1
>>
>> If I submit the task using the above (mesos-execute) command from same as
>> one of the slave it runs on that system.
>>
>> But when I submit the task from some different system. It uses just that
>> system and queues the tasks not runs on the other slaves.
>> Some times I see the message "Failed to getgid: unknown user"
>>
>> Do I need to start some process to push the task on all the slaves
>> equally? Am I missing something here?
>>
>> Regards,
>> Pradeep
>>
>>
>>
>> On 2 October 2015 at 15:07, Ondrej Smola  wrote:
>>
>>> Hi Pradeep,
>>>
>>> the problem is with IP your slave advertise - mesos by default resolves
>>> your hostname - there are several solutions  (let say your node ip is
>>> 192.168.56.128)
>>>
>>> 1)  export LIBPROCESS_IP=192.168.56.128
>>> 2)  set mesos options - ip, hostname
>>>
>>> one way to do this is to create files
>>>
>>> echo "192.168.56.128" > /etc/mesos-slave/ip
>>> echo "abc.mesos.com" > /etc/mesos-slave/hostname
>>>
>>> for more configuration options see
>>> http://mesos.apache.org/documentation/latest/configuration
>>>
>>>
>>>
>>>
>>>
>>> 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale :
>>>
>>>> Hi Guangya,
>>>>
>>>> Thanks for reply. I found one interesting log message.
>>>>
>>>>  7410 master.cpp:5977] Removed slave
>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
>>>> registered at the same address
>>>>
>>>> Mostly because of this issue, the systems/slave nodes are getting
>>>> registered and de-registered to make a room for the next node. I can even
>>>> see this on
>>>> the UI interface, for some time one node got added and after some time
>>>> that will be replaced with the new slave node.
>>>>
>>>> The above log is followed by the below log messages.
>>>>
>>>>
>>>> I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action (18
>>>> bytes) to leveldb took 104089ns
>>>> I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at 384
>>>> E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown socket
>>>> with fd 15: Transport endpoint is not connected
>>>> I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>>>> (192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578;
>>>> ports(*):[31000-32000]
>>>> I1002 10:01:12.754065  7

Re: Running a task in Mesos cluster

2015-10-02 Thread Pradeep Kiruvale
Hi Ondrej,

Thanks for your reply

I did solve that issue, yes you are right there was an issue with slave IP
address setting.

Now I am facing issue with the scheduling the tasks. When I try to schedule
a task using

/src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
--command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
--resources="cpus(*):3;mem(*):2560"

The tasks always get scheduled on the same node. The resources from the
other nodes are not getting used to schedule the tasks.

 I just start the mesos slaves like below

./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos  --hostname=slave1

If I submit the task using the above (mesos-execute) command from same as
one of the slave it runs on that system.

But when I submit the task from some different system. It uses just that
system and queues the tasks not runs on the other slaves.
Some times I see the message "Failed to getgid: unknown user"

Do I need to start some process to push the task on all the slaves equally?
Am I missing something here?

Regards,
Pradeep



On 2 October 2015 at 15:07, Ondrej Smola  wrote:

> Hi Pradeep,
>
> the problem is with IP your slave advertise - mesos by default resolves
> your hostname - there are several solutions  (let say your node ip is
> 192.168.56.128)
>
> 1)  export LIBPROCESS_IP=192.168.56.128
> 2)  set mesos options - ip, hostname
>
> one way to do this is to create files
>
> echo "192.168.56.128" > /etc/mesos-slave/ip
> echo "abc.mesos.com" > /etc/mesos-slave/hostname
>
> for more configuration options see
> http://mesos.apache.org/documentation/latest/configuration
>
>
>
>
>
> 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale :
>
>> Hi Guangya,
>>
>> Thanks for reply. I found one interesting log message.
>>
>>  7410 master.cpp:5977] Removed slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
>> registered at the same address
>>
>> Mostly because of this issue, the systems/slave nodes are getting
>> registered and de-registered to make a room for the next node. I can even
>> see this on
>> the UI interface, for some time one node got added and after some time
>> that will be replaced with the new slave node.
>>
>> The above log is followed by the below log messages.
>>
>>
>> I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action (18 bytes)
>> to leveldb took 104089ns
>> I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at 384
>> E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown socket
>> with fd 15: Transport endpoint is not connected
>> I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578;
>> ports(*):[31000-32000]
>> I1002 10:01:12.754065  7413 master.cpp:1080] Slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116) disconnected
>> I1002 10:01:12.754072  7416 hierarchical.hpp:675] Added slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8;
>> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: )
>> I1002 10:01:12.754084  7413 master.cpp:2534] Disconnecting slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116)
>> E1002 10:01:12.754118  7417 process.cpp:1912] Failed to shutdown socket
>> with fd 16: Transport endpoint is not connected
>> I1002 10:01:12.754132  7413 master.cpp:2553] Deactivating slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116)
>> I1002 10:01:12.754237  7416 hierarchical.hpp:768] Slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated
>> I1002 10:01:12.754240  7413 replica.cpp:658] Replica received learned
>> notice for position 384
>> I1002 10:01:12.754360  7413 leveldb.cpp:343] Persisting action (20 bytes)
>> to leveldb took 95171ns
>> I1002 10:01:12.754395  7413 leveldb.cpp:401] Deleting ~2 keys from
>> leveldb took 20333ns
>> I1002 10:01:12.754406  7413 replica.cpp:679] Persisted action at 384
>>
>>
>> Thanks,
>> Pradeep
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2 October 2015 at 02:35, Guangya Liu  wrote:
>>
>>> Hi Pradeep,
>>>
>>> Please check some of my questions in line.
>>>
>>> Thanks,
>>>
>>> Guangya
>>>
>>> On

Re: Running a task in Mesos cluster

2015-10-02 Thread Pradeep Kiruvale
Hi Guangya,

Thanks for reply. I found one interesting log message.

 7410 master.cpp:5977] Removed slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
registered at the same address

Mostly because of this issue, the systems/slave nodes are getting
registered and de-registered to make a room for the next node. I can even
see this on
the UI interface, for some time one node got added and after some time that
will be replaced with the new slave node.

The above log is followed by the below log messages.


I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action (18 bytes)
to leveldb took 104089ns
I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at 384
E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown socket
with fd 15: Transport endpoint is not connected
I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
(192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578;
ports(*):[31000-32000]
I1002 10:01:12.754065  7413 master.cpp:1080] Slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
(192.168.0.116) disconnected
I1002 10:01:12.754072  7416 hierarchical.hpp:675] Added slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8;
mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: )
I1002 10:01:12.754084  7413 master.cpp:2534] Disconnecting slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
(192.168.0.116)
E1002 10:01:12.754118  7417 process.cpp:1912] Failed to shutdown socket
with fd 16: Transport endpoint is not connected
I1002 10:01:12.754132  7413 master.cpp:2553] Deactivating slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
(192.168.0.116)
I1002 10:01:12.754237  7416 hierarchical.hpp:768] Slave
6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated
I1002 10:01:12.754240  7413 replica.cpp:658] Replica received learned
notice for position 384
I1002 10:01:12.754360  7413 leveldb.cpp:343] Persisting action (20 bytes)
to leveldb took 95171ns
I1002 10:01:12.754395  7413 leveldb.cpp:401] Deleting ~2 keys from leveldb
took 20333ns
I1002 10:01:12.754406  7413 replica.cpp:679] Persisted action at 384


Thanks,
Pradeep



















On 2 October 2015 at 02:35, Guangya Liu  wrote:

> Hi Pradeep,
>
> Please check some of my questions in line.
>
> Thanks,
>
> Guangya
>
> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am new to Mesos. I have set up a Mesos cluster with 1 Master and 3
>> Slaves.
>>
>> One slave runs on the Master Node itself and Other slaves run on
>> different nodes. Here node means the physical boxes.
>>
>> I tried running the tasks by configuring one Node cluster. Tested the
>> task scheduling using mesos-execute, works fine.
>>
>> When I configure three Node cluster (1master and 3 slaves) and try to see
>> the resources on the master (in GUI) only the Master node resources are
>> visible.
>>  The other nodes resources are not visible. Some times visible but in a
>> de-actived state.
>>
> Can you please append some logs from mesos-slave and mesos-master? There
> should be some logs in either master or slave telling you what is wrong.
>
>>
>> *Please let me know what could be the reason. All the nodes are in the
>> same network. *
>>
>> When I try to schedule a task using
>>
>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
>> --resources="cpus(*):3;mem(*):2560"
>>
>> The tasks always get scheduled on the same node. The resources from the
>> other nodes are not getting used to schedule the tasks.
>>
> Based on your previous question, there is only one node in your cluster,
> that's why other nodes are not available. We need first identify what is
> wrong with other three nodes first.
>
>>
>> I*s it required to register the frameworks from every slave node on the
>> Master?*
>>
> It is not required.
>
>>
>> *I have configured this cluster using the git-hub code.*
>>
>>
>> Thanks & Regards,
>> Pradeep
>>
>>
>


Running a task in Mesos cluster

2015-10-01 Thread Pradeep Kiruvale
Hi All,

I am new to Mesos. I have set up a Mesos cluster with 1 Master and 3 Slaves.

One slave runs on the Master Node itself and Other slaves run on different
nodes. Here node means the physical boxes.

I tried running the tasks by configuring one Node cluster. Tested the task
scheduling using mesos-execute, works fine.

When I configure three Node cluster (1master and 3 slaves) and try to see
the resources on the master (in GUI) only the Master node resources are
visible.
 The other nodes resources are not visible. Some times visible but in a
de-actived state.

*Please let me know what could be the reason. All the nodes are in the same
network. *

When I try to schedule a task using

/src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
--command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
--resources="cpus(*):3;mem(*):2560"

The tasks always get scheduled on the same node. The resources from the
other nodes are not getting used to schedule the tasks.

I*s it required to register the frameworks from every slave node on the
Master?*

*I have configured this cluster using the git-hub code.*


Thanks & Regards,
Pradeep


Re: [DISCUSS] Renaming Mesos Slave

2015-06-02 Thread Pradeep Kiruvale
1.Mesos Resource-Agent
2.mesos resource-Agent
3.Mesos Resource-Scheduler
4.Eventually

Thanks & Regards,
Pradeep


On 2 June 2015 at 15:39, Scott Rankin  wrote:

>   +1
>
>
>1. Mesos Worker
>2. mesos-worker
>3. No
>4. Change the docs as soon as the new command is available, perhaps
>provide a symlink for a while.
>
>
>   From: Adam Bordelon
> Reply-To: "user@mesos.apache.org"
> Date: Monday, June 1, 2015 at 5:18 PM
> To: dev, "user@mesos.apache.org"
> Subject: [DISCUSS] Renaming Mesos Slave
>
>   There has been much discussion about finding a less offensive name than
> "Slave", and many of these thoughts have been captured in
> https://issues.apache.org/jira/browse/MESOS-1478
>
> I would like to open up the discussion on this topic for one week, and if
> we cannot arrive at a lazy consensus, I will draft a proposal from the
> discussion and call for a VOTE.
> Here are the questions I would like us to answer:
> 1. What should we call the "Mesos Slave" node/host/machine?
> 2. What should we call the "mesos-slave" process (could be the same)?
> 3. Do we need to rename Mesos Master too?
>
> Another topic worth discussing is the deprecation process, but we don't
> necessarily need to decide on that at the same time as deciding the new
> name(s).
> 4. How will we phase in the new name and phase out the old name?
>
> Please voice your thoughts and opinions below.
>
> Thanks!
> -Adam-
>
> P.S. My personal thoughts:
> 1. Mesos Worker [Node]
> 2. Mesos Worker or Agent
> 3. No
> 4. Carefully
>
> This email message contains information that Motus, LLC considers
> confidential and/or proprietary, or may later designate as confidential and
> proprietary. It is intended only for use of the individual or entity named
> above and should not be forwarded to any other persons or entities without
> the express consent of Motus, LLC, nor should it be used for any purpose
> other than in the course of any potential or actual business relationship
> with Motus, LLC. If the reader of this message is not the intended
> recipient, or the employee or agent responsible to deliver it to the
> intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this communication is strictly prohibited. If
> you have received this communication in error, please notify sender
> immediately and destroy the original message.
>
> Internal Revenue Service regulations require that certain types of written
> advice include a disclaimer. To the extent the preceding message contains
> advice relating to a Federal tax issue, unless expressly stated otherwise
> the advice is not intended or written to be used, and it cannot be used by
> the recipient or any other taxpayer, for the purpose of avoiding Federal
> tax penalties, and was not written to support the promotion or marketing of
> any transaction or matter discussed herein.
>


Resource Monitoring-run time optimization

2015-04-09 Thread Pradeep Kiruvale
Hi All,

Is there a with which in mesos we can monitor the scheduled tasks and their
resource access pattern and reschedule the tasks on the optimized
resources? For example like NUMA balancer in case of Linux on a NUMA system.

Regards,
Pradeep


Re: Optimal resource allocation

2015-02-05 Thread Pradeep Kiruvale
Hi Alex,

Thanks for the clarification.

Regards,
Pradeep

On 5 February 2015 at 15:23, Alex Rukletsov  wrote:

> Pradeep,
>
> by two level scheduling is meant that the decision on how to use a certain
> resource offer is delegated to the framework. It's a framework's scheduler
> that decides whether to make use of the incoming offer or to reject it and
> wait for another "more suitable" one based on the resource type it's
> offered and / or attributes associated with the offered slave.
>
> On Thu, Feb 5, 2015 at 3:02 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Timothy,
>>
>> Thanks for your reply.
>>
>> Ya, even I also would like to consider all the dimensions while
>> scheduling an application, but here I just mentioned one dimension.
>>
>> But I did not understand what do you mean by the two level scheduler?
>> You mean some scheduling decisions happen
>> in master level and some in slave level?
>>
>> And how these decisions are made? is it uses best fit algorithm at master?
>>
>> Regards,
>> Pradeep
>>
>> On 5 February 2015 at 14:55, Timothy Chen  wrote:
>>
>>> Hi Pradeep,
>>>
>>> First of all I think the notion of optimal is not just a single
>>> dimension of being task duration, but also considering lots of other
>>> dimensions such as throughput, fairness, latency, SLA and more.
>>>
>>> Mesos is a two level scheduler, which means it's not doing all the
>>> scheduling at a single point (master), but instead cooperate with
>>> frameworks to have a good scheduling decision.
>>>
>>> So Mesos can achieve it with multiple attributes or resources as you
>>> mentioned with the help of frameworks.
>>>
>>> Tim
>>>
>>> On Feb 5, 2015, at 9:09 PM, Dario Rexin  wrote:
>>>
>>> Hi Pradeep,
>>>
>>> I am actually working on a patch for ARM support. I already have Mesos
>>> running on ARMv7, just need to polish it a bit and I still have 1 failing
>>> test. Expect news about this soon.
>>>
>>> Cheers,
>>> Dario
>>>
>>> On Feb 5, 2015, at 1:46 PM, Pradeep Kiruvale 
>>> wrote:
>>>
>>> Hi Dario,
>>>
>>> Thanks for the reply and clarification.
>>>
>>>  How hard is to port to ARM? is there lot of architecture related code?
>>> Any idea?
>>>
>>> Regards,
>>> Pradeep
>>>
>>> On 5 February 2015 at 12:01, Dario Rexin  wrote:
>>>
>>>> There is currently no support for ARM cpus. GPUs and FPGAs could be
>>>> added to the resources in the future but are also not supported yet.
>>>> Scheduling tasks on machines that have a specific configuration (powerful
>>>> GPU or sth like that) can be done with attributes. There's however no way
>>>> to isolate those resources like we do with CPU and RAM.
>>>>
>>>>
>>>>
>>>> > On 05.02.2015, at 11:10, Chengwei Yang 
>>>> wrote:
>>>> >
>>>> >> On Thu, Feb 05, 2015 at 12:00:28AM +0100, Pradeep Kiruvale wrote:
>>>> >> Hi All,
>>>> >>
>>>> >> I am new to Mesos and I have heard and read lot about it.
>>>> >>
>>>> >> I have few doubts regarding the resource allocation by the mesos,
>>>> please help
>>>> >> me
>>>> >> to clarify my doubts.
>>>> >>
>>>> >> In a data center, if there are thousands of heterogeneous nodes
>>>> >> (x86,arm,gpu,fpgas) then is the mesos can really allocate a
>>>> co-located
>>>> >
>>>> > First, does mesos can run on arm, gpu, fpga?
>>>> >
>>>> > Seconds, does your tasks run on all archs?
>>>> >
>>>> > --
>>>> > Thanks,
>>>> > Chengwei
>>>> >
>>>> >> resources for any incoming application to finish the task faster?
>>>> >>
>>>> >> How these resource constraints are solved? what kind of a constraint
>>>> solver it
>>>> >> uses?
>>>> >>
>>>> >> Is the policy maker configurable?
>>>> >>
>>>> >> Thanks & Regards,
>>>> >> Pradeep
>>>> >
>>>>
>>>
>>>
>>>
>>
>


Re: Optimal resource allocation

2015-02-05 Thread Pradeep Kiruvale
HI Tim,

Thanks for the clarification and for the url. I will go through it.

Regards,
Pradeep

On 5 February 2015 at 15:23, Tim Chen  wrote:

> Hi Pradeep,
>
> You can find more information here
> http://mesos.apache.org/documentation/latest/mesos-architecture/, and
> also the paper that is in the web page.
>
> But yes, resource offers are chosen by the master based on the scheduler
> chosen (although the only implementation is based on dominant resource
> fairness), and frameworks based on the offers choose to launch tasks with
> them or not with local scheduling decisions.
>
> Tim
>
> On Thu, Feb 5, 2015 at 10:02 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Timothy,
>>
>> Thanks for your reply.
>>
>> Ya, even I also would like to consider all the dimensions while
>> scheduling an application, but here I just mentioned one dimension.
>>
>> But I did not understand what do you mean by the two level scheduler?
>> You mean some scheduling decisions happen
>> in master level and some in slave level?
>>
>> And how these decisions are made? is it uses best fit algorithm at master?
>>
>> Regards,
>> Pradeep
>>
>> On 5 February 2015 at 14:55, Timothy Chen  wrote:
>>
>>> Hi Pradeep,
>>>
>>> First of all I think the notion of optimal is not just a single
>>> dimension of being task duration, but also considering lots of other
>>> dimensions such as throughput, fairness, latency, SLA and more.
>>>
>>> Mesos is a two level scheduler, which means it's not doing all the
>>> scheduling at a single point (master), but instead cooperate with
>>> frameworks to have a good scheduling decision.
>>>
>>> So Mesos can achieve it with multiple attributes or resources as you
>>> mentioned with the help of frameworks.
>>>
>>> Tim
>>>
>>> On Feb 5, 2015, at 9:09 PM, Dario Rexin  wrote:
>>>
>>> Hi Pradeep,
>>>
>>> I am actually working on a patch for ARM support. I already have Mesos
>>> running on ARMv7, just need to polish it a bit and I still have 1 failing
>>> test. Expect news about this soon.
>>>
>>> Cheers,
>>> Dario
>>>
>>> On Feb 5, 2015, at 1:46 PM, Pradeep Kiruvale 
>>> wrote:
>>>
>>> Hi Dario,
>>>
>>> Thanks for the reply and clarification.
>>>
>>>  How hard is to port to ARM? is there lot of architecture related code?
>>> Any idea?
>>>
>>> Regards,
>>> Pradeep
>>>
>>> On 5 February 2015 at 12:01, Dario Rexin  wrote:
>>>
>>>> There is currently no support for ARM cpus. GPUs and FPGAs could be
>>>> added to the resources in the future but are also not supported yet.
>>>> Scheduling tasks on machines that have a specific configuration (powerful
>>>> GPU or sth like that) can be done with attributes. There's however no way
>>>> to isolate those resources like we do with CPU and RAM.
>>>>
>>>>
>>>>
>>>> > On 05.02.2015, at 11:10, Chengwei Yang 
>>>> wrote:
>>>> >
>>>> >> On Thu, Feb 05, 2015 at 12:00:28AM +0100, Pradeep Kiruvale wrote:
>>>> >> Hi All,
>>>> >>
>>>> >> I am new to Mesos and I have heard and read lot about it.
>>>> >>
>>>> >> I have few doubts regarding the resource allocation by the mesos,
>>>> please help
>>>> >> me
>>>> >> to clarify my doubts.
>>>> >>
>>>> >> In a data center, if there are thousands of heterogeneous nodes
>>>> >> (x86,arm,gpu,fpgas) then is the mesos can really allocate a
>>>> co-located
>>>> >
>>>> > First, does mesos can run on arm, gpu, fpga?
>>>> >
>>>> > Seconds, does your tasks run on all archs?
>>>> >
>>>> > --
>>>> > Thanks,
>>>> > Chengwei
>>>> >
>>>> >> resources for any incoming application to finish the task faster?
>>>> >>
>>>> >> How these resource constraints are solved? what kind of a constraint
>>>> solver it
>>>> >> uses?
>>>> >>
>>>> >> Is the policy maker configurable?
>>>> >>
>>>> >> Thanks & Regards,
>>>> >> Pradeep
>>>> >
>>>>
>>>
>>>
>>>
>>
>


Re: Optimal resource allocation

2015-02-05 Thread Pradeep Kiruvale
Hi Timothy,

Thanks for your reply.

Ya, even I also would like to consider all the dimensions while scheduling
an application, but here I just mentioned one dimension.

But I did not understand what do you mean by the two level scheduler?  You
mean some scheduling decisions happen
in master level and some in slave level?

And how these decisions are made? is it uses best fit algorithm at master?

Regards,
Pradeep

On 5 February 2015 at 14:55, Timothy Chen  wrote:

> Hi Pradeep,
>
> First of all I think the notion of optimal is not just a single dimension
> of being task duration, but also considering lots of other dimensions such
> as throughput, fairness, latency, SLA and more.
>
> Mesos is a two level scheduler, which means it's not doing all the
> scheduling at a single point (master), but instead cooperate with
> frameworks to have a good scheduling decision.
>
> So Mesos can achieve it with multiple attributes or resources as you
> mentioned with the help of frameworks.
>
> Tim
>
> On Feb 5, 2015, at 9:09 PM, Dario Rexin  wrote:
>
> Hi Pradeep,
>
> I am actually working on a patch for ARM support. I already have Mesos
> running on ARMv7, just need to polish it a bit and I still have 1 failing
> test. Expect news about this soon.
>
> Cheers,
> Dario
>
> On Feb 5, 2015, at 1:46 PM, Pradeep Kiruvale 
> wrote:
>
> Hi Dario,
>
> Thanks for the reply and clarification.
>
>  How hard is to port to ARM? is there lot of architecture related code?
> Any idea?
>
> Regards,
> Pradeep
>
> On 5 February 2015 at 12:01, Dario Rexin  wrote:
>
>> There is currently no support for ARM cpus. GPUs and FPGAs could be added
>> to the resources in the future but are also not supported yet. Scheduling
>> tasks on machines that have a specific configuration (powerful GPU or sth
>> like that) can be done with attributes. There's however no way to isolate
>> those resources like we do with CPU and RAM.
>>
>>
>>
>> > On 05.02.2015, at 11:10, Chengwei Yang 
>> wrote:
>> >
>> >> On Thu, Feb 05, 2015 at 12:00:28AM +0100, Pradeep Kiruvale wrote:
>> >> Hi All,
>> >>
>> >> I am new to Mesos and I have heard and read lot about it.
>> >>
>> >> I have few doubts regarding the resource allocation by the mesos,
>> please help
>> >> me
>> >> to clarify my doubts.
>> >>
>> >> In a data center, if there are thousands of heterogeneous nodes
>> >> (x86,arm,gpu,fpgas) then is the mesos can really allocate a co-located
>> >
>> > First, does mesos can run on arm, gpu, fpga?
>> >
>> > Seconds, does your tasks run on all archs?
>> >
>> > --
>> > Thanks,
>> > Chengwei
>> >
>> >> resources for any incoming application to finish the task faster?
>> >>
>> >> How these resource constraints are solved? what kind of a constraint
>> solver it
>> >> uses?
>> >>
>> >> Is the policy maker configurable?
>> >>
>> >> Thanks & Regards,
>> >> Pradeep
>> >
>>
>
>
>


Re: Optimal resource allocation

2015-02-05 Thread Pradeep Kiruvale
Hi Dario,

 Cool! I will look forward for that.

Regards,
Pradeep



On 5 February 2015 at 14:09, Dario Rexin  wrote:

> Hi Pradeep,
>
> I am actually working on a patch for ARM support. I already have Mesos
> running on ARMv7, just need to polish it a bit and I still have 1 failing
> test. Expect news about this soon.
>
> Cheers,
> Dario
>
> On Feb 5, 2015, at 1:46 PM, Pradeep Kiruvale 
> wrote:
>
> Hi Dario,
>
> Thanks for the reply and clarification.
>
>  How hard is to port to ARM? is there lot of architecture related code?
> Any idea?
>
> Regards,
> Pradeep
>
> On 5 February 2015 at 12:01, Dario Rexin  wrote:
>
>> There is currently no support for ARM cpus. GPUs and FPGAs could be added
>> to the resources in the future but are also not supported yet. Scheduling
>> tasks on machines that have a specific configuration (powerful GPU or sth
>> like that) can be done with attributes. There's however no way to isolate
>> those resources like we do with CPU and RAM.
>>
>>
>>
>> > On 05.02.2015, at 11:10, Chengwei Yang 
>> wrote:
>> >
>> >> On Thu, Feb 05, 2015 at 12:00:28AM +0100, Pradeep Kiruvale wrote:
>> >> Hi All,
>> >>
>> >> I am new to Mesos and I have heard and read lot about it.
>> >>
>> >> I have few doubts regarding the resource allocation by the mesos,
>> please help
>> >> me
>> >> to clarify my doubts.
>> >>
>> >> In a data center, if there are thousands of heterogeneous nodes
>> >> (x86,arm,gpu,fpgas) then is the mesos can really allocate a co-located
>> >
>> > First, does mesos can run on arm, gpu, fpga?
>> >
>> > Seconds, does your tasks run on all archs?
>> >
>> > --
>> > Thanks,
>> > Chengwei
>> >
>> >> resources for any incoming application to finish the task faster?
>> >>
>> >> How these resource constraints are solved? what kind of a constraint
>> solver it
>> >> uses?
>> >>
>> >> Is the policy maker configurable?
>> >>
>> >> Thanks & Regards,
>> >> Pradeep
>> >
>>
>
>
>


Re: Optimal resource allocation

2015-02-05 Thread Pradeep Kiruvale
Hi Dario,

Thanks for the reply and clarification.

 How hard is to port to ARM? is there lot of architecture related code? Any
idea?

Regards,
Pradeep

On 5 February 2015 at 12:01, Dario Rexin  wrote:

> There is currently no support for ARM cpus. GPUs and FPGAs could be added
> to the resources in the future but are also not supported yet. Scheduling
> tasks on machines that have a specific configuration (powerful GPU or sth
> like that) can be done with attributes. There's however no way to isolate
> those resources like we do with CPU and RAM.
>
>
>
> > On 05.02.2015, at 11:10, Chengwei Yang 
> wrote:
> >
> >> On Thu, Feb 05, 2015 at 12:00:28AM +0100, Pradeep Kiruvale wrote:
> >> Hi All,
> >>
> >> I am new to Mesos and I have heard and read lot about it.
> >>
> >> I have few doubts regarding the resource allocation by the mesos,
> please help
> >> me
> >> to clarify my doubts.
> >>
> >> In a data center, if there are thousands of heterogeneous nodes
> >> (x86,arm,gpu,fpgas) then is the mesos can really allocate a co-located
> >
> > First, does mesos can run on arm, gpu, fpga?
> >
> > Seconds, does your tasks run on all archs?
> >
> > --
> > Thanks,
> > Chengwei
> >
> >> resources for any incoming application to finish the task faster?
> >>
> >> How these resource constraints are solved? what kind of a constraint
> solver it
> >> uses?
> >>
> >> Is the policy maker configurable?
> >>
> >> Thanks & Regards,
> >> Pradeep
> >
>


Re: Optimal resource allocation

2015-02-05 Thread Pradeep Kiruvale
Hi Chengwei,

Find my replies inline

On Thu, Feb 05, 2015 at 12:00:28AM +0100, Pradeep Kiruvale wrote:
> > Hi All,
> >
> > I am new to Mesos and I have heard and read lot about it.
> >
> > I have few doubts regarding the resource allocation by the mesos, please
> help
> > me
> > to clarify my doubts.
> >
> > In a data center, if there are thousands of heterogeneous nodes
> > (x86,arm,gpu,fpgas) then is the mesos can really allocate a co-located
>
> First, does mesos can run on arm, gpu, fpga?
>
>
 I dont have any idea.

Seconds, does your tasks run on all archs?
>
>
Especially HPC workloads will be.

Regards,
Pradeep


> --
> Thanks,
> Chengwei
>
> > resources for any incoming application to finish the task faster?
> >
> > How these resource constraints are solved? what kind of a constraint
> solver it
> > uses?
> >
> > Is the policy maker configurable?
> >
> > Thanks & Regards,
> > Pradeep
> >
> >
>
>


Re: Optimal resource allocation

2015-02-05 Thread Pradeep Kiruvale
Hi Billy,

Thanks for the reply. This is helpful for understanding the Mesos deeper.

Regards,
Pradeep

On 5 February 2015 at 10:01, Billy Bones  wrote:

> WARNING: The following assumption is based on my little understanding of
> mesos architecture and internals, you should not take it as a definitive
> answer and may wait for more experimented suggestions.
>
> About my little understanding of the mesos ressource allocation process, I
> think that on your kind of environnements (ARM / x86 / GPU / FPGA) it will
> not really allocate them wisely as the default algorithm is not so smart
> and consider ressources as commodities and not their real speed etc.
>
> I read some topics earlier about the necessity to improve this specific
> part of the "kernel", but It didn't mention your archs and focused deeply
> on the x86 family.
> I think that integrate the GPUs and FPGAs would be awesome!
>
> One nice feature would be that the master look at the registered slaves
> deeper regarding their archs and ressources perfomance before offers any
> ressource to a task.
>
> This kind of feature could be implement when the slave try to register to
> the master as a pre-fly test (Benchmark??).
> That would then add smartest offers and ressource scheduling.
>
> Anyway, long story short, I don't think so, but you should way for more
> experimented answers.
>
> 2015-02-05 0:00 GMT+01:00 Pradeep Kiruvale :
>
>> Hi All,
>>
>> I am new to Mesos and I have heard and read lot about it.
>>
>> I have few doubts regarding the resource allocation by the mesos, please
>> help me
>> to clarify my doubts.
>>
>> In a data center, if there are thousands of heterogeneous nodes
>> (x86,arm,gpu,fpgas) then is the mesos can really allocate a co-located
>> resources for any incoming application to finish the task faster?
>>
>> How these resource constraints are solved? what kind of a constraint
>> solver it uses?
>>
>> Is the policy maker configurable?
>>
>> Thanks & Regards,
>> Pradeep
>>
>>
>>
>


Optimal resource allocation

2015-02-04 Thread Pradeep Kiruvale
Hi All,

I am new to Mesos and I have heard and read lot about it.

I have few doubts regarding the resource allocation by the mesos, please
help me
to clarify my doubts.

In a data center, if there are thousands of heterogeneous nodes
(x86,arm,gpu,fpgas) then is the mesos can really allocate a co-located
resources for any incoming application to finish the task faster?

How these resource constraints are solved? what kind of a constraint solver
it uses?

Is the policy maker configurable?

Thanks & Regards,
Pradeep