Re: Multiple disks with Mesos

2014-10-08 Thread Damien Hardy
Hello,

I run mesos on top hadoop HDFS.
Hadoop handle well with JBOD configuration.

Today mesos can only work on one of the disk and cannot take advantage
of other disks. (use non HDFS space)

This would be a great feature to handle with JBOD too. Dealing with
failure better than LVM for example.

Cheers,

Le 08/10/2014 01:06, Arunabha Ghosh a écrit :
> Hi,
>  I would like to run Mesos slaves on machines that have multiple
> disks. According to the Mesos configuration page
> <http://mesos.apache.org/documentation/latest/configuration/> I can
> specify a work_dir argument to the slaves. 
> 
> 1) Can the work_dir argument contain multiple directories ?
> 
> 2) Is the work_dir where Mesos will place all of its data ? So If I
> started a task on Mesos, would the slave place the task's data (stderr,
> stdout, task created directories) inside work_dir ?
> 
> Thanks,
> Arunabha

-- 
Damien HARDY



signature.asc
Description: OpenPGP digital signature


Re: How can libmesos bind and declare specific network interface

2014-07-01 Thread Damien Hardy
Good one \o/

many thanks.

Le 01/07/2014 14:52, Tomas Barton a écrit :
> what about?
> 
> export LIBPROCESS_IP=10.69.69.45
> 
> or add rule to iptables for port range translation 3:6 to that
> interface
> 
> 
> On 1 July 2014 14:30, Damien Hardy  <mailto:dha...@viadeoteam.com>> wrote:
> 
> Hi,
> I am not talking about mesos-master or mesos-slave but about the spark
> driver (using libmesos as a framework).
> As it declares itself diring mesos registration comming from default
> interface of the desktop instead of the VPN one.
> 
> So mesos-master tries to access to an interface it cannot reach.
> 
> Spark using "spark.driver.host   10.69.69.45"
> see netstat :
> tcp0  0 0.0.0.0:44424 <http://0.0.0.0:44424>  
> 0.0.0.0:*
> LISTEN  1000   3076384 6779/java
> tcp0  0 10.69.69.45:39698 <http://10.69.69.45:39698>
>   10.50.0.1:5050 <http://10.50.0.1:5050>
> ESTABLISHED 1000   3068664 6779/java
> tcp6   0  0 :::43430:::*
> LISTEN  1000   3077940 6779/java
> tcp6   0  0 :::37926:::*
> LISTEN  1000   3077939 6779/java
> tcp6   0  0 :::4040 :::*
> LISTEN  1000   3077942 6779/java
> tcp6   0  0 :::51154:::*
> LISTEN  1000   3077938 6779/java
> tcp6   0  0 10.69.69.45:34610 <http://10.69.69.45:34610>
>   :::*
> LISTEN  1000   3076383 6779/java
> tcp6   0  0 :::43122:::*
> LISTEN  1000   3077884 6779/java
> 
> We can see spark interface are well binded to 10.69.69.45 but problem
> still exist for port 44424 that is supposed to be reached by
> mesos-master during registration.
> 
> I would like to make libmesos, used by framework, bind to the rigth
> interface.
> 
> Le 01/07/2014 13:34, Tomas Barton a écrit :
> > Hi,
> >
> > have you tried setting '--ip 10.69.69.45' ?
> >
> > So, mesos-master is binded to a wrong interface? Or you have problem
> > with mesos-slaves?
> >
> > Tomas
> >
> >
> > On 1 July 2014 12:16, Damien Hardy  <mailto:dha...@viadeoteam.com>
> > <mailto:dha...@viadeoteam.com <mailto:dha...@viadeoteam.com>>> wrote:
> >
> > Hello,
> >
> > We would like to use spark on mesos but mesos cluster is
> accessible
> > via VPN.
> > When running spark-shell we can see registrations attemps
> rununing with
> > defaut public interface of the desktop :
> >
> > ```
> > I0701 12:07:34.710917 2440 
>  master.cpp:820]
> > Framework
> > 20140612-135938-16790026-5050-2407-0537
> > (scheduler(1)@192.168.2.92:42731 <http://192.168.2.92:42731>
> <http://192.168.2.92:42731>)
> > already registered, resending
> > acknowledgement
> > I0701 12:07:35.711632  2430 master.cpp:815] Received registration
> > request from scheduler(1)@192.168.2.92:42731
> <http://192.168.2.92:42731> <http://192.168.2.92:42731>
> > ```
> >
> > But we would like it register with the VPN interface.
> >
> > This is working when changing my /etc/hosts file and setting
> hostname on
> > my VPN address:
> > ```
> > I0701 12:03:54.193022  2441 master.cpp:815] Received registration
> > request from scheduler(1)@10.69.69.45:47440
> <http://10.69.69.45:47440> <http://10.69.69.45:47440>
> > I0701 12:03:54.193094  2441 master.cpp:833] Registering framework
> > 20140612-135938-16790026-5050-2407-0536 at
> > scheduler(1)@10.69.69.45:47440 <http://10.69.69.45:47440>
> <http://10.69.69.45:47440>
> > ```
> >
> > I tried spark with
> > ```
> > spark.driver.host   10.69.69.45
> > ```
> > I can see spark binding to the rigth interfa ce but mesos keep
> > registring with default one. (and fail)
> >
> > I hope envvar $MESOS_hostname would do the trick but without
> success...
> >
> > Thank for help.
> >
> > --
> > Damien HARDY
> > IT Infrastructure Architect
> > Viadeo - 30 rue de la Victoire - 75009 Paris - France
> > PGP : 45D7F89A
> >
> >
> 
> --
> Damien HARDY
> IT Infrastructure Architect
> Viadeo - 30 rue de la Victoire - 75009 Paris - France
> PGP : 45D7F89A
> 
> 

-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France
PGP : 45D7F89A



signature.asc
Description: OpenPGP digital signature


Re: How can libmesos bind and declare specific network interface

2014-07-01 Thread Damien Hardy
Hi,
I am not talking about mesos-master or mesos-slave but about the spark
driver (using libmesos as a framework).
As it declares itself diring mesos registration comming from default
interface of the desktop instead of the VPN one.

So mesos-master tries to access to an interface it cannot reach.

Spark using "spark.driver.host   10.69.69.45"
see netstat :
tcp0  0 0.0.0.0:44424   0.0.0.0:*
LISTEN  1000   3076384 6779/java
tcp0  0 10.69.69.45:39698   10.50.0.1:5050
ESTABLISHED 1000   3068664 6779/java
tcp6   0  0 :::43430:::*
LISTEN  1000   3077940 6779/java
tcp6   0  0 :::37926:::*
LISTEN  1000   3077939 6779/java
tcp6   0  0 :::4040 :::*
LISTEN  1000   3077942 6779/java
tcp6   0  0 :::51154:::*
LISTEN  1000   3077938 6779/java
tcp6   0  0 10.69.69.45:34610   :::*
LISTEN  1000   3076383 6779/java
tcp6   0  0 :::43122:::*
LISTEN  1000   3077884 6779/java

We can see spark interface are well binded to 10.69.69.45 but problem
still exist for port 44424 that is supposed to be reached by
mesos-master during registration.

I would like to make libmesos, used by framework, bind to the rigth
interface.

Le 01/07/2014 13:34, Tomas Barton a écrit :
> Hi, 
> 
> have you tried setting '--ip 10.69.69.45' ?
> 
> So, mesos-master is binded to a wrong interface? Or you have problem
> with mesos-slaves?
> 
> Tomas
> 
> 
> On 1 July 2014 12:16, Damien Hardy  <mailto:dha...@viadeoteam.com>> wrote:
> 
> Hello,
> 
> We would like to use spark on mesos but mesos cluster is accessible
> via VPN.
> When running spark-shell we can see registrations attemps rununing with
> defaut public interface of the desktop :
> 
> ```
> I0701 12:07:34.710917 2440  master.cpp:820]
> Framework
> 20140612-135938-16790026-5050-2407-0537
> (scheduler(1)@192.168.2.92:42731 <http://192.168.2.92:42731>)
> already registered, resending
> acknowledgement
> I0701 12:07:35.711632  2430 master.cpp:815] Received registration
> request from scheduler(1)@192.168.2.92:42731 <http://192.168.2.92:42731>
> ```
> 
> But we would like it register with the VPN interface.
> 
> This is working when changing my /etc/hosts file and setting hostname on
> my VPN address:
> ```
> I0701 12:03:54.193022  2441 master.cpp:815] Received registration
> request from scheduler(1)@10.69.69.45:47440 <http://10.69.69.45:47440>
> I0701 12:03:54.193094  2441 master.cpp:833] Registering framework
> 20140612-135938-16790026-5050-2407-0536 at
> scheduler(1)@10.69.69.45:47440 <http://10.69.69.45:47440>
> ```
> 
> I tried spark with
> ```
> spark.driver.host   10.69.69.45
> ```
> I can see spark binding to the rigth interfa ce but mesos keep
> registring with default one. (and fail)
> 
> I hope envvar $MESOS_hostname would do the trick but without success...
> 
> Thank for help.
> 
> --
> Damien HARDY
> IT Infrastructure Architect
> Viadeo - 30 rue de la Victoire - 75009 Paris - France
> PGP : 45D7F89A
> 
> 

-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France
PGP : 45D7F89A



signature.asc
Description: OpenPGP digital signature


How can libmesos bind and declare specific network interface

2014-07-01 Thread Damien Hardy
Hello,

We would like to use spark on mesos but mesos cluster is accessible via VPN.
When running spark-shell we can see registrations attemps rununing with
defaut public interface of the desktop :

```
I0701 12:07:34.710917  2440 master.cpp:820] Framework
20140612-135938-16790026-5050-2407-0537
(scheduler(1)@192.168.2.92:42731) already registered, resending
acknowledgement
I0701 12:07:35.711632  2430 master.cpp:815] Received registration
request from scheduler(1)@192.168.2.92:42731
```

But we would like it register with the VPN interface.

This is working when changing my /etc/hosts file and setting hostname on
my VPN address:
```
I0701 12:03:54.193022  2441 master.cpp:815] Received registration
request from scheduler(1)@10.69.69.45:47440
I0701 12:03:54.193094  2441 master.cpp:833] Registering framework
20140612-135938-16790026-5050-2407-0536 at scheduler(1)@10.69.69.45:47440
```

I tried spark with
```
spark.driver.host   10.69.69.45
```
I can see spark binding to the rigth interfa ce but mesos keep
registring with default one. (and fail)

I hope envvar $MESOS_hostname would do the trick but without success...

Thank for help.

-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France
PGP : 45D7F89A



signature.asc
Description: OpenPGP digital signature


Re: Log managment

2014-05-30 Thread Damien Hardy
Hello,
Your point is right.
however --log_dir option provide a way to display logs for uses in HTTP
UI. That is not permit by tailing stdout/stderr.

Best regards,

Le 15/05/2014 17:42, Dick Davies a écrit :
> I'd try a newer version before you file bugs - but to be honest log rotation
> is logrotates job, it's really not very hard to setup.
> 
> In our stack we run under upstart, so things make it into syslog and we
> don't have to worry about rotation - scales better too as it's easier to
> centralize.
> 
> On 14 May 2014 09:46, Damien Hardy  wrote:
>> Hello,
>>
>> Log in mesos are problematic for me so far.
>> We are used to use log4j facility in java world that permit a lot of things.
>>
>> Mainly I would like log rotation (ideally with logrotate tool to be
>> homogeneous with other things) without restarting processes because in
>> my experience it looses history ( mesos 0.16.0 so far )
>>
>> Best regards,
>>
>> --
>> Damien HARDY

-- 
Damien HARDY



signature.asc
Description: OpenPGP digital signature


Re: Log managment

2014-05-30 Thread Damien Hardy
Thank you, This is the way I will go too.
Don't you have problems changing file descriptor when process is running
? (deleted file keeping growing a filling disk space until you restart
the process)
I suggest adding "copytruncate" in logrotate configuration.


Le 30/05/2014 11:31, Tomas Barton a écrit :
> I've already refactored the logging, I'm redirecting stdout and stderr
> to a master.log or slave.log
> 
> https://github.com/deric/mesos-deb-packaging/blob/master/mesos-init-wrapper#L109
> 
> the logrotate itself it quite simple
> 
> 
> /var/log/mesos/*.log {
> daily
> missingok
> rotate 30
> compress
> delaycompress
> notifempty
> }
> 
> 
> 
> On 30 May 2014 11:14, Damien Hardy  <mailto:dha...@viadeoteam.com>> wrote:
> 
> Hello,
> 
> Yes I do.
> I thought this was re right thing to do for logs.
> But never ending file is not safe usable. This option --log_dir need
> some rework I suppose.
> I will go with stdout/stderr pipeline instead (using logrotate
> copytruncate to handle open file descriptors)
> 
> Thank you
> 
> Le 15/05/2014 22:02, Tomas Barton a écrit :
> > Hi Damien,
> >
> > do you use the `--log_dir` switch? If so, mesos is creating quite many
> > files in a strange format:
> >
> > mesos-slave.{hostname}.invalid-user.log.INFO.20140409-155625.7545
> >
> > when you forward stdout of the service to a single file and afterwards
> > apply simple logroate
> > rules, you might get nicer logs.
> >
> > Tomas
> 
> --
> Damien HARDY
> 
> 

-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France
PGP : 45D7F89A



signature.asc
Description: OpenPGP digital signature


Re: Log managment

2014-05-30 Thread Damien Hardy
Hello,

Yes I do.
I thought this was re right thing to do for logs.
But never ending file is not safe usable. This option --log_dir need
some rework I suppose.
I will go with stdout/stderr pipeline instead (using logrotate
copytruncate to handle open file descriptors)

Thank you

Le 15/05/2014 22:02, Tomas Barton a écrit :
> Hi Damien,
> 
> do you use the `--log_dir` switch? If so, mesos is creating quite many
> files in a strange format:
> 
> mesos-slave.{hostname}.invalid-user.log.INFO.20140409-155625.7545
> 
> when you forward stdout of the service to a single file and afterwards
> apply simple logroate
> rules, you might get nicer logs.
> 
> Tomas

-- 
Damien HARDY



signature.asc
Description: OpenPGP digital signature


Re: Log managment

2014-05-16 Thread Damien Hardy
Hello,

I created https://issues.apache.org/jira/browse/MESOS-1375

Thank you,

Cheers,

Le 14/05/2014 19:28, Adam Bordelon a écrit :
> Hi Damien,
> 
> Log rotation sounds like a reasonable request. Please file a JIRA for
> it, and we can discuss details there.
> 
> Thanks,
> -Adam-

-- 
Damien HARDY



signature.asc
Description: OpenPGP digital signature


Log managment

2014-05-14 Thread Damien Hardy
Hello,

Log in mesos are problematic for me so far.
We are used to use log4j facility in java world that permit a lot of things.

Mainly I would like log rotation (ideally with logrotate tool to be
homogeneous with other things) without restarting processes because in
my experience it looses history ( mesos 0.16.0 so far )

Best regards,

-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France
PGP : 45D7F89A



signature.asc
Description: OpenPGP digital signature


Re: run master and slave on same host (just for testing)?

2014-01-15 Thread Damien Hardy
Hello Jim,

Actually configuration are mostly different between master and slave.
Only few of them are common and mostly addressing network or log
purposes that would be common on a single host.

BTW You can use envar for settings,
any config can be set using $MESOS_ prefix

ex :
export MESOS_ip=192.168.255.2
./bin/mesos-slave


For my need of debian packaging, I put init scripts that can inspire you
to run services
they use respectively /etc/default/mesos /etc/default/mesos-master
/etc/default/mesos-slave

https://github.com/viadeo/mesos/blob/feature/Debian/debian/mesos-master.init
https://github.com/viadeo/mesos/blob/feature/Debian/debian/mesos-slave.init

Cheers,

Le 15/01/2014 02:18, Jim Freeman a écrit :
> I'm following Mesos README's "Running a Mesos Cluster" section.  Is
>  [prefix]/var/mesos/conf/mesos.conf used to config both the master and
> slave?  If so then I can't run master and slave on same host since the
> config would differ for master vs. slave.  BTW, I don't see this file
> installed, nor do I see a .template for it anywhere.
> 
> - Jim

-- 
Damien HARDY



signature.asc
Description: OpenPGP digital signature


Re: Hadoop on Mesos use local cdh4 installation instead of tar.gz

2014-01-03 Thread Damien Hardy
Hello,

Seams that a directory is not supported here ("cp omitting directory
/usr/lib/hadoop-0.20-mapreduce/" and failing then in 256 error in stderr)

But this work (I was first disappointed by using file:// schema as an uri) :

```
  
mapred.mesos.executor.uri
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop
  
  
mapred.mesos.executor.directory
./
  
  
mapred.mesos.executor.command
. /etc/default/hadoop-0.20; env ; ./hadoop
org.apache.hadoop.mapred.MesosExecutor
  
```

just copying the `bin/hadoop` command in framework directory and using it.


Le 02/01/2014 19:17, Florian Leibert a écrit :
> Hi Damien -
> Sorry for responding to this a bit late.
> 
> Snappy should be enabled if you follow the steps in this tutorial:
> "http://mesosphere.io/learn/run-hadoop-on-mesos/";. You can set an
> alternate location for the hadoop distribution by changing the
> "mapred.mesos.executor.uri" - i.e. you can use a local directory / file
> here.
> 
> I hope this helps!
> 
> Cheers & happy new year,
> --Flo
> ᐧ
> 
> 
> On Thu, Jan 2, 2014 at 8:45 AM, Damien Hardy  <mailto:dha...@viadeoteam.com>> wrote:
> 
> Hello,
> 
> Using hadoop distribution is possible (here cdh4.1.2) :
> An archive is mandatory by haddop-mesos framework, so I created and
> deployed a small dummy file that does not cost so much to get and untar.
> 
> In mapred-site.xml, override mapred.mesos.executor.directory and
> mapred.mesos.executor.command so I use mesos task directory for my job
> and deployed cloudera tasktracker to execute.
> 
> +  
> +mapred.mesos.executor.uri
> +hdfs://hdfscluster/tmp/dummy.tar.gz
> +  
> +  
> +mapred.mesos.executor.directory
> +./
> +  
> +  
> +mapred.mesos.executor.command
> +. /etc/default/hadoop-0.20; env ; $HADOOP_HOME/bin/hadoop
> org.apache.hadoop.mapred.MesosExecutor
> +  
> 
> Add some envar in /etc/default/hadoop-0.20 so hadoop services can find
> hadoop-mesos jar and libmesos :
> 
> +export
> 
> HADOOP_CLASSPATH=/usr/lib/hadoop-mesos/hadoop-mesos.jar:$HADOOP_HOME/contrib/fairscheduler/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.2.jar:$HADOOP_CLASSPATH
> +export MESOS_NATIVE_LIBRARY=/usr/lib/libmesos.so
> 
> I created an hadoop-mesos deb to be deployed with hadoop ditribution.
> My goal is to limit -copyToLocal of TT code for each mesos tasks, and no
> need for special manipulation in Hadoop Distribution code (only config)
> 
> Regards,
> 
> Le 31/12/2013 16:45, Damien Hardy a écrit :
> > I'm now able to use snappy compression by adding
> >
> > export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/
> > in my /etc/default/mesos-slave (environment variable for mesos-slave
> > process used by my init.d script)
> >
> > This envar is propagated to executor Jvm and so taskTracker can find
> > libsnappy.so to use it.
> >
> > Starting using local deployement of cdh4 ...
> >
> > Reading at the source it seams that something could be done using
> > mapred.mesos.executor.directory and mapred.mesos.executor.command
> > to use local hadoop.
> >
> >
> > Le 31/12/2013 15:08, Damien Hardy a écrit :
> >> Hello,
> >>
> >> Happy new year 2014 @mesos users.
> >>
> >> I am trying to get MapReduce cdh4.1.2 running on Mesos.
> >>
> >> Seams working mostly but few things are still problematic.
> >>
> >>   * MR1 code is already deployed locally with HDFS is there a way
> to use
> >> it instead of tar.gz stored on HDFS to be copied locally and untar.
> >>
> >>   * If not, using tar.gz distribution of cdh4 seams not supporting
> >> Snappy compression. is there a way to correct it ?
> >>
> >> Best regards,
> >>
> >
> 
> --
> Damien HARDY
> IT Infrastructure Architect
> Viadeo - 30 rue de la Victoire - 75009 Paris - France
> PGP : 45D7F89A
> 
> 

-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France
PGP : 45D7F89A



signature.asc
Description: OpenPGP digital signature


Re: Hadoop on Mesos use local cdh4 installation instead of tar.gz

2014-01-02 Thread Damien Hardy
Hello,

Using hadoop distribution is possible (here cdh4.1.2) :
An archive is mandatory by haddop-mesos framework, so I created and
deployed a small dummy file that does not cost so much to get and untar.

In mapred-site.xml, override mapred.mesos.executor.directory and
mapred.mesos.executor.command so I use mesos task directory for my job
and deployed cloudera tasktracker to execute.

+  
+mapred.mesos.executor.uri
+hdfs://hdfscluster/tmp/dummy.tar.gz
+  
+  
+mapred.mesos.executor.directory
+./
+  
+  
+mapred.mesos.executor.command
+. /etc/default/hadoop-0.20; env ; $HADOOP_HOME/bin/hadoop
org.apache.hadoop.mapred.MesosExecutor
+  

Add some envar in /etc/default/hadoop-0.20 so hadoop services can find
hadoop-mesos jar and libmesos :

+export
HADOOP_CLASSPATH=/usr/lib/hadoop-mesos/hadoop-mesos.jar:$HADOOP_HOME/contrib/fairscheduler/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.2.jar:$HADOOP_CLASSPATH
+export MESOS_NATIVE_LIBRARY=/usr/lib/libmesos.so

I created an hadoop-mesos deb to be deployed with hadoop ditribution.
My goal is to limit -copyToLocal of TT code for each mesos tasks, and no
need for special manipulation in Hadoop Distribution code (only config)

Regards,

Le 31/12/2013 16:45, Damien Hardy a écrit :
> I'm now able to use snappy compression by adding
> 
> export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/
> in my /etc/default/mesos-slave (environment variable for mesos-slave
> process used by my init.d script)
> 
> This envar is propagated to executor Jvm and so taskTracker can find
> libsnappy.so to use it.
> 
> Starting using local deployement of cdh4 ...
> 
> Reading at the source it seams that something could be done using
> mapred.mesos.executor.directory and mapred.mesos.executor.command
> to use local hadoop.
> 
> 
> Le 31/12/2013 15:08, Damien Hardy a écrit :
>> Hello,
>>
>> Happy new year 2014 @mesos users.
>>
>> I am trying to get MapReduce cdh4.1.2 running on Mesos.
>>
>> Seams working mostly but few things are still problematic.
>>
>>   * MR1 code is already deployed locally with HDFS is there a way to use
>> it instead of tar.gz stored on HDFS to be copied locally and untar.
>>
>>   * If not, using tar.gz distribution of cdh4 seams not supporting
>> Snappy compression. is there a way to correct it ?
>>
>> Best regards,
>>
> 

-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France
PGP : 45D7F89A



signature.asc
Description: OpenPGP digital signature


Re: Hadoop on Mesos use local cdh4 installation instead of tar.gz

2013-12-31 Thread Damien Hardy
I'm now able to use snappy compression by adding

export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/
in my /etc/default/mesos-slave (environment variable for mesos-slave
process used by my init.d script)

This envar is propagated to executor Jvm and so taskTracker can find
libsnappy.so to use it.

Starting using local deployement of cdh4 ...

Reading at the source it seams that something could be done using
mapred.mesos.executor.directory and mapred.mesos.executor.command
to use local hadoop.


Le 31/12/2013 15:08, Damien Hardy a écrit :
> Hello,
> 
> Happy new year 2014 @mesos users.
> 
> I am trying to get MapReduce cdh4.1.2 running on Mesos.
> 
> Seams working mostly but few things are still problematic.
> 
>   * MR1 code is already deployed locally with HDFS is there a way to use
> it instead of tar.gz stored on HDFS to be copied locally and untar.
> 
>   * If not, using tar.gz distribution of cdh4 seams not supporting
> Snappy compression. is there a way to correct it ?
> 
> Best regards,
> 

-- 
Damien HARDY



signature.asc
Description: OpenPGP digital signature


Hadoop on Mesos use local cdh4 installation instead of tar.gz

2013-12-31 Thread Damien Hardy
Hello,

Happy new year 2014 @mesos users.

I am trying to get MapReduce cdh4.1.2 running on Mesos.

Seams working mostly but few things are still problematic.

  * MR1 code is already deployed locally with HDFS is there a way to use
it instead of tar.gz stored on HDFS to be copied locally and untar.

  * If not, using tar.gz distribution of cdh4 seams not supporting
Snappy compression. is there a way to correct it ?

Best regards,

-- 
Damien HARDY



signature.asc
Description: OpenPGP digital signature


Maven artifact index of Mesos

2013-10-28 Thread Damien Hardy
Hello,

I keep up to date with HEAD of Mesos for my tests cluster.
So I try to build Chronos with recent version of Mesos.
Is there any maven repo that is updated with snaphot version of java mesos
lib 0.15.0,
where we can see an index of versions available ?

Best regards,

-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France


Re: UI remote Task Sandbox displays error

2013-10-25 Thread Damien Hardy
Thank you Ben,

For my case, guilty is here
https://github.com/airbnb/chronos/blob/master/src/main/scala/com/airbnb/scheduler/jobs/TaskUtils.scala#L22:)

maybe some base64 encoding for paths could be a way to do

Best,


2013/10/23 Benjamin Mahler 

> There could be an issue related to using the ':' character in your
> executor id (which ultimately gets mapped to a path). Awhile back I filed
> this: https://issues.apache.org/jira/browse/MESOS-361
>
> That's all I have to go on with the given information, the slave log might
> be more informative here, do you have it available?
>
>
> On Wed, Oct 23, 2013 at 9:50 AM, Damien Hardy wrote:
>
>> Hello,
>>
>> I suppose I'm doing wrong.
>>
>> When running a task on a remote slave from master, clicking on sandbox
>> link on UI results in an error:
>>
>> *Error browsing path:
>> /var/lib/mesos//slaves/201310211635-3951143104-5050-5534-2/frameworks/201310211635-3951143104-5050-5534-0002/executors/ct:1382546709043:0:test/runs/60a53a92-0e92-4f33-a0ac-9577fefa2dd7
>>
>> *
>> /var/lib/mesos is the working dir for slave.
>>
>> local dir (on master) is obviously empty, and path exists on distant
>> slave. (with stderr and stdout files as expected)
>>
>> When task run locally with master it display well.
>>
>> revision commit is b6405e20b1fc2163dada7072c4dd417518ce1efd
>>
>> What did I miss ?
>>
>> Regards,
>>
>> --
>> Damien HARDY
>>
>
>


-- 
Damien HARDY


UI remote Task Sandbox displays error

2013-10-23 Thread Damien Hardy
Hello,

I suppose I'm doing wrong.

When running a task on a remote slave from master, clicking on sandbox link
on UI results in an error:

*Error browsing path:
/var/lib/mesos//slaves/201310211635-3951143104-5050-5534-2/frameworks/201310211635-3951143104-5050-5534-0002/executors/ct:1382546709043:0:test/runs/60a53a92-0e92-4f33-a0ac-9577fefa2dd7

*
/var/lib/mesos is the working dir for slave.

local dir (on master) is obviously empty, and path exists on distant slave.
(with stderr and stdout files as expected)

When task run locally with master it display well.

revision commit is b6405e20b1fc2163dada7072c4dd417518ce1efd

What did I miss ?

Regards,

-- 
Damien HARDY


Re: parallel build fails on tests/mesos_tests-slave_recovery_tests.o

2013-10-18 Thread Damien Hardy
Ok I find out the compilation error

Mesos need more than 2Gb of RAM to make check with parallel = 2 (my VM was
limited to 2 Gb) upgrading to 4Gb solved it.

Concerning the failing test, even after merging
#14097<https://reviews.apache.org/r/14097/>it keep failing
I filled a issue MESOS-747

Best regards


2013/10/16 Vinod Kone 

> Mind filing a bug for the compiler error? We encountered these too before
> but were unable to nail down the root cause.
>
> The fix for the flaky fault tolerance test is at:
> https://reviews.apache.org/r/14097/
>
>
> On Wed, Oct 16, 2013 at 8:36 AM, Damien Hardy wrote:
>
>> Hello,
>>
>> The last pull on master branch HEAD fails to build with parallel
>> compilation.
>> always on "tests/mesos_tests-slave_recovery_tests.o"
>> It used to work at commit d8da5f4 (october 2nd)
>>
>> [...]
>> g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\"
>> -DPACKAGE_VERSION=\"0.15.0\" -DPACKAGE_STRING=\"mesos\ 0.15.0\"
>> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\"
>> -DVERSION=\"0.15.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
>> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1
>> -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1
>> -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 -DMESOS_HAS_JAVA=1
>> -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBZ=1 -DHAVE_LIBCRYPTO=1
>> -DHAVE_LIBSSL=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 -I.   -Wall -Werror
>> -DLIBDIR=\"/usr/lib\" -DPKGLIBEXECDIR=\"/usr/lib/mesos\"
>> -DPKGDATADIR=\"/usr/share/mesos\" -I../include
>> -I../3rdparty/libprocess/include
>> -I../3rdparty/libprocess/3rdparty/stout/include -I../include
>> -I../3rdparty/libprocess/3rdparty/boost-1.53.0
>> -I../3rdparty/libprocess/3rdparty/protobuf-2.4.1/src
>> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src
>> -I../3rdparty/zookeeper-3.3.4/src/c/include
>> -I../3rdparty/zookeeper-3.3.4/src/c/generated
>> -DSOURCE_DIR=\"/home/vagrant/mesos\" -DBUILD_DIR=\"/home/vagrant/mesos\"
>> -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include
>> -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include
>> -I/usr/lib/jvm/java-6-openjdk/include
>> -I/usr/lib/jvm/java-6-openjdk/include/linux -DZOOKEEPER_VERSION=\"3.3.4\"
>> -g -O2 -fno-strict-aliasing -g2 -O2 -c -o tests/mesos_tests-state_tests.o
>> `test -f 'tests/state_tests.cpp' || echo './'`tests/state_tests.cpp
>> g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\"
>> -DPACKAGE_VERSION=\"0.15.0\" -DPACKAGE_STRING=\"mesos\ 0.15.0\"
>> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\"
>> -DVERSION=\"0.15.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
>> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1
>> -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1
>> -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 -DMESOS_HAS_JAVA=1
>> -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBZ=1 -DHAVE_LIBCRYPTO=1
>> -DHAVE_LIBSSL=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 -I.   -Wall -Werror
>> -DLIBDIR=\"/usr/lib\" -DPKGLIBEXECDIR=\"/usr/lib/mesos\"
>> -DPKGDATADIR=\"/usr/share/mesos\" -I../include
>> -I../3rdparty/libprocess/include
>> -I../3rdparty/libprocess/3rdparty/stout/include -I../include
>> -I../3rdparty/libprocess/3rdparty/boost-1.53.0
>> -I../3rdparty/libprocess/3rdparty/protobuf-2.4.1/src
>> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src
>> -I../3rdparty/zookeeper-3.3.4/src/c/include
>> -I../3rdparty/zookeeper-3.3.4/src/c/generated
>> -DSOURCE_DIR=\"/home/vagrant/mesos\" -DBUILD_DIR=\"/home/vagrant/mesos\"
>> -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include
>> -I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include
>> -I/usr/lib/jvm/java-6-openjdk/include
>> -I/usr/lib/jvm/java-6-openjdk/include/linux -DZOOKEEPER_VERSION=\"3.3.4\"
>> -g -O2 -fno-strict-aliasing -g2 -O2 -c -o
>> tests/mesos_tests-status_update_manager_tests.o `test -f
>> 'tests/status_update_manager_tests.cpp' || echo
>> './'`tests/status_update_manager_tests.cpp
>> g++: Internal error: Processus arrêté (program cc1plus)
>> Please submit a full bug report.
>> See  for instructions.
>> make[4]: *** [tests/mesos_tests-slave_recovery_tests

parallel build fails on tests/mesos_tests-slave_recovery_tests.o

2013-10-16 Thread Damien Hardy
Hello,

The last pull on master branch HEAD fails to build with parallel
compilation.
always on "tests/mesos_tests-slave_recovery_tests.o"
It used to work at commit d8da5f4 (october 2nd)

[...]
g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\"
-DPACKAGE_VERSION=\"0.15.0\" -DPACKAGE_STRING=\"mesos\ 0.15.0\"
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\"
-DVERSION=\"0.15.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 -DMESOS_HAS_JAVA=1
-DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBZ=1 -DHAVE_LIBCRYPTO=1
-DHAVE_LIBSSL=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 -I.   -Wall -Werror
-DLIBDIR=\"/usr/lib\" -DPKGLIBEXECDIR=\"/usr/lib/mesos\"
-DPKGDATADIR=\"/usr/share/mesos\" -I../include
-I../3rdparty/libprocess/include
-I../3rdparty/libprocess/3rdparty/stout/include -I../include
-I../3rdparty/libprocess/3rdparty/boost-1.53.0
-I../3rdparty/libprocess/3rdparty/protobuf-2.4.1/src
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src
-I../3rdparty/zookeeper-3.3.4/src/c/include
-I../3rdparty/zookeeper-3.3.4/src/c/generated
-DSOURCE_DIR=\"/home/vagrant/mesos\" -DBUILD_DIR=\"/home/vagrant/mesos\"
-I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include
-I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include
-I/usr/lib/jvm/java-6-openjdk/include
-I/usr/lib/jvm/java-6-openjdk/include/linux -DZOOKEEPER_VERSION=\"3.3.4\"
-g -O2 -fno-strict-aliasing -g2 -O2 -c -o tests/mesos_tests-state_tests.o
`test -f 'tests/state_tests.cpp' || echo './'`tests/state_tests.cpp
g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\"
-DPACKAGE_VERSION=\"0.15.0\" -DPACKAGE_STRING=\"mesos\ 0.15.0\"
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\"
-DVERSION=\"0.15.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 -DMESOS_HAS_JAVA=1
-DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBZ=1 -DHAVE_LIBCRYPTO=1
-DHAVE_LIBSSL=1 -DHAVE_LIBCURL=1 -DHAVE_LIBSASL2=1 -I.   -Wall -Werror
-DLIBDIR=\"/usr/lib\" -DPKGLIBEXECDIR=\"/usr/lib/mesos\"
-DPKGDATADIR=\"/usr/share/mesos\" -I../include
-I../3rdparty/libprocess/include
-I../3rdparty/libprocess/3rdparty/stout/include -I../include
-I../3rdparty/libprocess/3rdparty/boost-1.53.0
-I../3rdparty/libprocess/3rdparty/protobuf-2.4.1/src
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src
-I../3rdparty/zookeeper-3.3.4/src/c/include
-I../3rdparty/zookeeper-3.3.4/src/c/generated
-DSOURCE_DIR=\"/home/vagrant/mesos\" -DBUILD_DIR=\"/home/vagrant/mesos\"
-I../3rdparty/libprocess/3rdparty/gmock-1.6.0/gtest/include
-I../3rdparty/libprocess/3rdparty/gmock-1.6.0/include
-I/usr/lib/jvm/java-6-openjdk/include
-I/usr/lib/jvm/java-6-openjdk/include/linux -DZOOKEEPER_VERSION=\"3.3.4\"
-g -O2 -fno-strict-aliasing -g2 -O2 -c -o
tests/mesos_tests-status_update_manager_tests.o `test -f
'tests/status_update_manager_tests.cpp' || echo
'./'`tests/status_update_manager_tests.cpp
g++: Internal error: Processus arrêté (program cc1plus)
Please submit a full bug report.
See  for instructions.
make[4]: *** [tests/mesos_tests-slave_recovery_tests.o] Erreur 1
make[4]: *** Attente des tâches non terminées
make[4]: quittant le répertoire « /home/vagrant/mesos/src »
make[3]: *** [check-am] Erreur 2
make[3]: quittant le répertoire « /home/vagrant/mesos/src »
make[2]: *** [check] Erreur 2
make[2]: quittant le répertoire « /home/vagrant/mesos/src »
make[1]: *** [check-recursive] Erreur 1
make[1]: quittant le répertoire « /home/vagrant/mesos »
dh_auto_test: make -j2 check returned exit code 2

build done on debian squeeze.

I tested without parallel with success except fails on
FaultToleranceTest.ReregisterFrameworkExitedExecutor but twice longer as
expected (core2 duo /o\)


My fail on test :

[ RUN  ] FaultToleranceTest.ReregisterFrameworkExitedExecutor
tests/fault_tolerance_tests.cpp:1112: Failure
Mock function called more times than expected - returning directly.
Function call: resourceOffers(0x7fffb7fcb820, @0x2b8b118d4c80 {
224-byte object <30-52 1A-0D 8B-2B 00-00 00-00 00-00 00-00 00-00 60-68
4B-18 8B-2B 00-00 10-2B 00-18 8B-2B 00-00 90-F6 0E-18 8B-2B 00-00 B0-26
05-18 8B-2B 00-00 78-48 4B-18 8B-2B 00-00 04-00 00-00 04-00 00-00 ... E8-48
4B-18 8B-2B 00-00 00-00 00-00 00-00 00-00 04-00 00-00 64-65 66-61 75-6C
74-20 61-63 74-69 6F-6E 2C-20 6F-72 20-61 73-73 69-67 6E-20 74-68 65-20
64-65 66-61 75-6C 00-00 00-00 0F-00 00-00> })
 Expected: to be called once
   Actual: called twice - over-saturated and active
[  FAILED  ] FaultToleranceTest.ReregisterFrameworkExitedExecutor (29 ms)


-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France


Re: mesos frameworks registration ?

2013-09-30 Thread Damien Hardy
Hello,

I give a try with mesos-0.14.0-rc4 and the 2 frameworks can register on
mesos.
And execute tasks (batch and long running)



2013/9/24 Damien Hardy 

> Not yet,
>
> I suspected some misconfiguration on mesos part because chronos as the
> same behaviour.
>
>
>
> 2013/9/23 Benjamin Mahler 
>
>> It looks like the Marathon framework is continually failing over, have
>> you sought help from the Marathon developers?
>>
>>
>> On Mon, Sep 23, 2013 at 2:52 AM, Damien Hardy wrote:
>>
>>> Hello there,
>>>
>>> I might miss something about framework deployment on mesos.
>>>
>>> I try to get chronos or marathon frameworks working with HEAD of mesos
>>> running distributed.
>>>
>>> I mesos topology seams OK slaves report to master and I can see offers
>>> of resources (total available) on the mesos HTTP interface.
>>>
>>> 192.168.255.1 : marathon or chronos
>>> 192.168.255.2 : zookeeper + mesos master
>>> 192.168.255.3 : mesos slave
>>>
>>> Then I start marathon or chornos (HEAD version for both with pom.xml
>>> using "0.15.0-20130910-2" for example.
>>>
>>> It seams succeed in finding master, I can see the frameworks listed.
>>> But mesos services seams complain permanently, flooding logs on slave
>>> with :
>>>
>>> ```
>>> 2013-09-23 
>>> 11:35:37,405:2264(0x7faf54a73700):ZOO_DEBUG@zookeeper_process@1983:
>>> Got ping response in 0 ms
>>> W0923 11:35:38.002933  2267 slave.cpp:1322] Ignoring updating pid for
>>> framework marathon-0.0.6 because it does not exist
>>> W0923 11:35:38.359627  2269 slave.cpp:1322] Ignoring updating pid for
>>> framework marathon-0.0.6 because it does not exist
>>> W0923 11:35:39.003171  2266 slave.cpp:1322] Ignoring updating pid for
>>> framework marathon-0.0.6 because it does not exist
>>> ```
>>>
>>> and master also with :
>>>
>>> I0923 11:35:33.420017  3685 master.cpp:734] Re-registering framework
>>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>>> I0923 11:35:33.420178  3685 master.cpp:753] Framework marathon-0.0.6
>>> failed over
>>> I0923 11:35:33.668504  3683 master.cpp:1445] Sending 1 offers to
>>> framework marathon-0.0.6
>>> W0923 11:35:33.708227  3686 master.cpp:80] No whitelist given.
>>> Advertising offers for all slaves
>>> I0923 11:35:33.776002  3686 master.cpp:734] Re-registering framework
>>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>>> I0923 11:35:33.776146  3686 master.cpp:753] Framework marathon-0.0.6
>>> failed over
>>> I0923 11:35:33.776432  3684 hierarchical_allocator_process.hpp:598]
>>> Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
>>> (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
>>> ports(*):[31000-32000]) on slave 201309231034-50309312-5050--2 from
>>> framework marathon-0.0.6
>>> I0923 11:35:34.419661  3686 master.cpp:734] Re-registering framework
>>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>>> I0923 11:35:34.419801  3686 master.cpp:753] Framework marathon-0.0.6
>>> failed over
>>> I0923 11:35:34.669680  3684 master.cpp:1445] Sending 1 offers to
>>> framework marathon-0.0.6
>>> I0923 11:35:34.776325  3684 master.cpp:734] Re-registering framework
>>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>>> I0923 11:35:34.776445  3684 master.cpp:753] Framework marathon-0.0.6
>>> failed over
>>> I0923 11:35:34.776748  3684 hierarchical_allocator_process.hpp:598]
>>> Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
>>> (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
>>> ports(*):[31000-32000]) on slave 201309231034-50309312-5050--2 from
>>> framework marathon-0.0.6
>>>
>>> When I try to start a service with marathon : base on the example given :
>>>
>>> marathon -H http://192.168.255.1:8080 start -i chronos -u
>>> https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz-C 
>>> "./chronos/bin/demo ./chronos/config/nomail.yml
>>> ./chronos/target/chronos-1.0-SNAPSHOT.jar"
>>> Starting app 'chronos'
>>> ERROR:
>>>
>>> Seams to be there :
>>>
>>> marathon -H http://192.168.255.1:8080 list
>>> App ID:chronos
>>> Command:   ./chronos/bin/demo ./chronos/config/nomail.yml
>>> ./chronos/target/chronos-1.0-SNAPSHOT.jar
>>> Instances: 1
>>> CPUs:  1.0
>>> Memory:10.0 MB
>>> URI:
>>> https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz
>>>
>>> chronos have the same problem about non existing id on slave, I can
>>> create scheduled command but it is never executed.
>>>
>>> Thank you for any help understanding this.
>>>
>>> --
>>> Damien HARDY
>>>
>>
>>
>
>
> --
> Damien HARDY
> IT Infrastructure Architect
>  Viadeo - 30 rue de la Victoire - 75009 Paris - France
>



-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France


Blog post/Tuto on mesos

2013-09-30 Thread Damien Hardy
Hello,

I don't know if @riywo is on this ML but I just landed on his blog post
series (1) about mesos.

Great work of tutorial plus a tiny usefull tool with msh (2)

(1) http://tech.riywo.com/blog/2013/09/27/mesos-introduction-1/
(2) https://github.com/riywo/msh

-- 
Damien HARDY


Re: Aurora, Marathon and long lived job frameworks

2013-09-27 Thread Damien Hardy
Hello,

What about chronos http://airbnb.github.io/chronos/

Best regards,


2013/9/27 Dan Colish 

> I have been working on an internal project for executing a large number of
> jobs across a cluster for the past couple of months and I am currently
> doing a spike on using mesos for some of the cluster management tasks. The
> clear prior art winners are Aurora and Marathon, but in both cases they
> fall short of what I need.
>
> In aurora's case, the software is clearly very early in the open sourcing
> process and as a result it missing significant pieces. The biggest missing
> piece is the actual execution framework, Thermos. [That is what I assume
> thermos does. I have no internal knowledge to verify that assumption]
> Additionally, Aurora is heavily optimized for a high user count and large
> number of incoming jobs. My use case is much simpler. There is only one
> effective user and we have a small known set of jobs which need to run.
>
> On the other hand, Marathon is not designed for job execution if job is
> defined to be a smaller unit of work. Instead, Marathon self-describes as a
> meta-framework for deploying frameworks to a mesos cluster. A job to
> marathon is the framework that runs. I do not think Marathon would be a
> good fit for managing the my task execution and retry logic. It is designed
> to run at on as a sub-layer of the cluster's resource allocation scheduler
> and its abstractions follow suit.
>
> For my needs Aurora does appear to be a much closer fit than Marathon, but
> neither is ideal. Since that is the case, I find myself left with a rough
> choice. I am not thrilled with the prospect of yet another framework for
> Mesos, but there is a lot of work which I have already completed for my
> internal project that would need to reworked to fit with Aurora. Currently
> my project can support the following features.
>
> * Distributed job locking - jobs cannot overlap
> * Job execution delay queue - jobs can be run immediately or after a delay
> * Job preemption
> * Job success/failure tracking
> * Garbage collection of dead jobs
> * Job execution failover - job is retried on a new executor
> * Executor warming - min # of executors idle
> * Executor limits - max # of executors available
>
> My plan for integration with mesos is to adapt the job manager into a
> mesos scheduler and my execution slaves into a mesos executor. At that
> point, my framework will be able to run on the mesos cluster, but I have a
> few concerns about how to allocated and release resources that the
> executors will use over the lifetime of the cluster. I am not sure whether
> it is better to be greedy early on in the frameworks life-cycle or to
> decline resources initially and scale the framework's slaves when jobs
> start coming in. Additionally, the relationship between the executor and
> its associated driver are not immediately clear to me. If I am reading the
> code correctly, they do not provide a way to stop a task in progress short
> of killing the executor process.
>
> I think that mesos will be a nice feature to add to my project and I would
> really appreciate any feedback from the community. I will provide progress
> updates as I continue work on my experiments.
>



-- 
Damien HARDY


Re: Disk Resource Offer Control

2013-09-26 Thread Damien Hardy
Hello

On the mesos-slave exec you can give it an option :

--work_dir=VALUE   Where to place framework work
directories
 (default: /tmp/mesos)

Best regards,



2013/9/26 Phil Siegrist 

> Hi, I'm currently running mesos on EC2. The root partition is small. Is
> there a way to specify the location/s for disk resources.
>
> All my work is currently being deployed to tmp.
>



-- 
Damien HARDY


Re: is mesos-submit broken on HEAD (0.15) ?

2013-09-24 Thread Damien Hardy
Hi,
I finaly get the patch via jira thank too google cache.
mesos-submit ended without errors. (but only on local slave because
executor does not exists on remote slave)

Task end in STAGING status.
What does it means ?
Is there some description of possible states ?
I see in a presentation about jenckins on mesos that LOST status is very
important but what about others ?

Thank you.

-- 
Damien



2013/9/23 Damien Hardy 

> Thank you Benjamin,
>
> I get 502 errors for now on https://reviews.apache.org /o\
>
>
> 2013/9/20 Benjamin Mahler 
>
>> mesos-submit is indeed broken and in need of some love, David Greenberg
>> has a review to fix it:
>> https://reviews.apache.org/r/13367/
>>
>>
>> On Fri, Sep 20, 2013 at 8:06 AM, Damien Hardy wrote:
>>
>>> Hello,
>>>
>>> mesos-submit seams broken (or maybe I missed something)
>>>
>>> I want to execute some helloworld on my deployed mesos cluster.
>>>
>>> ```
>>> vagrant@master01:~/mesos$ ./frameworks/mesos-submit/mesos_submit.py
>>> zk://192.168.255.2:2181/mesos 'echo plop'
>>> Connecting to mesos master zk://192.168.255.2:2181/mesos
>>> Traceback (most recent call last):
>>>   File "./frameworks/mesos-submit/mesos_submit.py", line 102, in 
>>> mesos.MesosSchedulerDriver(sched, master).run()
>>> TypeError: function takes exactly 3 arguments (2 given)
>>> ```
>>>
>>> test-frameworks suppose that the whole build directory is deployed on
>>> every nodes (at the same place).
>>> And running it complains about test-executor file not found because I
>>> want to deploy nodes using debian package of slave service and dependencies
>>> (without tests files).
>>>
>>>
>>
>


Re: mesos frameworks registration ?

2013-09-24 Thread Damien Hardy
Not yet,

I suspected some misconfiguration on mesos part because chronos as the same
behaviour.



2013/9/23 Benjamin Mahler 

> It looks like the Marathon framework is continually failing over, have you
> sought help from the Marathon developers?
>
>
> On Mon, Sep 23, 2013 at 2:52 AM, Damien Hardy wrote:
>
>> Hello there,
>>
>> I might miss something about framework deployment on mesos.
>>
>> I try to get chronos or marathon frameworks working with HEAD of mesos
>> running distributed.
>>
>> I mesos topology seams OK slaves report to master and I can see offers of
>> resources (total available) on the mesos HTTP interface.
>>
>> 192.168.255.1 : marathon or chronos
>> 192.168.255.2 : zookeeper + mesos master
>> 192.168.255.3 : mesos slave
>>
>> Then I start marathon or chornos (HEAD version for both with pom.xml
>> using "0.15.0-20130910-2" for example.
>>
>> It seams succeed in finding master, I can see the frameworks listed.
>> But mesos services seams complain permanently, flooding logs on slave
>> with :
>>
>> ```
>> 2013-09-23 
>> 11:35:37,405:2264(0x7faf54a73700):ZOO_DEBUG@zookeeper_process@1983:
>> Got ping response in 0 ms
>> W0923 11:35:38.002933  2267 slave.cpp:1322] Ignoring updating pid for
>> framework marathon-0.0.6 because it does not exist
>> W0923 11:35:38.359627  2269 slave.cpp:1322] Ignoring updating pid for
>> framework marathon-0.0.6 because it does not exist
>> W0923 11:35:39.003171  2266 slave.cpp:1322] Ignoring updating pid for
>> framework marathon-0.0.6 because it does not exist
>> ```
>>
>> and master also with :
>>
>> I0923 11:35:33.420017  3685 master.cpp:734] Re-registering framework
>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>> I0923 11:35:33.420178  3685 master.cpp:753] Framework marathon-0.0.6
>> failed over
>> I0923 11:35:33.668504  3683 master.cpp:1445] Sending 1 offers to
>> framework marathon-0.0.6
>> W0923 11:35:33.708227  3686 master.cpp:80] No whitelist given.
>> Advertising offers for all slaves
>> I0923 11:35:33.776002  3686 master.cpp:734] Re-registering framework
>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>> I0923 11:35:33.776146  3686 master.cpp:753] Framework marathon-0.0.6
>> failed over
>> I0923 11:35:33.776432  3684 hierarchical_allocator_process.hpp:598]
>> Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
>> (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
>> ports(*):[31000-32000]) on slave 201309231034-50309312-5050--2 from
>> framework marathon-0.0.6
>> I0923 11:35:34.419661  3686 master.cpp:734] Re-registering framework
>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>> I0923 11:35:34.419801  3686 master.cpp:753] Framework marathon-0.0.6
>> failed over
>> I0923 11:35:34.669680  3684 master.cpp:1445] Sending 1 offers to
>> framework marathon-0.0.6
>> I0923 11:35:34.776325  3684 master.cpp:734] Re-registering framework
>> marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
>> I0923 11:35:34.776445  3684 master.cpp:753] Framework marathon-0.0.6
>> failed over
>> I0923 11:35:34.776748  3684 hierarchical_allocator_process.hpp:598]
>> Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
>> (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
>> ports(*):[31000-32000]) on slave 201309231034-50309312-5050--2 from
>> framework marathon-0.0.6
>>
>> When I try to start a service with marathon : base on the example given :
>>
>> marathon -H http://192.168.255.1:8080 start -i chronos -u
>> https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz-C 
>> "./chronos/bin/demo ./chronos/config/nomail.yml
>> ./chronos/target/chronos-1.0-SNAPSHOT.jar"
>> Starting app 'chronos'
>> ERROR:
>>
>> Seams to be there :
>>
>> marathon -H http://192.168.255.1:8080 list
>> App ID:chronos
>> Command:   ./chronos/bin/demo ./chronos/config/nomail.yml
>> ./chronos/target/chronos-1.0-SNAPSHOT.jar
>> Instances: 1
>> CPUs:  1.0
>> Memory:10.0 MB
>> URI:
>> https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz
>>
>> chronos have the same problem about non existing id on slave, I can
>> create scheduled command but it is never executed.
>>
>> Thank you for any help understanding this.
>>
>> --
>> Damien HARDY
>>
>
>


-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France


mesos frameworks registration ?

2013-09-23 Thread Damien Hardy
Hello there,

I might miss something about framework deployment on mesos.

I try to get chronos or marathon frameworks working with HEAD of mesos
running distributed.

I mesos topology seams OK slaves report to master and I can see offers of
resources (total available) on the mesos HTTP interface.

192.168.255.1 : marathon or chronos
192.168.255.2 : zookeeper + mesos master
192.168.255.3 : mesos slave

Then I start marathon or chornos (HEAD version for both with pom.xml using
"0.15.0-20130910-2" for example.

It seams succeed in finding master, I can see the frameworks listed.
But mesos services seams complain permanently, flooding logs on slave with :

```
2013-09-23 11:35:37,405:2264(0x7faf54a73700):ZOO_DEBUG@zookeeper_process@1983:
Got ping response in 0 ms
W0923 11:35:38.002933  2267 slave.cpp:1322] Ignoring updating pid for
framework marathon-0.0.6 because it does not exist
W0923 11:35:38.359627  2269 slave.cpp:1322] Ignoring updating pid for
framework marathon-0.0.6 because it does not exist
W0923 11:35:39.003171  2266 slave.cpp:1322] Ignoring updating pid for
framework marathon-0.0.6 because it does not exist
```

and master also with :

I0923 11:35:33.420017  3685 master.cpp:734] Re-registering framework
marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
I0923 11:35:33.420178  3685 master.cpp:753] Framework marathon-0.0.6 failed
over
I0923 11:35:33.668504  3683 master.cpp:1445] Sending 1 offers to framework
marathon-0.0.6
W0923 11:35:33.708227  3686 master.cpp:80] No whitelist given. Advertising
offers for all slaves
I0923 11:35:33.776002  3686 master.cpp:734] Re-registering framework
marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
I0923 11:35:33.776146  3686 master.cpp:753] Framework marathon-0.0.6 failed
over
I0923 11:35:33.776432  3684 hierarchical_allocator_process.hpp:598]
Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
(total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
ports(*):[31000-32000]) on slave 201309231034-50309312-5050--2 from
framework marathon-0.0.6
I0923 11:35:34.419661  3686 master.cpp:734] Re-registering framework
marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
I0923 11:35:34.419801  3686 master.cpp:753] Framework marathon-0.0.6 failed
over
I0923 11:35:34.669680  3684 master.cpp:1445] Sending 1 offers to framework
marathon-0.0.6
I0923 11:35:34.776325  3684 master.cpp:734] Re-registering framework
marathon-0.0.6 at scheduler(1)@192.168.3.224:58107
I0923 11:35:34.776445  3684 master.cpp:753] Framework marathon-0.0.6 failed
over
I0923 11:35:34.776748  3684 hierarchical_allocator_process.hpp:598]
Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000]
(total allocatable: cpus(*):2; mem(*):2942; disk(*):35195;
ports(*):[31000-32000]) on slave 201309231034-50309312-5050--2 from
framework marathon-0.0.6

When I try to start a service with marathon : base on the example given :

marathon -H http://192.168.255.1:8080 start -i chronos -u
https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz -C
"./chronos/bin/demo ./chronos/config/nomail.yml
./chronos/target/chronos-1.0-SNAPSHOT.jar"
Starting app 'chronos'
ERROR:

Seams to be there :

marathon -H http://192.168.255.1:8080 list
App ID:chronos
Command:   ./chronos/bin/demo ./chronos/config/nomail.yml
./chronos/target/chronos-1.0-SNAPSHOT.jar
Instances: 1
CPUs:  1.0
Memory:10.0 MB
URI:
https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz

chronos have the same problem about non existing id on slave, I can create
scheduled command but it is never executed.

Thank you for any help understanding this.

-- 
Damien HARDY


Re: is mesos-submit broken on HEAD (0.15) ?

2013-09-23 Thread Damien Hardy
Thank you Benjamin,

I get 502 errors for now on https://reviews.apache.org /o\


2013/9/20 Benjamin Mahler 

> mesos-submit is indeed broken and in need of some love, David Greenberg
> has a review to fix it:
> https://reviews.apache.org/r/13367/
>
>
> On Fri, Sep 20, 2013 at 8:06 AM, Damien Hardy wrote:
>
>> Hello,
>>
>> mesos-submit seams broken (or maybe I missed something)
>>
>> I want to execute some helloworld on my deployed mesos cluster.
>>
>> ```
>> vagrant@master01:~/mesos$ ./frameworks/mesos-submit/mesos_submit.py zk://
>> 192.168.255.2:2181/mesos 'echo plop'
>> Connecting to mesos master zk://192.168.255.2:2181/mesos
>> Traceback (most recent call last):
>>   File "./frameworks/mesos-submit/mesos_submit.py", line 102, in 
>> mesos.MesosSchedulerDriver(sched, master).run()
>> TypeError: function takes exactly 3 arguments (2 given)
>> ```
>>
>> test-frameworks suppose that the whole build directory is deployed on
>> every nodes (at the same place).
>> And running it complains about test-executor file not found because I
>> want to deploy nodes using debian package of slave service and dependencies
>> (without tests files).
>>
>> --
>> Damien
>>
>
>


-- 
Damien HARDY
IT Infrastructure Architect
Viadeo - 30 rue de la Victoire - 75009 Paris - France


is mesos-submit broken on HEAD (0.15) ?

2013-09-20 Thread Damien Hardy
Hello,

mesos-submit seams broken (or maybe I missed something)

I want to execute some helloworld on my deployed mesos cluster.

```
vagrant@master01:~/mesos$ ./frameworks/mesos-submit/mesos_submit.py zk://
192.168.255.2:2181/mesos 'echo plop'
Connecting to mesos master zk://192.168.255.2:2181/mesos
Traceback (most recent call last):
  File "./frameworks/mesos-submit/mesos_submit.py", line 102, in 
mesos.MesosSchedulerDriver(sched, master).run()
TypeError: function takes exactly 3 arguments (2 given)
```

test-frameworks suppose that the whole build directory is deployed on every
nodes (at the same place).
And running it complains about test-executor file not found because I want
to deploy nodes using debian package of slave service and dependencies
(without tests files).

-- 
Damien