Add to Powered by Mesos list?

2016-05-18 Thread Guillermo Rodriguez
Hi,
  
 May we ask to be included to the "Powered by Mesos" list?
  
 Name: CMCRC
 URL: http://www.cmcrc.com/
  
 Regards,
 Guillermo
  



Delete the /observe HTTP endpoint

2016-05-18 Thread Qian Zhang
Hi Folks,

We are going to delete the master "/observe" HTTP endpoint in the JIRA
ticket MESOS-5408 since this endpoint was introduced a long time ago
for supporting functionality that was never implemented.

Please let us know if you have any comments or concerns, thanks!


Thanks,
Qian Zhang


Re: Cannot pull from private docker v1 registry

2016-05-18 Thread Joseph Wu
The stderr you posted suggests that Mesos successfully fetched your
.dockercfg.  If the following docker pull fails, there should be additional
logs printed either in the Mesos agent logs, or in the task stderr.

Can you check those as well?  (And post them here.)

On Wed, May 18, 2016 at 2:29 PM, Scott Kinney  wrote:

> I have a valid .dockercfg credential file on the slave that I pass as a
> uri in the marathon app definition like...
>
>   "uris": [
>   "file:///root/.dockercfg"
>
>   ],
>
> it fails.
> Mesos sandbox stderr...
>
> I0517 21:45:04.104918  5512 fetcher.cpp:424] Fetcher Info:
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/cf607b5a-b629-46f1-a053-0659b78c4231-S454","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":false,"value":"file:\/\/\/root\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/cf607b5a-b629-46f1-a053-0659b78c4231-S454\/frameworks\/cf607b5a-b629-46f1-a053-0659b78c4231-\/executors\/gridservice.9d35ca3e-1c78-11e6-8664-0242472674ba\/runs\/9c650d01-127c-416b-a00b-5ad09409c76e"}
> I0517 21:45:04.106462  5512 fetcher.cpp:379] Fetching URI
> 'file:///root/.dockercfg' I0517 21:45:04.106475  5512 fetcher.cpp:250]
> Fetching directly into the sandbox directory I0517 21:45:04.106487  5512
> fetcher.cpp:187] Fetching URI 'file:///root/.dockercfg' I0517
> 21:45:04.106499  5512 fetcher.cpp:167] Copying resource with command:cp
> '/root/.dockercfg'
> '/tmp/mesos/slaves/cf607b5a-b629-46f1-a053-0659b78c4231-S454/frameworks/cf607b5a-b629-46f1-a053-0659b78c4231-/executors/gridservice.9d35ca3e-1c78-11e6-8664-0242472674ba/runs/9c650d01-127c-416b-a00b-5ad09409c76e/.dockercfg'
> I0517 21:45:04.107993  5512 fetcher.cpp:456] Fetched
> 'file:///root/.dockercfg' to
> '/tmp/mesos/slaves/cf607b5a-b629-46f1-a053-0659b78c4231-S454/frameworks/cf607b5a-b629-46f1-a053-0659b78c4231-/executors/gridservice.9d35ca3e-1c78-11e6-8664-0242472674ba/runs/9c650d01-127c-416b-a00b-5ad09409c76e/.dockercfg
>
>
> Marathon debug says it can't authenticate. I can pull manually on the
> slave with this credential file.
> Any idea what i'm doing wrong?
>
>
> Scott Kinney | DevOps
> stem   |   m  510.282.1299
> 100 Rollins Road, Millbrae, California 94030
>
>  This e-mail and/or any attachments contain Stem, Inc. confidential and
> proprietary information and material for the sole use of the intended
> recipient(s). Any review, use or distribution that has not been expressly
> authorized by Stem, Inc. is strictly prohibited.  If you are not the
> intended recipient, please contact the sender and delete all copies. Thank
> you.


Cannot pull from private docker v1 registry

2016-05-18 Thread Scott Kinney
I have a valid .dockercfg credential file on the slave that I pass as a uri in 
the marathon app definition like...

  "uris": [
      "file:///root/.dockercfg"

  ],

it fails. 
Mesos sandbox stderr...

I0517 21:45:04.104918  5512 fetcher.cpp:424] Fetcher Info: 
{"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/cf607b5a-b629-46f1-a053-0659b78c4231-S454","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":false,"value":"file:\/\/\/root\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/cf607b5a-b629-46f1-a053-0659b78c4231-S454\/frameworks\/cf607b5a-b629-46f1-a053-0659b78c4231-\/executors\/gridservice.9d35ca3e-1c78-11e6-8664-0242472674ba\/runs\/9c650d01-127c-416b-a00b-5ad09409c76e"}
 I0517 21:45:04.106462  5512 fetcher.cpp:379] Fetching URI 
'file:///root/.dockercfg' I0517 21:45:04.106475  5512 fetcher.cpp:250] Fetching 
directly into the sandbox directory I0517 21:45:04.106487  5512 
fetcher.cpp:187] Fetching URI 'file:///root/.dockercfg' I0517 21:45:04.106499  
5512 fetcher.cpp:167] Copying resource with command:cp '/root/.dockercfg' 
'/tmp/mesos/slaves/cf607b5a-b629-46f1-a053-0659b78c4231-S454/frameworks/cf607b5a-b629-46f1-a053-0659b78c4231-/executors/gridservice.9d35ca3e-1c78-11e6-8664-0242472674ba/runs/9c650d01-127c-416b-a00b-5ad09409c76e/.dockercfg'
 I0517 21:45:04.107993  5512 fetcher.cpp:456] Fetched 'file:///root/.dockercfg' 
to 
'/tmp/mesos/slaves/cf607b5a-b629-46f1-a053-0659b78c4231-S454/frameworks/cf607b5a-b629-46f1-a053-0659b78c4231-/executors/gridservice.9d35ca3e-1c78-11e6-8664-0242472674ba/runs/9c650d01-127c-416b-a00b-5ad09409c76e/.dockercfg


Marathon debug says it can't authenticate. I can pull manually on the slave 
with this credential file.
Any idea what i'm doing wrong?

 
Scott Kinney | DevOps 
stem   |   m  510.282.1299 
100 Rollins Road, Millbrae, California 94030  

 This e-mail and/or any attachments contain Stem, Inc. confidential and 
proprietary information and material for the sole use of the intended 
recipient(s). Any review, use or distribution that has not been expressly 
authorized by Stem, Inc. is strictly prohibited.  If you are not the intended 
recipient, please contact the sender and delete all copies. Thank you.

Re: How is the OS X environment created with Mesos

2016-05-18 Thread James Peach
This probably boils down to not being in the right launchd session.
launchd(8) discusses this at a high level. You can see what is going
on in your user session with "launchctl print user/$(id -u)".

I'm not sure what the right mechanics ought to be for Mesos. It used
to be that you would use the "bsexec" subcommand to run something in a
different session, but that is deprecated and I don't see an obvious
replacement in the new subcommands. Maybe worth asking on the
launchd-dev mailing list ...


On 11 May 2016 at 12:10, DiGiorgio, Mr. Rinaldo S.  wrote:
>
> On May 5, 2016, at 13:28, haosdent  wrote:
>
>>There is no explicit statement about what Mesos means when it runs a task
>> as some other user.
> I think this is just ensure the running user of the task is the user you
> given. In Mesos, it jus call the [setuid](http://linux.die.net/man/2/setuid)
> to change the user, It would not execute something like the bashrc script of
> user.
>
>
> I have been unable to solve this problem for the last few days. I am
> wondering if you have any ideas.
>
>
>
> When Mesos starts a task on an OSX machine, the task is run with setuid to
> the user I have asked for.  When that user runs I cannot get that user to
> have a default login keychain.  I want to initialize the environment so that
> user has something that looks like this.
>
>  existinguser$ security login-keychain
>
>
>  "/Users/rinaldo/Library/Keychains/login.keychain”
>
>
> I have tried many options to create the above keychain for the other user
> that is running in a process that was created by mesos and changed to that
> user with setuid.
>
> I understand that is likely not a Mesos issue. I am hoping someone on this
> alias has come across this issue or something similar.  I have tried the
> following and they have all failed.
>
> su -c   as existinguser
>
> /bin/login as existinguser
>
> OSX is not Open Source so it is difficult to understand what it is they do
> to create a user environment.  The “security” application has many options
> to create keychains but when I use those options the Keychains endup in
>
>
> "/Library/Keychains/System.keychain"
>
>"/Library/Keychains/System.keychain”
>
>
>   I have no investigated how a user is able to create a keychain in the
> System.keychain when running as a user in a Mesos created process.
>
>
> Rinaldo
>
>
>
>
>
> On Thu, May 5, 2016 at 7:41 PM, DiGiorgio, Mr. Rinaldo S.
>  wrote:
>>
>> Hi,
>>
>> Recently I noticed that the Mesos Jenkins plugin supports the
>> setting of environment variables. Somewhere between 0.26 and 0.28.1,
>> settings like
>>
>> USER=
>> HOME=
>>
>> were required to get things to work the way they had worked. I
>> have been able to set the environment this way but I have some concerns
>> about it.
>>
>> There is no explicit statement about what Mesos means when it runs
>> a task as some other user.  Clearly it is not running some of the scripts
>> normally run during login.  This was a constant source of confusion with
>> Jenkins. If one can state what exactly is done to create the user
>> environment each platform and how it is different that others it will save
>> countless hours of debugging IMO. I realize OSX is an odd system -- linux at
>> times, Apple specific at times in areas that conflict with Linux but this
>> will only get more complicated when Windows agents become available.
>>
>>
>>
>> Rinaldo
>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>



-- 
James Peach | jor...@gmail.com


Re: Consequences of health-check timeouts?

2016-05-18 Thread Steven Schlansker

> On May 18, 2016, at 10:44 AM, haosdent  wrote:
> 
> >In re executor_shutdown_grace_period: how would this enable the task 
> >(MongoDB) to terminate gracefully? (BTW: I am fairly certain that the mongo 
> >STDOUT as captured by Mesos shows that it received signal 15 just before it 
> >said good-bye). My naive understanding of this grace period is that it 
> >simply delays the termination of the executor.

I'm not 100% sure this is related or helpful, but be aware that we believe 
there is a bug in the Docker
containerizer's handling of logs during shutdown:

https://issues.apache.org/jira/browse/MESOS-5195

We spent a lot of time debugging why our application was not shutting down as 
we expected,
only to find that the real problem was that Mesos was losing all logs sent 
during
shutdown.

> 
> If you use DockerContainerizer, mesos use executor_shutdown_grace_period as 
> the shutdown gracefully timeout for task as well. If you use 
> MesosContainerizer, it would send SIGTERM(15) first. After 3 seconds, if the 
> task is still alive, Mesos would send SIGKILL(9) to the task again.
> 
> >I'm not sure what the java task is. This took place on the mesos-master node 
> >and none of our applications runs there. It runs master, Marathon, and ZK. 
> >Maybe the java task is Marathon or ZK?
> 
> Not sure about this, maybe others have similar experience on this, do 
> Marathon or Zookeeper abnormal at that time? Could you provide the log of 
> mesos-master/mesos-slave when accident happened as well?
> 
> 
> On Wed, May 18, 2016 at 7:11 PM, Paul Bell  wrote:
> Hi Hasodent,
> 
> Thanks for your reply.
> 
> In re executor_shutdown_grace_period: how would this enable the task 
> (MongoDB) to terminate gracefully? (BTW: I am fairly certain that the mongo 
> STDOUT as captured by Mesos shows that it received signal 15 just before it 
> said good-bye). My naive understanding of this grace period is that it simply 
> delays the termination of the executor.
> 
> The following snippet is rom /var/log/syslog. I believe it shows the stack 
> trace (largely in the kernel) that led to mesos-master being blocked for more 
> than 120 seconds. Please note that immediately above (before) the blocked 
> mesos-master is a blocked jbd2/dm. Immediately below (after) the blocked 
> mesos-master is a blocked java task. I'm not sure what the java task is. This 
> took place on the mesos-master node and none of our applications runs there. 
> It runs master, Marathon, and ZK. Maybe the java task is Marathon or ZK?
> 
> Thanks again.
> 
> -Paul
> May 16 20:06:53 71 kernel: [193339.890848] INFO: task mesos-master:4013 
> blocked for more than 120 seconds.
> 
> May 16 20:06:53 71 kernel: [193339.890873]   Not tainted 
> 3.13.0-32-generic #57-Ubuntu
> 
> May 16 20:06:53 71 kernel: [193339.890889] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 
> May 16 20:06:53 71 kernel: [193339.890912] mesos-masterD 88013fd94440 
> 0  4013  1 0x
> 
> May 16 20:06:53 71 kernel: [193339.890914]  880137429a28 0002 
> 880135778000 880137429fd8
> 
> May 16 20:06:53 71 kernel: [193339.890916]  00014440 00014440 
> 880135778000 88013fd94cd8
> 
> May 16 20:06:53 71 kernel: [193339.890918]  88013ffd34b0 0002 
> 81284630 880137429aa0
> 
> May 16 20:06:53 71 kernel: [193339.890919] Call Trace:
> 
> May 16 20:06:53 71 kernel: [193339.890922]  [] ? 
> start_this_handle+0x590/0x590
> 
> May 16 20:06:53 71 kernel: [193339.890924]  [] 
> io_schedule+0x9d/0x140
> 
> May 16 20:06:53 71 kernel: [193339.890925]  [] 
> sleep_on_shadow_bh+0xe/0x20
> 
> May 16 20:06:53 71 kernel: [193339.890927]  [] 
> __wait_on_bit+0x62/0x90
> 
> May 16 20:06:53 71 kernel: [193339.890929]  [] ? 
> start_this_handle+0x590/0x590
> 
> May 16 20:06:53 71 kernel: [193339.890930]  [] 
> out_of_line_wait_on_bit+0x77/0x90
> 
> May 16 20:06:53 71 kernel: [193339.890932]  [] ? 
> autoremove_wake_function+0x40/0x40
> 
> May 16 20:06:53 71 kernel: [193339.890934]  [] ? 
> wake_up_bit+0x25/0x30
> 
> May 16 20:06:53 71 kernel: [193339.890936]  [] 
> do_get_write_access+0x2ad/0x4f0
> 
> May 16 20:06:53 71 kernel: [193339.890938]  [] ? 
> __getblk+0x2d/0x2e0
> 
> May 16 20:06:53 71 kernel: [193339.890939]  [] 
> jbd2_journal_get_write_access+0x27/0x40
> 
> May 16 20:06:53 71 kernel: [193339.890942]  [] 
> __ext4_journal_get_write_access+0x3b/0x80
> 
> May 16 20:06:53 71 kernel: [193339.890946]  [] 
> ext4_reserve_inode_write+0x70/0xa0
> 
> May 16 20:06:53 71 kernel: [193339.890948]  [] ? 
> ext4_dirty_inode+0x40/0x60
> 
> May 16 20:06:53 71 kernel: [193339.890949]  [] 
> ext4_mark_inode_dirty+0x44/0x1f0
> 
> May 16 20:06:53 71 kernel: [193339.890951]  [] 
> ext4_dirty_inode+0x40/0x60
> 
> May 16 20:06:53 71 kernel: [193339.890953]  [] 
> __mark_inode_dirty+0x10a/0x2d0
> 
> May 16 20:06:53 71 kernel: [193339.890956]  [] 
> update_time+0x81/0xd0
> 
> May 16 20:06:53 71 kernel: [19

Re: Consequences of health-check timeouts?

2016-05-18 Thread haosdent
>In re executor_shutdown_grace_period: how would this enable the task
(MongoDB) to terminate gracefully? (BTW: I am fairly certain that the mongo
STDOUT as captured by Mesos shows that it received signal 15 just before it
said good-bye). My naive understanding of this grace period is that it
simply delays the termination of the executor.

If you use DockerContainerizer, mesos use executor_shutdown_grace_period as
the shutdown gracefully timeout for task as well. If you use
MesosContainerizer, it would send SIGTERM(15) first. After 3 seconds, if
the task is still alive, Mesos would send SIGKILL(9) to the task again.

>I'm not sure what the java task is. This took place on the mesos-master
node and none of our applications runs there. It runs master, Marathon, and
ZK. Maybe the java task is Marathon or ZK?

Not sure about this, maybe others have similar experience on this, do
Marathon or Zookeeper abnormal at that time? Could you provide the log of
mesos-master/mesos-slave when accident happened as well?


On Wed, May 18, 2016 at 7:11 PM, Paul Bell  wrote:

> Hi Hasodent,
>
> Thanks for your reply.
>
> In re executor_shutdown_grace_period: how would this enable the task
> (MongoDB) to terminate gracefully? (BTW: I am fairly certain that the mongo
> STDOUT as captured by Mesos shows that it received signal 15 just before it
> said good-bye). My naive understanding of this grace period is that it
> simply delays the termination of the executor.
>
> The following snippet is rom /var/log/syslog. I believe it shows the stack
> trace (largely in the kernel) that led to mesos-master being blocked for
> more than 120 seconds. Please note that immediately above (before) the
> blocked mesos-master is a blocked jbd2/dm. Immediately below (after) the
> blocked mesos-master is a blocked java task. I'm not sure what the java
> task is. This took place on the mesos-master node and none of our
> applications runs there. It runs master, Marathon, and ZK. Maybe the java
> task is Marathon or ZK?
>
> Thanks again.
>
> -Paul
>
> May 16 20:06:53 71 kernel: [193339.890848] INFO: task mesos-master:4013
> blocked for more than 120 seconds.
>
> May 16 20:06:53 71 kernel: [193339.890873]   Not tainted
> 3.13.0-32-generic #57-Ubuntu
>
> May 16 20:06:53 71 kernel: [193339.890889] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>
> May 16 20:06:53 71 kernel: [193339.890912] mesos-masterD
> 88013fd94440 0  4013  1 0x
>
> May 16 20:06:53 71 kernel: [193339.890914]  880137429a28
> 0002 880135778000 880137429fd8
>
> May 16 20:06:53 71 kernel: [193339.890916]  00014440
> 00014440 880135778000 88013fd94cd8
>
> May 16 20:06:53 71 kernel: [193339.890918]  88013ffd34b0
> 0002 81284630 880137429aa0
>
> May 16 20:06:53 71 kernel: [193339.890919] Call Trace:
>
> May 16 20:06:53 71 kernel: [193339.890922]  [] ?
> start_this_handle+0x590/0x590
>
> May 16 20:06:53 71 kernel: [193339.890924]  []
> io_schedule+0x9d/0x140
>
> May 16 20:06:53 71 kernel: [193339.890925]  []
> sleep_on_shadow_bh+0xe/0x20
>
> May 16 20:06:53 71 kernel: [193339.890927]  []
> __wait_on_bit+0x62/0x90
>
> May 16 20:06:53 71 kernel: [193339.890929]  [] ?
> start_this_handle+0x590/0x590
>
> May 16 20:06:53 71 kernel: [193339.890930]  []
> out_of_line_wait_on_bit+0x77/0x90
>
> May 16 20:06:53 71 kernel: [193339.890932]  [] ?
> autoremove_wake_function+0x40/0x40
>
> May 16 20:06:53 71 kernel: [193339.890934]  [] ?
> wake_up_bit+0x25/0x30
>
> May 16 20:06:53 71 kernel: [193339.890936]  []
> do_get_write_access+0x2ad/0x4f0
>
> May 16 20:06:53 71 kernel: [193339.890938]  [] ?
> __getblk+0x2d/0x2e0
>
> May 16 20:06:53 71 kernel: [193339.890939]  []
> jbd2_journal_get_write_access+0x27/0x40
>
> May 16 20:06:53 71 kernel: [193339.890942]  []
> __ext4_journal_get_write_access+0x3b/0x80
>
> May 16 20:06:53 71 kernel: [193339.890946]  []
> ext4_reserve_inode_write+0x70/0xa0
>
> May 16 20:06:53 71 kernel: [193339.890948]  [] ?
> ext4_dirty_inode+0x40/0x60
>
> May 16 20:06:53 71 kernel: [193339.890949]  []
> ext4_mark_inode_dirty+0x44/0x1f0
>
> May 16 20:06:53 71 kernel: [193339.890951]  []
> ext4_dirty_inode+0x40/0x60
>
> May 16 20:06:53 71 kernel: [193339.890953]  []
> __mark_inode_dirty+0x10a/0x2d0
>
> May 16 20:06:53 71 kernel: [193339.890956]  []
> update_time+0x81/0xd0
>
> May 16 20:06:53 71 kernel: [193339.890957]  []
> file_update_time+0x80/0xd0
>
> May 16 20:06:53 71 kernel: [193339.890961]  []
> __generic_file_aio_write+0x180/0x3d0
>
> May 16 20:06:53 71 kernel: [193339.890963]  []
> generic_file_aio_write+0x58/0xa0
>
> May 16 20:06:53 71 kernel: [193339.890965]  []
> ext4_file_write+0x99/0x400
>
> May 16 20:06:53 71 kernel: [193339.890967]  [] ?
> wake_up_state+0x10/0x20
>
> May 16 20:06:53 71 kernel: [193339.890970]  [] ?
> wake_futex+0x66/0x90
>
> May 16 20:06:53 71 kernel: [193339.890972]  [] ?
> futex_wake+0x1b1/0x1d0
>
> May 16 20:06:53 71 kernel: 

Re: Consequences of health-check timeouts?

2016-05-18 Thread Paul Bell
Hi Hasodent,

Thanks for your reply.

In re executor_shutdown_grace_period: how would this enable the task
(MongoDB) to terminate gracefully? (BTW: I am fairly certain that the mongo
STDOUT as captured by Mesos shows that it received signal 15 just before it
said good-bye). My naive understanding of this grace period is that it
simply delays the termination of the executor.

The following snippet is rom /var/log/syslog. I believe it shows the stack
trace (largely in the kernel) that led to mesos-master being blocked for
more than 120 seconds. Please note that immediately above (before) the
blocked mesos-master is a blocked jbd2/dm. Immediately below (after) the
blocked mesos-master is a blocked java task. I'm not sure what the java
task is. This took place on the mesos-master node and none of our
applications runs there. It runs master, Marathon, and ZK. Maybe the java
task is Marathon or ZK?

Thanks again.

-Paul

May 16 20:06:53 71 kernel: [193339.890848] INFO: task mesos-master:4013
blocked for more than 120 seconds.

May 16 20:06:53 71 kernel: [193339.890873]   Not tainted
3.13.0-32-generic #57-Ubuntu

May 16 20:06:53 71 kernel: [193339.890889] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.

May 16 20:06:53 71 kernel: [193339.890912] mesos-masterD
88013fd94440 0  4013  1 0x

May 16 20:06:53 71 kernel: [193339.890914]  880137429a28
0002 880135778000 880137429fd8

May 16 20:06:53 71 kernel: [193339.890916]  00014440
00014440 880135778000 88013fd94cd8

May 16 20:06:53 71 kernel: [193339.890918]  88013ffd34b0
0002 81284630 880137429aa0

May 16 20:06:53 71 kernel: [193339.890919] Call Trace:

May 16 20:06:53 71 kernel: [193339.890922]  [] ?
start_this_handle+0x590/0x590

May 16 20:06:53 71 kernel: [193339.890924]  []
io_schedule+0x9d/0x140

May 16 20:06:53 71 kernel: [193339.890925]  []
sleep_on_shadow_bh+0xe/0x20

May 16 20:06:53 71 kernel: [193339.890927]  []
__wait_on_bit+0x62/0x90

May 16 20:06:53 71 kernel: [193339.890929]  [] ?
start_this_handle+0x590/0x590

May 16 20:06:53 71 kernel: [193339.890930]  []
out_of_line_wait_on_bit+0x77/0x90

May 16 20:06:53 71 kernel: [193339.890932]  [] ?
autoremove_wake_function+0x40/0x40

May 16 20:06:53 71 kernel: [193339.890934]  [] ?
wake_up_bit+0x25/0x30

May 16 20:06:53 71 kernel: [193339.890936]  []
do_get_write_access+0x2ad/0x4f0

May 16 20:06:53 71 kernel: [193339.890938]  [] ?
__getblk+0x2d/0x2e0

May 16 20:06:53 71 kernel: [193339.890939]  []
jbd2_journal_get_write_access+0x27/0x40

May 16 20:06:53 71 kernel: [193339.890942]  []
__ext4_journal_get_write_access+0x3b/0x80

May 16 20:06:53 71 kernel: [193339.890946]  []
ext4_reserve_inode_write+0x70/0xa0

May 16 20:06:53 71 kernel: [193339.890948]  [] ?
ext4_dirty_inode+0x40/0x60

May 16 20:06:53 71 kernel: [193339.890949]  []
ext4_mark_inode_dirty+0x44/0x1f0

May 16 20:06:53 71 kernel: [193339.890951]  []
ext4_dirty_inode+0x40/0x60

May 16 20:06:53 71 kernel: [193339.890953]  []
__mark_inode_dirty+0x10a/0x2d0

May 16 20:06:53 71 kernel: [193339.890956]  []
update_time+0x81/0xd0

May 16 20:06:53 71 kernel: [193339.890957]  []
file_update_time+0x80/0xd0

May 16 20:06:53 71 kernel: [193339.890961]  []
__generic_file_aio_write+0x180/0x3d0

May 16 20:06:53 71 kernel: [193339.890963]  []
generic_file_aio_write+0x58/0xa0

May 16 20:06:53 71 kernel: [193339.890965]  []
ext4_file_write+0x99/0x400

May 16 20:06:53 71 kernel: [193339.890967]  [] ?
wake_up_state+0x10/0x20

May 16 20:06:53 71 kernel: [193339.890970]  [] ?
wake_futex+0x66/0x90

May 16 20:06:53 71 kernel: [193339.890972]  [] ?
futex_wake+0x1b1/0x1d0

May 16 20:06:53 71 kernel: [193339.890974]  []
do_sync_write+0x5a/0x90

May 16 20:06:53 71 kernel: [193339.890976]  []
vfs_write+0xb4/0x1f0

May 16 20:06:53 71 kernel: [193339.890978]  []
SyS_write+0x49/0xa0

May 16 20:06:53 71 kernel: [193339.890980]  []
tracesys+0xe1/0xe6



On Wed, May 18, 2016 at 2:33 AM, haosdent  wrote:

> >Is there some way to be given control (a callback, or an "exit" routine)
> so that the container about to be nuked can be given a chance to exit
> gracefully?
> The default value of executor_shutdown_grace_period is 5 seconds, you
> could change it by specify the `--executor_shutdown_grace_period` flag when
> launch mesos agent.
>
> >Are there other steps I can take to avoid this mildly calamitous
> occurrence?
> >mesos-slaves get shutdown
> Do you know where your mesos-master stuck when it happens? Any error log
> or related log about this? In addition, is there any log when mesos-slave
> shut down?
>
> On Wed, May 18, 2016 at 6:12 AM, Paul Bell  wrote:
>
>> Hi All,
>>
>> I probably have the following account partly wrong, but let me present it
>> just the same and those who know better can correct me as needed.
>>
>> I've an application that runs several MongoDB shards, each a Dockerized
>> container, each on a distinct node (VM); in fact, some of the VMs 

Re: Mesos Calico CNI

2016-05-18 Thread haosdent
>It's not yet clear to me what exactly I have to put in the directories
pointed by --network_cni_config_dir and --network_cni_plugins_dir in
order to create a network?

According to my understanding from code, Mesos try to parse any files under
--network_cni_config_dir except directories.

--network_cni_plugins_dir is used for find execute files. Suppose you
define your network in --network_cni_config_dir like

```
{
  "type": "foo",
  ...
  "ipam": {
"type: "bar",
  }
  ...
}
```

After Mesos finish parsing the network definition file under
--network_cni_config_dir, it would try find the execute file "foo" and
"bar" under
--network_cni_plugins_dir. Because the type of network you defined above is
"foo", and the type of "ipam" is "bar".

Just my quick reply, you could wait for Qian Zhang or Avinash's detail
reply later.


On Wed, May 18, 2016 at 4:40 PM, Frank Scholten 
wrote:

> Hi Avinash,
>
> Thanks for your response. I am following the steps at
> https://github.com/asridharan/mesos/blob/MESOS-4771/docs/cni.md and
> when I run the mesos-execute command on the cluster I started at
> https://github.com/ContainerSolutions/mesos-calico-cni-sandbox I get a
> message saying the network does not exist. This is ok because I have
> not created the network yet.
>
> It's not yet clear to me what exactly I have to put in the directories
> pointed by --network_cni_config_dir and --network_cni_plugins_dir in
> order to create a network?
>
>
> On Tue, May 17, 2016 at 5:16 PM, Avinash Sridharan
>  wrote:
> > Hi Frank,
> >  I am in the process of putting up the documentation for CNI support in
> > Mesos. You can find the RB patch for the documentation here:
> > https://reviews.apache.org/r/47463/
> >
> > You can find a rendering of the markdown on my github over here:
> > https://github.com/asridharan/mesos/blob/MESOS-4771/docs/cni.md
> >
> >
> > I have put up one example of using the `network/cni` isolator with a
> > "bridge" plugin. Working on adding some more examples, but given that
> > people have already started showing some interest thought would be a good
> > dry run for the documentation if someone could test out the instructions.
> >
> > Would be great if you can try following the instructions and leave any
> > feedback on the review board.
> >
> >
> > Thanks,
> > Avinash
> >
> > On Tue, May 17, 2016 at 6:51 AM, Frank Scholten 
> > wrote:
> >
> >> In the meantime I am looking at an alternative route trying to figure
> >> out how an ipAddress value on a Marathon app get propagated into Mesos
> >> CNI.
> >>
> >> Marathon reads the ipAddress value from the AppDefinition and then
> >> publishes on the eventbus. I don't see what happens to it from that
> >> point.
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, May 17, 2016 at 2:58 PM, Jay JN Guo 
> wrote:
> >> > - net::links() -> stout/net.hpp
> >> > - Personally, I'm not very familiar with CLion build. Maybe somebody
> else
> >> > could answer that.
> >> > - I think this is very much related to dev mailing list, so +dev
> >> >
> >> > /J
> >> >
> >> > Frank Scholten  wrote on 05/17/2016 20:47:12:
> >> >
> >> >> From: Frank Scholten 
> >> >> To: user@mesos.apache.org
> >> >> Cc: Qian AZ Zhang/China/IBM@IBMCN, avin...@mesosphere.io
> >> >> Date: 05/17/2016 20:48
> >> >> Subject: Re: Mesos Calico CNI
> >> >>
> >> >> Thanks. I now like to run that unit test via the debugger from the
> >> >> Mesos source try in CLion. Is there a doc on how to build Mesos in
> >> >> CLion and debug a unit test?
> >> >>
> >> >> Also, where does net::links() come from? Can't find it in the
> sources.
> >> >>
> >> >>
> >> >> On Tue, May 17, 2016 at 11:00 AM, haosdent 
> wrote:
> >> >> >>Is there some user documentation for this feature?
> >> >> > Unfortunately, the document is not ready.
> >> >> > https://issues.apache.org/jira/browse/MESOS-4771
> >> >> >
> >> >> >>but I am not sure what I have to do to create an IP for a task.
> >> >> > Qian Zhang show an example of configuration in his test cases, I
> think
> >> > you
> >> >> > may take a look first.
> >> >> > https://reviews.apache.org/r/46097/diff/7#index_header
> >> >> >
> >> >> > On Tue, May 17, 2016 at 4:20 PM, Frank Scholten
> >> > 
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi all,
> >> >> >>
> >> >> >> I tried out CNI support with Calico but I am not sure what I have
> to
> >> >> >> do to create an IP for a task.
> >> >> >>
> >> >> >> See this sandbox repository on Github
> >> >> >>
> >> >> >> https://github.com/ContainerSolutions/mesos-calico-cni-sandbox
> >> >> >>
> >> >> >> In this repo I build the Mesos master branch and
> >> >> >> https://github.com/projectcalico/calico-cni, create a local
> cluster
> >> >> >> and deploy an application. I don't see anything in the logs about
> >> cni,
> >> >> >> except that it loads the cni isolator.
> >> >> >>
> >> >> >> Is there some user documentation for this feature? If not I am
> happy
> >> >> >> to write documentation once I figure out how this feature works.
> >> >> >>
> >> >> >> Cheers,

Re: Mesos Calico CNI

2016-05-18 Thread Frank Scholten
Is there a separate cli for testing of checking the the network
isolation config and plugin files without starting up an entire
cluster?





On Wed, May 18, 2016 at 10:40 AM, Frank Scholten  wrote:
> Hi Avinash,
>
> Thanks for your response. I am following the steps at
> https://github.com/asridharan/mesos/blob/MESOS-4771/docs/cni.md and
> when I run the mesos-execute command on the cluster I started at
> https://github.com/ContainerSolutions/mesos-calico-cni-sandbox I get a
> message saying the network does not exist. This is ok because I have
> not created the network yet.
>
> It's not yet clear to me what exactly I have to put in the directories
> pointed by --network_cni_config_dir and --network_cni_plugins_dir in
> order to create a network?
>
>
> On Tue, May 17, 2016 at 5:16 PM, Avinash Sridharan
>  wrote:
>> Hi Frank,
>>  I am in the process of putting up the documentation for CNI support in
>> Mesos. You can find the RB patch for the documentation here:
>> https://reviews.apache.org/r/47463/
>>
>> You can find a rendering of the markdown on my github over here:
>> https://github.com/asridharan/mesos/blob/MESOS-4771/docs/cni.md
>>
>>
>> I have put up one example of using the `network/cni` isolator with a
>> "bridge" plugin. Working on adding some more examples, but given that
>> people have already started showing some interest thought would be a good
>> dry run for the documentation if someone could test out the instructions.
>>
>> Would be great if you can try following the instructions and leave any
>> feedback on the review board.
>>
>>
>> Thanks,
>> Avinash
>>
>> On Tue, May 17, 2016 at 6:51 AM, Frank Scholten 
>> wrote:
>>
>>> In the meantime I am looking at an alternative route trying to figure
>>> out how an ipAddress value on a Marathon app get propagated into Mesos
>>> CNI.
>>>
>>> Marathon reads the ipAddress value from the AppDefinition and then
>>> publishes on the eventbus. I don't see what happens to it from that
>>> point.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, May 17, 2016 at 2:58 PM, Jay JN Guo  wrote:
>>> > - net::links() -> stout/net.hpp
>>> > - Personally, I'm not very familiar with CLion build. Maybe somebody else
>>> > could answer that.
>>> > - I think this is very much related to dev mailing list, so +dev
>>> >
>>> > /J
>>> >
>>> > Frank Scholten  wrote on 05/17/2016 20:47:12:
>>> >
>>> >> From: Frank Scholten 
>>> >> To: user@mesos.apache.org
>>> >> Cc: Qian AZ Zhang/China/IBM@IBMCN, avin...@mesosphere.io
>>> >> Date: 05/17/2016 20:48
>>> >> Subject: Re: Mesos Calico CNI
>>> >>
>>> >> Thanks. I now like to run that unit test via the debugger from the
>>> >> Mesos source try in CLion. Is there a doc on how to build Mesos in
>>> >> CLion and debug a unit test?
>>> >>
>>> >> Also, where does net::links() come from? Can't find it in the sources.
>>> >>
>>> >>
>>> >> On Tue, May 17, 2016 at 11:00 AM, haosdent  wrote:
>>> >> >>Is there some user documentation for this feature?
>>> >> > Unfortunately, the document is not ready.
>>> >> > https://issues.apache.org/jira/browse/MESOS-4771
>>> >> >
>>> >> >>but I am not sure what I have to do to create an IP for a task.
>>> >> > Qian Zhang show an example of configuration in his test cases, I think
>>> > you
>>> >> > may take a look first.
>>> >> > https://reviews.apache.org/r/46097/diff/7#index_header
>>> >> >
>>> >> > On Tue, May 17, 2016 at 4:20 PM, Frank Scholten
>>> > 
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi all,
>>> >> >>
>>> >> >> I tried out CNI support with Calico but I am not sure what I have to
>>> >> >> do to create an IP for a task.
>>> >> >>
>>> >> >> See this sandbox repository on Github
>>> >> >>
>>> >> >> https://github.com/ContainerSolutions/mesos-calico-cni-sandbox
>>> >> >>
>>> >> >> In this repo I build the Mesos master branch and
>>> >> >> https://github.com/projectcalico/calico-cni, create a local cluster
>>> >> >> and deploy an application. I don't see anything in the logs about
>>> cni,
>>> >> >> except that it loads the cni isolator.
>>> >> >>
>>> >> >> Is there some user documentation for this feature? If not I am happy
>>> >> >> to write documentation once I figure out how this feature works.
>>> >> >>
>>> >> >> Cheers,
>>> >> >>
>>> >> >> Frank
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Best Regards,
>>> >> > Haosdent Huang
>>> >>
>>>
>>
>>
>>
>> --
>> Avinash Sridharan, Mesosphere
>> +1 (323) 702 5245


Re: Mesos Calico CNI

2016-05-18 Thread Frank Scholten
Hi Daniel,

Cool. I will have a look at your repository.

If you can add a PR to my repo to demonstrate how it works I would
really appreciate it.

On Tue, May 17, 2016 at 5:57 PM, Daniel Osborne  wrote:
> Frank,
>
> I’ve done some work to get Calico-CNI working in Mesos. The work is based
> on the net-modules repo (even though CNI doesn’t have much to do with
> netmdoules) only because it already had a pretty slick docker demo set up.
> Here’s the branch: https://github.com/djosborne/net-modules/tree/cni
>
> I’d be happy to assist you in getting it working on your repo.  Let me know
> if you need any assistance.
>
> We haven't published much information on this as of yet, since Mesos CNI
> support is only just rolling out now.
>
> -Dan
>
> On Tue, May 17, 2016 at 8:16 AM, Avinash Sridharan 
> wrote:
>
>> Hi Frank,
>>  I am in the process of putting up the documentation for CNI support in
>> Mesos. You can find the RB patch for the documentation here:
>> https://reviews.apache.org/r/47463/
>>
>> You can find a rendering of the markdown on my github over here:
>> https://github.com/asridharan/mesos/blob/MESOS-4771/docs/cni.md
>>
>>
>> I have put up one example of using the `network/cni` isolator with a
>> "bridge" plugin. Working on adding some more examples, but given that
>> people have already started showing some interest thought would be a good
>> dry run for the documentation if someone could test out the instructions.
>>
>> Would be great if you can try following the instructions and leave any
>> feedback on the review board.
>>
>>
>> Thanks,
>> Avinash
>>
>> On Tue, May 17, 2016 at 6:51 AM, Frank Scholten 
>> wrote:
>>
>>> In the meantime I am looking at an alternative route trying to figure
>>> out how an ipAddress value on a Marathon app get propagated into Mesos
>>> CNI.
>>>
>>> Marathon reads the ipAddress value from the AppDefinition and then
>>> publishes on the eventbus. I don't see what happens to it from that
>>> point.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, May 17, 2016 at 2:58 PM, Jay JN Guo 
>>> wrote:
>>> > - net::links() -> stout/net.hpp
>>> > - Personally, I'm not very familiar with CLion build. Maybe somebody
>>> else
>>> > could answer that.
>>> > - I think this is very much related to dev mailing list, so +dev
>>> >
>>> > /J
>>> >
>>> > Frank Scholten  wrote on 05/17/2016 20:47:12:
>>> >
>>> >> From: Frank Scholten 
>>> >> To: user@mesos.apache.org
>>> >> Cc: Qian AZ Zhang/China/IBM@IBMCN, avin...@mesosphere.io
>>> >> Date: 05/17/2016 20:48
>>> >> Subject: Re: Mesos Calico CNI
>>> >>
>>> >> Thanks. I now like to run that unit test via the debugger from the
>>> >> Mesos source try in CLion. Is there a doc on how to build Mesos in
>>> >> CLion and debug a unit test?
>>> >>
>>> >> Also, where does net::links() come from? Can't find it in the sources.
>>> >>
>>> >>
>>> >> On Tue, May 17, 2016 at 11:00 AM, haosdent  wrote:
>>> >> >>Is there some user documentation for this feature?
>>> >> > Unfortunately, the document is not ready.
>>> >> > https://issues.apache.org/jira/browse/MESOS-4771
>>> >> >
>>> >> >>but I am not sure what I have to do to create an IP for a task.
>>> >> > Qian Zhang show an example of configuration in his test cases, I
>>> think
>>> > you
>>> >> > may take a look first.
>>> >> > https://reviews.apache.org/r/46097/diff/7#index_header
>>> >> >
>>> >> > On Tue, May 17, 2016 at 4:20 PM, Frank Scholten
>>> > 
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi all,
>>> >> >>
>>> >> >> I tried out CNI support with Calico but I am not sure what I have to
>>> >> >> do to create an IP for a task.
>>> >> >>
>>> >> >> See this sandbox repository on Github
>>> >> >>
>>> >> >> https://github.com/ContainerSolutions/mesos-calico-cni-sandbox
>>> >> >>
>>> >> >> In this repo I build the Mesos master branch and
>>> >> >> https://github.com/projectcalico/calico-cni, create a local cluster
>>> >> >> and deploy an application. I don't see anything in the logs about
>>> cni,
>>> >> >> except that it loads the cni isolator.
>>> >> >>
>>> >> >> Is there some user documentation for this feature? If not I am happy
>>> >> >> to write documentation once I figure out how this feature works.
>>> >> >>
>>> >> >> Cheers,
>>> >> >>
>>> >> >> Frank
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Best Regards,
>>> >> > Haosdent Huang
>>> >>
>>>
>>
>>
>>
>> --
>> Avinash Sridharan, Mesosphere
>> +1 (323) 702 5245
>>


Re: Mesos Calico CNI

2016-05-18 Thread Frank Scholten
Hi Avinash,

Thanks for your response. I am following the steps at
https://github.com/asridharan/mesos/blob/MESOS-4771/docs/cni.md and
when I run the mesos-execute command on the cluster I started at
https://github.com/ContainerSolutions/mesos-calico-cni-sandbox I get a
message saying the network does not exist. This is ok because I have
not created the network yet.

It's not yet clear to me what exactly I have to put in the directories
pointed by --network_cni_config_dir and --network_cni_plugins_dir in
order to create a network?


On Tue, May 17, 2016 at 5:16 PM, Avinash Sridharan
 wrote:
> Hi Frank,
>  I am in the process of putting up the documentation for CNI support in
> Mesos. You can find the RB patch for the documentation here:
> https://reviews.apache.org/r/47463/
>
> You can find a rendering of the markdown on my github over here:
> https://github.com/asridharan/mesos/blob/MESOS-4771/docs/cni.md
>
>
> I have put up one example of using the `network/cni` isolator with a
> "bridge" plugin. Working on adding some more examples, but given that
> people have already started showing some interest thought would be a good
> dry run for the documentation if someone could test out the instructions.
>
> Would be great if you can try following the instructions and leave any
> feedback on the review board.
>
>
> Thanks,
> Avinash
>
> On Tue, May 17, 2016 at 6:51 AM, Frank Scholten 
> wrote:
>
>> In the meantime I am looking at an alternative route trying to figure
>> out how an ipAddress value on a Marathon app get propagated into Mesos
>> CNI.
>>
>> Marathon reads the ipAddress value from the AppDefinition and then
>> publishes on the eventbus. I don't see what happens to it from that
>> point.
>>
>>
>>
>>
>>
>>
>> On Tue, May 17, 2016 at 2:58 PM, Jay JN Guo  wrote:
>> > - net::links() -> stout/net.hpp
>> > - Personally, I'm not very familiar with CLion build. Maybe somebody else
>> > could answer that.
>> > - I think this is very much related to dev mailing list, so +dev
>> >
>> > /J
>> >
>> > Frank Scholten  wrote on 05/17/2016 20:47:12:
>> >
>> >> From: Frank Scholten 
>> >> To: user@mesos.apache.org
>> >> Cc: Qian AZ Zhang/China/IBM@IBMCN, avin...@mesosphere.io
>> >> Date: 05/17/2016 20:48
>> >> Subject: Re: Mesos Calico CNI
>> >>
>> >> Thanks. I now like to run that unit test via the debugger from the
>> >> Mesos source try in CLion. Is there a doc on how to build Mesos in
>> >> CLion and debug a unit test?
>> >>
>> >> Also, where does net::links() come from? Can't find it in the sources.
>> >>
>> >>
>> >> On Tue, May 17, 2016 at 11:00 AM, haosdent  wrote:
>> >> >>Is there some user documentation for this feature?
>> >> > Unfortunately, the document is not ready.
>> >> > https://issues.apache.org/jira/browse/MESOS-4771
>> >> >
>> >> >>but I am not sure what I have to do to create an IP for a task.
>> >> > Qian Zhang show an example of configuration in his test cases, I think
>> > you
>> >> > may take a look first.
>> >> > https://reviews.apache.org/r/46097/diff/7#index_header
>> >> >
>> >> > On Tue, May 17, 2016 at 4:20 PM, Frank Scholten
>> > 
>> >> > wrote:
>> >> >>
>> >> >> Hi all,
>> >> >>
>> >> >> I tried out CNI support with Calico but I am not sure what I have to
>> >> >> do to create an IP for a task.
>> >> >>
>> >> >> See this sandbox repository on Github
>> >> >>
>> >> >> https://github.com/ContainerSolutions/mesos-calico-cni-sandbox
>> >> >>
>> >> >> In this repo I build the Mesos master branch and
>> >> >> https://github.com/projectcalico/calico-cni, create a local cluster
>> >> >> and deploy an application. I don't see anything in the logs about
>> cni,
>> >> >> except that it loads the cni isolator.
>> >> >>
>> >> >> Is there some user documentation for this feature? If not I am happy
>> >> >> to write documentation once I figure out how this feature works.
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> Frank
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Best Regards,
>> >> > Haosdent Huang
>> >>
>>
>
>
>
> --
> Avinash Sridharan, Mesosphere
> +1 (323) 702 5245