[jira] [Commented] (MESOS-5308) ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.

2016-04-29 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264243#comment-15264243
 ] 

James Peach commented on MESOS-5308:


Looks like the resource statistics are not correct. I don't really see why that 
would happen. Probably the way to debug this is to leave the scratch filesystem 
mounted and poke at it with {{xfs_quota}}.

{code}
[01:07:51]W: [Step 10/10] 1048576 bytes (1.0 MB) copied, 0.00128219 s, 818 
MB/s
[01:07:51] : [Step 10/10] 
../../src/tests/containerizer/xfs_quota_tests.cpp:559: Failure
[01:07:51]W: [Step 10/10] I0429 01:07:51.865185 17604 slave.cpp:825] Agent 
terminating
[01:07:51] : [Step 10/10] Value of: 
usage1->executors(0).statistics().disk_used_bytes()
[01:07:51] : [Step 10/10]   Actual: 196608
{code}

> ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.
> ---
>
> Key: MESOS-5308
> URL: https://issues.apache.org/jira/browse/MESOS-5308
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
> Environment: Fedora 23 with/without SSL
>Reporter: Gilbert Song
>  Labels: isolation
>
> Here is the log:
> {code}
> [01:07:51] :   [Step 10/10] [ RUN  ] 
> ROOT_XFS_QuotaTest.NoCheckpointRecovery
> [01:07:51] :   [Step 10/10] meta-data=/dev/loop0 isize=512
> agcount=2, agsize=5120 blks
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> attr=2, projid32bit=1
> [01:07:51] :   [Step 10/10]  =   crc=1
> finobt=1, sparse=0
> [01:07:51] :   [Step 10/10] data =   bsize=4096   
> blocks=10240, imaxpct=25
> [01:07:51] :   [Step 10/10]  =   sunit=0  
> swidth=0 blks
> [01:07:51] :   [Step 10/10] naming   =version 2  bsize=4096   
> ascii-ci=0 ftype=1
> [01:07:51] :   [Step 10/10] log  =internal log   bsize=4096   
> blocks=855, version=2
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> sunit=0 blks, lazy-count=1
> [01:07:51] :   [Step 10/10] realtime =none   extsz=4096   
> blocks=0, rtextents=0
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.690585 17604 cluster.cpp:149] 
> Creating default 'local' authorizer
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.706126 17604 leveldb.cpp:174] 
> Opened db in 15.452988ms
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707135 17604 leveldb.cpp:181] 
> Compacted db in 984939ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707154 17604 leveldb.cpp:196] 
> Created db iterator in 4159ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707159 17604 leveldb.cpp:202] 
> Seeked to beginning of db in 517ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707165 17604 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 305ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707176 17604 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707320 17621 recover.cpp:447] 
> Starting replica recovery
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707381 17621 recover.cpp:473] 
> Replica is in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707638 17619 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (17889)@172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707732 17624 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707885 17624 recover.cpp:564] 
> Updating replica status to STARTING
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708389 17618 master.cpp:382] 
> Master 0c1e0a50-1212-4104-a148-661131b79f27 
> (ip-172-30-2-13.ec2.internal.mesosphere.io) started on 172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708406 17618 master.cpp:384] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_http_frameworks="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" 

[jira] [Commented] (MESOS-5308) ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.

2016-04-30 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265390#comment-15265390
 ] 

James Peach commented on MESOS-5308:


Tested this on master w/ fedora23 VM and can't get it to fail. Timing issue?

> ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.
> ---
>
> Key: MESOS-5308
> URL: https://issues.apache.org/jira/browse/MESOS-5308
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
> Environment: Fedora 23 with/without SSL
>Reporter: Gilbert Song
>  Labels: isolation
>
> Here is the log:
> {code}
> [01:07:51] :   [Step 10/10] [ RUN  ] 
> ROOT_XFS_QuotaTest.NoCheckpointRecovery
> [01:07:51] :   [Step 10/10] meta-data=/dev/loop0 isize=512
> agcount=2, agsize=5120 blks
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> attr=2, projid32bit=1
> [01:07:51] :   [Step 10/10]  =   crc=1
> finobt=1, sparse=0
> [01:07:51] :   [Step 10/10] data =   bsize=4096   
> blocks=10240, imaxpct=25
> [01:07:51] :   [Step 10/10]  =   sunit=0  
> swidth=0 blks
> [01:07:51] :   [Step 10/10] naming   =version 2  bsize=4096   
> ascii-ci=0 ftype=1
> [01:07:51] :   [Step 10/10] log  =internal log   bsize=4096   
> blocks=855, version=2
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> sunit=0 blks, lazy-count=1
> [01:07:51] :   [Step 10/10] realtime =none   extsz=4096   
> blocks=0, rtextents=0
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.690585 17604 cluster.cpp:149] 
> Creating default 'local' authorizer
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.706126 17604 leveldb.cpp:174] 
> Opened db in 15.452988ms
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707135 17604 leveldb.cpp:181] 
> Compacted db in 984939ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707154 17604 leveldb.cpp:196] 
> Created db iterator in 4159ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707159 17604 leveldb.cpp:202] 
> Seeked to beginning of db in 517ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707165 17604 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 305ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707176 17604 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707320 17621 recover.cpp:447] 
> Starting replica recovery
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707381 17621 recover.cpp:473] 
> Replica is in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707638 17619 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (17889)@172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707732 17624 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707885 17624 recover.cpp:564] 
> Updating replica status to STARTING
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708389 17618 master.cpp:382] 
> Master 0c1e0a50-1212-4104-a148-661131b79f27 
> (ip-172-30-2-13.ec2.internal.mesosphere.io) started on 172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708406 17618 master.cpp:384] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_http_frameworks="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/master"
>  --zk_session_timeout="10secs"
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708510 17618 master.cpp:433] 
> Master only allowing authenticated frameworks to register
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708518 17618 master.cpp:439] 
> Master only allowing authenticated agents to register
> [01:07:51]W:   [Step 

[jira] [Commented] (MESOS-5308) ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.

2016-04-30 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265404#comment-15265404
 ] 

James Peach commented on MESOS-5308:


Earlier I did 1000 iterations with no error. This time I got a different 
failure after 114:

{code}
I0430 11:10:54.62 11591 exec.cpp:150] Version: 0.29.0
I0430 11:10:54.678015 11620 exec.cpp:225] Executor registered on agent 
ccb4b4e9-46f5-48ec-bc72-ae5504a67357-S0
Registered executor on fedora-23
Starting task ea4ece2b-fb5a-40a2-bfcd-ff79a31566d8
sh -c 'dd if=/dev/zero of=file bs=1048576 count=1; sleep 1000'
Forked command at 11627
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.000696071 s, 1.5 GB/s
I0430 11:10:54.691599 11620 exec.cpp:399] Executor asked to shutdown
Shutting down
Sending SIGTERM to process tree at pid 11627
Sent SIGTERM to the following process trees:
[
-+- 11627 sh -c dd if=/dev/zero of=file bs=1048576 count=1; sleep 1000
 \--- 11629 sleep 1000
]
/opt/home/src/mesos.git/src/tests/containerizer/xfs_quota_tests.cpp:575: Failure
Failed to wait 15secs for _recover
*** Aborted at 1462039869 (unix time) try "date -d @1462039869" if you are 
using GNU date ***
PC: @  0x184da1e testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 4417 (TID 0x7f7794df78c0) from PID 0; stack 
trace: ***
@ 0x7f778d1e49f0 (unknown)
@  0x184da1e testing::UnitTest::AddTestPartResult()
@  0x1842197 testing::internal::AssertHelper::operator=()
@  0x17a0570 
mesos::internal::tests::ROOT_XFS_QuotaTest_NoCheckpointRecovery_Test::TestBody()
@  0x186bbcc 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x1866bd2 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@  0x184761c testing::Test::Run()
@  0x1847dd4 testing::TestInfo::Run()
@  0x1848425 testing::TestCase::Run()
@  0x184ef63 testing::internal::UnitTestImpl::RunAllTests()
@  0x186c893 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x1867748 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@  0x184dc3f testing::UnitTest::Run()
@   0xf640f1 RUN_ALL_TESTS()
@   0xf63ce9 main
@ 0x7f778bcfc580 __libc_start_main
@   0xa35039 _start
Segmentation fault
{code}

> ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.
> ---
>
> Key: MESOS-5308
> URL: https://issues.apache.org/jira/browse/MESOS-5308
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
> Environment: Fedora 23 with/without SSL
>Reporter: Gilbert Song
>  Labels: isolation
>
> Here is the log:
> {code}
> [01:07:51] :   [Step 10/10] [ RUN  ] 
> ROOT_XFS_QuotaTest.NoCheckpointRecovery
> [01:07:51] :   [Step 10/10] meta-data=/dev/loop0 isize=512
> agcount=2, agsize=5120 blks
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> attr=2, projid32bit=1
> [01:07:51] :   [Step 10/10]  =   crc=1
> finobt=1, sparse=0
> [01:07:51] :   [Step 10/10] data =   bsize=4096   
> blocks=10240, imaxpct=25
> [01:07:51] :   [Step 10/10]  =   sunit=0  
> swidth=0 blks
> [01:07:51] :   [Step 10/10] naming   =version 2  bsize=4096   
> ascii-ci=0 ftype=1
> [01:07:51] :   [Step 10/10] log  =internal log   bsize=4096   
> blocks=855, version=2
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> sunit=0 blks, lazy-count=1
> [01:07:51] :   [Step 10/10] realtime =none   extsz=4096   
> blocks=0, rtextents=0
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.690585 17604 cluster.cpp:149] 
> Creating default 'local' authorizer
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.706126 17604 leveldb.cpp:174] 
> Opened db in 15.452988ms
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707135 17604 leveldb.cpp:181] 
> Compacted db in 984939ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707154 17604 leveldb.cpp:196] 
> Created db iterator in 4159ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707159 17604 leveldb.cpp:202] 
> Seeked to beginning of db in 517ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707165 17604 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 305ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707176 17604 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707320 17621 recover.cpp:447] 
> Starting replica recovery
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707381 17621 recover.cpp:473] 
> Replica is in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707638 17619 replica.cpp:673] 

[jira] [Commented] (MESOS-5308) ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.

2016-05-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271864#comment-15271864
 ] 

James Peach commented on MESOS-5308:


Thanks for the help [~xujyan]. Patch is in https://reviews.apache.org/r/47001/.

> ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.
> ---
>
> Key: MESOS-5308
> URL: https://issues.apache.org/jira/browse/MESOS-5308
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
> Environment: Fedora 23 with/without SSL
>Reporter: Gilbert Song
>Assignee: James Peach
>  Labels: isolation
>
> Here is the log:
> {code}
> [01:07:51] :   [Step 10/10] [ RUN  ] 
> ROOT_XFS_QuotaTest.NoCheckpointRecovery
> [01:07:51] :   [Step 10/10] meta-data=/dev/loop0 isize=512
> agcount=2, agsize=5120 blks
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> attr=2, projid32bit=1
> [01:07:51] :   [Step 10/10]  =   crc=1
> finobt=1, sparse=0
> [01:07:51] :   [Step 10/10] data =   bsize=4096   
> blocks=10240, imaxpct=25
> [01:07:51] :   [Step 10/10]  =   sunit=0  
> swidth=0 blks
> [01:07:51] :   [Step 10/10] naming   =version 2  bsize=4096   
> ascii-ci=0 ftype=1
> [01:07:51] :   [Step 10/10] log  =internal log   bsize=4096   
> blocks=855, version=2
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> sunit=0 blks, lazy-count=1
> [01:07:51] :   [Step 10/10] realtime =none   extsz=4096   
> blocks=0, rtextents=0
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.690585 17604 cluster.cpp:149] 
> Creating default 'local' authorizer
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.706126 17604 leveldb.cpp:174] 
> Opened db in 15.452988ms
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707135 17604 leveldb.cpp:181] 
> Compacted db in 984939ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707154 17604 leveldb.cpp:196] 
> Created db iterator in 4159ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707159 17604 leveldb.cpp:202] 
> Seeked to beginning of db in 517ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707165 17604 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 305ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707176 17604 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707320 17621 recover.cpp:447] 
> Starting replica recovery
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707381 17621 recover.cpp:473] 
> Replica is in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707638 17619 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (17889)@172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707732 17624 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707885 17624 recover.cpp:564] 
> Updating replica status to STARTING
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708389 17618 master.cpp:382] 
> Master 0c1e0a50-1212-4104-a148-661131b79f27 
> (ip-172-30-2-13.ec2.internal.mesosphere.io) started on 172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708406 17618 master.cpp:384] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_http_frameworks="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/master"
>  --zk_session_timeout="10secs"
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708510 17618 master.cpp:433] 
> Master only allowing authenticated frameworks to register
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708518 17618 master.cpp:439] 
> Master only allowing authenticated agents 

[jira] [Assigned] (MESOS-5308) ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.

2016-05-04 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-5308:
--

Assignee: James Peach

> ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.
> ---
>
> Key: MESOS-5308
> URL: https://issues.apache.org/jira/browse/MESOS-5308
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
> Environment: Fedora 23 with/without SSL
>Reporter: Gilbert Song
>Assignee: James Peach
>  Labels: isolation
>
> Here is the log:
> {code}
> [01:07:51] :   [Step 10/10] [ RUN  ] 
> ROOT_XFS_QuotaTest.NoCheckpointRecovery
> [01:07:51] :   [Step 10/10] meta-data=/dev/loop0 isize=512
> agcount=2, agsize=5120 blks
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> attr=2, projid32bit=1
> [01:07:51] :   [Step 10/10]  =   crc=1
> finobt=1, sparse=0
> [01:07:51] :   [Step 10/10] data =   bsize=4096   
> blocks=10240, imaxpct=25
> [01:07:51] :   [Step 10/10]  =   sunit=0  
> swidth=0 blks
> [01:07:51] :   [Step 10/10] naming   =version 2  bsize=4096   
> ascii-ci=0 ftype=1
> [01:07:51] :   [Step 10/10] log  =internal log   bsize=4096   
> blocks=855, version=2
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> sunit=0 blks, lazy-count=1
> [01:07:51] :   [Step 10/10] realtime =none   extsz=4096   
> blocks=0, rtextents=0
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.690585 17604 cluster.cpp:149] 
> Creating default 'local' authorizer
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.706126 17604 leveldb.cpp:174] 
> Opened db in 15.452988ms
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707135 17604 leveldb.cpp:181] 
> Compacted db in 984939ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707154 17604 leveldb.cpp:196] 
> Created db iterator in 4159ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707159 17604 leveldb.cpp:202] 
> Seeked to beginning of db in 517ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707165 17604 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 305ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707176 17604 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707320 17621 recover.cpp:447] 
> Starting replica recovery
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707381 17621 recover.cpp:473] 
> Replica is in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707638 17619 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (17889)@172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707732 17624 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707885 17624 recover.cpp:564] 
> Updating replica status to STARTING
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708389 17618 master.cpp:382] 
> Master 0c1e0a50-1212-4104-a148-661131b79f27 
> (ip-172-30-2-13.ec2.internal.mesosphere.io) started on 172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708406 17618 master.cpp:384] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_http_frameworks="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/master"
>  --zk_session_timeout="10secs"
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708510 17618 master.cpp:433] 
> Master only allowing authenticated frameworks to register
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708518 17618 master.cpp:439] 
> Master only allowing authenticated agents to register
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708521 17618 master.cpp:445] 
> 

[jira] [Commented] (MESOS-5308) ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.

2016-05-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271901#comment-15271901
 ] 

James Peach commented on MESOS-5308:


Interesting I got a recovery failure after 5000 iterations.

{code}
/opt/home/src/mesos.git/src/tests/containerizer/xfs_quota_tests.cpp:606: Failure
Value of: xfs::getProjectId(sandbox).isNone()
  Actual: false
Expected: true
*** Aborted at 1462425273 (unix time) try "date -d @1462425273" if you are 
using GNU date ***
PC: @  0x184d566 testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 8269 (TID 0x7f860ea948c0) from PID 0; stack 
trace: ***
@ 0x7f8606e819f0 (unknown)
@  0x184d566 testing::UnitTest::AddTestPartResult()
@  0x1841cdf testing::internal::AssertHelper::operator=()
@  0x17a0bfb 
mesos::internal::tests::ROOT_XFS_QuotaTest_NoCheckpointRecovery_Test::TestBody()
{code}

I verified the XFS project state and somehow the isolator failed to remove the 
project ID.

> ROOT_XFS_QuotaTest.NoCheckpointRecovery failed.
> ---
>
> Key: MESOS-5308
> URL: https://issues.apache.org/jira/browse/MESOS-5308
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
> Environment: Fedora 23 with/without SSL
>Reporter: Gilbert Song
>Assignee: James Peach
>  Labels: isolation
>
> Here is the log:
> {code}
> [01:07:51] :   [Step 10/10] [ RUN  ] 
> ROOT_XFS_QuotaTest.NoCheckpointRecovery
> [01:07:51] :   [Step 10/10] meta-data=/dev/loop0 isize=512
> agcount=2, agsize=5120 blks
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> attr=2, projid32bit=1
> [01:07:51] :   [Step 10/10]  =   crc=1
> finobt=1, sparse=0
> [01:07:51] :   [Step 10/10] data =   bsize=4096   
> blocks=10240, imaxpct=25
> [01:07:51] :   [Step 10/10]  =   sunit=0  
> swidth=0 blks
> [01:07:51] :   [Step 10/10] naming   =version 2  bsize=4096   
> ascii-ci=0 ftype=1
> [01:07:51] :   [Step 10/10] log  =internal log   bsize=4096   
> blocks=855, version=2
> [01:07:51] :   [Step 10/10]  =   sectsz=512   
> sunit=0 blks, lazy-count=1
> [01:07:51] :   [Step 10/10] realtime =none   extsz=4096   
> blocks=0, rtextents=0
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.690585 17604 cluster.cpp:149] 
> Creating default 'local' authorizer
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.706126 17604 leveldb.cpp:174] 
> Opened db in 15.452988ms
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707135 17604 leveldb.cpp:181] 
> Compacted db in 984939ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707154 17604 leveldb.cpp:196] 
> Created db iterator in 4159ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707159 17604 leveldb.cpp:202] 
> Seeked to beginning of db in 517ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707165 17604 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 305ns
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707176 17604 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707320 17621 recover.cpp:447] 
> Starting replica recovery
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707381 17621 recover.cpp:473] 
> Replica is in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707638 17619 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (17889)@172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707732 17624 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.707885 17624 recover.cpp:564] 
> Updating replica status to STARTING
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708389 17618 master.cpp:382] 
> Master 0c1e0a50-1212-4104-a148-661131b79f27 
> (ip-172-30-2-13.ec2.internal.mesosphere.io) started on 172.30.2.13:37618
> [01:07:51]W:   [Step 10/10] I0429 01:07:51.708406 17618 master.cpp:384] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_http_frameworks="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/mnt/teamcity/temp/buildTmp/ROOT_XFS_QuotaTest_NoCheckpointRecovery_ZsRNg9/mnt/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 

[jira] [Updated] (MESOS-7021) Consistent symlink behavior for os::stat accessors.

2017-02-01 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-7021:
---
Shepherd: Michael Park

> Consistent symlink behavior for os::stat accessors.
> ---
>
> Key: MESOS-7021
> URL: https://issues.apache.org/jira/browse/MESOS-7021
> Project: Mesos
>  Issue Type: Improvement
>Reporter: James Peach
>Assignee: James Peach
>Priority: Trivial
>
> The various stat(2) accessories in the {{os::stat}} namespace are not 
> consistent with the ability to specify whether they follow symlinks. Update 
> them so they consistently take a {{FollowSymlink}} option.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7049) CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest is broken on Fedora 25.

2017-02-01 Thread James Peach (JIRA)
James Peach created MESOS-7049:
--

 Summary: 
CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest is broken on 
Fedora 25.
 Key: MESOS-7049
 URL: https://issues.apache.org/jira/browse/MESOS-7049
 Project: Mesos
  Issue Type: Bug
  Components: isolation, tests
Reporter: James Peach


*Test output:*
{noformat}
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from CgroupsAnyHierarchyWithPerfEventTest
[ RUN  ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest
../../src/tests/containerizer/cgroups_tests.cpp:1020: Failure
(statistics).failure(): Failed to parse perf sample: Failed to parse perf 
sample line '6186960975,,cycles,mesos_test,2000511515,100.00,3.093,GHz': 
Unexpected number of fields
../../src/tests/containerizer/cgroups_tests.cpp:193: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/perf_event/mesos_test': Device or resource busy
[  FAILED  ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest 
(2123 ms)
[--] 1 test from CgroupsAnyHierarchyWithPerfEventTest (2123 ms total)

[--] Global test environment tear-down
../../src/tests/environment.cpp:836: Failure
Failed
Tests completed with child processes remaining:
-+- 20455 /home/jpeach/upstream/mesos/build/src/.libs/mesos-tests --verbose 
--gtest_filter=CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest
 \--- 20500 /home/jpeach/upstream/mesos/build/src/.libs/mesos-tests --verbose 
--gtest_filter=CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest
[==] 1 test from 1 test case ran. (2141 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest
{noformat}

*Software versions:*
{noformat}
[jpeach@jpeach src]$ uname -a
Linux jpeach.apple.com 4.9.6-200.fc25.x86_64 #1 SMP Thu Jan 26 10:17:45 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux
[jpeach@jpeach src]$ perf -v
perf version 4.9.6.200.fc25.x86_64.g51a0
[jpeach@jpeach src]$ cat /etc/os-release
NAME=Fedora
VERSION="25 (Workstation Edition)"
ID=fedora
VERSION_ID=25
PRETTY_NAME="Fedora 25 (Workstation Edition)"
ANSI_COLOR="0;34"
CPE_NAME="cpe:/o:fedoraproject:fedora:25"
HOME_URL="https://fedoraproject.org/;
BUG_REPORT_URL="https://bugzilla.redhat.com/;
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=25
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=25
PRIVACY_POLICY_URL=https://fedoraproject.org/wiki/Legal:PrivacyPolicy
VARIANT="Workstation Edition"
VARIANT_ID=workstation
{noformat}

The test then fails to clean up, leaving stale processes and cgroups.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] (MESOS-7041) Default CommandInfo usage to not use the shell.

2017-01-31 Thread James Peach (JIRA)
James Peach created MESOS-7041:
--

 Summary: Default CommandInfo usage to not use the shell.
 Key: MESOS-7041
 URL: https://issues.apache.org/jira/browse/MESOS-7041
 Project: Mesos
  Issue Type: Bug
  Components: security
Reporter: James Peach


One of the usage patterns of {{CommandInfo}} is to carry commands from 
isolators to launchers. The default (and easiest) way to use this is 
{{launchInfo.add_pre_exec_commands()->set_value(...)}}, which invokes the 
shell. To reduce the risk of shell injection attacks all isolators should 
default to not using the shell, which implies that this should be the 
easiest/default usage pattern.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7077) Check failed: resource.has_allocation_info().

2017-02-07 Thread James Peach (JIRA)
James Peach created MESOS-7077:
--

 Summary: Check failed: resource.has_allocation_info().
 Key: MESOS-7077
 URL: https://issues.apache.org/jira/browse/MESOS-7077
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach
Priority: Critical


Seeing this {{CHECK}} fail with top-of-tree master:

{noformat}
F0207 16:00:44.657328 3351272 master.cpp:8980] Check failed: 
resource.has_allocation_info()
{noformat}

The symbolicated backtrace is:
{noformat}
(gdb) where
#0  0x7f009f1315e5 in raise () from /lib64/libc.so.6
#1  0x7f009f132dc5 in abort () from /lib64/libc.so.6
#2  0x7f00a168e496 in google::DumpStackTraceAndExit () at 
src/utilities.cc:147
#3  0x7f00a1685e7d in google::LogMessage::Fail () at src/logging.cc:1458
#4  0x7f00a1687c0d in google::LogMessage::SendToLog (this=Unhandled dwarf 
expression opcode 0xf3
) at src/logging.cc:1412
#5  0x7f00a1685a02 in google::LogMessage::Flush (this=0x7f00917ef560) at 
src/logging.cc:1281
#6  0x7f00a16885e9 in google::LogMessageFatal::~LogMessageFatal 
(this=Unhandled dwarf expression opcode 0xf3
) at src/logging.cc:1984
#7  0x7f00a0a1184c in mesos::internal::master::Slave::addTask 
(this=0x7f007c830280, task=0x7f0080835340)
at ../../src/master/master.cpp:8980
#8  0x7f00a0a18b53 in mesos::internal::master::Slave::Slave 
(this=0x7f007c830280, _master=Unhandled dwarf expression opcode 0xf3
)
at ../../src/master/master.cpp:8947
#9  0x7f00a0a19c57 in mesos::internal::master::Master::_reregisterSlave 
(this=0x7f00990bf000,
slaveInfo=..., pid=..., checkpointedResources=Unhandled dwarf expression 
opcode 0xf3
) at ../../src/master/master.cpp:5759
#10 0x7f00a0a1cb22 in operator() (__functor=Unhandled dwarf expression 
opcode 0xf3
)
at ../../3rdparty/libprocess/include/process/dispatch.hpp:229
#11 std::_Function_handler >::_M_invoke(const 
std::_Any_data &, process::ProcessBase *) (
__functor=Unhandled dwarf expression opcode 0xf3
{noformat}

I expect that this happened because the master moved to the latest version 
before all the agents had moved.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7077) Check failed: resource.has_allocation_info().

2017-02-07 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856394#comment-15856394
 ] 

James Peach commented on MESOS-7077:


We have a test environment which continuously deploys nightly builds of the ASF 
master repository. The deploy is a puppet change, so there's no guaranteed 
ordering. When the master is redeployed the expectation is that agents running 
potentially different versions will reconnect to the new master and that tasks 
won't be disrupted.

> Check failed: resource.has_allocation_info().
> -
>
> Key: MESOS-7077
> URL: https://issues.apache.org/jira/browse/MESOS-7077
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Priority: Critical
>
> Seeing this {{CHECK}} fail with top-of-tree master:
> {noformat}
> F0207 16:00:44.657328 3351272 master.cpp:8980] Check failed: 
> resource.has_allocation_info()
> {noformat}
> The symbolicated backtrace is:
> {noformat}
> (gdb) where
> #0  0x7f009f1315e5 in raise () from /lib64/libc.so.6
> #1  0x7f009f132dc5 in abort () from /lib64/libc.so.6
> #2  0x7f00a168e496 in google::DumpStackTraceAndExit () at 
> src/utilities.cc:147
> #3  0x7f00a1685e7d in google::LogMessage::Fail () at src/logging.cc:1458
> #4  0x7f00a1687c0d in google::LogMessage::SendToLog (this=Unhandled dwarf 
> expression opcode 0xf3
> ) at src/logging.cc:1412
> #5  0x7f00a1685a02 in google::LogMessage::Flush (this=0x7f00917ef560) at 
> src/logging.cc:1281
> #6  0x7f00a16885e9 in google::LogMessageFatal::~LogMessageFatal 
> (this=Unhandled dwarf expression opcode 0xf3
> ) at src/logging.cc:1984
> #7  0x7f00a0a1184c in mesos::internal::master::Slave::addTask 
> (this=0x7f007c830280, task=0x7f0080835340)
> at ../../src/master/master.cpp:8980
> #8  0x7f00a0a18b53 in mesos::internal::master::Slave::Slave 
> (this=0x7f007c830280, _master=Unhandled dwarf expression opcode 0xf3
> )
> at ../../src/master/master.cpp:8947
> #9  0x7f00a0a19c57 in mesos::internal::master::Master::_reregisterSlave 
> (this=0x7f00990bf000,
> slaveInfo=..., pid=..., checkpointedResources=Unhandled dwarf expression 
> opcode 0xf3
> ) at ../../src/master/master.cpp:5759
> #10 0x7f00a0a1cb22 in operator() (__functor=Unhandled dwarf expression 
> opcode 0xf3
> )
> at ../../3rdparty/libprocess/include/process/dispatch.hpp:229
> #11 std::_Function_handler process::dispatch(const process::PID&, void (T::*)(P0, P1, P2, P3, P4, P5, 
> P6, P7, P8, P9), A0, A1, A2, A3, A4, A5, A6, A7, A8, A9) [with T = 
> mesos::internal::master::Master; P0 = const mesos::SlaveInfo&; P1 = const 
> process::UPID&; P2 = const std::vector&; P3 = const 
> std::vector&; P4 = const std::vector&; P5 = 
> const std::vector&; P6 = const 
> std::vector&; P7 = const 
> std::basic_string&; P8 = const 
> std::vector&; P9 = const process::Future&; 
> A0 = mesos::SlaveInfo; A1 = process::UPID; A2 = std::vector; 
> A3 = std::vector; A4 = std::vector; A5 = 
> std::vector; A6 = 
> std::vector; A7 = 
> std::basic_string; A8 = std::vector; A9 = 
> process::Future]:: >::_M_invoke(const 
> std::_Any_data &, process::ProcessBase *) (
> __functor=Unhandled dwarf expression opcode 0xf3
> {noformat}
> I expect that this happened because the master moved to the latest version 
> before all the agents had moved.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6982) PerfTest.Version fails on recent Arch Linux

2017-02-07 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856793#comment-15856793
 ] 

James Peach commented on MESOS-6982:


Well I guess we can just look at the first 2 version components :-/

> PerfTest.Version fails on recent Arch Linux
> ---
>
> Key: MESOS-6982
> URL: https://issues.apache.org/jira/browse/MESOS-6982
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] PerfTest.Version
> ../../mesos/src/tests/containerizer/perf_tests.cpp:134: Failure
> (perf::version()).failure(): Invalid version component 'g69973b': Failed to 
> convert 'g69973b' to number
> [  FAILED  ] PerfTest.Version (50 ms)
> {noformat}
> {noformat}
> $ perf --version
> perf version 4.9.g69973b
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7060) Tests depends on DockerArchive and LinuxRootfs failed.

2017-02-08 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858348#comment-15858348
 ] 

James Peach commented on MESOS-7060:


The reason the {{ldd}} helper is different is that it is using the 
{{ld.so.cache}} entry for {{ld-linux-x86-64.so.2}}:

{noformat}
# ldconfig -p | grep ld-linux
ld-linux-x86-64.so.2 (libc6,x86-64) => 
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
{noformat}

> Tests depends on DockerArchive and LinuxRootfs failed.
> --
>
> Key: MESOS-7060
> URL: https://issues.apache.org/jira/browse/MESOS-7060
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, test
> Environment: ubuntu16
>Reporter: Jie Yu
>Assignee: James Peach
>
> This issue was introduced by patches from MESOS-6588. Reverting the patches 
> in this ticket solves the issue.
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from LinuxFilesystemIsolatorTest
> [ RUN  ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem
> I0205 00:49:41.405323 98276 containerizer.cpp:220] Using isolation: 
> filesystem/linux,docker/runtime,network/cni,volume/image
> I0205 00:49:41.410899 98276 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> E0205 00:49:41.413491 98276 shell.hpp:107] Command 'hadoop version 2>&1' 
> failed; this is the output:
> sh: 1: hadoop: not found
> I0205 00:49:41.413553 98276 fetcher.cpp:69] Skipping URI fetcher plugin 
> 'hadoop' as it could not be created: Failed to create HDFS client: Failed to 
> execute 'hadoop version 2>&1'; the command was either not found or exited 
> with a non-zero exit status: 127
> I0205 00:49:41.416126 98276 provisioner.cpp:249] Using default backend 'aufs'
> I0205 00:49:41.420802 98298 containerizer.cpp:992] Starting container 
> e41c62c5-c0e1-4cc0-9c59-4793d9d086bf for executor 'test_executor' of 
> framework 
> I0205 00:49:41.607056 98303 provisioner.cpp:453] Provisioning image rootfs 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_yYg8z0/provisioner/containers/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf/backends/aufs/rootfses/b21ea6c2-bdcd-4511-b57b-86cfdc6722d1'
>  for container e41c62c5-c0e1-4cc0-9c59-4793d9d086bf using aufs backend
> I0205 00:49:41.614753 98291 linux_launcher.cpp:429] Launching container 
> e41c62c5-c0e1-4cc0-9c59-4793d9d086bf and cloning with namespaces CLONE_NEWNS
> Executing pre-exec command 
> '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/ubuntu\/workspace\/dist\/mesos\/build\/src\/mesos-containerizer"}'
> Executing pre-exec command 
> '{"arguments":["mount","-n","--rbind","\/tmp\/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_yYg8z0\/sandbox","\/tmp\/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_yYg8z0\/provisioner\/containers\/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf\/backends\/aufs\/rootfses\/b21ea6c2-bdcd-4511-b57b-86cfdc6722d1\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
> Changing root to 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_yYg8z0/provisioner/containers/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf/backends/aufs/rootfses/b21ea6c2-bdcd-4511-b57b-86cfdc6722d1
> Failed to execute command: No such file or directory
> I0205 00:49:41.904868 98293 containerizer.cpp:2482] Container 
> e41c62c5-c0e1-4cc0-9c59-4793d9d086bf has exited
> I0205 00:49:41.904922 98293 containerizer.cpp:2119] Destroying container 
> e41c62c5-c0e1-4cc0-9c59-4793d9d086bf in RUNNING state
> I0205 00:49:41.905582 98303 linux_launcher.cpp:505] Asked to destroy 
> container e41c62c5-c0e1-4cc0-9c59-4793d9d086bf
> I0205 00:49:41.906301 98303 linux_launcher.cpp:548] Using freezer to destroy 
> cgroup mesos/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf
> I0205 00:49:41.907871 98298 cgroups.cpp:2726] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf
> I0205 00:49:41.909617 98294 cgroups.cpp:1439] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf after 
> 1.66016ms
> I0205 00:49:41.911558 98300 cgroups.cpp:2744] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf
> I0205 00:49:41.913187 98300 cgroups.cpp:1468] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf after 
> 1.569024ms
> I0205 00:49:41.917798 98294 provisioner.cpp:534] Destroying container rootfs 
> at 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_yYg8z0/provisioner/containers/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf/backends/aufs/rootfses/b21ea6c2-bdcd-4511-b57b-86cfdc6722d1'
>  for container e41c62c5-c0e1-4cc0-9c59-4793d9d086bf
> 

[jira] [Commented] (MESOS-7060) Tests depends on DockerArchive and LinuxRootfs failed.

2017-02-08 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858394#comment-15858394
 ] 

James Peach commented on MESOS-7060:


OK, I see what is happening here. On RHEL, the ELF interpreter in the binary 
matches {{ld.so.cache}}, but it doesn't on Ubuntu. So on Ubuntu we have the 
file at the path specified by the cache, but the ELF {{.interp}} header is 
telling the loader to look for it at a different path that we don't have. 
Because we don't copy the path needed by the {{.interp}} header, exec(2) always 
fails with {{ENOENT}}.

*Fedora 25:*
{noformat}
$ readelf -p .interp /bin/sh
String dump of section '.interp':
  [ 0]  /lib64/ld-linux-x86-64.so.2
$ ldd /bin/sh
linux-vdso.so.1 (0x7ffe09956000)
libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7feead539000)
libdl.so.2 => /lib64/libdl.so.2 (0x7feead335000)
libc.so.6 => /lib64/libc.so.6 (0x7feeacf6f000)
/lib64/ld-linux-x86-64.so.2 (0x55e72ef2a000)
$ ldconfig -p | grep ld-linux
ld-linux-x86-64.so.2 (libc6,x86-64) => /lib64/ld-linux-x86-64.so.2
{noformat}

*Ubuntu 14.04:*
{noformat}
# readelf -p .interp /bin/sh
String dump of section '.interp':
  [ 0]  /lib64/ld-linux-x86-64.so.2
# ldd /bin/sh
linux-vdso.so.1 =>  (0x7ffc249ba000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f12f5472000)
/lib64/ld-linux-x86-64.so.2 (0x7f12f5a57000)
#  ldconfig -p | grep ld-linux
ld-linux-x86-64.so.2 (libc6,x86-64) => 
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
{noformat}

> Tests depends on DockerArchive and LinuxRootfs failed.
> --
>
> Key: MESOS-7060
> URL: https://issues.apache.org/jira/browse/MESOS-7060
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, test
> Environment: ubuntu16
>Reporter: Jie Yu
>Assignee: James Peach
>
> This issue was introduced by patches from MESOS-6588. Reverting the patches 
> in this ticket solves the issue.
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from LinuxFilesystemIsolatorTest
> [ RUN  ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem
> I0205 00:49:41.405323 98276 containerizer.cpp:220] Using isolation: 
> filesystem/linux,docker/runtime,network/cni,volume/image
> I0205 00:49:41.410899 98276 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> E0205 00:49:41.413491 98276 shell.hpp:107] Command 'hadoop version 2>&1' 
> failed; this is the output:
> sh: 1: hadoop: not found
> I0205 00:49:41.413553 98276 fetcher.cpp:69] Skipping URI fetcher plugin 
> 'hadoop' as it could not be created: Failed to create HDFS client: Failed to 
> execute 'hadoop version 2>&1'; the command was either not found or exited 
> with a non-zero exit status: 127
> I0205 00:49:41.416126 98276 provisioner.cpp:249] Using default backend 'aufs'
> I0205 00:49:41.420802 98298 containerizer.cpp:992] Starting container 
> e41c62c5-c0e1-4cc0-9c59-4793d9d086bf for executor 'test_executor' of 
> framework 
> I0205 00:49:41.607056 98303 provisioner.cpp:453] Provisioning image rootfs 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_yYg8z0/provisioner/containers/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf/backends/aufs/rootfses/b21ea6c2-bdcd-4511-b57b-86cfdc6722d1'
>  for container e41c62c5-c0e1-4cc0-9c59-4793d9d086bf using aufs backend
> I0205 00:49:41.614753 98291 linux_launcher.cpp:429] Launching container 
> e41c62c5-c0e1-4cc0-9c59-4793d9d086bf and cloning with namespaces CLONE_NEWNS
> Executing pre-exec command 
> '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/ubuntu\/workspace\/dist\/mesos\/build\/src\/mesos-containerizer"}'
> Executing pre-exec command 
> '{"arguments":["mount","-n","--rbind","\/tmp\/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_yYg8z0\/sandbox","\/tmp\/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_yYg8z0\/provisioner\/containers\/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf\/backends\/aufs\/rootfses\/b21ea6c2-bdcd-4511-b57b-86cfdc6722d1\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
> Changing root to 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_yYg8z0/provisioner/containers/e41c62c5-c0e1-4cc0-9c59-4793d9d086bf/backends/aufs/rootfses/b21ea6c2-bdcd-4511-b57b-86cfdc6722d1
> Failed to execute command: No such file or directory
> I0205 00:49:41.904868 98293 containerizer.cpp:2482] Container 
> e41c62c5-c0e1-4cc0-9c59-4793d9d086bf has exited
> I0205 00:49:41.904922 98293 containerizer.cpp:2119] Destroying container 
> e41c62c5-c0e1-4cc0-9c59-4793d9d086bf in RUNNING state
> I0205 00:49:41.905582 98303 

[jira] [Commented] (MESOS-5393) XFS disk isolator should disallow sandbox writes when no 'disk' is used in executor/task

2017-01-24 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836565#comment-15836565
 ] 

James Peach commented on MESOS-5393:


Implemented as a 1-block quota. Note that this makes it impossible to run a 
task because the quota gets used by agent logs.

> XFS disk isolator should disallow sandbox writes when no 'disk' is used in 
> executor/task
> 
>
> Key: MESOS-5393
> URL: https://issues.apache.org/jira/browse/MESOS-5393
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: James Peach
>
> This is similar to MESOS-5081 and was left as a TODO in the first patch for 
> the XFS isolator.
> {noformat:title=}
> // TODO(jpeach) If there's no disk resource attached, we should set the
> // minimum quota (1 block), since a zero quota would be unconstrained.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5116) Investigate supporting accounting only mode in XFS isolator

2017-01-24 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836891#comment-15836891
 ] 

James Peach edited comment on MESOS-5116 at 1/25/17 1:10 AM:
-

| Stop storing agent flags in the XFS disk isolator. | 
https://reviews.apache.org/r/55896/ |
| Add support for not enforcing XFS quotas. | 
https://reviews.apache.org/r/55897/ |
| Update XFS disk isolator documentation. | https://reviews.apache.org/r/55903/ 
|


was (Author: jamespeach):
| Stop storing agent flags in the XFS disk isolator. | 
https://reviews.apache.org/r/55896/ |
| Add support for not enforcing XFS quotas. 
|https://reviews.apache.org/r/55897/ |

> Investigate supporting accounting only mode in XFS isolator
> ---
>
> Key: MESOS-5116
> URL: https://issues.apache.org/jira/browse/MESOS-5116
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Yan Xu
>Assignee: James Peach
>
> The initial implementation of XFS isolator always enforces the disk quota 
> limit. In contrast, Posix disk isolator supports optionally monitoring the 
> disk usage without enforcement. This eases the transition into disk quota 
> enforcement mode.
> Mesos agent provides a {{flags.enforce_container_disk_quota}} flag to turn on 
> enforcement when the Posix isolator is added. With XFS either we support it 
> as well or we need to change the flag so it's Posix disk isolator specific.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] (MESOS-7031) Offline garbage collection.

2017-01-30 Thread James Peach (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 James Peach created an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Mesos /  MESOS-7031 
 
 
 
  Offline garbage collection.  
 
 
 
 
 
 
 
 
 

Issue Type:
 
  Bug 
 
 
 

Assignee:
 

 Unassigned 
 
 
 

Components:
 

 agent 
 
 
 

Created:
 

 30/Jan/17 17:04 
 
 
 

Priority:
 
  Major 
 
 
 

Reporter:
 
 James Peach 
 
 
 
 
 
 
 
 
 
 
If you don't manage the agent disk carefully, it is possible to fill the disk to such an extent that the agent will not be able to start (it will fail to write some checkpoint file and exit). Recovering from this is a manual operations task, which is undesirable. 
It would be helpful if there was a way to run an offline agent garbage collection, to free up enough space for the agent to recover. This probably needs at least 2 levels; Level 1 would clean up scratch space based on existing policy. Level 2 would clean up everything that doesn't have a running task. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
  

[jira] [Comment Edited] (MESOS-7017) HTTP API responses can crash the master.

2017-01-27 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843632#comment-15843632
 ] 

James Peach edited comment on MESOS-7017 at 1/27/17 11:10 PM:
--

Here's the partial stack trace:

{noformat}
#2  0x7fb830734696 in google::DumpStackTraceAndExit () at 
src/utilities.cc:147
#3  0x7fb83072c08d in google::LogMessage::Fail () at src/logging.cc:1458
#4  0x7fb83072de1d in google::LogMessage::SendToLog (this=Unhandled dwarf 
expression opcode 0xf3
) at src/logging.cc:1412
#5  0x7fb83072bc12 in google::LogMessage::Flush (this=0x7fb8227f3890) at 
src/logging.cc:1281
#6  0x7fb83072e7f9 in google::LogMessageFatal::~LogMessageFatal 
(this=Unhandled dwarf expression opcode 0xf3
) at src/logging.cc:1984
#7  0x7fb82fb35113 in evolve (response=...) at 
../../src/internal/evolve.cpp:63
#8  mesos::internal::evolve (response=...) at ../../src/internal/evolve.cpp:218
#9  0x7fb82fba8dd6 in mesos::internal::master::Master::Http::&)>::operator()(const 
std::tuple &) const (__closure=0x7fb720c7a940, 
approvers=Unhandled dwarf expression opcode 0xf3
)
at ../../src/master/http.cpp:3772
#10 0x7fb82fba9068 in operator() (__functor=Unhandled dwarf expression 
opcode 0xf3
) at ../../3rdparty/libprocess/include/process/deferred.hpp:225
#11 std::_Function_handler() const:: 
[with R = process::Future; P0 = const 
std::tuple&; F = 
mesos::internal::master::Master::Http::getTasks(const mesos::master::Call&, 
const Option&, mesos::ContentType) 
const::&)>]:: >::_M_invoke(const 
std::_Any_data &) (__functor=Unhandled dwarf expression opcode 0xf3
) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2025
#12 0x7fb82faf73b3 in operator() (__functor=Unhandled dwarf expression 
opcode 0xf3
) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2439
#13 operator() (__functor=Unhandled dwarf expression opcode 0xf3
) at ../../3rdparty/libprocess/include/process/dispatch.hpp:112
#14 std::_Function_handler::operator()(const 
process::UPID&, F&&) [with F = 
std::function&; R = 
process::http::Response]:: >::_M_invoke(const 
std::_Any_data &, process::ProcessBase *) (__functor=Unhandled dwarf expression 
opcode 0xf3
)
at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2039
{noformat}

So the proximate cause of the crash is that {{evolve}} does a bidirectional 
serialization. For large messages this causes 2 large allocations even if is 
doesn't trigger the {{CHECK}}.


was (Author: jamespeach):
Here's the partial stack trace:

{noformat}
#2  0x7fb830734696 in google::DumpStackTraceAndExit () at 
src/utilities.cc:147
#3  0x7fb83072c08d in google::LogMessage::Fail () at src/logging.cc:1458
#4  0x7fb83072de1d in google::LogMessage::SendToLog (this=Unhandled dwarf 
expression opcode 0xf3
) at src/logging.cc:1412
#5  0x7fb83072bc12 in google::LogMessage::Flush (this=0x7fb8227f3890) at 
src/logging.cc:1281
#6  0x7fb83072e7f9 in google::LogMessageFatal::~LogMessageFatal 
(this=Unhandled dwarf expression opcode 0xf3
) at src/logging.cc:1984
#7  0x7fb82fb35113 in evolve (response=...) at 
../../src/internal/evolve.cpp:63
#8  mesos::internal::evolve (response=...) at ../../src/internal/evolve.cpp:218
#9  0x7fb82fba8dd6 in mesos::internal::master::Master::Http::&)>::operator()(const 
std::tuple &) const (__closure=0x7fb720c7a940, 
approvers=Unhandled dwarf expression opcode 0xf3
)
at ../../src/master/http.cpp:3772
#10 0x7fb82fba9068 in operator() (__functor=Unhandled dwarf expression 
opcode 0xf3
) at ../../3rdparty/libprocess/include/process/deferred.hpp:225
#11 std::_Function_handler() const:: 
[with R = process::Future; P0 = const 
std::tuple&; F = 
mesos::internal::master::Master::Http::getTasks(const mesos::master::Call&, 
const Option&, mesos::ContentType) 
const::&)>]:: >::_M_invoke(const 
std::_Any_data &) (__functor=Unhandled dwarf expression opcode 0xf3
) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2025
#12 0x7fb82faf73b3 in operator() (__functor=Unhandled dwarf expression 
opcode 0xf3
) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2439
#13 operator() 

[jira] [Commented] (MESOS-7017) HTTP API responses can crash the master.

2017-01-27 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843632#comment-15843632
 ] 

James Peach commented on MESOS-7017:


Here's the partial stack trace:

{noformat}
#2  0x7fb830734696 in google::DumpStackTraceAndExit () at 
src/utilities.cc:147
#3  0x7fb83072c08d in google::LogMessage::Fail () at src/logging.cc:1458
#4  0x7fb83072de1d in google::LogMessage::SendToLog (this=Unhandled dwarf 
expression opcode 0xf3
) at src/logging.cc:1412
#5  0x7fb83072bc12 in google::LogMessage::Flush (this=0x7fb8227f3890) at 
src/logging.cc:1281
#6  0x7fb83072e7f9 in google::LogMessageFatal::~LogMessageFatal 
(this=Unhandled dwarf expression opcode 0xf3
) at src/logging.cc:1984
#7  0x7fb82fb35113 in evolve (response=...) at 
../../src/internal/evolve.cpp:63
#8  mesos::internal::evolve (response=...) at ../../src/internal/evolve.cpp:218
#9  0x7fb82fba8dd6 in mesos::internal::master::Master::Http::&)>::operator()(const 
std::tuple &) const (__closure=0x7fb720c7a940, 
approvers=Unhandled dwarf expression opcode 0xf3
)
at ../../src/master/http.cpp:3772
#10 0x7fb82fba9068 in operator() (__functor=Unhandled dwarf expression 
opcode 0xf3
) at ../../3rdparty/libprocess/include/process/deferred.hpp:225
#11 std::_Function_handler() const:: 
[with R = process::Future; P0 = const 
std::tuple&; F = 
mesos::internal::master::Master::Http::getTasks(const mesos::master::Call&, 
const Option&, mesos::ContentType) 
const::&)>]:: >::_M_invoke(const 
std::_Any_data &) (__functor=Unhandled dwarf expression opcode 0xf3
) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2025
#12 0x7fb82faf73b3 in operator() (__functor=Unhandled dwarf expression 
opcode 0xf3
) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2439
#13 operator() (__functor=Unhandled dwarf expression opcode 0xf3
) at ../../3rdparty/libprocess/include/process/dispatch.hpp:112
#14 std::_Function_handler::operator()(const 
process::UPID&, F&&) [with F = 
std::function&; R = 
process::http::Response]:: >::_M_invoke(const 
std::_Any_data &, process::ProcessBase *) (__functor=Unhandled dwarf expression 
opcode 0xf3
)
at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2039
{noformat}

So the proximate cause of the crash is that {{evolve}} does an unnecessary 
bidirectional serialization. For large messages this causes 2 unnecessary large 
allocations even if is doesn't trigger the {{CHECK}}.

> HTTP API responses can crash the master.
> 
>
> Key: MESOS-7017
> URL: https://issues.apache.org/jira/browse/MESOS-7017
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: James Peach
>Priority: Critical
>
> The master can crash when generating large responses to small API requests. 
> One manifestation of this is querying the tasks.
> {noformat}
> [libprotobuf ERROR google/protobuf/io/coded_stream.cc:180] A protocol message 
> was rejected because it was too big (more than 67108864 bytes).  To increase 
> the limit (or to disable these warnings), see 
> CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
> F0126 18:34:18.790386 26230 evolve.cpp:63] Check failed: 
> t.ParsePartialFromString(data) Failed to parse mesos.v1.master.Response while 
> evolving from mesos.master.Response
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6935) Operator API to get only current frameworks and tasks.

2017-01-27 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6935:
---
Summary: Operator API to get only current frameworks and tasks.  (was: 
Operator API to get current frameworks only.)

> Operator API to get only current frameworks and tasks.
> --
>
> Key: MESOS-6935
> URL: https://issues.apache.org/jira/browse/MESOS-6935
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: James Peach
>
> The master {{GET_FRAMEWORKS}} operator API always return the current 
> frameworks and the {{completed_frameworks}}. Since the set of 
> {{completed_frameworks}} can be very large and is often not wanted, it would 
> be helpful if there was a way to exclude those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources

2017-01-27 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843110#comment-15843110
 ] 

James Peach commented on MESOS-1807:


FWIW, if you try to run a task with 0 disk and actually enforce that, the task 
will fail. See MESOS-5393.

> Disallow executors with cpu only or memory only resources
> -
>
> Key: MESOS-1807
> URL: https://issues.apache.org/jira/browse/MESOS-1807
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
> Attachments: Screenshot 2015-07-28 14.40.35.png
>
>
> Currently master allows executors to be launched with either only cpus or 
> only memory but we shouldn't allow that.
> This is because executor is an actual unix process that is launched by the 
> slave. If an executor doesn't specify cpus, what should the cpu limits be for 
> that executor when there are no tasks running on it? If no cpu limits are set 
> then it might starve other executors/tasks on the slave violating isolation 
> guarantees. Same goes with memory. Moreover, the current 
> containerizer/isolator code will throw failures when using such an executor, 
> e.g., when the last task on the executor finishes and Containerizer::update() 
> is called with 0 cpus or 0 mem.
> According to a source code [TODO | 
> https://github.com/apache/mesos/blob/0226620747e1769434a1a83da547bfc3470a9549/src/master/validation.cpp#L400]
>  this should also include checking whether requested resources are greater 
> than  MIN_CPUS/MIN_BYTES.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-7021) Consistent symlink behavior for os::stat accessors.

2017-01-27 Thread James Peach (JIRA)
James Peach created MESOS-7021:
--

 Summary: Consistent symlink behavior for os::stat accessors.
 Key: MESOS-7021
 URL: https://issues.apache.org/jira/browse/MESOS-7021
 Project: Mesos
  Issue Type: Improvement
Reporter: James Peach
Priority: Trivial


The various stat(2) accessories in the {{os::stat}} namespace are not 
consistent with the ability to specify whether they follow symlinks. Update 
them so they consistently take a {{FollowSymlink}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-7021) Consistent symlink behavior for os::stat accessors.

2017-01-27 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-7021:
--

Assignee: James Peach

> Consistent symlink behavior for os::stat accessors.
> ---
>
> Key: MESOS-7021
> URL: https://issues.apache.org/jira/browse/MESOS-7021
> Project: Mesos
>  Issue Type: Improvement
>Reporter: James Peach
>Assignee: James Peach
>Priority: Trivial
>
> The various stat(2) accessories in the {{os::stat}} namespace are not 
> consistent with the ability to specify whether they follow symlinks. Update 
> them so they consistently take a {{FollowSymlink}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-7017) HTTP API responses can crash the master

2017-01-26 Thread James Peach (JIRA)
James Peach created MESOS-7017:
--

 Summary: HTTP API responses can crash the master
 Key: MESOS-7017
 URL: https://issues.apache.org/jira/browse/MESOS-7017
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach


The master can crash when generating large responses to small API requests. One 
manifestation of this is querying the tasks.

{noformat}
[libprotobuf ERROR google/protobuf/io/coded_stream.cc:180] A protocol message 
was rejected because it was too big (more than 67108864 bytes).  To increase 
the limit (or to disable these warnings), see 
CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
F0126 18:34:18.790386 26230 evolve.cpp:63] Check failed: 
t.ParsePartialFromString(data) Failed to parse mesos.v1.master.Response while 
evolving from mesos.master.Response
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-7018) Move historical data out of memory.

2017-01-26 Thread James Peach (JIRA)
James Peach created MESOS-7018:
--

 Summary: Move historical data out of memory.
 Key: MESOS-7018
 URL: https://issues.apache.org/jira/browse/MESOS-7018
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach


There is a bunch of history (e.g., completed tasks, completed frameworks) that 
is kept in memory. It is information that is not commonly needed, so keeping it 
in memory is a waste. Keeping it in memory also limits the amount of history 
you can keep.

If we spool this history to disk we can keep much longer history at lower cost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6935) Operator API to get current frameworks only.

2017-01-26 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840548#comment-15840548
 ] 

James Peach commented on MESOS-6935:


Same problem for tasks. Mostly, the set of completed tasks is large and 
uninteresting.

> Operator API to get current frameworks only.
> 
>
> Key: MESOS-6935
> URL: https://issues.apache.org/jira/browse/MESOS-6935
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: James Peach
>
> The master {{GET_FRAMEWORKS}} operator API always return the current 
> frameworks and the {{completed_frameworks}}. Since the set of 
> {{completed_frameworks}} can be very large and is often not wanted, it would 
> be helpful if there was a way to exclude those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-7017) HTTP API responses can crash the master

2017-01-26 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-7017:
---
Component/s: HTTP API

> HTTP API responses can crash the master
> ---
>
> Key: MESOS-7017
> URL: https://issues.apache.org/jira/browse/MESOS-7017
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: James Peach
>
> The master can crash when generating large responses to small API requests. 
> One manifestation of this is querying the tasks.
> {noformat}
> [libprotobuf ERROR google/protobuf/io/coded_stream.cc:180] A protocol message 
> was rejected because it was too big (more than 67108864 bytes).  To increase 
> the limit (or to disable these warnings), see 
> CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
> F0126 18:34:18.790386 26230 evolve.cpp:63] Check failed: 
> t.ParsePartialFromString(data) Failed to parse mesos.v1.master.Response while 
> evolving from mesos.master.Response
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-7017) HTTP API responses can crash the master.

2017-01-26 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840546#comment-15840546
 ] 

James Peach commented on MESOS-7017:


Need to verify, but experimentally it looks like the whole response is buffered 
in memory before sending anything. We ought to stream the response.

> HTTP API responses can crash the master.
> 
>
> Key: MESOS-7017
> URL: https://issues.apache.org/jira/browse/MESOS-7017
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: James Peach
>
> The master can crash when generating large responses to small API requests. 
> One manifestation of this is querying the tasks.
> {noformat}
> [libprotobuf ERROR google/protobuf/io/coded_stream.cc:180] A protocol message 
> was rejected because it was too big (more than 67108864 bytes).  To increase 
> the limit (or to disable these warnings), see 
> CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
> F0126 18:34:18.790386 26230 evolve.cpp:63] Check failed: 
> t.ParsePartialFromString(data) Failed to parse mesos.v1.master.Response while 
> evolving from mesos.master.Response
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-7017) HTTP API responses can crash the master.

2017-01-26 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-7017:
---
Summary: HTTP API responses can crash the master.  (was: HTTP API responses 
can crash the master)

> HTTP API responses can crash the master.
> 
>
> Key: MESOS-7017
> URL: https://issues.apache.org/jira/browse/MESOS-7017
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: James Peach
>
> The master can crash when generating large responses to small API requests. 
> One manifestation of this is querying the tasks.
> {noformat}
> [libprotobuf ERROR google/protobuf/io/coded_stream.cc:180] A protocol message 
> was rejected because it was too big (more than 67108864 bytes).  To increase 
> the limit (or to disable these warnings), see 
> CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
> F0126 18:34:18.790386 26230 evolve.cpp:63] Check failed: 
> t.ParsePartialFromString(data) Failed to parse mesos.v1.master.Response while 
> evolving from mesos.master.Response
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-7019) SCRAM authentication.

2017-01-26 Thread James Peach (JIRA)
James Peach created MESOS-7019:
--

 Summary: SCRAM authentication.
 Key: MESOS-7019
 URL: https://issues.apache.org/jira/browse/MESOS-7019
 Project: Mesos
  Issue Type: Improvement
Reporter: James Peach


Add support for the SCRAM authentication method, [RFC 5802 | 
https://tools.ietf.org/html/rfc5802] is the SASL mechanism and [RFC 7804 | 
https://tools.ietf.org/html/rfc7804 ] is the equivalent HTTP authentication 
mechanism.

SCRAM is a very simple digest-style authentication mechanism that has both a 
strong digest scheme and mutual authentication. The server is not required to 
have the cleartext passwords. It is suitable for use both the agent 
authentication API and the HTTP authentication API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-7122) Process reaper should have a dedicated thread to avoid deadlock.

2017-02-17 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872216#comment-15872216
 ] 

James Peach commented on MESOS-7122:


While I agree that blocking should be avoided, the point of this bug is that it 
is possible for the reaper to not reap. The reaper has to be able to reliably 
reap so that forward progress can be made in the unfortunate event of code 
blocking on subprocesses.

Running a separate thread for each {{waitpid}} seems expensive but would work. 
You could probably also implement this by having an event loop in {{kevent}} to 
monitor the PIDs directly, or by using {{signalfd}} on Linux to intercept 
{{SIGCHLD}} and reap any registered PIDs.

> Process reaper should have a dedicated thread to avoid deadlock.
> 
>
> Key: MESOS-7122
> URL: https://issues.apache.org/jira/browse/MESOS-7122
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>
> In a test environment, we saw that libprocess can deadlock when the process 
> reaper is unable to run. 
> This happens in the Mesos HDFS client, which synchronously runs a {{hadoop}} 
> subprocess. If this happens too many times, the {{ReaperProcess}} is never 
> scheduled to reap the subprocess statuses. Since the HDFS {{Future}} never 
> completes, we deadlock with all the threads in the call stack below. If there 
> was a dedicated thread for the {{ReaperProcess}} to run on, or some other way 
> to endure that is is scheduled we could avoid the deadlock.
> {noformat}
> #0  0x7f67b6ffc68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f67b6da12fc in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7f67b8b864f6 in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.2.0.so
> #3  0x7f67b8b8d347 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.2.0.so
> #4  0x7f67b8b51a85 in process::Latch::await(Duration const&) () from 
> /usr/lib64/libmesos-1.2.0.so
> #5  0x7f67b834fc9f in process::Future::await(Duration const&) 
> const () from /usr/lib64/libmesos-1.2.0.so
> #6  0x7f67b833d700 in 
> mesos::internal::slave::fetchSize(std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&) () from /usr/lib64/libmesos-1.2.0.so
> #7  0x7f67b833df5e in 
> std::result_of  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()>::type 
> process::AsyncExecutorProcess::execute  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2}>(std::result_of const&, 
> boost::disable_if const&::is_void  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()> >, void>::type*) () from 
> /usr/lib64/libmesos-1.2.0.so
> #8  0x7f67b833a3d5 in std::_Function_handler ()(process::ProcessBase*), process::Future > 
> process::dispatch, process::AsyncExecutorProcess, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} const&, void*, 
> {lambda()#2}, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} 
> const&>(process::PID const&, process::Future 
> (process::PID::*)(mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID
>  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, 

[jira] [Commented] (MESOS-7122) Process reaper should have a dedicated thread to avoid deadlock.

2017-02-17 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872285#comment-15872285
 ] 

James Peach commented on MESOS-7122:


Ah. I'd agree that the reaper is a core part of libprocess and is special :)

> Process reaper should have a dedicated thread to avoid deadlock.
> 
>
> Key: MESOS-7122
> URL: https://issues.apache.org/jira/browse/MESOS-7122
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>
> In a test environment, we saw that libprocess can deadlock when the process 
> reaper is unable to run. 
> This happens in the Mesos HDFS client, which synchronously runs a {{hadoop}} 
> subprocess. If this happens too many times, the {{ReaperProcess}} is never 
> scheduled to reap the subprocess statuses. Since the HDFS {{Future}} never 
> completes, we deadlock with all the threads in the call stack below. If there 
> was a dedicated thread for the {{ReaperProcess}} to run on, or some other way 
> to endure that is is scheduled we could avoid the deadlock.
> {noformat}
> #0  0x7f67b6ffc68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f67b6da12fc in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7f67b8b864f6 in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.2.0.so
> #3  0x7f67b8b8d347 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.2.0.so
> #4  0x7f67b8b51a85 in process::Latch::await(Duration const&) () from 
> /usr/lib64/libmesos-1.2.0.so
> #5  0x7f67b834fc9f in process::Future::await(Duration const&) 
> const () from /usr/lib64/libmesos-1.2.0.so
> #6  0x7f67b833d700 in 
> mesos::internal::slave::fetchSize(std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&) () from /usr/lib64/libmesos-1.2.0.so
> #7  0x7f67b833df5e in 
> std::result_of  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()>::type 
> process::AsyncExecutorProcess::execute  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2}>(std::result_of const&, 
> boost::disable_if const&::is_void  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()> >, void>::type*) () from 
> /usr/lib64/libmesos-1.2.0.so
> #8  0x7f67b833a3d5 in std::_Function_handler ()(process::ProcessBase*), process::Future > 
> process::dispatch, process::AsyncExecutorProcess, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} const&, void*, 
> {lambda()#2}, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} 
> const&>(process::PID const&, process::Future 
> (process::PID::*)(mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID
>  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} const&, void*), {lambda()#2}, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, 

[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-02-22 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879278#comment-15879278
 ] 

James Peach commented on MESOS-7160:


perf aborted for some reason.

> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7143) ABORT checks its preconditions incorrectly and incompletely

2017-02-19 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873919#comment-15873919
 ] 

James Peach commented on MESOS-7143:


Similarly the {{ABORT}} macro is optimistic. It lets you pass multiple 
arguments but {{_Abort}} only takes 1 message:

{code}
#define ABORT(...) _Abort(_ABORT_PREFIX, __VA_ARGS__)
{code}

For _Abort itself, consider annotating the arguments to be non-null and 
removing the checks. It's not cool to abort without an actionable message.

> ABORT checks its preconditions incorrectly and incompletely
> ---
>
> Key: MESOS-7143
> URL: https://issues.apache.org/jira/browse/MESOS-7143
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Affects Versions: 0.23.0
>Reporter: Benjamin Bannier
>Priority: Minor
>  Labels: coverity, tech-debt
>
> Currently, stout's {{ABORT}} (which is mapped to {{_Abort}}) checks it 
> precondition incompletely and incorrectly.
> Its current control flow is roughly
> {code}
> void _Abort(const char* prefix, const char* message)
> {
>   size_t prefix_len = strlen(prefix);
>   size_t message_len = strlen(message);
>   
>   // Async-safe write.
>while(::write(2, prefix, prefix_len) == -1 && errno == EINTR);
>while(message != nullptr &&
>  ::write(2, message, message_len) == -1 && errno == EINTR);
> }
> {code}
> We here check the precondition {{message != nullptr}} after we already have 
> called {{strlen(message)}}; calling {{strlen}} on a {{nullptr}} already 
> triggers undefined behavior.
> Similarly, we never guard against a {{prefix}} which is {{nullptr}}, but 
> unconditionally call {{strlen}} on it.
> It seems it should be possible to assert that neither {{prefix}} nor 
> {{message}} are {{nullptr}} before any use.
> This was diagnosed by coverity as CID-1400833, and has been present in all 
> releases since 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7122) Process reaper should have a dedicated thread to avoid deadlock.

2017-02-13 Thread James Peach (JIRA)
James Peach created MESOS-7122:
--

 Summary: Process reaper should have a dedicated thread to avoid 
deadlock.
 Key: MESOS-7122
 URL: https://issues.apache.org/jira/browse/MESOS-7122
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Reporter: James Peach


In a test environment, we saw that libprocess can deadlock when the process 
reaper is unable to run. 

This happens in the Mesos HDFS client, which synchronously runs a {{hadoop}} 
subprocess. If this happens too many times, the {{ReaperProcess}} is never 
scheduled to reap the subprocess statuses. Since the HDFS {{Future}} never 
completes, we deadlock with all the threads in the call stack below. If there 
was a dedicated thread for the {{ReaperProcess}} to run on, or some other way 
to endure that is is scheduled we could avoid the deadlock.

{noformat}
#0  0x7f67b6ffc68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f67b6da12fc in 
std::condition_variable::wait(std::unique_lock&) () from 
/usr/lib64/libstdc++.so.6
#2  0x7f67b8b864f6 in process::ProcessManager::wait(process::UPID const&) 
() from /usr/lib64/libmesos-1.2.0.so
#3  0x7f67b8b8d347 in process::wait(process::UPID const&, Duration const&) 
() from /usr/lib64/libmesos-1.2.0.so
#4  0x7f67b8b51a85 in process::Latch::await(Duration const&) () from 
/usr/lib64/libmesos-1.2.0.so
#5  0x7f67b834fc9f in process::Future::await(Duration const&) const 
() from /usr/lib64/libmesos-1.2.0.so
#6  0x7f67b833d700 in 
mesos::internal::slave::fetchSize(std::basic_string const&, 
Option > 
const&) () from /usr/lib64/libmesos-1.2.0.so
#7  0x7f67b833df5e in 
std::result_of const&, 
Option > 
const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
const&)::{lambda()#2} ()()>::type 
process::AsyncExecutorProcess::execute const&, 
Option > 
const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
const&)::{lambda()#2}>(std::result_of const&, boost::disable_if, std::allocator > const&, 
Option > 
const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
const&)::{lambda()#2} ()()> >, void>::type*) () from 
/usr/lib64/libmesos-1.2.0.so
#8  0x7f67b833a3d5 in std::_Function_handler > 
process::dispatch, process::AsyncExecutorProcess, 
mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
mesos::CommandInfo const&, std::basic_string const&, Option > const&, mesos::SlaveID const&, 
mesos::internal::slave::Flags const&)::{lambda()#2} const&, void*, 
{lambda()#2}, mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID 
const&, mesos::CommandInfo const&, std::basic_string const&, 
Option > 
const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
const&)::{lambda()#2} const&>(process::PID 
const&, process::Future 
(process::PID::*)(mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID
 const&, mesos::CommandInfo const&, std::basic_string const&, 
Option > 
const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
const&)::{lambda()#2} const&, void*), {lambda()#2}, 
mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
mesos::CommandInfo const&, std::basic_string const&, Option > const&, mesos::SlaveID const&, 
mesos::internal::slave::Flags const&)::{lambda()#2} 
const&)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data const&, 
process::ProcessBase*) () from /usr/lib64/libmesos-1.2.0.so
#9  0x7f67b8b85ede in 
process::ProcessManager::resume(process::ProcessBase*) () from 
/usr/lib64/libmesos-1.2.0.so
#10 0x7f67b8b8fc8f in 
std::thread::_Impl >::_M_run() () from /usr/lib64/libmesos-1.2.0.so
#11 0x7f67b6da1470 in ?? () from /usr/lib64/libstdc++.so.6
#12 0x7f67b6ff8aa1 in start_thread () from /lib64/libpthread.so.0
#13 0x7f67b6a3faad in 

[jira] [Created] (MESOS-7115) Agent should prefer LOG(FATAL) over EXIT().

2017-02-10 Thread James Peach (JIRA)
James Peach created MESOS-7115:
--

 Summary: Agent should prefer LOG(FATAL) over EXIT().
 Key: MESOS-7115
 URL: https://issues.apache.org/jira/browse/MESOS-7115
 Project: Mesos
  Issue Type: Bug
  Components: agent
Reporter: James Peach
Priority: Minor


I saw the agent exit with an auth failure:
{noformat}
I0210 14:16:49.731459  9503 authenticatee.cpp:259] Received SASL authentication 
step
Master master@17.174.144.199:5050 refused authentication
{noformat}

Note the lack of log metadata on the exit message. This message (from 
{{slave.cpp}} and a number of others in the same file should all use 
{{LOG(FATAL)}} so that log aggregation can pick up the timestamp, error 
severity, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7115) Agent should prefer LOG(FATAL) over EXIT().

2017-02-14 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-7115:
--

Assignee: James Peach

> Agent should prefer LOG(FATAL) over EXIT().
> ---
>
> Key: MESOS-7115
> URL: https://issues.apache.org/jira/browse/MESOS-7115
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> I saw the agent exit with an auth failure:
> {noformat}
> I0210 14:16:49.731459  9503 authenticatee.cpp:259] Received SASL 
> authentication step
> Master master@17.174.144.199:5050 refused authentication
> {noformat}
> Note the lack of log metadata on the exit message. This message (from 
> {{slave.cpp}} and a number of others in the same file should all use 
> {{LOG(FATAL)}} so that log aggregation can pick up the timestamp, error 
> severity, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6433) cgroups PERF related test cases failed

2017-02-13 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863955#comment-15863955
 ] 

James Peach commented on MESOS-6433:


Probably MESOS-7049

> cgroups PERF related test cases failed
> --
>
> Key: MESOS-6433
> URL: https://issues.apache.org/jira/browse/MESOS-6433
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: cgroups
>
> Test with
> {code}
> sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*CgroupsIsolatorTest.*" 
> --verbose
> [==] 10 tests from 1 test case ran. (8071 ms total)
> [  PASSED  ] 6 tests.
> [  FAILED  ] 4 tests, listed below:
> [  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_PERF_Sample
> [  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_PERF_PerfForward
> [  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_MemoryForward
> [  FAILED  ] CgroupsIsolatorTest.ROOT_CGROUPS_MemoryBackward
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6982) PerfTest.Version fails on recent Arch Linux

2017-02-13 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-6982:
--

Assignee: James Peach

> PerfTest.Version fails on recent Arch Linux
> ---
>
> Key: MESOS-6982
> URL: https://issues.apache.org/jira/browse/MESOS-6982
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: James Peach
>  Labels: mesosphere, newbie
>
> {noformat}
> [ RUN  ] PerfTest.Version
> ../../mesos/src/tests/containerizer/perf_tests.cpp:134: Failure
> (perf::version()).failure(): Invalid version component 'g69973b': Failed to 
> convert 'g69973b' to number
> [  FAILED  ] PerfTest.Version (50 ms)
> {noformat}
> {noformat}
> $ perf --version
> perf version 4.9.g69973b
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-6935) Operator API to get current frameworks only.

2017-01-17 Thread James Peach (JIRA)
James Peach created MESOS-6935:
--

 Summary: Operator API to get current frameworks only.
 Key: MESOS-6935
 URL: https://issues.apache.org/jira/browse/MESOS-6935
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API
Reporter: James Peach


The master {{GET_FRAMEWORKS}} operator API always return the current frameworks 
and the {{completed_frameworks}}. Since the set of {{completed_frameworks}} can 
be very large and is often not wanted, it would be helpful if there was a way 
to exclude those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6946) Make wait status checks consistent.

2017-01-18 Thread James Peach (JIRA)
James Peach created MESOS-6946:
--

 Summary: Make wait status checks consistent.
 Key: MESOS-6946
 URL: https://issues.apache.org/jira/browse/MESOS-6946
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach
Priority: Trivial


There are various places that test the {{wait(2)}} exit status in different 
ways. Clean this up to be consistent and use {{WSTRINGIFY}} to format error 
messages where appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6946) Make wait status checks consistent.

2017-01-18 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-6946:
--

Assignee: James Peach

> Make wait status checks consistent.
> ---
>
> Key: MESOS-6946
> URL: https://issues.apache.org/jira/browse/MESOS-6946
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>Priority: Trivial
>
> There are various places that test the {{wait(2)}} exit status in different 
> ways. Clean this up to be consistent and use {{WSTRINGIFY}} to format error 
> messages where appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6858) network/cni isolator generates incomplete resolv.conf

2017-01-19 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-6858:
--

Assignee: James Peach

> network/cni isolator generates incomplete resolv.conf
> -
>
> Key: MESOS-6858
> URL: https://issues.apache.org/jira/browse/MESOS-6858
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network
>Reporter: James Peach
>Assignee: James Peach
>
> The CNI [network 
> configuration|https://github.com/containernetworking/cni/blob/master/SPEC.md#network-configuration]
>  dictionary contains entries for the {{/etc/resolv.conf}}, {{nameservers}}, 
> {{domain}}, {{search}} and {{options}} fields.
> In {{NetworkCniIsolatorProcess::_isolate()}}, the {{network/cni}} isolator 
> only emits the {{nameservers}} and ignores the remaining fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6562) Use JSON content type in mesos-execute.

2016-11-08 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648230#comment-15648230
 ] 

James Peach commented on MESOS-6562:


Ping [~jieyu] [~anandmazumdar] [~vinodkone]. I can post the trivial patch if 
anyone volunteers to shepherd.

> Use JSON content type in mesos-execute.
> ---
>
> Key: MESOS-6562
> URL: https://issues.apache.org/jira/browse/MESOS-6562
> Project: Mesos
>  Issue Type: Improvement
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Use the {{mesos::ContentType::JSON}} in {{mesas-execute}} so that we can 
> easily packet trace the scheduler interactions. This makes debugging a lot 
> easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6562) Use JSON content type in mesos-execute.

2016-11-08 Thread James Peach (JIRA)
James Peach created MESOS-6562:
--

 Summary: Use JSON content type in mesos-execute.
 Key: MESOS-6562
 URL: https://issues.apache.org/jira/browse/MESOS-6562
 Project: Mesos
  Issue Type: Improvement
Reporter: James Peach
Assignee: James Peach
Priority: Minor


Use the {{mesos::ContentType::JSON}} in {{mesas-execute}} so that we can easily 
packet trace the scheduler interactions. This makes debugging a lot easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6556) UTS namespace isolator

2016-11-09 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652210#comment-15652210
 ] 

James Peach commented on MESOS-6556:


| Add net::setDomainname() helper API. | https://reviews.apache.org/r/53626/ |
| Implement a namespaces/uts isolator. | https://reviews.apache.org/r/53627/ |
|  Document the namespaces/uts isolator. | https://reviews.apache.org/r/53628/ |

> UTS namespace isolator
> --
>
> Key: MESOS-6556
> URL: https://issues.apache.org/jira/browse/MESOS-6556
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Add a {{namespace/uts}} isolator for doing UTS namespace isolation without 
> using the CNI isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6562) Use JSON content type in mesos-execute.

2016-11-09 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652174#comment-15652174
 ] 

James Peach commented on MESOS-6562:


| Use JSON content type in mesos-execute.  | 
https://reviews.apache.org/r/53624/| 

> Use JSON content type in mesos-execute.
> ---
>
> Key: MESOS-6562
> URL: https://issues.apache.org/jira/browse/MESOS-6562
> Project: Mesos
>  Issue Type: Improvement
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Use the {{mesos::ContentType::JSON}} in {{mesas-execute}} so that we can 
> easily packet trace the scheduler interactions. This makes debugging a lot 
> easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6520) Make errno an explicit argument for ErrnoError.

2016-11-07 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6520:
---
Shepherd: Michael Park

> Make errno an explicit argument for ErrnoError.
> ---
>
> Key: MESOS-6520
> URL: https://issues.apache.org/jira/browse/MESOS-6520
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Make {{errno}} an explicit argument to {{ErrnoError}}. Right now, the 
> constructor to {{ErrnoError}} references {{errno}} directly, which makes it 
> awkward to pass a custom {{errno}} value (you have to set {{errno}} globally).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6556) UTS namespace isolator

2016-11-07 Thread James Peach (JIRA)
James Peach created MESOS-6556:
--

 Summary: UTS namespace isolator
 Key: MESOS-6556
 URL: https://issues.apache.org/jira/browse/MESOS-6556
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: James Peach
Assignee: James Peach
Priority: Minor


Add a {{namespace/uts}} isolator for doing UTS namespace isolation without 
using the CNI isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6557) IPC namespace isolator

2016-11-07 Thread James Peach (JIRA)
James Peach created MESOS-6557:
--

 Summary: IPC namespace isolator
 Key: MESOS-6557
 URL: https://issues.apache.org/jira/browse/MESOS-6557
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: James Peach
Assignee: James Peach


Add a {{namespace/ipc}} isolator for creating an IP namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6576) DefaultExecutorTest.KillTaskGroupOnTaskFailure sometimes fails in CI

2016-11-10 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6576:
---
Attachment: KillTaskGroupOnTaskFailure.success.log
KillTaskGroupOnTaskFailure.failure.log

> DefaultExecutorTest.KillTaskGroupOnTaskFailure sometimes fails in CI
> 
>
> Key: MESOS-6576
> URL: https://issues.apache.org/jira/browse/MESOS-6576
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: James Peach
> Attachments: KillTaskGroupOnTaskFailure.failure.log, 
> KillTaskGroupOnTaskFailure.success.log
>
>
> {{DefaultExecutorTest.KillTaskGroupOnTaskFailure}} sometimes fails in the ASF 
> CI.
> Interesting  pieces of the failing test run:
> {noformat}
> ...
> I1110 20:38:54.775871 29740 status_update_manager.cpp:323] Received status 
> update TASK_KILLED (UUID: a4746389-8155-44e0-ada4-00b8d3e997c1) for task 
> df99cc50-9b0f-4692-afc9-d587c3515a67 of framework 
> 2df0125f-4865-4aba-b13d-02f338815729-
> I1110 20:38:54.776181 29730 slave.cpp:4075] Status update manager 
> successfully handled status update TASK_KILLED (UUID: 
> a4746389-8155-44e0-ada4-00b8d3e997c1) for task 
> df99cc50-9b0f-4692-afc9-d587c3515a67 of framework 
> 2df0125f-4865-4aba-b13d-02f338815729-
> I1110 20:38:55.456354 29738 hierarchical.cpp:1880] Filtered offer with 
> cpus(*):1.7; mem(*):928; disk(*):928; ports(*):[31000-32000] on agent 
> 2df0125f-4865-4aba-b13d-02f338815729-S0 for framework 
> 2df0125f-4865-4aba-b13d-02f338815729-
> I1110 20:38:55.456434 29738 hierarchical.cpp:1694] No allocations performed
> I1110 20:38:55.456468 29738 hierarchical.cpp:1789] No inverse offers to send 
> out!
> I1110 20:38:55.456545 29738 hierarchical.cpp:1286] Performed allocation for 1 
> agents in 745185ns
> I1110 20:38:55.875964 29731 containerizer.cpp:2336] Container 
> a56ac08b-8f97-4ae4-a2e8-5ef5d55fbe98 has exited
> I1110 20:38:55.876022 29731 containerizer.cpp:1973] Destroying container 
> a56ac08b-8f97-4ae4-a2e8-5ef5d55fbe98 in RUNNING state
> I1110 20:38:55.876387 29731 launcher.cpp:143] Asked to destroy container 
> a56ac08b-8f97-4ae4-a2e8-5ef5d55fbe98
> I1110 20:38:55.881464 29728 provisioner.cpp:324] Ignoring destroy request for 
> unknown container a56ac08b-8f97-4ae4-a2e8-5ef5d55fbe98
> I1110 20:38:55.882894 29730 slave.cpp:4672] Executor 'default' of framework 
> 2df0125f-4865-4aba-b13d-02f338815729- exited with status 0
> I1110 20:38:55.883446 29741 master.cpp:5884] Executor 'default' of framework 
> 2df0125f-4865-4aba-b13d-02f338815729- on agent 
> 2df0125f-4865-4aba-b13d-02f338815729-S0 at slave(18)@172.17.0.2:36164 
> (ade222407ffe): exited with status 0
> I1110 20:38:55.883545 29741 master.cpp:7840] Removing executor 'default' with 
> resources cpus(*):0.1; mem(*):32; disk(*):32 of framework 
> 2df0125f-4865-4aba-b13d-02f338815729- on agent 
> 2df0125f-4865-4aba-b13d-02f338815729-S0 at slave(18)@172.17.0.2:36164 
> (ade222407ffe)
> I1110 20:38:55.884820 29729 hierarchical.cpp:1018] Recovered cpus(*):0.1; 
> mem(*):32; disk(*):32 (total: cpus(*):2; mem(*):1024; disk(*):1024; 
> ports(*):[31000-32000], allocated: cpus(*):0.2; mem(*):64; disk(*):64) on 
> agent 2df0125f-4865-4aba-b13d-02f338815729-S0 from framework 
> 2df0125f-4865-4aba-b13d-02f338815729-
> I1110 20:38:55.885892 29737 scheduler.cpp:675] Enqueuing event FAILURE 
> received from  href='http://172.17.0.2:36164/master/api/v1/scheduler'>http://172.17.0.2:36164/master/api/v1/scheduler
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: failure(0x7ffdc4df11f0, @0x2b639800b6b0 48-byte object 
> 90-82 AC-51 63-2B 00-00 00-00 00-00 00-00 00-00 07-00 00-00 00-00 00-00 
> 70-0A 01-98 63-2B 00-00 20-C7 00-98 63-2B 00-00 00-00 00-00 63-2B 00-00)
> ...
> I1110 20:39:04.566794 29732 master.cpp:7715] Updating the state of task 
> e72d5139-0a11-48af-9d43-d4163c1404ee of framework 
> 2df0125f-4865-4aba-b13d-02f338815729- (latest state: TASK_FAILED, status 
> update state: TASK_RUNNING)
> ...
> I1110 20:39:04.569413 29736 scheduler.cpp:675] Enqueuing event UPDATE 
> received from  href='http://172.17.0.2:36164/master/api/v1/scheduler'>http://172.17.0.2:36164/master/api/v1/scheduler
> ../../src/tests/default_executor_tests.cpp:583: Failure
> Value of: taskStates
>   Actual: { (df99cc50-9b0f-4692-afc9-d587c3515a67, TASK_KILLED), 
> (e72d5139-0a11-48af-9d43-d4163c1404ee, TASK_FAILED) }
> Expected: expectedTaskStates
> Which is: { (df99cc50-9b0f-4692-afc9-d587c3515a67, TASK_RUNNING), 
> (e72d5139-0a11-48af-9d43-d4163c1404ee, TASK_RUNNING) }
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6576) DefaultExecutorTest.KillTaskGroupOnTaskFailure sometimes fails in CI

2016-11-10 Thread James Peach (JIRA)
James Peach created MESOS-6576:
--

 Summary: DefaultExecutorTest.KillTaskGroupOnTaskFailure sometimes 
fails in CI
 Key: MESOS-6576
 URL: https://issues.apache.org/jira/browse/MESOS-6576
 Project: Mesos
  Issue Type: Bug
  Components: tests
Reporter: James Peach


{{DefaultExecutorTest.KillTaskGroupOnTaskFailure}} sometimes fails in the ASF 
CI.

Interesting  pieces of the failing test run:
{noformat}
...
I1110 20:38:54.775871 29740 status_update_manager.cpp:323] Received status 
update TASK_KILLED (UUID: a4746389-8155-44e0-ada4-00b8d3e997c1) for task 
df99cc50-9b0f-4692-afc9-d587c3515a67 of framework 
2df0125f-4865-4aba-b13d-02f338815729-
I1110 20:38:54.776181 29730 slave.cpp:4075] Status update manager successfully 
handled status update TASK_KILLED (UUID: a4746389-8155-44e0-ada4-00b8d3e997c1) 
for task df99cc50-9b0f-4692-afc9-d587c3515a67 of framework 
2df0125f-4865-4aba-b13d-02f338815729-
I1110 20:38:55.456354 29738 hierarchical.cpp:1880] Filtered offer with 
cpus(*):1.7; mem(*):928; disk(*):928; ports(*):[31000-32000] on agent 
2df0125f-4865-4aba-b13d-02f338815729-S0 for framework 
2df0125f-4865-4aba-b13d-02f338815729-
I1110 20:38:55.456434 29738 hierarchical.cpp:1694] No allocations performed
I1110 20:38:55.456468 29738 hierarchical.cpp:1789] No inverse offers to send 
out!
I1110 20:38:55.456545 29738 hierarchical.cpp:1286] Performed allocation for 1 
agents in 745185ns
I1110 20:38:55.875964 29731 containerizer.cpp:2336] Container 
a56ac08b-8f97-4ae4-a2e8-5ef5d55fbe98 has exited
I1110 20:38:55.876022 29731 containerizer.cpp:1973] Destroying container 
a56ac08b-8f97-4ae4-a2e8-5ef5d55fbe98 in RUNNING state
I1110 20:38:55.876387 29731 launcher.cpp:143] Asked to destroy container 
a56ac08b-8f97-4ae4-a2e8-5ef5d55fbe98
I1110 20:38:55.881464 29728 provisioner.cpp:324] Ignoring destroy request for 
unknown container a56ac08b-8f97-4ae4-a2e8-5ef5d55fbe98
I1110 20:38:55.882894 29730 slave.cpp:4672] Executor 'default' of framework 
2df0125f-4865-4aba-b13d-02f338815729- exited with status 0
I1110 20:38:55.883446 29741 master.cpp:5884] Executor 'default' of framework 
2df0125f-4865-4aba-b13d-02f338815729- on agent 
2df0125f-4865-4aba-b13d-02f338815729-S0 at slave(18)@172.17.0.2:36164 
(ade222407ffe): exited with status 0
I1110 20:38:55.883545 29741 master.cpp:7840] Removing executor 'default' with 
resources cpus(*):0.1; mem(*):32; disk(*):32 of framework 
2df0125f-4865-4aba-b13d-02f338815729- on agent 
2df0125f-4865-4aba-b13d-02f338815729-S0 at slave(18)@172.17.0.2:36164 
(ade222407ffe)
I1110 20:38:55.884820 29729 hierarchical.cpp:1018] Recovered cpus(*):0.1; 
mem(*):32; disk(*):32 (total: cpus(*):2; mem(*):1024; disk(*):1024; 
ports(*):[31000-32000], allocated: cpus(*):0.2; mem(*):64; disk(*):64) on agent 
2df0125f-4865-4aba-b13d-02f338815729-S0 from framework 
2df0125f-4865-4aba-b13d-02f338815729-
I1110 20:38:55.885892 29737 scheduler.cpp:675] Enqueuing event FAILURE received 
from http://172.17.0.2:36164/master/api/v1/scheduler

GMOCK WARNING:
Uninteresting mock function call - returning directly.
Function call: failure(0x7ffdc4df11f0, @0x2b639800b6b0 48-byte object 
90-82 AC-51 63-2B 00-00 00-00 00-00 00-00 00-00 07-00 00-00 00-00 00-00 
70-0A 01-98 63-2B 00-00 20-C7 00-98 63-2B 00-00 00-00 00-00 63-2B 00-00)
...
I1110 20:39:04.566794 29732 master.cpp:7715] Updating the state of task 
e72d5139-0a11-48af-9d43-d4163c1404ee of framework 
2df0125f-4865-4aba-b13d-02f338815729- (latest state: TASK_FAILED, status 
update state: TASK_RUNNING)
...
I1110 20:39:04.569413 29736 scheduler.cpp:675] Enqueuing event UPDATE received 
from http://172.17.0.2:36164/master/api/v1/scheduler
../../src/tests/default_executor_tests.cpp:583: Failure
Value of: taskStates
  Actual: { (df99cc50-9b0f-4692-afc9-d587c3515a67, TASK_KILLED), 
(e72d5139-0a11-48af-9d43-d4163c1404ee, TASK_FAILED) }
Expected: expectedTaskStates
Which is: { (df99cc50-9b0f-4692-afc9-d587c3515a67, TASK_RUNNING), 
(e72d5139-0a11-48af-9d43-d4163c1404ee, TASK_RUNNING) }
...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6412) Improve socket connect error message.

2016-10-18 Thread James Peach (JIRA)
James Peach created MESOS-6412:
--

 Summary: Improve socket connect error message.
 Key: MESOS-6412
 URL: https://issues.apache.org/jira/browse/MESOS-6412
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach
Assignee: James Peach


The error from {{Socket::connect}} just says it failed. Improve this to report 
the error (from {{errno}}) and the address we are trying to connect to. This 
gives the operator a fighting chance at debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4621) --disable-optimize triggers optimized builds.

2016-10-21 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595610#comment-15595610
 ] 

James Peach commented on MESOS-4621:


[~bbannier] pinged me on this ... if you like I can resurrect the patch set and 
break it into smaller reviews. ETA would be middle of next week.

> --disable-optimize triggers optimized builds.
> -
>
> Key: MESOS-4621
> URL: https://issues.apache.org/jira/browse/MESOS-4621
> Project: Mesos
>  Issue Type: Bug
>Reporter: Till Toenshoff
>Assignee: Yong Tang
>Priority: Minor
>
> The toggle-logic of the build configuration argument {{optimize}} appears to 
> be implemented incorrectly. When using the perfectly legal invocation;
> {noformat}
> ../configure --disable-optimize
> {noformat}
> What you get here is enabled optimizing {{O2}}.
> {noformat}
> ccache g++ -Qunused-arguments -fcolor-diagnostics 
> -DPACKAGE_NAME=\"libprocess\" -DPACKAGE_TARNAME=\"libprocess\" 
> -DPACKAGE_VERSION=\"0.0.1\" -DPACKAGE_STRING=\"libprocess\ 0.0.1\" 
> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"libprocess\" 
> -DVERSION=\"0.0.1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
> -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
> -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
> -DLT_OBJDIR=\".libs/\" -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBCURL=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
> -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBDL=1 -I. 
> -I../../../../3rdparty/libprocess/3rdparty  
> -I../../../../3rdparty/libprocess/3rdparty/stout/include -Iprotobuf-2.5.0/src 
>  -Igmock-1.7.0/gtest/include -Igmock-1.7.0/include -isystem boost-1.53.0 
> -Ipicojson-1.3.0 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -Iglog-0.3.3/src 
> -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
> -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 
> -I/usr/include/apr-1.0   -O2 -Wno-unused-local-typedef -std=c++11 
> -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT 
> stout_tests-flags_tests.o -MD -MP -MF .deps/stout_tests-flags_tests.Tpo -c -o 
> stout_tests-flags_tests.o `test -f 'stout/tests/flags_tests.cpp' || echo 
> '../../../../3rdparty/libprocess/3rdparty/'`stout/tests/flags_tests.cpp
> {noformat}
> It seems more straightforward to actually disable optimizing for the above 
> argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2537) AC_ARG_ENABLED checks are broken

2016-10-21 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595977#comment-15595977
 ] 

James Peach commented on MESOS-2537:


Would you prefer 1 larger patch, or 1 patch per option (will be ~30 patches)?

> AC_ARG_ENABLED checks are broken
> 
>
> Key: MESOS-2537
> URL: https://issues.apache.org/jira/browse/MESOS-2537
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> In a number of places, the Mesos configure script passes "$foo=yes" to the 
> 2nd argument of {{AC_ARG_ENABLED}}. However, the 2nd argument is invoked when 
> the option is provided in any form, not just when the {{\--enable-foo}} form 
> is used. One result of this is that {{\--disable-optimize}} doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2537) AC_ARG_ENABLED checks are broken

2016-10-24 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602294#comment-15602294
 ] 

James Peach commented on MESOS-2537:


|Review: | https://reviews.apache.org/r/53136 |
|Review: | https://reviews.apache.org/r/53137 |
|Review: | https://reviews.apache.org/r/53138 |

> AC_ARG_ENABLED checks are broken
> 
>
> Key: MESOS-2537
> URL: https://issues.apache.org/jira/browse/MESOS-2537
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> In a number of places, the Mesos configure script passes "$foo=yes" to the 
> 2nd argument of {{AC_ARG_ENABLED}}. However, the 2nd argument is invoked when 
> the option is provided in any form, not just when the {{\--enable-foo}} form 
> is used. One result of this is that {{\--disable-optimize}} doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6412) Improve socket connect error message.

2016-10-18 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586737#comment-15586737
 ] 

James Peach commented on MESOS-6412:


https://reviews.apache.org/r/52997/

> Improve socket connect error message.
> -
>
> Key: MESOS-6412
> URL: https://issues.apache.org/jira/browse/MESOS-6412
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>
> The error from {{Socket::connect}} just says it failed. Improve this to 
> report the error (from {{errno}}) and the address we are trying to connect 
> to. This gives the operator a fighting chance at debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6588) LinuxRootfs misses required files

2016-11-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667754#comment-15667754
 ] 

James Peach edited comment on MESOS-6588 at 11/15/16 5:37 PM:
--

|Move containerizer Rootfs support to a cpp file. 
|[https://reviews.apache.org/r/53790|https://reviews.apache.org/r/53790] |
|Use the stout ELF parser to collect Linux rootfs files. 
|[https://reviews.apache.org/r/53791|https://reviews.apache.org/r/53791] |


was (Author: jamespeach):
|Move containerizer Rootfs support to a cpp file. 
|https://reviews.apache.org/r/53790 |
|Use the stout ELF parser to collect Linux rootfs files. 
|https://reviews.apache.org/r/53791 |

> LinuxRootfs misses required files
> -
>
> Key: MESOS-6588
> URL: https://issues.apache.org/jira/browse/MESOS-6588
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, tests
>Reporter: James Peach
>Assignee: James Peach
>
> The hard-coded list of required files in 
> {{src/tests/containerizer/rootfs.hpp}} is out of date for Fedora 24. F24 now 
> requires {{libtinfo.so.6}} and {{/lib64/libcrypto.so.10}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6011) Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)

2016-11-18 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677559#comment-15677559
 ] 

James Peach commented on MESOS-6011:


Duplicating to MESOS-6588, where I have a review request along the lines of 
[~bbannier]'s suggestion.

> Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)
> --
>
> Key: MESOS-6011
> URL: https://issues.apache.org/jira/browse/MESOS-6011
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, tests
>Reporter: Jan Schlicht
>Assignee: Gilbert Song
>  Labels: test
>
> Tests like {{AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest}} and 
> {{ProvisionerDockerPullerTest.ROOT_LocalPullerSimpleCommand}} will fail with 
> Fedora 24 because {{LinuxRootFs::create}}, used in these tests, assumes that 
> the binaries provided by the rootfs link to certain versions of shared 
> libraries. Because Fedora 24 has newer versions of some of these libraries, 
> tests using the binaries will fail. E.g.
> {noformat}
> $ ldd /bin/sh
>   linux-vdso.so.1 (0x7ffc98bfb000)
>   libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7fcd59df6000)
>   libdl.so.2 => /lib64/libdl.so.2 (0x7fcd59bf2000)
>   libc.so.6 => /lib64/libc.so.6 (0x7fcd5982f000)
>   /lib64/ld-linux-x86-64.so.2 (0x55fb8e6ea000)
> {noformat}
> but {{LinuxRootFs::create}} will try to copy {{/lib64/libtinfo.so.5}} into 
> the rootfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-543) proper auto-tools dependency checking and packaging

2016-11-17 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675812#comment-15675812
 ] 

James Peach commented on MESOS-543:
---

[~opoplawski] I tried to tackle this problem in 
[r35084|https://reviews.apache.org/r/35084/]. What do you think about that 
approach? At the time, I was quite unsuccessful in getting a build against the 
Fedora versions of gtest/gmock/glog.

> proper auto-tools dependency checking and packaging
> ---
>
> Key: MESOS-543
> URL: https://issues.apache.org/jira/browse/MESOS-543
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.12.0
>Reporter: Timothy St. Clair
>  Labels: build
>
> Currently mesos repo includes direct version dependencies in tarballs and 
> does not do proper m4 system dependency checking by default && optional 
> --with(ver).  That is the standard practice for downstream channel adoption.  
> For more details see: 
> https://fedoraproject.org/wiki/Packaging:Guidelines?rd=Packaging/Guidelines
> Separate fedora related packaging is being driven through: 
> https://fedoraproject.org/wiki/Big_data_SIG (terrible catch all name) 
> ---
> stout breakout: (enable stout to build and rev as a independent package)
> repo: https://github.com/besser82/stout
> rpm: https://github.com/ignatenkobrain/stout-rpm
> pull request: https://github.com/3rdparty/stout/pull/4
> packaging BZ: https://bugzilla.redhat.com/show_bug.cgi?id=988545
> ---
> libprocess breakout: (enable libprocess to build and rev as a independent 
> package)
> repo: https://github.com/ignatenkobrain/libprocess 
> rpm: https://github.com/ignatenkobrain/libprocess-rpm
> pull request: https://github.com/3rdparty/libprocess/pull/4
> packaging BZ: https://bugzilla.redhat.com/show_bug.cgi?id=994152 
> ---
> mesos build cleaning: (remove all 3rd party cruft and do proper checks for 
> libprocess & stout) 
> repo: https://github.com/timothysc/mesos 
> rpm: https://github.com/timothysc/mesos-rpm 
> packaging BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1010512
> Please see: https://fedoraproject.org/wiki/Mesos_packaging for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5393) XFS disk isolator should disallow sandbox writes when no 'disk' is used in executor/task

2016-11-15 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-5393:
--

Assignee: James Peach

> XFS disk isolator should disallow sandbox writes when no 'disk' is used in 
> executor/task
> 
>
> Key: MESOS-5393
> URL: https://issues.apache.org/jira/browse/MESOS-5393
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: James Peach
>
> This is similar to MESOS-5081 and was left as a TODO in the first patch for 
> the XFS isolator.
> {noformat:title=}
> // TODO(jpeach) If there's no disk resource attached, we should set the
> // minimum quota (1 block), since a zero quota would be unconstrained.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2016-11-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668934#comment-15668934
 ] 

James Peach commented on MESOS-6575:


A significant benefit of the {{disk/xfs}} isolator is that it doesn't kill the 
task, so I'm not very supportive of this. I suppose that it could be 
implemented as an additional feature flag, but I'm not sure why you would want 
this. IMHO the behavior of the {{disk/du}} isolator is pretty undesirable.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: isolation, slave
>Reporter: Santhosh Kumar Shanmugham
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5116) Investigate supporting accounting only mode in XFS isolator

2016-11-15 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-5116:
--

Assignee: James Peach

> Investigate supporting accounting only mode in XFS isolator
> ---
>
> Key: MESOS-5116
> URL: https://issues.apache.org/jira/browse/MESOS-5116
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Yan Xu
>Assignee: James Peach
>
> The initial implementation of XFS isolator always enforces the disk quota 
> limit. In contrast, Posix disk isolator supports optionally monitoring the 
> disk usage without enforcement. This eases the transition into disk quota 
> enforcement mode.
> Mesos agent provides a {{flags.enforce_container_disk_quota}} flag to turn on 
> enforcement when the Posix isolator is added. With XFS either we support it 
> as well or we need to change the flag so it's Posix disk isolator specific.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5158) Provide XFS quota support for persistent volumes.

2016-11-15 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-5158:
--

Assignee: James Peach

> Provide XFS quota support for persistent volumes.
> -
>
> Key: MESOS-5158
> URL: https://issues.apache.org/jira/browse/MESOS-5158
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Yan Xu
>Assignee: James Peach
>
> Given that the lifecycle of persistent volumes is managed outside of the 
> isolator, we may need to further abstract out the quota management 
> functionality to do it outside the XFS isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6196) Make the disk/xfs isolator nesting aware.

2016-11-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668947#comment-15668947
 ] 

James Peach commented on MESOS-6196:


FWIW I don't think that there's anything to do here as long as

* the XFS quota is applied to the enclosing task group directory
* all sub-containers have their scratch space within that directory
* the sub-container disk resource is a subset of the task group disk resource

One reason for adding nesting support would be to restrict the disk resource of 
individual sub-containers. Not sure whether that is expected to be part of the 
nesting semantics.

> Make the disk/xfs isolator nesting aware.
> -
>
> Key: MESOS-6196
> URL: https://issues.apache.org/jira/browse/MESOS-6196
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6557) IPC namespace isolator

2016-11-14 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664897#comment-15664897
 ] 

James Peach edited comment on MESOS-6557 at 11/14/16 8:19 PM:
--

|Implement a namespace/ipc isolator. 
|[https://reviews.apache.org/r/53688/|https://reviews.apache.org/r/53688/] |
| Use a common fixture for the PID namespace test. | 
[https://reviews.apache.org/r/53689/|https://reviews.apache.org/r/53689/] |
|Add namespaces/ipc documentation. 
|[https://reviews.apache.org/r/53690/|https://reviews.apache.org/r/53690/] |


was (Author: jamespeach):
|Implement a namespace/ipc isolator. |https://reviews.apache.org/r/53688/ |
|Add namespaces/ipc documentation. |https://reviews.apache.org/r/53690/ |

> IPC namespace isolator
> --
>
> Key: MESOS-6557
> URL: https://issues.apache.org/jira/browse/MESOS-6557
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
>
> Add a {{namespace/ipc}} isolator for creating an IPC namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6588) LinuxRoots misses required files

2016-11-14 Thread James Peach (JIRA)
James Peach created MESOS-6588:
--

 Summary: LinuxRoots misses required files
 Key: MESOS-6588
 URL: https://issues.apache.org/jira/browse/MESOS-6588
 Project: Mesos
  Issue Type: Bug
  Components: containerization, tests
Reporter: James Peach


The hard-coded list of required files in {{src/tests/containerizer/rootfs.hpp}} 
is out of date for Fedora 24. F24 now requires {{libtinfo.so.6}} and 
{{/lib64/libcrypto.so.10}}.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6588) LinuxRootfs misses required files

2016-11-14 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6588:
---
Shepherd: Yan Xu

> LinuxRootfs misses required files
> -
>
> Key: MESOS-6588
> URL: https://issues.apache.org/jira/browse/MESOS-6588
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, tests
>Reporter: James Peach
>Assignee: James Peach
>
> The hard-coded list of required files in 
> {{src/tests/containerizer/rootfs.hpp}} is out of date for Fedora 24. F24 now 
> requires {{libtinfo.so.6}} and {{/lib64/libcrypto.so.10}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6520) Make errno an explicit argument for ErrnoError.

2016-10-31 Thread James Peach (JIRA)
James Peach created MESOS-6520:
--

 Summary: Make errno an explicit argument for ErrnoError.
 Key: MESOS-6520
 URL: https://issues.apache.org/jira/browse/MESOS-6520
 Project: Mesos
  Issue Type: Bug
  Components: technical debt
Reporter: James Peach
Priority: Minor


Make {{errno}} an explicit argument to {{ErrnoError}}. Right now, the 
constructor to {{ErrnoError}} references {{errno}} directly, which makes it 
awkward to pass a custom {{errno}} value (you have to set {{errno}} globally).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6520) Make errno an explicit argument for ErrnoError.

2016-11-03 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-6520:
--

Assignee: James Peach

> Make errno an explicit argument for ErrnoError.
> ---
>
> Key: MESOS-6520
> URL: https://issues.apache.org/jira/browse/MESOS-6520
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Make {{errno}} an explicit argument to {{ErrnoError}}. Right now, the 
> constructor to {{ErrnoError}} references {{errno}} directly, which makes it 
> awkward to pass a custom {{errno}} value (you have to set {{errno}} globally).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6520) Make errno an explicit argument for ErrnoError.

2016-11-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636698#comment-15636698
 ] 

James Peach commented on MESOS-6520:


| Support explicit error codes in ErrnoError and SocketError. | 
https://reviews.apache.org/r/53474/ |
| Use explicit error codes in ErrnoError and SocketError. | 
https://reviews.apache.org/r/53475/ |

> Make errno an explicit argument for ErrnoError.
> ---
>
> Key: MESOS-6520
> URL: https://issues.apache.org/jira/browse/MESOS-6520
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Make {{errno}} an explicit argument to {{ErrnoError}}. Right now, the 
> constructor to {{ErrnoError}} references {{errno}} directly, which makes it 
> awkward to pass a custom {{errno}} value (you have to set {{errno}} globally).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6385) Document how containerization works in terms of entering namespaces / setting up cgroups, etc.

2016-10-26 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6385:
---
Summary: Document how containerization works in terms of entering 
namespaces / setting up cgroups, etc.  (was: Document how containierization 
works in terms of entering namespaces / setting up cgroups, etc.)

> Document how containerization works in terms of entering namespaces / setting 
> up cgroups, etc.
> --
>
> Key: MESOS-6385
> URL: https://issues.apache.org/jira/browse/MESOS-6385
> Project: Mesos
>  Issue Type: Task
>  Components: containerization, documentation
>Reporter: Kevin Klues
>Priority: Minor
>
> There is currently alot of tribal knowledge in what it means to actually 
> setup a container and launch a process inside of it.  It would be nice to see 
> some documentation produced which outlines the exact process of launching a 
> new container, as well as the process involved in executing a new task inside 
> that container (or as a nested container, which shares some portion of the 
> container, but not all of it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6772) Stop building mesos-slave.

2016-12-09 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736652#comment-15736652
 ] 

James Peach commented on MESOS-6772:


Rather than building binaries for both {{mesos-agent}} and {{mesos-slave}}, 
just install a symlink from the latter to the former.

> Stop building mesos-slave.
> --
>
> Key: MESOS-6772
> URL: https://issues.apache.org/jira/browse/MESOS-6772
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6772) Stop building mesos-slave.

2016-12-09 Thread James Peach (JIRA)
James Peach created MESOS-6772:
--

 Summary: Stop building mesos-slave.
 Key: MESOS-6772
 URL: https://issues.apache.org/jira/browse/MESOS-6772
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach
Assignee: James Peach






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6732) XFS disk isolator should check whether quotas are enabled

2016-12-06 Thread James Peach (JIRA)
James Peach created MESOS-6732:
--

 Summary: XFS disk isolator should check whether quotas are enabled
 Key: MESOS-6732
 URL: https://issues.apache.org/jira/browse/MESOS-6732
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: James Peach
Assignee: James Peach


If quotas are not enabled, the XFS disk isolator doesn't notice until it fails 
to set quotas for a task. It would be better to fail at startup if we know that 
quotas are not enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6920) Validate the UUID in Master::statusUpdate.

2017-01-13 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822023#comment-15822023
 ] 

James Peach commented on MESOS-6920:


| Validate the StatusUpdate UUID in Master::statusUpdate. | 
https://reviews.apache.org/r/55509/ |

> Validate the UUID in Master::statusUpdate.
> --
>
> Key: MESOS-6920
> URL: https://issues.apache.org/jira/browse/MESOS-6920
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>
> Validate the UUID in Master::statusUpdate() to avoid the possibility of 
> triggering a CHECK when logging the {{StatusUpdate}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6652) Perf version not correctly parsed on Fedora 24 (and probably others)

2017-01-13 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6652:
---
Shepherd: Yan Xu

> Perf version not correctly parsed on Fedora 24 (and probably others)
> 
>
> Key: MESOS-6652
> URL: https://issues.apache.org/jira/browse/MESOS-6652
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
> Environment: Fedora 24
>Reporter: Jan Schlicht
>Assignee: James Peach
>Priority: Minor
>
> Happened on a current Fedora 24 machine, when trying to run tests.
> {noformat}
> $ perf --version
> perf version 4.8.10.200.fc24.x86_64.gc23c
> {noformat}
> doesn't seem to be parsed correctly by {{perf::supported()}}, because when 
> running {{./bin/mesos-tests.sh}} it reads
> {noformat}
> -
> Could not find the 'perf' command or its version lower that 2.6.39 so tests 
> using it to sample the 'cpu-cycles' hardware event will not be run.
> -
> -
> require 'perf' version >= 2.6.39 so no 'perf' tests will be run
> -
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6652) Perf version not correctly parsed on Fedora 24 (and probably others)

2017-01-13 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-6652:
--

Assignee: James Peach

> Perf version not correctly parsed on Fedora 24 (and probably others)
> 
>
> Key: MESOS-6652
> URL: https://issues.apache.org/jira/browse/MESOS-6652
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
> Environment: Fedora 24
>Reporter: Jan Schlicht
>Assignee: James Peach
>Priority: Minor
>
> Happened on a current Fedora 24 machine, when trying to run tests.
> {noformat}
> $ perf --version
> perf version 4.8.10.200.fc24.x86_64.gc23c
> {noformat}
> doesn't seem to be parsed correctly by {{perf::supported()}}, because when 
> running {{./bin/mesos-tests.sh}} it reads
> {noformat}
> -
> Could not find the 'perf' command or its version lower that 2.6.39 so tests 
> using it to sample the 'cpu-cycles' hardware event will not be run.
> -
> -
> require 'perf' version >= 2.6.39 so no 'perf' tests will be run
> -
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6652) Perf version not correctly parsed on Fedora 24 (and probably others)

2017-01-13 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822315#comment-15822315
 ] 

James Peach commented on MESOS-6652:


| Handle perf versions with more than 3 components. | 
https://reviews.apache.org/r/55521/ |

> Perf version not correctly parsed on Fedora 24 (and probably others)
> 
>
> Key: MESOS-6652
> URL: https://issues.apache.org/jira/browse/MESOS-6652
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
> Environment: Fedora 24
>Reporter: Jan Schlicht
>Priority: Minor
>
> Happened on a current Fedora 24 machine, when trying to run tests.
> {noformat}
> $ perf --version
> perf version 4.8.10.200.fc24.x86_64.gc23c
> {noformat}
> doesn't seem to be parsed correctly by {{perf::supported()}}, because when 
> running {{./bin/mesos-tests.sh}} it reads
> {noformat}
> -
> Could not find the 'perf' command or its version lower that 2.6.39 so tests 
> using it to sample the 'cpu-cycles' hardware event will not be run.
> -
> -
> require 'perf' version >= 2.6.39 so no 'perf' tests will be run
> -
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6920) Validate the UUID in Master::statusUpdate.

2017-01-13 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6920:
---
Shepherd: Yan Xu

> Validate the UUID in Master::statusUpdate.
> --
>
> Key: MESOS-6920
> URL: https://issues.apache.org/jira/browse/MESOS-6920
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>
> Validate the UUID in Master::statusUpdate() to avoid the possibility of 
> triggering a CHECK when logging the {{StatusUpdate}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6920) Validate the UUID in Master::statusUpdate.

2017-01-13 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-6920:
--

Assignee: James Peach

> Validate the UUID in Master::statusUpdate.
> --
>
> Key: MESOS-6920
> URL: https://issues.apache.org/jira/browse/MESOS-6920
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>
> Validate the UUID in Master::statusUpdate() to avoid the possibility of 
> triggering a CHECK when logging the {{StatusUpdate}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6920) Validate the UUID in Master::statusUpdate.

2017-01-13 Thread James Peach (JIRA)
James Peach created MESOS-6920:
--

 Summary: Validate the UUID in Master::statusUpdate.
 Key: MESOS-6920
 URL: https://issues.apache.org/jira/browse/MESOS-6920
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach


Validate the UUID in Master::statusUpdate() to avoid the possibility of 
triggering a CHECK when logging the {{StatusUpdate}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6918) Prometheus exporter endpoints for metrics

2017-01-13 Thread James Peach (JIRA)
James Peach created MESOS-6918:
--

 Summary: Prometheus exporter endpoints for metrics
 Key: MESOS-6918
 URL: https://issues.apache.org/jira/browse/MESOS-6918
 Project: Mesos
  Issue Type: Bug
  Components: statistics
Reporter: James Peach


There are a couple of [Prometheus|https://prometheus.io] metrics exporters for 
Mesos, of varying quality. Since the Mesos stats system actually knows about 
statistics data types and semantics, and Mesos has reasonable HTTP support we 
could add Prometheus metrics endpoints to directly expose statistics in 
[Prometheus wire 
format|https://prometheus.io/docs/instrumenting/exposition_formats/], removing 
the need for operators to run separate exporter processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6862) Replace os::system usages to reduce the risk of command injection.

2017-01-05 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6862:
---
Description: 
There are a number of places where {{os::system}} is used for convenience. To 
reduce the risk of command injection, we should replace most of these with 
{{subprocess}} or {{os::spawn}} and not execute them with the shell.

| posix/chown.hpp | {{os::chown}} | Replace with fts(3). |
|launcher/fetcher.cpp | {{extract()}} | Replace with {{subprocess}}. |
| launcher/fetcher.cpp | {{copyFile}} | Replace with {{subprocess}}. |
| linux/perf.cpp | {{valid()}} | Replace with {{subprocess}}. |
| cni/cni.cpp | {{NetworkCniIsolatorSetup::execute()}} | Not a problem, but 
should use {{subprocess}} for consistency. |
| -port_mapper/port_mapper.cpp- | -{{PortMapper::addPortMapping()}}- | -Replace 
with {{subprocess}}.- |
| -port_mapper/port_mapper.cpp- | -{{PortMapper::delPortMapping()}}- | -Replace 
with {{subprocess}}.- |

In the above table, read "replacement" as replacement with {{os::spawn}} or 
{{subprocess}} as appropriate.

  was:
There are a number of places where {{os::system}} is used for convenience. To 
reduce the risk of command injection, we should replace most of these with 
{{subprocess}} or {{os::spawn}} and not execute them with the shell.

| posix/chown.hpp | {{os::chown}} | Replace with fts(3). |
|launcher/fetcher.cpp | {{extract()}} | Replace with {{subprocess}}. |
| launcher/fetcher.cpp | {{copyFile}} | Replace with {{subprocess}}. |
| linux/perf.cpp | {{valid()}} | Replace with {{subprocess}}. |
| cni/cni.cpp | {{NetworkCniIsolatorSetup::execute()}} | Not a problem, but 
should use {{subprocess}} for consistency. |
| port_mapper/port_mapper.cpp | {{PortMapper::addPortMapping()}} | Replace with 
{{subprocess}}. |
| port_mapper/port_mapper.cpp | {{PortMapper::delPortMapping()}} | Replace with 
{{subprocess}}. |

In the above table, read "replacement" as replacement with {{os::spawn}} or 
{{subprocess}} as appropriate.


> Replace os::system usages to reduce the risk of command injection.
> --
>
> Key: MESOS-6862
> URL: https://issues.apache.org/jira/browse/MESOS-6862
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>
> There are a number of places where {{os::system}} is used for convenience. To 
> reduce the risk of command injection, we should replace most of these with 
> {{subprocess}} or {{os::spawn}} and not execute them with the shell.
> | posix/chown.hpp | {{os::chown}} | Replace with fts(3). |
> |launcher/fetcher.cpp | {{extract()}} | Replace with {{subprocess}}. |
> | launcher/fetcher.cpp | {{copyFile}} | Replace with {{subprocess}}. |
> | linux/perf.cpp | {{valid()}} | Replace with {{subprocess}}. |
> | cni/cni.cpp | {{NetworkCniIsolatorSetup::execute()}} | Not a problem, but 
> should use {{subprocess}} for consistency. |
> | -port_mapper/port_mapper.cpp- | -{{PortMapper::addPortMapping()}}- | 
> -Replace with {{subprocess}}.- |
> | -port_mapper/port_mapper.cpp- | -{{PortMapper::delPortMapping()}}- | 
> -Replace with {{subprocess}}.- |
> In the above table, read "replacement" as replacement with {{os::spawn}} or 
> {{subprocess}} as appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6862) Replace os::system usages to reduce the risk of command injection.

2017-01-05 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-6862:
--

Assignee: James Peach

> Replace os::system usages to reduce the risk of command injection.
> --
>
> Key: MESOS-6862
> URL: https://issues.apache.org/jira/browse/MESOS-6862
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>
> There are a number of places where {{os::system}} is used for convenience. To 
> reduce the risk of command injection, we should replace most of these with 
> {{subprocess}} or {{os::spawn}} and not execute them with the shell.
> | posix/chown.hpp | {{os::chown}} | Replace with fts(3). |
> |launcher/fetcher.cpp | {{extract()}} | Replace with {{subprocess}}. |
> | launcher/fetcher.cpp | {{copyFile}} | Replace with {{subprocess}}. |
> | linux/perf.cpp | {{valid()}} | Replace with {{subprocess}}. |
> | cni/cni.cpp | {{NetworkCniIsolatorSetup::execute()}} | Not a problem, but 
> should use {{subprocess}} for consistency. |
> | port_mapper/port_mapper.cpp | {{PortMapper::addPortMapping()}} | Replace 
> with {{subprocess}}. |
> | port_mapper/port_mapper.cpp | {{PortMapper::delPortMapping()}} | Replace 
> with {{subprocess}}. |
> In the above table, read "replacement" as replacement with {{os::spawn}} or 
> {{subprocess} as appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6556) Hostname support for the network/cni isolator.

2017-01-05 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6556:
---
Description: 
-Add a {{namespace/uts}} isolator for doing UTS namespace isolation without 
using the CNI isolator.-

Update the {{network/cni}} isolator to set the hostname specified by the task 
info.

  was:Add a {{namespace/uts}} isolator for doing UTS namespace isolation 
without using the CNI isolator.


> Hostname support for the network/cni isolator.
> --
>
> Key: MESOS-6556
> URL: https://issues.apache.org/jira/browse/MESOS-6556
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> -Add a {{namespace/uts}} isolator for doing UTS namespace isolation without 
> using the CNI isolator.-
> Update the {{network/cni}} isolator to set the hostname specified by the task 
> info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6556) Hostname support for the network/cni isolator.

2017-01-05 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6556:
---
Summary: Hostname support for the network/cni isolator.  (was: UTS 
namespace isolator)

> Hostname support for the network/cni isolator.
> --
>
> Key: MESOS-6556
> URL: https://issues.apache.org/jira/browse/MESOS-6556
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Add a {{namespace/uts}} isolator for doing UTS namespace isolation without 
> using the CNI isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6858) network/cni isolator generates incomplete resolv.conf

2017-01-05 Thread James Peach (JIRA)
James Peach created MESOS-6858:
--

 Summary: network/cni isolator generates incomplete resolv.conf
 Key: MESOS-6858
 URL: https://issues.apache.org/jira/browse/MESOS-6858
 Project: Mesos
  Issue Type: Bug
  Components: isolation, network
Reporter: James Peach


The CNI [network 
configuration|https://github.com/containernetworking/cni/blob/master/SPEC.md#network-configuration]
 dictionary contains entries for the {{/etc/resolv.conf}}, {{nameservers}}, 
{{domain}}, {{search}} and {{options}} fields.

In {{NetworkCniIsolatorProcess::_isolate()}}, the {{network/cni}} isolator only 
emits the {{nameservers}} and ignores the remaining fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6862) Replace os::system usages to reduce the risk of command injection.

2017-01-05 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6862:
---
Description: 
There are a number of places where {{os::system}} is used for convenience. To 
reduce the risk of command injection, we should replace most of these with 
{{subprocess}} or {{os::spawn}} and not execute them with the shell.

| posix/chown.hpp | {{os::chown}} | Replace with fts(3). |
|launcher/fetcher.cpp | {{extract()}} | Replace with {{subprocess}}. |
| launcher/fetcher.cpp | {{copyFile}} | Replace with {{subprocess}}. |
| linux/perf.cpp | {{valid()}} | Replace with {{subprocess}}. |
| cni/cni.cpp | {{NetworkCniIsolatorSetup::execute()}} | Not a problem, but 
should use {{subprocess}} for consistency. |
| port_mapper/port_mapper.cpp | {{PortMapper::addPortMapping()}} | Replace with 
{{subprocess}}. |
| port_mapper/port_mapper.cpp | {{PortMapper::delPortMapping()}} | Replace with 
{{subprocess}}. |

In the above table, read "replacement" as replacement with {{os::spawn}} or 
{{subprocess}} as appropriate.

  was:
There are a number of places where {{os::system}} is used for convenience. To 
reduce the risk of command injection, we should replace most of these with 
{{subprocess}} or {{os::spawn}} and not execute them with the shell.

| posix/chown.hpp | {{os::chown}} | Replace with fts(3). |
|launcher/fetcher.cpp | {{extract()}} | Replace with {{subprocess}}. |
| launcher/fetcher.cpp | {{copyFile}} | Replace with {{subprocess}}. |
| linux/perf.cpp | {{valid()}} | Replace with {{subprocess}}. |
| cni/cni.cpp | {{NetworkCniIsolatorSetup::execute()}} | Not a problem, but 
should use {{subprocess}} for consistency. |
| port_mapper/port_mapper.cpp | {{PortMapper::addPortMapping()}} | Replace with 
{{subprocess}}. |
| port_mapper/port_mapper.cpp | {{PortMapper::delPortMapping()}} | Replace with 
{{subprocess}}. |

In the above table, read "replacement" as replacement with {{os::spawn}} or 
{{subprocess} as appropriate.


> Replace os::system usages to reduce the risk of command injection.
> --
>
> Key: MESOS-6862
> URL: https://issues.apache.org/jira/browse/MESOS-6862
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>
> There are a number of places where {{os::system}} is used for convenience. To 
> reduce the risk of command injection, we should replace most of these with 
> {{subprocess}} or {{os::spawn}} and not execute them with the shell.
> | posix/chown.hpp | {{os::chown}} | Replace with fts(3). |
> |launcher/fetcher.cpp | {{extract()}} | Replace with {{subprocess}}. |
> | launcher/fetcher.cpp | {{copyFile}} | Replace with {{subprocess}}. |
> | linux/perf.cpp | {{valid()}} | Replace with {{subprocess}}. |
> | cni/cni.cpp | {{NetworkCniIsolatorSetup::execute()}} | Not a problem, but 
> should use {{subprocess}} for consistency. |
> | port_mapper/port_mapper.cpp | {{PortMapper::addPortMapping()}} | Replace 
> with {{subprocess}}. |
> | port_mapper/port_mapper.cpp | {{PortMapper::delPortMapping()}} | Replace 
> with {{subprocess}}. |
> In the above table, read "replacement" as replacement with {{os::spawn}} or 
> {{subprocess}} as appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6862) Replace os::system usages to reduce the risk of command injection.

2017-01-05 Thread James Peach (JIRA)
James Peach created MESOS-6862:
--

 Summary: Replace os::system usages to reduce the risk of command 
injection.
 Key: MESOS-6862
 URL: https://issues.apache.org/jira/browse/MESOS-6862
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach


There are a number of places where {{os::system}} is used for convenience. To 
reduce the risk of command injection, we should replace most of these with 
{{subprocess}} or {{os::spawn}} and not execute them with the shell.

| posix/chown.hpp | {{os::chown}} | Replace with fts(3). |
|launcher/fetcher.cpp | {{extract()}} | Replace with {{subprocess}}. |
| launcher/fetcher.cpp | {{copyFile}} | Replace with {{subprocess}}. |
| linux/perf.cpp | {{valid()}} | Replace with {{subprocess}}. |
| cni/cni.cpp | {{NetworkCniIsolatorSetup::execute()}} | Not a problem, but 
should use {{subprocess}} for consistency. |
| port_mapper/port_mapper.cpp | {{PortMapper::addPortMapping()}} | Replace with 
{{subprocess}}. |
| port_mapper/port_mapper.cpp | {{PortMapper::delPortMapping()}} | Replace with 
{{subprocess}}. |

In the above table, read "replacement" as replacement with {{os::spawn}} or 
{{subprocess} as appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6867) perf version check is broken

2017-01-05 Thread James Peach (JIRA)
James Peach created MESOS-6867:
--

 Summary: perf version check is broken
 Key: MESOS-6867
 URL: https://issues.apache.org/jira/browse/MESOS-6867
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach


On Fedora 25, you can't use the {{perf}} features since the version check is 
broken.

{noformat}
[jpeach@jpeach mesos]$ perf --version
perf version 4.8.15.300.fc25.x86_64.gd7d45
...
Thread 1 "mesos-tests" hit Breakpoint 4, perf::supported () at 
../../src/linux/perf.cpp:225
(gdb) n
227   LOG(ERROR) << "Failed to get perf version: " << version.failure();
(gdb) p version.failure()
$2 = "Version string has 7 components; maximum 3 components allowed"
{noformat}

Looks like the {{Version}} parser erroneously constrains the version string to 
3 components.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6862) Replace os::system usages to reduce the risk of command injection.

2017-01-05 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-6862:
---
Shepherd: Yan Xu

> Replace os::system usages to reduce the risk of command injection.
> --
>
> Key: MESOS-6862
> URL: https://issues.apache.org/jira/browse/MESOS-6862
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>
> There are a number of places where {{os::system}} is used for convenience. To 
> reduce the risk of command injection, we should replace most of these with 
> {{subprocess}} or {{os::spawn}} and not execute them with the shell.
> | posix/chown.hpp | {{os::chown}} | Replace with fts(3). |
> |launcher/fetcher.cpp | {{extract()}} | Replace with {{subprocess}}. |
> | launcher/fetcher.cpp | {{copyFile}} | Replace with {{subprocess}}. |
> | linux/perf.cpp | {{valid()}} | Replace with {{subprocess}}. |
> | cni/cni.cpp | {{NetworkCniIsolatorSetup::execute()}} | Not a problem, but 
> should use {{subprocess}} for consistency. |
> | -port_mapper/port_mapper.cpp- | -{{PortMapper::addPortMapping()}}- | 
> -Replace with {{subprocess}}.- |
> | -port_mapper/port_mapper.cpp- | -{{PortMapper::delPortMapping()}}- | 
> -Replace with {{subprocess}}.- |
> In the above table, read "replacement" as replacement with {{os::spawn}} or 
> {{subprocess}} as appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6862) Replace os::system usages to reduce the risk of command injection.

2017-01-05 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803114#comment-15803114
 ] 

James Peach commented on MESOS-6862:


| Use os::spawn in the CNI isolator. | https://reviews.apache.org/r/55238/ |
| Stop using os::system to validate perf event names. | 
https://reviews.apache.org/r/55241/ |
| Stop using os::system to extract fetcher archives. | 
https://reviews.apache.org/r/55239/ | 
| Stop using os::system to copy local files. | 
https://reviews.apache.org/r/55240/ |
| Stop using os::system to chown a directory hierarchy. | 
https://reviews.apache.org/r/55242/ |

> Replace os::system usages to reduce the risk of command injection.
> --
>
> Key: MESOS-6862
> URL: https://issues.apache.org/jira/browse/MESOS-6862
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>
> There are a number of places where {{os::system}} is used for convenience. To 
> reduce the risk of command injection, we should replace most of these with 
> {{subprocess}} or {{os::spawn}} and not execute them with the shell.
> | posix/chown.hpp | {{os::chown}} | Replace with fts(3). |
> |launcher/fetcher.cpp | {{extract()}} | Replace with {{subprocess}}. |
> | launcher/fetcher.cpp | {{copyFile}} | Replace with {{subprocess}}. |
> | linux/perf.cpp | {{valid()}} | Replace with {{subprocess}}. |
> | cni/cni.cpp | {{NetworkCniIsolatorSetup::execute()}} | Not a problem, but 
> should use {{subprocess}} for consistency. |
> | -port_mapper/port_mapper.cpp- | -{{PortMapper::addPortMapping()}}- | 
> -Replace with {{subprocess}}.- |
> | -port_mapper/port_mapper.cpp- | -{{PortMapper::delPortMapping()}}- | 
> -Replace with {{subprocess}}.- |
> In the above table, read "replacement" as replacement with {{os::spawn}} or 
> {{subprocess}} as appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6723) Mesos fails to link using gold linker

2017-01-03 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795716#comment-15795716
 ] 

James Peach commented on MESOS-6723:


FWIW I'm using {{-fuse-ld=gold}} successfully on Fedora 25 without any 
{{Makefile}} patches.

> Mesos fails to link using gold linker
> -
>
> Key: MESOS-6723
> URL: https://issues.apache.org/jira/browse/MESOS-6723
> Project: Mesos
>  Issue Type: Bug
>  Components: build
> Environment: Arch Linux, amd64, GNU gold (GNU Binutils 2.27) 1.12
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> Configure flags:
> {noformat}
> ../mesos/configure --disable-java --disable-python CC="ccache gcc" 
> CXX="ccache g++" CXXFLAGS=-fuse-ld=gold CFLAGS=-fuse-ld=gold
> {noformat}
> Compile output:
> {noformat}
> /bin/sh ../libtool  --tag=CXX   --mode=link ccache g++ -pthread -fuse-ld=gold 
> -Wno-unused-local-typedefs -std=c++11 -Wl,--as-needed  -o mesos-local 
> local/mesos_local-main.o libmesos.la -lz -lsvn_delta-1 -lsvn_subr-1 
> -lsasl2 -lcurl -lapr-1 -lz  -lrt -lunwind
> libtool: link: ccache g++ -pthread -fuse-ld=gold -Wno-unused-local-typedefs 
> -std=c++11 -Wl,--as-needed -o .libs/mesos-local local/mesos_local-main.o  
> ./.libs/libmesos.so -lpthread -lsvn_delta-1 -lsvn_subr-1 -lsasl2 -lcurl 
> -lapr-1 -lz -lrt -lunwind -pthread -Wl,-rpath -Wl,/usr/local/lib
> ./.libs/libmesos.so: error: undefined reference to 'dlerror'
> ./.libs/libmesos.so: error: undefined reference to 'dlclose'
> ./.libs/libmesos.so: error: undefined reference to 'dlopen'
> ./.libs/libmesos.so: error: undefined reference to 'dlsym'
> collect2: error: ld returned 1 exit status
> make[2]: *** [Makefile:5139: mesos-local] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6556) UTS namespace isolator

2017-01-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799841#comment-15799841
 ] 

James Peach commented on MESOS-6556:


Abandoned the reviews above.

> UTS namespace isolator
> --
>
> Key: MESOS-6556
> URL: https://issues.apache.org/jira/browse/MESOS-6556
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Add a {{namespace/uts}} isolator for doing UTS namespace isolation without 
> using the CNI isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6556) UTS namespace isolator

2017-01-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799862#comment-15799862
 ] 

James Peach commented on MESOS-6556:


| Add hostname support to the network/cni isolator. | 
https://reviews.apache.org/r/55191/ |

> UTS namespace isolator
> --
>
> Key: MESOS-6556
> URL: https://issues.apache.org/jira/browse/MESOS-6556
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Add a {{namespace/uts}} isolator for doing UTS namespace isolation without 
> using the CNI isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6851) make install fails the second time

2017-01-04 Thread James Peach (JIRA)
James Peach created MESOS-6851:
--

 Summary: make install fails the second time
 Key: MESOS-6851
 URL: https://issues.apache.org/jira/browse/MESOS-6851
 Project: Mesos
  Issue Type: Bug
  Components: build
Reporter: James Peach


Run {{make install}} twice and the second time will fail when it tries to 
overwrite symlinks:

{code}
make[4]: Entering directory '/home/jpeach/upstream/mesos/build/src'
cd //opt/mesos/etc/mesos && \
  ln -s mesos-agent-env.sh.template mesos-slave-env.sh.template
ln: failed to create symbolic link 'mesos-slave-env.sh.template': File exists
Makefile:12952: recipe for target 'copy-template-and-create-symlink' failed
make[4]: *** [copy-template-and-create-symlink] Error 1
make[4]: Leaving directory '/home/jpeach/upstream/mesos/build/src'
Makefile:12487: recipe for target 'install-data-am' failed
make[3]: *** [install-data-am] Error 2
make[3]: Leaving directory '/home/jpeach/upstream/mesos/build/src'
Makefile:12197: recipe for target 'install-am' failed
make[2]: *** [install-am] Error 2
make[2]: Leaving directory '/home/jpeach/upstream/mesos/build/src'
Makefile:12191: recipe for target 'install' failed
make[1]: *** [install] Error 2
make[1]: Leaving directory '/home/jpeach/upstream/mesos/build/src'
Makefile:764: recipe for target 'install-recursive' failed
make: *** [install-recursive] Error 1
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6851) make install fails the second time

2017-01-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798713#comment-15798713
 ] 

James Peach commented on MESOS-6851:


/cc [~bbannier]

> make install fails the second time
> --
>
> Key: MESOS-6851
> URL: https://issues.apache.org/jira/browse/MESOS-6851
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: James Peach
>
> Run {{make install}} twice and the second time will fail when it tries to 
> overwrite symlinks:
> {code}
> make[4]: Entering directory '/home/jpeach/upstream/mesos/build/src'
> cd //opt/mesos/etc/mesos && \
>   ln -s mesos-agent-env.sh.template mesos-slave-env.sh.template
> ln: failed to create symbolic link 'mesos-slave-env.sh.template': File exists
> Makefile:12952: recipe for target 'copy-template-and-create-symlink' failed
> make[4]: *** [copy-template-and-create-symlink] Error 1
> make[4]: Leaving directory '/home/jpeach/upstream/mesos/build/src'
> Makefile:12487: recipe for target 'install-data-am' failed
> make[3]: *** [install-data-am] Error 2
> make[3]: Leaving directory '/home/jpeach/upstream/mesos/build/src'
> Makefile:12197: recipe for target 'install-am' failed
> make[2]: *** [install-am] Error 2
> make[2]: Leaving directory '/home/jpeach/upstream/mesos/build/src'
> Makefile:12191: recipe for target 'install' failed
> make[1]: *** [install] Error 2
> make[1]: Leaving directory '/home/jpeach/upstream/mesos/build/src'
> Makefile:764: recipe for target 'install-recursive' failed
> make: *** [install-recursive] Error 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-7340) Log HTTP accesses to the /files endpoint

2017-04-04 Thread James Peach (JIRA)
James Peach created MESOS-7340:
--

 Summary: Log HTTP accesses to the /files endpoint
 Key: MESOS-7340
 URL: https://issues.apache.org/jira/browse/MESOS-7340
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Reporter: James Peach
Assignee: James Peach
Priority: Minor


The Mesos master and agent log HTTP accesses, but the {{Files}} process does 
not. We should log accessed to the {{/files}} endpoint in the same way.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7340) Log HTTP accesses to the /files endpoint

2017-04-04 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-7340:
---
Shepherd: Anand Mazumdar

> Log HTTP accesses to the /files endpoint
> 
>
> Key: MESOS-7340
> URL: https://issues.apache.org/jira/browse/MESOS-7340
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> The Mesos master and agent log HTTP accesses, but the {{Files}} process does 
> not. We should log accessed to the {{/files}} endpoint in the same way.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


<    1   2   3   4   5   6   7   8   >