[jira] [Commented] (MESOS-1352) Uninitialized scalar field in usage/main.cpp

2014-09-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129633#comment-14129633
 ] 

Kamil Domański commented on MESOS-1352:
---

False positive.
The bool is initialized from a command line parameter with a default false 
value.

> Uninitialized scalar field in usage/main.cpp
> 
>
> Key: MESOS-1352
> URL: https://issues.apache.org/jira/browse/MESOS-1352
> Project: Mesos
>  Issue Type: Bug
>Reporter: Niklas Quarfot Nielsen
>  Labels: coverity
>
> 
> *** CID 1213899:  Uninitialized scalar field  (UNINIT_CTOR)
> /src/usage/main.cpp: 56 in Flags::Flags()()
> 50 "Whether or not to output ResourceStatistics protobuf\n"
> 51 "using the \"recordio\" format, i.e., the size as a \n"
> 52 "4 byte unsigned integer followed by the serialized\n"
> 53 "protobuf itself. By default the ResourceStatistics\n"
> 54 "will be output as JSON",
> 55 false);
> >>> CID 1213899:  Uninitialized scalar field  (UNINIT_CTOR)
> >>> Non-static class member "recordio" is not initialized in this 
> >>> constructor nor in any functions that it calls.
> 56   }
> 57
> 58   Option pid;
> 59   bool recordio;
> 60 };
> 61



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1786) FaultToleranceTest.ReconcilePendingTasks is flaky.

2014-09-10 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1786:
---
Sprint: Mesos Q3 Sprint 5

> FaultToleranceTest.ReconcilePendingTasks is flaky.
> --
>
> Key: MESOS-1786
> URL: https://issues.apache.org/jira/browse/MESOS-1786
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>
> {noformat}
> [ RUN  ] FaultToleranceTest.ReconcilePendingTasks
> Using temporary directory 
> '/tmp/FaultToleranceTest_ReconcilePendingTasks_TwmFlm'
> I0910 20:18:02.308562 21634 leveldb.cpp:176] Opened db in 28.520372ms
> I0910 20:18:02.315268 21634 leveldb.cpp:183] Compacted db in 6.37495ms
> I0910 20:18:02.315588 21634 leveldb.cpp:198] Created db iterator in 6338ns
> I0910 20:18:02.315745 21634 leveldb.cpp:204] Seeked to beginning of db in 
> 1781ns
> I0910 20:18:02.315901 21634 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 537ns
> I0910 20:18:02.316076 21634 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0910 20:18:02.316524 21654 recover.cpp:425] Starting replica recovery
> I0910 20:18:02.316800 21654 recover.cpp:451] Replica is in EMPTY status
> I0910 20:18:02.317245 21654 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0910 20:18:02.317445 21654 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0910 20:18:02.317672 21654 recover.cpp:542] Updating replica status to 
> STARTING
> I0910 20:18:02.321723 21652 master.cpp:286] Master 
> 20140910-201802-16842879-60361-21634 (precise) started on 127.0.1.1:60361
> I0910 20:18:02.322041 21652 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0910 20:18:02.322320 21652 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0910 20:18:02.322568 21652 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/FaultToleranceTest_ReconcilePendingTasks_TwmFlm/credentials'
> I0910 20:18:02.323031 21652 master.cpp:366] Authorization enabled
> I0910 20:18:02.323663 21654 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 5.781277ms
> I0910 20:18:02.324074 21654 replica.cpp:320] Persisted replica status to 
> STARTING
> I0910 20:18:02.324443 21654 recover.cpp:451] Replica is in STARTING status
> I0910 20:18:02.325106 21654 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0910 20:18:02.325454 21654 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0910 20:18:02.326408 21654 recover.cpp:542] Updating replica status to VOTING
> I0910 20:18:02.323892 21649 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@127.0.1.1:60361
> I0910 20:18:02.326120 21652 master.cpp:1212] The newly elected leader is 
> master@127.0.1.1:60361 with id 20140910-201802-16842879-60361-21634
> I0910 20:18:02.323938 21651 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0910 20:18:04.209081 21655 hierarchical_allocator_process.hpp:697] No 
> resources available to allocate!
> I0910 20:18:04.209183 21655 hierarchical_allocator_process.hpp:659] Performed 
> allocation for 0 slaves in 118308ns
> I0910 20:18:04.209230 21652 master.cpp:1225] Elected as the leading master!
> I0910 20:18:04.209246 21652 master.cpp:1043] Recovering from registrar
> I0910 20:18:04.209360 21650 registrar.cpp:313] Recovering registrar
> I0910 20:18:04.214040 21654 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.887284299secs
> I0910 20:18:04.214094 21654 replica.cpp:320] Persisted replica status to 
> VOTING
> I0910 20:18:04.214190 21654 recover.cpp:556] Successfully joined the Paxos 
> group
> I0910 20:18:04.214258 21654 recover.cpp:440] Recover process terminated
> I0910 20:18:04.214437 21654 log.cpp:656] Attempting to start the writer
> I0910 20:18:04.214756 21654 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0910 20:18:04.223865 21654 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 9.044596ms
> I0910 20:18:04.223944 21654 replica.cpp:342] Persisted promised to 1
> I0910 20:18:04.229053 21652 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0910 20:18:04.229552 21652 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0910 20:18:04.248437 2

[jira] [Created] (MESOS-1786) FaultToleranceTest.ReconcilePendingTasks is flaky.

2014-09-10 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-1786:
--

 Summary: FaultToleranceTest.ReconcilePendingTasks is flaky.
 Key: MESOS-1786
 URL: https://issues.apache.org/jira/browse/MESOS-1786
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler


{noformat}
[ RUN  ] FaultToleranceTest.ReconcilePendingTasks
Using temporary directory '/tmp/FaultToleranceTest_ReconcilePendingTasks_TwmFlm'
I0910 20:18:02.308562 21634 leveldb.cpp:176] Opened db in 28.520372ms
I0910 20:18:02.315268 21634 leveldb.cpp:183] Compacted db in 6.37495ms
I0910 20:18:02.315588 21634 leveldb.cpp:198] Created db iterator in 6338ns
I0910 20:18:02.315745 21634 leveldb.cpp:204] Seeked to beginning of db in 1781ns
I0910 20:18:02.315901 21634 leveldb.cpp:273] Iterated through 0 keys in the db 
in 537ns
I0910 20:18:02.316076 21634 replica.cpp:741] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0910 20:18:02.316524 21654 recover.cpp:425] Starting replica recovery
I0910 20:18:02.316800 21654 recover.cpp:451] Replica is in EMPTY status
I0910 20:18:02.317245 21654 replica.cpp:638] Replica in EMPTY status received a 
broadcasted recover request
I0910 20:18:02.317445 21654 recover.cpp:188] Received a recover response from a 
replica in EMPTY status
I0910 20:18:02.317672 21654 recover.cpp:542] Updating replica status to STARTING
I0910 20:18:02.321723 21652 master.cpp:286] Master 
20140910-201802-16842879-60361-21634 (precise) started on 127.0.1.1:60361
I0910 20:18:02.322041 21652 master.cpp:332] Master only allowing authenticated 
frameworks to register
I0910 20:18:02.322320 21652 master.cpp:337] Master only allowing authenticated 
slaves to register
I0910 20:18:02.322568 21652 credentials.hpp:36] Loading credentials for 
authentication from 
'/tmp/FaultToleranceTest_ReconcilePendingTasks_TwmFlm/credentials'
I0910 20:18:02.323031 21652 master.cpp:366] Authorization enabled
I0910 20:18:02.323663 21654 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 5.781277ms
I0910 20:18:02.324074 21654 replica.cpp:320] Persisted replica status to 
STARTING
I0910 20:18:02.324443 21654 recover.cpp:451] Replica is in STARTING status
I0910 20:18:02.325106 21654 replica.cpp:638] Replica in STARTING status 
received a broadcasted recover request
I0910 20:18:02.325454 21654 recover.cpp:188] Received a recover response from a 
replica in STARTING status
I0910 20:18:02.326408 21654 recover.cpp:542] Updating replica status to VOTING
I0910 20:18:02.323892 21649 hierarchical_allocator_process.hpp:299] 
Initializing hierarchical allocator process with master : master@127.0.1.1:60361
I0910 20:18:02.326120 21652 master.cpp:1212] The newly elected leader is 
master@127.0.1.1:60361 with id 20140910-201802-16842879-60361-21634
I0910 20:18:02.323938 21651 master.cpp:120] No whitelist given. Advertising 
offers for all slaves
I0910 20:18:04.209081 21655 hierarchical_allocator_process.hpp:697] No 
resources available to allocate!
I0910 20:18:04.209183 21655 hierarchical_allocator_process.hpp:659] Performed 
allocation for 0 slaves in 118308ns
I0910 20:18:04.209230 21652 master.cpp:1225] Elected as the leading master!
I0910 20:18:04.209246 21652 master.cpp:1043] Recovering from registrar
I0910 20:18:04.209360 21650 registrar.cpp:313] Recovering registrar
I0910 20:18:04.214040 21654 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 1.887284299secs
I0910 20:18:04.214094 21654 replica.cpp:320] Persisted replica status to VOTING
I0910 20:18:04.214190 21654 recover.cpp:556] Successfully joined the Paxos group
I0910 20:18:04.214258 21654 recover.cpp:440] Recover process terminated
I0910 20:18:04.214437 21654 log.cpp:656] Attempting to start the writer
I0910 20:18:04.214756 21654 replica.cpp:474] Replica received implicit promise 
request with proposal 1
I0910 20:18:04.223865 21654 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 9.044596ms
I0910 20:18:04.223944 21654 replica.cpp:342] Persisted promised to 1
I0910 20:18:04.229053 21652 coordinator.cpp:230] Coordinator attemping to fill 
missing position
I0910 20:18:04.229552 21652 replica.cpp:375] Replica received explicit promise 
request for position 0 with proposal 2
I0910 20:18:04.248437 21652 leveldb.cpp:343] Persisting action (8 bytes) to 
leveldb took 18.839475ms
I0910 20:18:04.248525 21652 replica.cpp:676] Persisted action at 0
I0910 20:18:04.251194 21650 replica.cpp:508] Replica received write request for 
position 0
I0910 20:18:04.251260 21650 leveldb.cpp:438] Reading position from leveldb took 
43213ns
I0910 20:18:04.262251 21650 leveldb.cpp:343] Persisting action (14 bytes) to 
leveldb took 10.949353ms
I0910 20:18:04.262346 21650 replica.cpp:676] Persisted action at 0
I0910 20:18:04.262717 21650 replica.cpp:655] Replica received learned notice 
for position 0
I091

[jira] [Resolved] (MESOS-1779) Mesos style checker should catch trailing white space

2014-09-10 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone resolved MESOS-1779.
---
Resolution: Fixed
  Assignee: Kamil Domański

commit 60dd10cfb8aef1ae661d17bd3d28f0596b731ff3
Author: Kamil Domanski 
Date:   Wed Sep 10 21:20:03 2014 -0700

Added support for catching trailing white spaces in the style checker.

Review: https://reviews.apache.org/r/25526


> Mesos style checker should catch trailing white space
> -
>
> Key: MESOS-1779
> URL: https://issues.apache.org/jira/browse/MESOS-1779
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Kamil Domański
>  Labels: newbie
>
> Trailing white space errors are currently not caught by the style checker. It 
> should!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1779) Mesos style checker should catch trailing white space

2014-09-10 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1779:
--
Fix Version/s: 0.21.0

> Mesos style checker should catch trailing white space
> -
>
> Key: MESOS-1779
> URL: https://issues.apache.org/jira/browse/MESOS-1779
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Kamil Domański
>  Labels: newbie
> Fix For: 0.21.0
>
>
> Trailing white space errors are currently not caught by the style checker. It 
> should!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1779) Mesos style checker should catch trailing white space

2014-09-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129579#comment-14129579
 ] 

Kamil Domański commented on MESOS-1779:
---

https://reviews.apache.org/r/25526/

> Mesos style checker should catch trailing white space
> -
>
> Key: MESOS-1779
> URL: https://issues.apache.org/jira/browse/MESOS-1779
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>  Labels: newbie
>
> Trailing white space errors are currently not caught by the style checker. It 
> should!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart

2014-09-10 Thread Cody Maloney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129546#comment-14129546
 ] 

Cody Maloney commented on MESOS-1739:
-

[~vinodkone]: New review request (https://reviews.apache.org/r/25525/). Updated 
the bug title.

Tests now pass. All functionality is there. All comments are incorporated 
except the patch still allows both resources and attributes to be set to 
supersets of what they currently are. 

In the case where someone has a setup where they have a critical negative 
attribute check, they should be aware of that and just not ever add that 
attribute at runtime (They can always fully kill the slave then restart). 
Changing the recover behavior in this case doesn't break their setups, and 
there are a number of cases where we would like increasing attribute sets. The 
check is in one place, and would be one line to change / remove / make be 
identical if that is a hard requirement.

> Allow slave reconfiguration on restart
> --
>
> Key: MESOS-1739
> URL: https://issues.apache.org/jira/browse/MESOS-1739
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Patrick Reilly
>Assignee: Cody Maloney
>
> Make it so that either via a slave restart or a out of process "reconfigure" 
> ping, the attributes and resources of a slave can be updated to be a superset 
> of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart

2014-09-10 Thread Patrick Reilly (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129485#comment-14129485
 ] 

Patrick Reilly commented on MESOS-1739:
---

[~vinodkone] I've gone ahead and closed https://reviews.apache.org/r/25111/ 
I'll have [~cmaloney] submit a new review board shortly.

> Allow slave reconfiguration on restart
> --
>
> Key: MESOS-1739
> URL: https://issues.apache.org/jira/browse/MESOS-1739
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Patrick Reilly
>Assignee: Cody Maloney
>
> Make it so that either via a slave restart or a out of process "reconfigure" 
> ping, the attributes and resources of a slave can be updated to be a superset 
> of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1739) Allow slave reconfiguration on restart

2014-09-10 Thread Cody Maloney (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cody Maloney updated MESOS-1739:

Summary: Allow slave reconfiguration on restart  (was: Add Dynamic Slave 
Attributes)

> Allow slave reconfiguration on restart
> --
>
> Key: MESOS-1739
> URL: https://issues.apache.org/jira/browse/MESOS-1739
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Patrick Reilly
>Assignee: Cody Maloney
>
> Make it so that either via a slave restart or a out of process "reconfigure" 
> ping, the attributes and resources of a slave can be updated to be a superset 
> of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1739) Add Dynamic Slave Attributes

2014-09-10 Thread Cody Maloney (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cody Maloney reassigned MESOS-1739:
---

Assignee: Cody Maloney  (was: Patrick Reilly)

> Add Dynamic Slave Attributes
> 
>
> Key: MESOS-1739
> URL: https://issues.apache.org/jira/browse/MESOS-1739
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Patrick Reilly
>Assignee: Cody Maloney
>
> Make it so that either via a slave restart or a out of process "reconfigure" 
> ping, the attributes and resources of a slave can be updated to be a superset 
> of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1785) ExampleTest.LowLevelSchedulerLibprocess is flaky

2014-09-10 Thread Yan Xu (JIRA)
Yan Xu created MESOS-1785:
-

 Summary: ExampleTest.LowLevelSchedulerLibprocess is flaky
 Key: MESOS-1785
 URL: https://issues.apache.org/jira/browse/MESOS-1785
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.20.0
Reporter: Yan Xu


The test lasted forever because task 1 is not completed.

{noformat:title=log}
[ RUN  ] ExamplesTest.LowLevelSchedulerLibprocess
Using temporary directory '/tmp/ExamplesTest_LowLevelSchedulerLibprocess_s2iS1n'
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0910 05:57:28.191807 17625 process.cpp:1771] libprocess is initialized on 
127.0.1.1:47878 for 8 cpus
Enabling authentication for the scheduler
I0910 05:57:28.193083 17625 logging.cpp:177] Logging to STDERR
I0910 05:57:28.193274 17625 scheduler.cpp:145] Version: 0.21.0
I0910 05:57:28.224969 17625 leveldb.cpp:176] Opened db in 29.007559ms
I0910 05:57:28.234238 17625 leveldb.cpp:183] Compacted db in 9.042296ms
I0910 05:57:28.234468 17625 leveldb.cpp:198] Created db iterator in 32144ns
I0910 05:57:28.234742 17625 leveldb.cpp:204] Seeked to beginning of db in 1548ns
I0910 05:57:28.234879 17625 leveldb.cpp:273] Iterated through 0 keys in the db 
in 5502ns
I0910 05:57:28.235086 17625 replica.cpp:741] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0910 05:57:28.237017 17654 master.cpp:286] Master 
20140910-055728-16842879-47878-17625 (trusty) started on 127.0.1.1:47878
I0910 05:57:28.237479 17654 master.cpp:332] Master only allowing authenticated 
frameworks to register
I0910 05:57:28.237592 17654 master.cpp:339] Master allowing unauthenticated 
slaves to register
I0910 05:57:28.237733 17654 credentials.hpp:36] Loading credentials for 
authentication from 
'/tmp/ExamplesTest_LowLevelSchedulerLibprocess_s2iS1n/credentials'
W0910 05:57:28.237985 17654 credentials.hpp:51] Permissions on credentials file 
'/tmp/ExamplesTest_LowLevelSchedulerLibprocess_s2iS1n/credentials' are too 
open. It is recommended that your credentials file is NOT accessible by others.
I0910 05:57:28.238147 17654 master.cpp:366] Authorization enabled
I0910 05:57:28.237361 17652 recover.cpp:425] Starting replica recovery
I0910 05:57:28.238878 17651 recover.cpp:451] Replica is in EMPTY status
I0910 05:57:28.239704 17651 replica.cpp:638] Replica in EMPTY status received a 
broadcasted recover request
I0910 05:57:28.240255 17651 recover.cpp:188] Received a recover response from a 
replica in EMPTY status
I0910 05:57:28.240582 17651 recover.cpp:542] Updating replica status to STARTING
I0910 05:57:28.240134 17650 master.cpp:120] No whitelist given. Advertising 
offers for all slaves
I0910 05:57:28.241950 17652 hierarchical_allocator_process.hpp:299] 
Initializing hierarchical allocator process with master : master@127.0.1.1:47878
I0910 05:57:28.243247 17651 master.cpp:1212] The newly elected leader is 
master@127.0.1.1:47878 with id 20140910-055728-16842879-47878-17625
I0910 05:57:28.243402 17651 master.cpp:1225] Elected as the leading master!
I0910 05:57:28.243620 17651 master.cpp:1043] Recovering from registrar
I0910 05:57:28.243831 17651 registrar.cpp:313] Recovering registrar
I0910 05:57:28.244851 17625 containerizer.cpp:89] Using isolation: 
posix/cpu,posix/mem
I0910 05:57:28.246381 17652 slave.cpp:167] Slave started on 1)@127.0.1.1:47878
I0910 05:57:28.246824 17652 slave.cpp:287] Slave resources: cpus(*):1; 
mem(*):1986; disk(*):24988; ports(*):[31000-32000]
I0910 05:57:28.247046 17652 slave.cpp:315] Slave hostname: trusty
I0910 05:57:28.247133 17652 slave.cpp:316] Slave checkpoint: true
I0910 05:57:28.247632 17652 state.cpp:33] Recovering state from 
'/tmp/mesos-SUO0qf/0/meta'
I0910 05:57:28.247834 17650 status_update_manager.cpp:193] Recovering status 
update manager
I0910 05:57:28.247987 17650 containerizer.cpp:252] Recovering containerizer
I0910 05:57:28.248399 17652 slave.cpp:3202] Finished recovery
I0910 05:57:28.248921 17652 slave.cpp:598] New master detected at 
master@127.0.1.1:47878
I0910 05:57:28.249073 17656 status_update_manager.cpp:167] New master detected 
at master@127.0.1.1:47878
I0910 05:57:28.249191 17652 slave.cpp:634] No credentials provided. Attempting 
to register without authentication
I0910 05:57:28.249337 17652 slave.cpp:645] Detecting new master
I0910 05:57:28.249740 17625 containerizer.cpp:89] Using isolation: 
posix/cpu,posix/mem
I0910 05:57:28.251518 17650 slave.cpp:167] Slave started on 2)@127.0.1.1:47878
I0910 05:57:28.251782 17650 slave.cpp:287] Slave resources: cpus(*):1; 
mem(*):1986; disk(*):24988; ports(*):[31000-32000]
I0910 05:57:28.251960 17650 slave.cpp:315] Slave hostname: trusty
I0910 05:57:28.252079 17650 slave.cpp:316] Slave checkpoint: true
I0910 05:57:28.252481 17650 state.cpp:33] Recovering state from 
'/tmp/mesos-SUO0qf/1/meta'
I0910 05:57:28.252691 17650 status_update_man

[jira] [Commented] (MESOS-1783) MasterTest.LaunchDuplicateOfferTest is flaky

2014-09-10 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129269#comment-14129269
 ] 

Niklas Quarfot Nielsen commented on MESOS-1783:
---

Will do - sorry for the tardy reply

> MasterTest.LaunchDuplicateOfferTest is flaky
> 
>
> Key: MESOS-1783
> URL: https://issues.apache.org/jira/browse/MESOS-1783
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.0
> Environment: ubuntu-14.04-gcc Jenkins VM
>Reporter: Yan Xu
>
> {noformat:title=}
> [ RUN  ] MasterTest.LaunchDuplicateOfferTest
> Using temporary directory '/tmp/MasterTest_LaunchDuplicateOfferTest_3ifzmg'
> I0909 22:46:59.212977 21883 leveldb.cpp:176] Opened db in 20.307533ms
> I0909 22:46:59.219717 21883 leveldb.cpp:183] Compacted db in 6.470397ms
> I0909 22:46:59.219925 21883 leveldb.cpp:198] Created db iterator in 5571ns
> I0909 22:46:59.220100 21883 leveldb.cpp:204] Seeked to beginning of db in 
> 1365ns
> I0909 22:46:59.220268 21883 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 658ns
> I0909 22:46:59.220448 21883 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0909 22:46:59.220855 21903 recover.cpp:425] Starting replica recovery
> I0909 22:46:59.221103 21903 recover.cpp:451] Replica is in EMPTY status
> I0909 22:46:59.221626 21903 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0909 22:46:59.221914 21903 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0909 22:46:59.04 21903 recover.cpp:542] Updating replica status to 
> STARTING
> I0909 22:46:59.232590 21900 master.cpp:286] Master 
> 20140909-224659-16842879-44263-21883 (trusty) started on 127.0.1.1:44263
> I0909 22:46:59.233278 21900 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0909 22:46:59.233543 21900 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0909 22:46:59.233934 21900 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/MasterTest_LaunchDuplicateOfferTest_3ifzmg/credentials'
> I0909 22:46:59.236431 21900 master.cpp:366] Authorization enabled
> I0909 22:46:59.237522 21898 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@127.0.1.1:44263
> I0909 22:46:59.237877 21904 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0909 22:46:59.238723 21903 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 16.245391ms
> I0909 22:46:59.238916 21903 replica.cpp:320] Persisted replica status to 
> STARTING
> I0909 22:46:59.239203 21903 recover.cpp:451] Replica is in STARTING status
> I0909 22:46:59.239724 21903 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0909 22:46:59.239967 21903 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0909 22:46:59.240304 21903 recover.cpp:542] Updating replica status to VOTING
> I0909 22:46:59.240684 21900 master.cpp:1212] The newly elected leader is 
> master@127.0.1.1:44263 with id 20140909-224659-16842879-44263-21883
> I0909 22:46:59.240846 21900 master.cpp:1225] Elected as the leading master!
> I0909 22:46:59.241149 21900 master.cpp:1043] Recovering from registrar
> I0909 22:46:59.241509 21898 registrar.cpp:313] Recovering registrar
> I0909 22:46:59.248440 21903 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 7.864221ms
> I0909 22:46:59.248644 21903 replica.cpp:320] Persisted replica status to 
> VOTING
> I0909 22:46:59.248846 21903 recover.cpp:556] Successfully joined the Paxos 
> group
> I0909 22:46:59.249330 21897 log.cpp:656] Attempting to start the writer
> I0909 22:46:59.249809 21897 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0909 22:46:59.250075 21903 recover.cpp:440] Recover process terminated
> I0909 22:46:59.258286 21897 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.292514ms
> I0909 22:46:59.258489 21897 replica.cpp:342] Persisted promised to 1
> I0909 22:46:59.258848 21897 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0909 22:46:59.259454 21897 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0909 22:46:59.267755 21897 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 8.109338ms
> I0909 22:46:59.267916 21897 replica.cpp:676] Persisted action at 0
> I0909 22:46:59.270128 21902 replica.cpp:508] Replica received write request 
> for position 0
> I0909 22:46:59.270294 21902 leveldb.cpp:438] Reading position from leveldb 
> took 27443ns
> I0909 22:46:59.277220 21902 leveldb.cpp:343] Persisting action (14 bytes) to 
> leve

[jira] [Commented] (MESOS-1739) Add Dynamic Slave Attributes

2014-09-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129212#comment-14129212
 ] 

Vinod Kone commented on MESOS-1739:
---

[~preillyme] Do you want rename the title/description of the ticket and the 
review per the revised semantics we discussed about? Also, let me know if/when 
it's ready to review.

Also, you asked about design doc for framework update earlier. You can find it 
here: MESOS-1784. 

> Add Dynamic Slave Attributes
> 
>
> Key: MESOS-1739
> URL: https://issues.apache.org/jira/browse/MESOS-1739
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Patrick Reilly
>Assignee: Patrick Reilly
>
> Make it so that either via a slave restart or a out of process "reconfigure" 
> ping, the attributes and resources of a slave can be updated to be a superset 
> of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-1728) Libprocess: report bind parameters on failure

2014-09-10 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone resolved MESOS-1728.
---
  Resolution: Fixed
   Fix Version/s: 0.21.0
Target Version/s: 0.20.1

commit 70784a9f234b2902d6fee11298365d9b08756313
Author: Nikita Vetoshkin 
Date:   Thu Aug 21 11:40:55 2014 -0700

Report bind parameters on failure.

Review: https://reviews.apache.org/r/24939


> Libprocess: report bind parameters on failure
> -
>
> Key: MESOS-1728
> URL: https://issues.apache.org/jira/browse/MESOS-1728
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Nikita Vetoshkin
>Assignee: Nikita Vetoshkin
>Priority: Trivial
> Fix For: 0.21.0
>
>
> When you attempt to start slave or master and there's another one already 
> running there, it is nice to report what are the actual parameters to 
> {{bind}} call that failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1784) Design the semantics for updating FrameworkInfo

2014-09-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129179#comment-14129179
 ] 

Vinod Kone commented on MESOS-1784:
---

https://docs.google.com/document/d/1vEBuFN9mm3HkrNCmkAuwX-4kNYv0MwjUecLE3Jp_BqE/edit?usp=sharing

> Design the semantics for updating FrameworkInfo
> ---
>
> Key: MESOS-1784
> URL: https://issues.apache.org/jira/browse/MESOS-1784
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> Currently, there is no easy way for frameworks to update their 
> FrameworkInfo., resulting in issues like MESOS-703 and MESOS-1218.
> This ticket captures the design for doing FrameworkInfo update without having 
> to roll masters/slaves/tasks/executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1709) ExamplesTest.NoExecutorFramework is flaky

2014-09-10 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129015#comment-14129015
 ] 

Yan Xu commented on MESOS-1709:
---

This can be prevented if ExecutorDriver::join() waits for all status update 
acks. See MESOS-243

> ExamplesTest.NoExecutorFramework is flaky
> -
>
> Key: MESOS-1709
> URL: https://issues.apache.org/jira/browse/MESOS-1709
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kone
>
> Seen this happen couple of times on Twitter CI machines. Looks like the slave 
> sends TASK_FAILED for one of the executors because it got a 
> executorTerminated() signal before it got a TASK_FINISHED signal.
> {code}
> [ RUN  ] ExamplesTest.NoExecutorFramework
> Using temporary directory '/tmp/ExamplesTest_NoExecutorFramework_dZZzd6'
> Enabling authentication for the framework
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0815 18:39:25.623885  5879 process.cpp:1770] libprocess is initialized on 
> 192.168.122.164:53897 for 8 cpus
> I0815 18:39:25.624589  5879 logging.cpp:172] Logging to STDERR
> I0815 18:39:25.627943  5879 leveldb.cpp:176] Opened db in 897745ns
> I0815 18:39:25.628557  5879 leveldb.cpp:183] Compacted db in 467234ns
> I0815 18:39:25.628706  5879 leveldb.cpp:198] Created db iterator in 11396ns
> I0815 18:39:25.628939  5879 leveldb.cpp:204] Seeked to beginning of db in 
> 1391ns
> I0815 18:39:25.629060  5879 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 678ns
> I0815 18:39:25.629261  5879 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0815 18:39:25.630502  5909 recover.cpp:425] Starting replica recovery
> I0815 18:39:25.630935  5909 recover.cpp:451] Replica is in EMPTY status
> I0815 18:39:25.631501  5909 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0815 18:39:25.631804  5909 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0815 18:39:25.632524  5909 recover.cpp:542] Updating replica status to 
> STARTING
> I0815 18:39:25.632935  5909 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 48908ns
> I0815 18:39:25.633219  5909 replica.cpp:320] Persisted replica status to 
> STARTING
> I0815 18:39:25.633545  5909 recover.cpp:451] Replica is in STARTING status
> I0815 18:39:25.634224  5905 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0815 18:39:25.634405  5905 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0815 18:39:25.634724  5909 recover.cpp:542] Updating replica status to VOTING
> I0815 18:39:25.634948  5908 master.cpp:286] Master 
> 20140815-183925-2759502016-53897-5879 (fedora-20) started on 
> 192.168.122.164:53897
> I0815 18:39:25.635123  5908 master.cpp:323] Master only allowing 
> authenticated frameworks to register
> I0815 18:39:25.635311  5908 master.cpp:330] Master allowing unauthenticated 
> slaves to register
> I0815 18:39:25.635455  5908 credentials.hpp:36] Loading credentials for 
> authentication from '/tmp/ExamplesTest_NoExecutorFramework_dZZzd6/credentials'
> W0815 18:39:25.635658  5908 credentials.hpp:51] Permissions on credentials 
> file '/tmp/ExamplesTest_NoExecutorFramework_dZZzd6/credentials' are too open. 
> It is recommended that your credentials file is NOT accessible by others.
> I0815 18:39:25.635861  5908 master.cpp:357] Authorization enabled
> I0815 18:39:25.636286  5910 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@192.168.122.164:53897
> I0815 18:39:25.636443  5907 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0815 18:39:25.637657  5908 master.cpp:1196] The newly elected leader is 
> master@192.168.122.164:53897 with id 20140815-183925-2759502016-53897-5879
> I0815 18:39:25.638296  5908 master.cpp:1209] Elected as the leading master!
> I0815 18:39:25.638254  5906 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 42875ns
> I0815 18:39:25.638552  5906 replica.cpp:320] Persisted replica status to 
> VOTING
> I0815 18:39:25.638737  5906 recover.cpp:556] Successfully joined the Paxos 
> group
> I0815 18:39:25.638926  5906 recover.cpp:440] Recover process terminated
> I0815 18:39:25.639225  5908 master.cpp:1027] Recovering from registrar
> I0815 18:39:25.639457  5907 registrar.cpp:313] Recovering registrar
> I0815 18:39:25.639850  5907 log.cpp:656] Attempting to start the writer
> I0815 18:39:25.640336  5907 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0815 18:39:25.640530  5907 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 37820ns
> I0815 18:39:25.640714  5907 replica.cpp:342] Persisted promised to 1
> I08

[jira] [Commented] (MESOS-243) driver stop() should block until outstanding requests have been persisted

2014-09-10 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129014#comment-14129014
 ] 

Yan Xu commented on MESOS-243:
--

Making messages are flushed still aren't sufficient to ensure it's received. We 
can have ExecutorDriver.join() wait for all status updates to be acked.

> driver stop() should block until outstanding requests have been persisted
> -
>
> Key: MESOS-243
> URL: https://issues.apache.org/jira/browse/MESOS-243
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0, 0.14.1, 
> 0.14.2, 0.15.0
>Reporter: brian wickman
>
> in our executor, we send a terminal status update message and immediately 
> call driver.stop().  it turns out that the status update is dispatched 
> asynchronously and races with driver shutdown, causing tasks to instead 
> periodically go into LOST state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1776) --without-PACKAGE will set incorrect dependency prefix

2014-09-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129008#comment-14129008
 ] 

Kamil Domański commented on MESOS-1776:
---

[~tstclair], I'd like some feedback on whether the ability to provide a prefix 
for unbundled dependencies is necessary. The current method for providing the 
prefix is a bit unusual and conflicts with canonical usage of *\-\-with-X* and 
*\-\-without-X* flags.

A workaround is possible by checking for the variables in question being equal 
to "*no*" and changing their values to "*/usr*" in such cases. However, I see 
removal of prefixing altogether as a preferred solution.

Either way, I'd like to take care of this as as soon as I'm pointed in the 
right direction.

> --without-PACKAGE will set incorrect dependency prefix
> --
>
> Key: MESOS-1776
> URL: https://issues.apache.org/jira/browse/MESOS-1776
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.20.0
>Reporter: Kamil Domański
>  Labels: build
>
> When disabling a particular bundled dependency with *--without-PACKAGE*, the 
> build scripts of both Mesos and libprocess will set a corresponding variable 
> to "no". This is later treated as prefix under which to search for the 
> package.
> For example, with *--without-protobuf*, the script will search for *protoc* 
> under *no/bin* and obviously fail. I would propose to get rid of these 
> prefixes entirely and instead search in default locations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1766) MasterAuthorizationTest.DuplicateRegistration test is flaky

2014-09-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129003#comment-14129003
 ] 

Vinod Kone commented on MESOS-1766:
---

https://reviews.apache.org/r/25516/

> MasterAuthorizationTest.DuplicateRegistration test is flaky
> ---
>
> Key: MESOS-1766
> URL: https://issues.apache.org/jira/browse/MESOS-1766
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> {code}
> [ RUN  ] MasterAuthorizationTest.DuplicateRegistration
> Using temporary directory 
> '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m'
> I0905 15:53:16.398993 25769 leveldb.cpp:176] Opened db in 2.601036ms
> I0905 15:53:16.399566 25769 leveldb.cpp:183] Compacted db in 546216ns
> I0905 15:53:16.399590 25769 leveldb.cpp:198] Created db iterator in 2787ns
> I0905 15:53:16.399605 25769 leveldb.cpp:204] Seeked to beginning of db in 
> 500ns
> I0905 15:53:16.399617 25769 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 185ns
> I0905 15:53:16.399633 25769 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0905 15:53:16.399817 25786 recover.cpp:425] Starting replica recovery
> I0905 15:53:16.399952 25793 recover.cpp:451] Replica is in EMPTY status
> I0905 15:53:16.400683 25795 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0905 15:53:16.400795 25787 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0905 15:53:16.401005 25783 recover.cpp:542] Updating replica status to 
> STARTING
> I0905 15:53:16.401470 25786 master.cpp:286] Master 
> 20140905-155316-3125920579-49188-25769 (penates.apache.org) started on 
> 67.195.81.186:49188
> I0905 15:53:16.401521 25786 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0905 15:53:16.401533 25786 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0905 15:53:16.401543 25786 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m/credentials'
> I0905 15:53:16.401558 25793 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 474683ns
> I0905 15:53:16.401582 25793 replica.cpp:320] Persisted replica status to 
> STARTING
> I0905 15:53:16.401667 25793 recover.cpp:451] Replica is in STARTING status
> I0905 15:53:16.401669 25786 master.cpp:366] Authorization enabled
> I0905 15:53:16.401898 25795 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0905 15:53:16.401936 25796 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:49188
> I0905 15:53:16.402160 25784 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0905 15:53:16.402333 25790 master.cpp:1205] The newly elected leader is 
> master@67.195.81.186:49188 with id 20140905-155316-3125920579-49188-25769
> I0905 15:53:16.402359 25790 master.cpp:1218] Elected as the leading master!
> I0905 15:53:16.402371 25790 master.cpp:1036] Recovering from registrar
> I0905 15:53:16.402472 25798 registrar.cpp:313] Recovering registrar
> I0905 15:53:16.402529 25791 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0905 15:53:16.402782 25788 recover.cpp:542] Updating replica status to VOTING
> I0905 15:53:16.403002 25795 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 116403ns
> I0905 15:53:16.403020 25795 replica.cpp:320] Persisted replica status to 
> VOTING
> I0905 15:53:16.403081 25791 recover.cpp:556] Successfully joined the Paxos 
> group
> I0905 15:53:16.403197 25791 recover.cpp:440] Recover process terminated
> I0905 15:53:16.403388 25796 log.cpp:656] Attempting to start the writer
> I0905 15:53:16.403993 25784 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0905 15:53:16.404147 25784 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 132156ns
> I0905 15:53:16.404167 25784 replica.cpp:342] Persisted promised to 1
> I0905 15:53:16.404542 25795 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0905 15:53:16.405498 25787 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0905 15:53:16.405868 25787 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 347231ns
> I0905 15:53:16.405886 25787 replica.cpp:676] Persisted action at 0
> I0905 15:53:16.406553 25788 replica.cpp:508] Replica received write request 
> for position 0
> I0905 15:53:16.406582 25788 leveldb.cpp:438] Reading position from leveldb 
> took 11402ns
> I0905 15:53:16.529067 25788 leveldb.cpp:343] Persisting action (14 bytes) to 
> l

[jira] [Commented] (MESOS-1760) MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky

2014-09-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129004#comment-14129004
 ] 

Vinod Kone commented on MESOS-1760:
---

https://reviews.apache.org/r/25516/

> MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky
> -
>
> Key: MESOS-1760
> URL: https://issues.apache.org/jira/browse/MESOS-1760
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> Observed this on Apache CI: 
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2355/changes
> {code}
> [ RUN] MasterAuthorizationTest.FrameworkRemovedBeforeReregistration
> Using temporary directory 
> '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z'
> I0903 22:04:33.520237 25565 leveldb.cpp:176] Opened db in 49.073821ms
> I0903 22:04:33.538331 25565 leveldb.cpp:183] Compacted db in 18.065051ms
> I0903 22:04:33.538363 25565 leveldb.cpp:198] Created db iterator in 4826ns
> I0903 22:04:33.538377 25565 leveldb.cpp:204] Seeked to beginning of db in 
> 682ns
> I0903 22:04:33.538385 25565 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 312ns
> I0903 22:04:33.538399 25565 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0903 22:04:33.538624 25593 recover.cpp:425] Starting replica recovery
> I0903 22:04:33.538707 25598 recover.cpp:451] Replica is in EMPTY status
> I0903 22:04:33.540909 25590 master.cpp:286] Master 
> 20140903-220433-453759884-44122-25565 (hemera.apache.org) started on 
> 140.211.11.27:44122
> I0903 22:04:33.540932 25590 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0903 22:04:33.540936 25590 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0903 22:04:33.540941 25590 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z/credentials'
> I0903 22:04:33.541337 25590 master.cpp:366] Authorization enabled
> I0903 22:04:33.541508 25597 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0903 22:04:33.542343 25582 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@140.211.11.27:44122
> I0903 22:04:33.542445 25592 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0903 22:04:33.543175 25602 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0903 22:04:33.543637 25587 recover.cpp:542] Updating replica status to 
> STARTING
> I0903 22:04:33.544256 25579 master.cpp:1205] The newly elected leader is 
> master@140.211.11.27:44122 with id 20140903-220433-453759884-44122-25565
> I0903 22:04:33.544275 25579 master.cpp:1218] Elected as the leading master!
> I0903 22:04:33.544282 25579 master.cpp:1036] Recovering from registrar
> I0903 22:04:33.544401 25579 registrar.cpp:313] Recovering registrar
> I0903 22:04:33.558487 25593 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 14.678563ms
> I0903 22:04:33.558531 25593 replica.cpp:320] Persisted replica status to 
> STARTING
> I0903 22:04:33.558653 25593 recover.cpp:451] Replica is in STARTING status
> I0903 22:04:33.559867 25588 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0903 22:04:33.560057 25602 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0903 22:04:33.561280 25584 recover.cpp:542] Updating replica status to VOTING
> I0903 22:04:33.576900 25581 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 14.712427ms
> I0903 22:04:33.576942 25581 replica.cpp:320] Persisted replica status to 
> VOTING
> I0903 22:04:33.577018 25581 recover.cpp:556] Successfully joined the Paxos 
> group
> I0903 22:04:33.577108 25581 recover.cpp:440] Recover process terminated
> I0903 22:04:33.577401 25581 log.cpp:656] Attempting to start the writer
> I0903 22:04:33.578559 25589 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0903 22:04:33.594611 25589 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 16.029152ms
> I0903 22:04:33.594640 25589 replica.cpp:342] Persisted promised to 1
> I0903 22:04:33.595391 25584 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0903 22:04:33.597512 25588 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0903 22:04:33.613037 25588 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 15.502568ms
> I0903 22:04:33.613065 25588 replica.cpp:676] Persisted action at 0
> I0903 22:04:33.615435 25585 replica.cpp:508]

[jira] [Updated] (MESOS-703) master fails to respect updated FrameworkInfo when the framework scheduler restarts

2014-09-10 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-703:
-
Shepherd: Vinod Kone

> master fails to respect updated FrameworkInfo when the framework scheduler 
> restarts
> ---
>
> Key: MESOS-703
> URL: https://issues.apache.org/jira/browse/MESOS-703
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.14.0
> Environment: ubuntu 13.04, mesos 0.14.0-rc3
>Reporter: Jordan Curzon
>
> When I first ran marathon it was running as a personal user and registered 
> with mesos-master as such due to putting an empty string in the user field. 
> When I restarted marathon as "nobody", tasks were still being run as the 
> personal user which didn't exist on the slaves. I know marathon was trying to 
> send a FrameworkInfo with nobody listed as the user because I hard coded it 
> in. The tasks wouldn't run as "nobody" until I restarted the mesos-master. 
> Each time I restarted the marathon framework, it reregistered with 
> mesos-master and mesos-master wrote to the logs that it detected a failover 
> because the scheduler went away and then came back.
> I understand the scheduler failover, but shouldn't mesos-master respect an 
> updated FrameworkInfo when the scheduler re-registers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-703) master fails to respect updated FrameworkInfo when the framework scheduler restarts

2014-09-10 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-703:
-
Assignee: (was: Vinod Kone)

> master fails to respect updated FrameworkInfo when the framework scheduler 
> restarts
> ---
>
> Key: MESOS-703
> URL: https://issues.apache.org/jira/browse/MESOS-703
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.14.0
> Environment: ubuntu 13.04, mesos 0.14.0-rc3
>Reporter: Jordan Curzon
>
> When I first ran marathon it was running as a personal user and registered 
> with mesos-master as such due to putting an empty string in the user field. 
> When I restarted marathon as "nobody", tasks were still being run as the 
> personal user which didn't exist on the slaves. I know marathon was trying to 
> send a FrameworkInfo with nobody listed as the user because I hard coded it 
> in. The tasks wouldn't run as "nobody" until I restarted the mesos-master. 
> Each time I restarted the marathon framework, it reregistered with 
> mesos-master and mesos-master wrote to the logs that it detected a failover 
> because the scheduler went away and then came back.
> I understand the scheduler failover, but shouldn't mesos-master respect an 
> updated FrameworkInfo when the scheduler re-registers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-703) master fails to respect updated FrameworkInfo when the framework scheduler restarts

2014-09-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128992#comment-14128992
 ] 

Vinod Kone commented on MESOS-703:
--

Linked the ticket for the design doc to do this properly.

> master fails to respect updated FrameworkInfo when the framework scheduler 
> restarts
> ---
>
> Key: MESOS-703
> URL: https://issues.apache.org/jira/browse/MESOS-703
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.14.0
> Environment: ubuntu 13.04, mesos 0.14.0-rc3
>Reporter: Jordan Curzon
>Assignee: Vinod Kone
>
> When I first ran marathon it was running as a personal user and registered 
> with mesos-master as such due to putting an empty string in the user field. 
> When I restarted marathon as "nobody", tasks were still being run as the 
> personal user which didn't exist on the slaves. I know marathon was trying to 
> send a FrameworkInfo with nobody listed as the user because I hard coded it 
> in. The tasks wouldn't run as "nobody" until I restarted the mesos-master. 
> Each time I restarted the marathon framework, it reregistered with 
> mesos-master and mesos-master wrote to the logs that it detected a failover 
> because the scheduler went away and then came back.
> I understand the scheduler failover, but shouldn't mesos-master respect an 
> updated FrameworkInfo when the scheduler re-registers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-703) master fails to respect updated FrameworkInfo when the framework scheduler restarts

2014-09-10 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-703:
-
Sprint:   (was: Mesos Q3 Sprint 5)

> master fails to respect updated FrameworkInfo when the framework scheduler 
> restarts
> ---
>
> Key: MESOS-703
> URL: https://issues.apache.org/jira/browse/MESOS-703
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.14.0
> Environment: ubuntu 13.04, mesos 0.14.0-rc3
>Reporter: Jordan Curzon
>Assignee: Vinod Kone
>
> When I first ran marathon it was running as a personal user and registered 
> with mesos-master as such due to putting an empty string in the user field. 
> When I restarted marathon as "nobody", tasks were still being run as the 
> personal user which didn't exist on the slaves. I know marathon was trying to 
> send a FrameworkInfo with nobody listed as the user because I hard coded it 
> in. The tasks wouldn't run as "nobody" until I restarted the mesos-master. 
> Each time I restarted the marathon framework, it reregistered with 
> mesos-master and mesos-master wrote to the logs that it detected a failover 
> because the scheduler went away and then came back.
> I understand the scheduler failover, but shouldn't mesos-master respect an 
> updated FrameworkInfo when the scheduler re-registers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1784) Design the semantics for updating FrameworkInfo

2014-09-10 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-1784:
-

 Summary: Design the semantics for updating FrameworkInfo
 Key: MESOS-1784
 URL: https://issues.apache.org/jira/browse/MESOS-1784
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Vinod Kone


Currently, there is no easy way for frameworks to update their FrameworkInfo., 
resulting in issues like MESOS-703 and MESOS-1218.

This ticket captures the design for doing FrameworkInfo update without having 
to roll masters/slaves/tasks/executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1766) MasterAuthorizationTest.DuplicateRegistration test is flaky

2014-09-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128986#comment-14128986
 ] 

Vinod Kone commented on MESOS-1766:
---

The bug here is that the authorizer might get more than the expected 
registration authorization requests because of registration retries by the 
scheduler driver.

The fix is simple, authorizer should allow all subsequent authorization 
requests.

> MasterAuthorizationTest.DuplicateRegistration test is flaky
> ---
>
> Key: MESOS-1766
> URL: https://issues.apache.org/jira/browse/MESOS-1766
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> {code}
> [ RUN  ] MasterAuthorizationTest.DuplicateRegistration
> Using temporary directory 
> '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m'
> I0905 15:53:16.398993 25769 leveldb.cpp:176] Opened db in 2.601036ms
> I0905 15:53:16.399566 25769 leveldb.cpp:183] Compacted db in 546216ns
> I0905 15:53:16.399590 25769 leveldb.cpp:198] Created db iterator in 2787ns
> I0905 15:53:16.399605 25769 leveldb.cpp:204] Seeked to beginning of db in 
> 500ns
> I0905 15:53:16.399617 25769 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 185ns
> I0905 15:53:16.399633 25769 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0905 15:53:16.399817 25786 recover.cpp:425] Starting replica recovery
> I0905 15:53:16.399952 25793 recover.cpp:451] Replica is in EMPTY status
> I0905 15:53:16.400683 25795 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0905 15:53:16.400795 25787 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0905 15:53:16.401005 25783 recover.cpp:542] Updating replica status to 
> STARTING
> I0905 15:53:16.401470 25786 master.cpp:286] Master 
> 20140905-155316-3125920579-49188-25769 (penates.apache.org) started on 
> 67.195.81.186:49188
> I0905 15:53:16.401521 25786 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0905 15:53:16.401533 25786 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0905 15:53:16.401543 25786 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m/credentials'
> I0905 15:53:16.401558 25793 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 474683ns
> I0905 15:53:16.401582 25793 replica.cpp:320] Persisted replica status to 
> STARTING
> I0905 15:53:16.401667 25793 recover.cpp:451] Replica is in STARTING status
> I0905 15:53:16.401669 25786 master.cpp:366] Authorization enabled
> I0905 15:53:16.401898 25795 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0905 15:53:16.401936 25796 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:49188
> I0905 15:53:16.402160 25784 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0905 15:53:16.402333 25790 master.cpp:1205] The newly elected leader is 
> master@67.195.81.186:49188 with id 20140905-155316-3125920579-49188-25769
> I0905 15:53:16.402359 25790 master.cpp:1218] Elected as the leading master!
> I0905 15:53:16.402371 25790 master.cpp:1036] Recovering from registrar
> I0905 15:53:16.402472 25798 registrar.cpp:313] Recovering registrar
> I0905 15:53:16.402529 25791 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0905 15:53:16.402782 25788 recover.cpp:542] Updating replica status to VOTING
> I0905 15:53:16.403002 25795 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 116403ns
> I0905 15:53:16.403020 25795 replica.cpp:320] Persisted replica status to 
> VOTING
> I0905 15:53:16.403081 25791 recover.cpp:556] Successfully joined the Paxos 
> group
> I0905 15:53:16.403197 25791 recover.cpp:440] Recover process terminated
> I0905 15:53:16.403388 25796 log.cpp:656] Attempting to start the writer
> I0905 15:53:16.403993 25784 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0905 15:53:16.404147 25784 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 132156ns
> I0905 15:53:16.404167 25784 replica.cpp:342] Persisted promised to 1
> I0905 15:53:16.404542 25795 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0905 15:53:16.405498 25787 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0905 15:53:16.405868 25787 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 347231ns
> I0905 15:53:16.405886 25787 replica.cpp:676] Persisted action at 0
> I0905 15:53:16.406553 25788 replica.cpp:508] Replica recei

[jira] [Updated] (MESOS-1760) MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky

2014-09-10 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1760:
--
  Sprint: Mesos Q3 Sprint 5
Assignee: Vinod Kone
Story Points: 1

This is due to the same issue as seen in MESOS-1766, duplicate registration 
retries.

> MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky
> -
>
> Key: MESOS-1760
> URL: https://issues.apache.org/jira/browse/MESOS-1760
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> Observed this on Apache CI: 
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2355/changes
> {code}
> [ RUN] MasterAuthorizationTest.FrameworkRemovedBeforeReregistration
> Using temporary directory 
> '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z'
> I0903 22:04:33.520237 25565 leveldb.cpp:176] Opened db in 49.073821ms
> I0903 22:04:33.538331 25565 leveldb.cpp:183] Compacted db in 18.065051ms
> I0903 22:04:33.538363 25565 leveldb.cpp:198] Created db iterator in 4826ns
> I0903 22:04:33.538377 25565 leveldb.cpp:204] Seeked to beginning of db in 
> 682ns
> I0903 22:04:33.538385 25565 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 312ns
> I0903 22:04:33.538399 25565 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0903 22:04:33.538624 25593 recover.cpp:425] Starting replica recovery
> I0903 22:04:33.538707 25598 recover.cpp:451] Replica is in EMPTY status
> I0903 22:04:33.540909 25590 master.cpp:286] Master 
> 20140903-220433-453759884-44122-25565 (hemera.apache.org) started on 
> 140.211.11.27:44122
> I0903 22:04:33.540932 25590 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0903 22:04:33.540936 25590 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0903 22:04:33.540941 25590 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z/credentials'
> I0903 22:04:33.541337 25590 master.cpp:366] Authorization enabled
> I0903 22:04:33.541508 25597 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0903 22:04:33.542343 25582 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@140.211.11.27:44122
> I0903 22:04:33.542445 25592 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0903 22:04:33.543175 25602 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0903 22:04:33.543637 25587 recover.cpp:542] Updating replica status to 
> STARTING
> I0903 22:04:33.544256 25579 master.cpp:1205] The newly elected leader is 
> master@140.211.11.27:44122 with id 20140903-220433-453759884-44122-25565
> I0903 22:04:33.544275 25579 master.cpp:1218] Elected as the leading master!
> I0903 22:04:33.544282 25579 master.cpp:1036] Recovering from registrar
> I0903 22:04:33.544401 25579 registrar.cpp:313] Recovering registrar
> I0903 22:04:33.558487 25593 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 14.678563ms
> I0903 22:04:33.558531 25593 replica.cpp:320] Persisted replica status to 
> STARTING
> I0903 22:04:33.558653 25593 recover.cpp:451] Replica is in STARTING status
> I0903 22:04:33.559867 25588 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0903 22:04:33.560057 25602 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0903 22:04:33.561280 25584 recover.cpp:542] Updating replica status to VOTING
> I0903 22:04:33.576900 25581 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 14.712427ms
> I0903 22:04:33.576942 25581 replica.cpp:320] Persisted replica status to 
> VOTING
> I0903 22:04:33.577018 25581 recover.cpp:556] Successfully joined the Paxos 
> group
> I0903 22:04:33.577108 25581 recover.cpp:440] Recover process terminated
> I0903 22:04:33.577401 25581 log.cpp:656] Attempting to start the writer
> I0903 22:04:33.578559 25589 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0903 22:04:33.594611 25589 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 16.029152ms
> I0903 22:04:33.594640 25589 replica.cpp:342] Persisted promised to 1
> I0903 22:04:33.595391 25584 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0903 22:04:33.597512 25588 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0903 22:04:33.613037 25588 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 15.502568ms
> I0903 22:04:33.613065 25588 replica

[jira] [Commented] (MESOS-1764) Build Fixes from 0.20 release

2014-09-10 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128940#comment-14128940
 ] 

Timothy St. Clair commented on MESOS-1764:
--

fix git clean -xdf on leveldb folder
-reviews.apache.org/r/25508-

> Build Fixes from 0.20 release
> -
>
> Key: MESOS-1764
> URL: https://issues.apache.org/jira/browse/MESOS-1764
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.20.0
>Reporter: Timothy St. Clair
>Assignee: Timothy St. Clair
>
> This ticket is a catch all for minor issues caught during a rebase and 
> testing.
> + Add package configuration file to deployment
> + Updates deploy_dir from localstatedir to sysconfdir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1766) MasterAuthorizationTest.DuplicateRegistration test is flaky

2014-09-10 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1766:
--
  Sprint: Mesos Q3 Sprint 5
Assignee: Vinod Kone
Story Points: 2

> MasterAuthorizationTest.DuplicateRegistration test is flaky
> ---
>
> Key: MESOS-1766
> URL: https://issues.apache.org/jira/browse/MESOS-1766
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> {code}
> [ RUN  ] MasterAuthorizationTest.DuplicateRegistration
> Using temporary directory 
> '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m'
> I0905 15:53:16.398993 25769 leveldb.cpp:176] Opened db in 2.601036ms
> I0905 15:53:16.399566 25769 leveldb.cpp:183] Compacted db in 546216ns
> I0905 15:53:16.399590 25769 leveldb.cpp:198] Created db iterator in 2787ns
> I0905 15:53:16.399605 25769 leveldb.cpp:204] Seeked to beginning of db in 
> 500ns
> I0905 15:53:16.399617 25769 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 185ns
> I0905 15:53:16.399633 25769 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0905 15:53:16.399817 25786 recover.cpp:425] Starting replica recovery
> I0905 15:53:16.399952 25793 recover.cpp:451] Replica is in EMPTY status
> I0905 15:53:16.400683 25795 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0905 15:53:16.400795 25787 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0905 15:53:16.401005 25783 recover.cpp:542] Updating replica status to 
> STARTING
> I0905 15:53:16.401470 25786 master.cpp:286] Master 
> 20140905-155316-3125920579-49188-25769 (penates.apache.org) started on 
> 67.195.81.186:49188
> I0905 15:53:16.401521 25786 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0905 15:53:16.401533 25786 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0905 15:53:16.401543 25786 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m/credentials'
> I0905 15:53:16.401558 25793 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 474683ns
> I0905 15:53:16.401582 25793 replica.cpp:320] Persisted replica status to 
> STARTING
> I0905 15:53:16.401667 25793 recover.cpp:451] Replica is in STARTING status
> I0905 15:53:16.401669 25786 master.cpp:366] Authorization enabled
> I0905 15:53:16.401898 25795 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0905 15:53:16.401936 25796 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:49188
> I0905 15:53:16.402160 25784 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0905 15:53:16.402333 25790 master.cpp:1205] The newly elected leader is 
> master@67.195.81.186:49188 with id 20140905-155316-3125920579-49188-25769
> I0905 15:53:16.402359 25790 master.cpp:1218] Elected as the leading master!
> I0905 15:53:16.402371 25790 master.cpp:1036] Recovering from registrar
> I0905 15:53:16.402472 25798 registrar.cpp:313] Recovering registrar
> I0905 15:53:16.402529 25791 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0905 15:53:16.402782 25788 recover.cpp:542] Updating replica status to VOTING
> I0905 15:53:16.403002 25795 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 116403ns
> I0905 15:53:16.403020 25795 replica.cpp:320] Persisted replica status to 
> VOTING
> I0905 15:53:16.403081 25791 recover.cpp:556] Successfully joined the Paxos 
> group
> I0905 15:53:16.403197 25791 recover.cpp:440] Recover process terminated
> I0905 15:53:16.403388 25796 log.cpp:656] Attempting to start the writer
> I0905 15:53:16.403993 25784 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0905 15:53:16.404147 25784 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 132156ns
> I0905 15:53:16.404167 25784 replica.cpp:342] Persisted promised to 1
> I0905 15:53:16.404542 25795 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0905 15:53:16.405498 25787 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0905 15:53:16.405868 25787 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 347231ns
> I0905 15:53:16.405886 25787 replica.cpp:676] Persisted action at 0
> I0905 15:53:16.406553 25788 replica.cpp:508] Replica received write request 
> for position 0
> I0905 15:53:16.406582 25788 leveldb.cpp:438] Reading position from leveldb 
> took 11402ns
> I0905 15:53:16.529067 25788 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb to

[jira] [Updated] (MESOS-1676) ZooKeeperMasterContenderDetectorTest.MasterDetectorTimedoutSession is flaky

2014-09-10 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-1676:
--
  Sprint: Mesos Q3 Sprint 5
Story Points: 1

> ZooKeeperMasterContenderDetectorTest.MasterDetectorTimedoutSession is flaky
> ---
>
> Key: MESOS-1676
> URL: https://issues.apache.org/jira/browse/MESOS-1676
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.0
>Reporter: Yan Xu
>Assignee: Yan Xu
>
> {noformat:title=}
> [ RUN  ] 
> ZooKeeperMasterContenderDetectorTest.MasterDetectorTimedoutSession
> I0806 01:18:37.648684 17458 zookeeper_test_server.cpp:158] Started 
> ZooKeeperTestServer on port 42069
> 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@716: Client 
> environment:host.name=lucid
> 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@723: Client 
> environment:os.name=Linux
> 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@724: Client 
> environment:os.arch=2.6.32-64-generic
> 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@725: Client 
> environment:os.version=#128-Ubuntu SMP Tue Jul 15 08:32:40 UTC 2014
> 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@log_env@733: Client 
> environment:user.name=(null)
> 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@log_env@741: Client 
> environment:user.home=/home/jenkins
> 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@log_env@753: Client 
> environment:user.dir=/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src
> 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@zookeeper_init@786: 
> Initiating client connection, host=127.0.0.1:42069 sessionTimeout=5000 
> watcher=0x2b467450bc00 sessionId=0 sessionPasswd= context=0x1682db0 
> flags=0
> 2014-08-06 01:18:37,656:17458(0x2b468638b700):ZOO_INFO@check_events@1703: 
> initiated connection to server [127.0.0.1:42069]
> 2014-08-06 01:18:37,669:17458(0x2b468638b700):ZOO_INFO@check_events@1750: 
> session establishment complete on server [127.0.0.1:42069], 
> sessionId=0x147aa6601cf, negotiated timeout=6000
> I0806 01:18:37.671725 17486 group.cpp:313] Group process 
> (group(37)@127.0.1.1:55561) connected to ZooKeeper
> I0806 01:18:37.671758 17486 group.cpp:787] Syncing group operations: queue 
> size (joins, cancels, datas) = (0, 0, 0)
> I0806 01:18:37.671771 17486 group.cpp:385] Trying to create path '/mesos' in 
> ZooKeeper
> 2014-08-06 
> 01:18:39,101:17458(0x2b4687394700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:36197] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2014-08-06 
> 01:18:42,441:17458(0x2b4687394700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:36197] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> I0806 01:18:42.656673 17481 contender.cpp:131] Joining the ZK group
> I0806 01:18:42.662484 17484 contender.cpp:247] New candidate (id='0') has 
> entered the contest for leadership
> I0806 01:18:42.663754 17481 detector.cpp:138] Detected a new leader: (id='0')
> I0806 01:18:42.663884 17481 group.cpp:658] Trying to get 
> '/mesos/info_00' in ZooKeeper
> I0806 01:18:42.664788 17483 detector.cpp:426] A new leading master 
> (UPID=@128.150.152.0:1) is detected
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@716: Client 
> environment:host.name=lucid
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@723: Client 
> environment:os.name=Linux
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@724: Client 
> environment:os.arch=2.6.32-64-generic
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@725: Client 
> environment:os.version=#128-Ubuntu SMP Tue Jul 15 08:32:40 UTC 2014
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@733: Client 
> environment:user.name=(null)
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@741: Client 
> environment:user.home=/home/jenkins
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@753: Client 
> environment:user.dir=/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@zookeeper_init@786: 
> Initiating client connection, host=127.0.0.1:42069 sessionTimeout=5000 
> watcher=0x2b467450bc00 sessionId=0 sessionPasswd= context=0x15c00f0 
> flags=0
> 2014-08-06 01:18:42,668:17458(0x2b4686d91700):ZOO_INFO@check_events@1703: 
> initiated connection to server [127.0.0.1:42069]

[jira] [Commented] (MESOS-1676) ZooKeeperMasterContenderDetectorTest.MasterDetectorTimedoutSession is flaky

2014-09-10 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128831#comment-14128831
 ] 

Yan Xu commented on MESOS-1676:
---

https://reviews.apache.org/r/25487/
https://reviews.apache.org/r/25511/

> ZooKeeperMasterContenderDetectorTest.MasterDetectorTimedoutSession is flaky
> ---
>
> Key: MESOS-1676
> URL: https://issues.apache.org/jira/browse/MESOS-1676
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.0
>Reporter: Yan Xu
>Assignee: Yan Xu
>
> {noformat:title=}
> [ RUN  ] 
> ZooKeeperMasterContenderDetectorTest.MasterDetectorTimedoutSession
> I0806 01:18:37.648684 17458 zookeeper_test_server.cpp:158] Started 
> ZooKeeperTestServer on port 42069
> 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@716: Client 
> environment:host.name=lucid
> 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@723: Client 
> environment:os.name=Linux
> 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@724: Client 
> environment:os.arch=2.6.32-64-generic
> 2014-08-06 01:18:37,650:17458(0x2b4679ca5700):ZOO_INFO@log_env@725: Client 
> environment:os.version=#128-Ubuntu SMP Tue Jul 15 08:32:40 UTC 2014
> 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@log_env@733: Client 
> environment:user.name=(null)
> 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@log_env@741: Client 
> environment:user.home=/home/jenkins
> 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@log_env@753: Client 
> environment:user.dir=/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src
> 2014-08-06 01:18:37,651:17458(0x2b4679ca5700):ZOO_INFO@zookeeper_init@786: 
> Initiating client connection, host=127.0.0.1:42069 sessionTimeout=5000 
> watcher=0x2b467450bc00 sessionId=0 sessionPasswd= context=0x1682db0 
> flags=0
> 2014-08-06 01:18:37,656:17458(0x2b468638b700):ZOO_INFO@check_events@1703: 
> initiated connection to server [127.0.0.1:42069]
> 2014-08-06 01:18:37,669:17458(0x2b468638b700):ZOO_INFO@check_events@1750: 
> session establishment complete on server [127.0.0.1:42069], 
> sessionId=0x147aa6601cf, negotiated timeout=6000
> I0806 01:18:37.671725 17486 group.cpp:313] Group process 
> (group(37)@127.0.1.1:55561) connected to ZooKeeper
> I0806 01:18:37.671758 17486 group.cpp:787] Syncing group operations: queue 
> size (joins, cancels, datas) = (0, 0, 0)
> I0806 01:18:37.671771 17486 group.cpp:385] Trying to create path '/mesos' in 
> ZooKeeper
> 2014-08-06 
> 01:18:39,101:17458(0x2b4687394700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:36197] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2014-08-06 
> 01:18:42,441:17458(0x2b4687394700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:36197] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> I0806 01:18:42.656673 17481 contender.cpp:131] Joining the ZK group
> I0806 01:18:42.662484 17484 contender.cpp:247] New candidate (id='0') has 
> entered the contest for leadership
> I0806 01:18:42.663754 17481 detector.cpp:138] Detected a new leader: (id='0')
> I0806 01:18:42.663884 17481 group.cpp:658] Trying to get 
> '/mesos/info_00' in ZooKeeper
> I0806 01:18:42.664788 17483 detector.cpp:426] A new leading master 
> (UPID=@128.150.152.0:1) is detected
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@716: Client 
> environment:host.name=lucid
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@723: Client 
> environment:os.name=Linux
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@724: Client 
> environment:os.arch=2.6.32-64-generic
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@725: Client 
> environment:os.version=#128-Ubuntu SMP Tue Jul 15 08:32:40 UTC 2014
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@733: Client 
> environment:user.name=(null)
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@741: Client 
> environment:user.home=/home/jenkins
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@log_env@753: Client 
> environment:user.dir=/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src
> 2014-08-06 01:18:42,666:17458(0x2b4679ea6700):ZOO_INFO@zookeeper_init@786: 
> Initiating client connection, host=127.0.0.1:42069 sessionTimeout=5000 
> watcher=0x2b467450bc00 sessionId=0 sessionPasswd= context=0x15c00f0 
> flags=0
> 2014-08-06 01:18:42,668:17458(0x2b4686d91700):ZOO_IN

[jira] [Resolved] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-10 Thread Timothy St. Clair (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy St. Clair resolved MESOS-1774.
--
  Resolution: Fixed
   Fix Version/s: 0.21.0
Target Version/s: 0.21.0  (was: 1.0.0, 0.20.1)

> Fix protobuf detection on systems with Python 3 as default
> --
>
> Key: MESOS-1774
> URL: https://issues.apache.org/jira/browse/MESOS-1774
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.20.0
> Environment: Gentoo Linux
> ./configure --disable-bundled
>Reporter: Kamil Domański
>Assignee: Timothy St. Clair
>  Labels: build
> Fix For: 0.21.0
>
>
> When configureing without bundled dependencies, usage of *python* symbolic 
> link in *m4/ac_python_module.m4* causes the detection of *google.protobuf* 
> module to fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1774) Fix protobuf detection on systems with Python 3 as default

2014-09-10 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128794#comment-14128794
 ] 

Timothy St. Clair commented on MESOS-1774:
--

commit ab1cf84e7beaa979cabbc2876623c7b30ad5e48b
Author: Kamil Domanski 
Date:   Wed Sep 10 12:43:58 2014 -0500

Fix protobuf detection on systems with Python 3 as default (part2)

MESOS-1774

Review: https://reviews.apache.org/r/25439


> Fix protobuf detection on systems with Python 3 as default
> --
>
> Key: MESOS-1774
> URL: https://issues.apache.org/jira/browse/MESOS-1774
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.20.0
> Environment: Gentoo Linux
> ./configure --disable-bundled
>Reporter: Kamil Domański
>Assignee: Timothy St. Clair
>  Labels: build
>
> When configureing without bundled dependencies, usage of *python* symbolic 
> link in *m4/ac_python_module.m4* causes the detection of *google.protobuf* 
> module to fail on systems with Python 3 set as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)