[jira] [Assigned] (MESOS-1733) Change the stout path utility to declare a single, variadic 'join' function instead of several separate declarations of various discrete arities

2015-06-09 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-1733:
-

Assignee: Anand Mazumdar  (was: Cody Maloney)

> Change the stout path utility to declare a single, variadic 'join' function 
> instead of several separate declarations of various discrete arities
> 
>
> Key: MESOS-1733
> URL: https://issues.apache.org/jira/browse/MESOS-1733
> Project: Mesos
>  Issue Type: Improvement
>  Components: build, stout
>Reporter: Patrick Reilly
>Assignee: Anand Mazumdar
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1733) Change the stout path utility to declare a single, variadic 'join' function instead of several separate declarations of various discrete arities

2015-06-09 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-1733:
--
Shepherd: Adam B  (was: Benjamin Hindman)

> Change the stout path utility to declare a single, variadic 'join' function 
> instead of several separate declarations of various discrete arities
> 
>
> Key: MESOS-1733
> URL: https://issues.apache.org/jira/browse/MESOS-1733
> Project: Mesos
>  Issue Type: Improvement
>  Components: build, stout
>Reporter: Patrick Reilly
>Assignee: Cody Maloney
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master

2015-06-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588666#comment-14588666
 ] 

Anand Mazumdar commented on MESOS-1988:
---

It seems that the library (src/scheduler/scheduler.cpp) already does the right 
thing by dropping calls silently. I would go ahead and nuke the second 
overload(...) that took a TaskInfo as a argument as its no longer being used in 
the code.

> Scheduler driver should not generate TASK_LOST when disconnected from master
> 
>
> Key: MESOS-1988
> URL: https://issues.apache.org/jira/browse/MESOS-1988
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: twitter
>
> Currently, the driver replies to launchTasks() with TASK_LOST if it detects 
> that it is disconnected from the master. After MESOS-1972 lands, this will be 
> the only place where driver generates TASK_LOST. See MESOS-1972 for more 
> context.
> This fix is targeted for 0.22.0 to give frameworks time to implement 
> reconciliation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master

2015-06-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588993#comment-14588993
 ] 

Anand Mazumdar commented on MESOS-1988:
---

Deleted OverLoad for review here : https://reviews.apache.org/r/35538

Left:
- Send email to dev mailing list to apprise them of the change in driver in 
(0.24?)
- Delete the relevant fragment of code that returns TASK_LOST from 
sched/sched.cpp.

> Scheduler driver should not generate TASK_LOST when disconnected from master
> 
>
> Key: MESOS-1988
> URL: https://issues.apache.org/jira/browse/MESOS-1988
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: twitter
>
> Currently, the driver replies to launchTasks() with TASK_LOST if it detects 
> that it is disconnected from the master. After MESOS-1972 lands, this will be 
> the only place where driver generates TASK_LOST. See MESOS-1972 for more 
> context.
> This fix is targeted for 0.22.0 to give frameworks time to implement 
> reconciliation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2295) Implement the Call endpoint on Slave

2015-06-18 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592131#comment-14592131
 ] 

Anand Mazumdar commented on MESOS-2295:
---

Related Shelved review from Alexander : https://reviews.apache.org/r/33824

> Implement the Call endpoint on Slave
> 
>
> Key: MESOS-2295
> URL: https://issues.apache.org/jira/browse/MESOS-2295
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: haosdent
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2295) Implement the Call endpoint on Slave

2015-06-18 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-2295:
-

Assignee: Anand Mazumdar  (was: haosdent)

> Implement the Call endpoint on Slave
> 
>
> Key: MESOS-2295
> URL: https://issues.apache.org/jira/browse/MESOS-2295
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2296) Implement the Events endpoint on slave

2015-06-18 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-2296:
-

Assignee: Anand Mazumdar  (was: haosdent)

> Implement the Events endpoint on slave
> --
>
> Key: MESOS-2296
> URL: https://issues.apache.org/jira/browse/MESOS-2296
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2906) Slave : Synchronous Validation for Calls

2015-06-22 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-2906:
-

 Summary: Slave : Synchronous Validation for Calls
 Key: MESOS-2906
 URL: https://issues.apache.org/jira/browse/MESOS-2906
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar


/call endpoint on the slave will return a 202 accepted code but has to do some 
basic validations before. In case of invalidation it will return a 4xx code.  

- We need to create the required infrastructure to validate the request and 
then process it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-06-22 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-2907:
-

 Summary: Slave : Create Basic Functionality to handle /call 
endpoint
 Key: MESOS-2907
 URL: https://issues.apache.org/jira/browse/MESOS-2907
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar


This is the first basic step in ensuring the basic /call functionality: 
processing a
POST /call
and returning:
202 if all goes well;
401 if not authorized; and
403 if the request is malformed.

Also , we might need to store some identifier which enables us to reject calls 
to /call if the executor has not issues a SUBSCRIBE/RESUBSCRIBE Request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2295) Implement the Call endpoint on Slave

2015-06-22 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596398#comment-14596398
 ] 

Anand Mazumdar commented on MESOS-2295:
---

[~marco-mesos] Just created 2 smaller tasks ( mirror image ) of the one's for 
master endpoint.

> Implement the Call endpoint on Slave
> 
>
> Key: MESOS-2295
> URL: https://issues.apache.org/jira/browse/MESOS-2295
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-06-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2907:
--
Description: 
This is the first basic step in ensuring the basic /call functionality: 
processing a
POST /call
and returning:
202 if all goes well;
401 if not authorized; and
403 if the request is malformed.

Also , we might need to store some identifier which enables us to reject calls 
to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request.

  was:
This is the first basic step in ensuring the basic /call functionality: 
processing a
POST /call
and returning:
202 if all goes well;
401 if not authorized; and
403 if the request is malformed.

Also , we might need to store some identifier which enables us to reject calls 
to /call if the executor has not issues a SUBSCRIBE/RESUBSCRIBE Request.


> Slave : Create Basic Functionality to handle /call endpoint
> ---
>
> Key: MESOS-2907
> URL: https://issues.apache.org/jira/browse/MESOS-2907
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> This is the first basic step in ensuring the basic /call functionality: 
> processing a
> POST /call
> and returning:
> 202 if all goes well;
> 401 if not authorized; and
> 403 if the request is malformed.
> Also , we might need to store some identifier which enables us to reject 
> calls to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-06-22 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596416#comment-14596416
 ] 

Anand Mazumdar commented on MESOS-2907:
---

This ticket only tracks the work done for the slave. The one for the master is 
here : https://issues.apache.org/jira/browse/MESOS-2860 . Most of the work, 
however, should be applicable to both.

> Slave : Create Basic Functionality to handle /call endpoint
> ---
>
> Key: MESOS-2907
> URL: https://issues.apache.org/jira/browse/MESOS-2907
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> This is the first basic step in ensuring the basic /call functionality: 
> processing a
> POST /call
> and returning:
> 202 if all goes well;
> 401 if not authorized; and
> 403 if the request is malformed.
> Also , we might need to store some identifier which enables us to reject 
> calls to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2708) Design doc for the Executor HTTP API

2015-06-22 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596511#comment-14596511
 ] 

Anand Mazumdar commented on MESOS-2708:
---

I would take it up. Waiting on [~arojas] to give me "editing" permissions on 
the document.

> Design doc for the Executor HTTP API
> 
>
> Key: MESOS-2708
> URL: https://issues.apache.org/jira/browse/MESOS-2708
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> This tracks the design of the Executor HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2296) Implement the Events stream on slave for Call endpoint

2015-06-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2296:
--
Summary: Implement the Events stream on slave for Call endpoint  (was: 
Implement the Events endpoint on slave)

> Implement the Events stream on slave for Call endpoint
> --
>
> Key: MESOS-2296
> URL: https://issues.apache.org/jira/browse/MESOS-2296
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2296) Implement the Events stream on slave for Call endpoint

2015-06-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2296:
--
Assignee: (was: Anand Mazumdar)

> Implement the Events stream on slave for Call endpoint
> --
>
> Key: MESOS-2296
> URL: https://issues.apache.org/jira/browse/MESOS-2296
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-06-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-2294:
-

Assignee: Anand Mazumdar

> Implement the Events stream on master for Call endpoint
> ---
>
> Key: MESOS-2294
> URL: https://issues.apache.org/jira/browse/MESOS-2294
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: twitter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master

2015-06-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-1988:
--
Fix Version/s: 0.24.0

> Scheduler driver should not generate TASK_LOST when disconnected from master
> 
>
> Key: MESOS-1988
> URL: https://issues.apache.org/jira/browse/MESOS-1988
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere, twitter
> Fix For: 0.24.0
>
>
> Currently, the driver replies to launchTasks() with TASK_LOST if it detects 
> that it is disconnected from the master. After MESOS-1972 lands, this will be 
> the only place where driver generates TASK_LOST. See MESOS-1972 for more 
> context.
> This fix is targeted for 0.22.0 to give frameworks time to implement 
> reconciliation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2552) C++ Scheduler library should send HTTP Calls to master

2015-06-29 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606163#comment-14606163
 ] 

Anand Mazumdar commented on MESOS-2552:
---

This was needed for testing the call HTTP endpoint on master. Submitted a diff 
up for review : https://reviews.apache.org/r/36013 ( is still a work in 
progress )

> C++ Scheduler library should send HTTP Calls to master
> --
>
> Key: MESOS-2552
> URL: https://issues.apache.org/jira/browse/MESOS-2552
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>
> Once the scheduler library sends Call messages, we should update it to send 
> Calls as HTTP requests to "/call" endpoint on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2552) C++ Scheduler library should send HTTP Calls to master

2015-06-29 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-2552:
-

Assignee: Anand Mazumdar

> C++ Scheduler library should send HTTP Calls to master
> --
>
> Key: MESOS-2552
> URL: https://issues.apache.org/jira/browse/MESOS-2552
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>
> Once the scheduler library sends Call messages, we should update it to send 
> Calls as HTTP requests to "/call" endpoint on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2552) C++ Scheduler library should send HTTP Calls to master

2015-06-29 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2552:
--
Sprint: Mesosphere Sprint 13
Labels: mesosphere  (was: )

> C++ Scheduler library should send HTTP Calls to master
> --
>
> Key: MESOS-2552
> URL: https://issues.apache.org/jira/browse/MESOS-2552
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Once the scheduler library sends Call messages, we should update it to send 
> Calls as HTTP requests to "/call" endpoint on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-06-29 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2294:
--
Labels: mesosphere twitter  (was: twitter)

> Implement the Events stream on master for Call endpoint
> ---
>
> Key: MESOS-2294
> URL: https://issues.apache.org/jira/browse/MESOS-2294
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere, twitter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-07-06 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615371#comment-14615371
 ] 

Anand Mazumdar commented on MESOS-2294:
---

Working to get the basic loop , client->subscribe->subscribed(response) back on 
the event stream and would send in a patch for that.

> Implement the Events stream on master for Call endpoint
> ---
>
> Key: MESOS-2294
> URL: https://issues.apache.org/jira/browse/MESOS-2294
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere, twitter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-07-06 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2294:
--
Sprint: Mesosphere Sprint 14

> Implement the Events stream on master for Call endpoint
> ---
>
> Key: MESOS-2294
> URL: https://issues.apache.org/jira/browse/MESOS-2294
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere, twitter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-07-08 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619280#comment-14619280
 ] 

Anand Mazumdar commented on MESOS-2294:
---

Initial review for just getting back a subscribed event on the stream : 
https://reviews.apache.org/r/36318/

> Implement the Events stream on master for Call endpoint
> ---
>
> Key: MESOS-2294
> URL: https://issues.apache.org/jira/browse/MESOS-2294
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere, twitter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-07-14 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2294:
--
Shepherd: Benjamin Mahler

> Implement the Events stream on master for Call endpoint
> ---
>
> Key: MESOS-2294
> URL: https://issues.apache.org/jira/browse/MESOS-2294
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2552) C++ Scheduler library should send HTTP Calls to master

2015-07-14 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2552:
--
Shepherd: Benjamin Mahler

> C++ Scheduler library should send HTTP Calls to master
> --
>
> Key: MESOS-2552
> URL: https://issues.apache.org/jira/browse/MESOS-2552
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Once the scheduler library sends Call messages, we should update it to send 
> Calls as HTTP requests to "/call" endpoint on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3067) Implement a streaming response decoder for events stream

2015-07-16 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3067:
-

 Summary: Implement a streaming response decoder for events stream
 Key: MESOS-3067
 URL: https://issues.apache.org/jira/browse/MESOS-3067
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Benjamin Mahler


We need a streaming response decoder to de-serialize chunks sent from the 
master on the events stream.

>From the HTTP API design doc:
Master encodes each Event in RecordIO format, i.e. a string representation of 
length of the event in bytes followed by JSON or binary Protobuf  (possibly 
compressed) encoded event.

As of now for getting the basic features right , this is being done in the 
test-cases:

{code}
  auto reader = response.get().reader;
  ASSERT_SOME(reader);

  Future eventFuture = reader.get().read();
  AWAIT_READY(eventFuture);

  Event event;
  event.ParseFromString(eventFuture.get());
{code}

Two things need to happen:
- We need master to emit events in RecordIO format i.e. event size followed by 
the serialized event instead of just the serialized events as is the case now.
- The decoder class should then abstract away the logic of reading the response 
and de-serializing events from the stream.

Ideally, the decoder should work with both "json" and "protobuf" responses.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2708) Design doc for the Executor HTTP API

2015-07-20 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-2708:
-

Assignee: Anand Mazumdar  (was: Alexander Rojas)

> Design doc for the Executor HTTP API
> 
>
> Key: MESOS-2708
> URL: https://issues.apache.org/jira/browse/MESOS-2708
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rojas
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> This tracks the design of the Executor HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2708) Design doc for the Executor HTTP API

2015-07-20 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633869#comment-14633869
 ] 

Anand Mazumdar commented on MESOS-2708:
---

My bad, I had missed this comment completely. I would take this up when we 
start to focus on the JIRA items for Executor HTTP API again.

> Design doc for the Executor HTTP API
> 
>
> Key: MESOS-2708
> URL: https://issues.apache.org/jira/browse/MESOS-2708
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> This tracks the design of the Executor HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2911) Add an Event message handler to scheduler library

2015-07-20 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-2911:
-

Assignee: Anand Mazumdar  (was: Benjamin Mahler)

> Add an Event message handler to scheduler library
> -
>
> Key: MESOS-2911
> URL: https://issues.apache.org/jira/browse/MESOS-2911
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>
> Adding this handler lets master send Event messages to the library.
> See MESOS-2909 for additional context.
> This ticket only tracks the installation of the handler and maybe handling of 
> a single event for testing. Additional events handling will be captured in a 
> different ticket(s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-07-20 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634057#comment-14634057
 ] 

Anand Mazumdar commented on MESOS-2294:
---

The part in review is a very small part of the JIRA i.e. just handles 
subscribe->subscribed calls. Would send in more patches for other events once 
we agree on the design/code semantics of the review I sent out.

> Implement the Events stream on master for Call endpoint
> ---
>
> Key: MESOS-2294
> URL: https://issues.apache.org/jira/browse/MESOS-2294
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2552) C++ Scheduler library should send HTTP Calls to master

2015-07-20 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634061#comment-14634061
 ] 

Anand Mazumdar commented on MESOS-2552:
---

I am pausing the progress on this one. It would be a better idea to resume work 
on this one once the master can understand calls/send events on the event 
stream.

> C++ Scheduler library should send HTTP Calls to master
> --
>
> Key: MESOS-2552
> URL: https://issues.apache.org/jira/browse/MESOS-2552
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Once the scheduler library sends Call messages, we should update it to send 
> Calls as HTTP requests to "/call" endpoint on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2552) C++ Scheduler library should send HTTP Calls to master

2015-07-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2552:
--
Sprint:   (was: Mesosphere Sprint 15)

> C++ Scheduler library should send HTTP Calls to master
> --
>
> Key: MESOS-2552
> URL: https://issues.apache.org/jira/browse/MESOS-2552
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Once the scheduler library sends Call messages, we should update it to send 
> Calls as HTTP requests to "/call" endpoint on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3119) Remove pthread specific code from Libprocess

2015-07-23 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639537#comment-14639537
 ] 

Anand Mazumdar commented on MESOS-3119:
---

Can we add some context/background here ? It's very hard to reason about the 
motivation of the changes from reviews linking to these.

> Remove pthread specific code from Libprocess
> 
>
> Key: MESOS-3119
> URL: https://issues.apache.org/jira/browse/MESOS-3119
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: libprocess, mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3169) FrameworkInfo should only be updated if the re-registration is valid

2015-07-29 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646489#comment-14646489
 ] 

Anand Mazumdar commented on MESOS-3169:
---

The `FrameworkErrorMessage` generated from `failoverFramework` is sent to the 
old scheduler. So , we should go ahead with updating the framework info 
correctly for that case as we are doing now. 

The only point of contention hence is this:
{code }else if (from != framework->pid); {code}

I guess the easiest fix would be to update the info at the end of the function 
rather then at the beginning unless I am missing something ?

> FrameworkInfo should only be updated if the re-registration is valid
> 
>
> Key: MESOS-3169
> URL: https://issues.apache.org/jira/browse/MESOS-3169
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.23.0
>Reporter: Joris Van Remoortere
>  Labels: framework, master, mesosphere
>
> See Ben Mahler's comment in https://reviews.apache.org/r/32961/
> FrameworkInfo should not be updated if the re-registration is invalid. This 
> can happen in a few cases under the branching logic, so this requires some 
> refactoring.
> Notice that a {code}FrameworkErrorMessage{code} can be generated  both inside 
> {code}else if (from != framework->pid){code} as well as from inside 
> {code}failoverFramework(framework, from);{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3169) FrameworkInfo should only be updated if the re-registration is valid

2015-07-29 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646489#comment-14646489
 ] 

Anand Mazumdar edited comment on MESOS-3169 at 7/29/15 5:48 PM:


The `FrameworkErrorMessage` generated from `failoverFramework` is sent to the 
old scheduler. So , we should go ahead with updating the framework info 
correctly for that case as we are doing now. 

The only point of contention hence is this:
{code}else if (from != framework->pid); {code}

I guess the easiest fix would be to update the info at the end of the function 
rather then at the beginning unless I am missing something ?


was (Author: anandmazumdar):
The `FrameworkErrorMessage` generated from `failoverFramework` is sent to the 
old scheduler. So , we should go ahead with updating the framework info 
correctly for that case as we are doing now. 

The only point of contention hence is this:
{code }else if (from != framework->pid); {code}

I guess the easiest fix would be to update the info at the end of the function 
rather then at the beginning unless I am missing something ?

> FrameworkInfo should only be updated if the re-registration is valid
> 
>
> Key: MESOS-3169
> URL: https://issues.apache.org/jira/browse/MESOS-3169
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.23.0
>Reporter: Joris Van Remoortere
>  Labels: framework, master, mesosphere
>
> See Ben Mahler's comment in https://reviews.apache.org/r/32961/
> FrameworkInfo should not be updated if the re-registration is invalid. This 
> can happen in a few cases under the branching logic, so this requires some 
> refactoring.
> Notice that a {code}FrameworkErrorMessage{code} can be generated  both inside 
> {code}else if (from != framework->pid){code} as well as from inside 
> {code}failoverFramework(framework, from);{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-08-02 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651324#comment-14651324
 ] 

Anand Mazumdar commented on MESOS-2294:
---

{code}
commit 90b107a249169c6fc8b8d398b675ab9bd2df633b
Author: Anand Mazumdar 
Date:   Tue Jul 28 11:53:45 2015 -0700

Updated Framework struct in master for the http api.

This change refactors the Framework struct in master to introduce
support for http frameworks:
  * 'pid' becomes a optional field.
  * Added optional 'http' field.

Review: https://reviews.apache.org/r/36318
{code}

> Implement the Events stream on master for Call endpoint
> ---
>
> Key: MESOS-2294
> URL: https://issues.apache.org/jira/browse/MESOS-2294
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-08-02 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651326#comment-14651326
 ] 

Anand Mazumdar commented on MESOS-2294:
---

Patch for subscribe->subscribed workflow : https://reviews.apache.org/r/36720

Working on re-registration equivalent for the subscribe call now.

> Implement the Events stream on master for Call endpoint
> ---
>
> Key: MESOS-2294
> URL: https://issues.apache.org/jira/browse/MESOS-2294
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-08-04 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654020#comment-14654020
 ] 

Anand Mazumdar commented on MESOS-2294:
---

Tests for subscribe/failover workflow: https://reviews.apache.org/r/37082

> Implement the Events stream on master for Call endpoint
> ---
>
> Key: MESOS-2294
> URL: https://issues.apache.org/jira/browse/MESOS-2294
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2860) Create the basic infrastructure to handle /call endpoint

2015-08-04 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654619#comment-14654619
 ] 

Anand Mazumdar commented on MESOS-2860:
---

{code}
commit 8e4d1c6e4fd1be2fe05db045f034b84bf19e04af
Author: Anand Mazumdar 
Date:   Tue Aug 4 13:16:58 2015 -0700

Added /call parsing and validation.

Review: https://reviews.apache.org/r/36720
{code}

> Create the basic infrastructure to handle /call endpoint
> 
>
> Key: MESOS-2860
> URL: https://issues.apache.org/jira/browse/MESOS-2860
> Project: Mesos
>  Issue Type: Story
>  Components: master
>Reporter: Marco Massenzio
>Assignee: Isabel Jimenez
>  Labels: mesosphere
>
> This is the first basic step in ensuring the basic {{/call}} functionality: 
> processing a 
> {noformat}
> POST /call
> {noformat}
> and returning:
> - {{202}} if all goes well;
> - {{401}} if not authorized; and
> - {{403}} if the request is malformed.
> We'll get more sophisticated as the work progressed (eg, supporting {{415}} 
> if the content-type is not of the right kind).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2552) C++ Scheduler library should send HTTP Calls to master

2015-08-11 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2552:
--
Sprint: Mesosphere Sprint 16

> C++ Scheduler library should send HTTP Calls to master
> --
>
> Key: MESOS-2552
> URL: https://issues.apache.org/jira/browse/MESOS-2552
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Once the scheduler library sends Call messages, we should update it to send 
> Calls as HTTP requests to "/call" endpoint on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2552) C++ Scheduler library should send HTTP Calls to master

2015-08-11 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682122#comment-14682122
 ] 

Anand Mazumdar commented on MESOS-2552:
---

Review Chain:
https://reviews.apache.org/r/37298
https://reviews.apache.org/r/37300
https://reviews.apache.org/r/37301

Pending:
https://reviews.apache.org/r/37302
https://reviews.apache.org/r/37303
https://reviews.apache.org/r/37304
https://reviews.apache.org/r/37328/
https://reviews.apache.org/r/37335/



> C++ Scheduler library should send HTTP Calls to master
> --
>
> Key: MESOS-2552
> URL: https://issues.apache.org/jira/browse/MESOS-2552
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Once the scheduler library sends Call messages, we should update it to send 
> Calls as HTTP requests to "/call" endpoint on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-2552) C++ Scheduler library should send HTTP Calls to master

2015-08-11 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2552:
--
Comment: was deleted

(was: This was needed for testing the call HTTP endpoint on master. Submitted a 
diff up for review : https://reviews.apache.org/r/36013 ( is still a work in 
progress ))

> C++ Scheduler library should send HTTP Calls to master
> --
>
> Key: MESOS-2552
> URL: https://issues.apache.org/jira/browse/MESOS-2552
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Once the scheduler library sends Call messages, we should update it to send 
> Calls as HTTP requests to "/call" endpoint on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3263) SchedulerTask.KillTest fails for JSON Requests

2015-08-14 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3263:
-

 Summary: SchedulerTask.KillTest fails for JSON Requests
 Key: MESOS-3263
 URL: https://issues.apache.org/jira/browse/MESOS-3263
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar
 Fix For: 0.24.0


Currently, SchedulerTests.KillTask fails when the ContentType specified is JSON 
in the request.

The crash happens in Master when it tries to process the Acknowledge call from 
client. The sent UUID escaped string in JSON from the client is unable to be 
correctly parsed by the master leading to the crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3311) SlaveTest.HTTPSchedulerSlaveRestart

2015-08-25 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-3311:
-

Assignee: Anand Mazumdar

> SlaveTest.HTTPSchedulerSlaveRestart
> ---
>
> Key: MESOS-3311
> URL: https://issues.apache.org/jira/browse/MESOS-3311
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/729/consoleFull
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: flaky-test
>
> Observed on ASF CI
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> Using temporary directory '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA'
> I0825 22:07:36.809872 27610 leveldb.cpp:176] Opened db in 3.751801ms
> I0825 22:07:36.85 27610 leveldb.cpp:183] Compacted db in 1.2194ms
> I0825 22:07:36.811175 27610 leveldb.cpp:198] Created db iterator in 30669ns
> I0825 22:07:36.811197 27610 leveldb.cpp:204] Seeked to beginning of db in 
> 7829ns
> I0825 22:07:36.811208 27610 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6017ns
> I0825 22:07:36.811245 27610 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0825 22:07:36.811722 27638 recover.cpp:449] Starting replica recovery
> I0825 22:07:36.811980 27638 recover.cpp:475] Replica is in EMPTY status
> I0825 22:07:36.813033 27641 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0825 22:07:36.813355 27635 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0825 22:07:36.813756 27628 recover.cpp:566] Updating replica status to 
> STARTING
> I0825 22:07:36.814434 27636 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 570160ns
> I0825 22:07:36.814471 27636 replica.cpp:323] Persisted replica status to 
> STARTING
> I0825 22:07:36.814743 27642 recover.cpp:475] Replica is in STARTING status
> I0825 22:07:36.814965 27638 master.cpp:378] Master 
> 20150825-220736-234885548-51219-27610 (09c6504e3a31) started on 
> 172.17.0.14:51219
> I0825 22:07:36.814999 27638 master.cpp:380] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
> --registry_strict="true" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.25.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/master" 
> --zk_session_timeout="10secs"
> I0825 22:07:36.815347 27638 master.cpp:425] Master only allowing 
> authenticated frameworks to register
> I0825 22:07:36.815371 27638 master.cpp:430] Master only allowing 
> authenticated slaves to register
> I0825 22:07:36.815402 27638 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/credentials'
> I0825 22:07:36.815634 27632 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0825 22:07:36.815752 27638 master.cpp:469] Using default 'crammd5' 
> authenticator
> I0825 22:07:36.815904 27638 master.cpp:506] Authorization enabled
> I0825 22:07:36.815979 27643 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0825 22:07:36.816185 27637 whitelist_watcher.cpp:79] No whitelist given
> I0825 22:07:36.816186 27641 hierarchical.hpp:346] Initialized hierarchical 
> allocator process
> I0825 22:07:36.816519 27630 recover.cpp:566] Updating replica status to VOTING
> I0825 22:07:36.817258 27639 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 475231ns
> I0825 22:07:36.817296 27639 replica.cpp:323] Persisted replica status to 
> VOTING
> I0825 22:07:36.817420 27637 master.cpp:1525] The newly elected leader is 
> master@172.17.0.14:51219 with id 20150825-220736-234885548-51219-27610
> I0825 22:07:36.817467 27637 master.cpp:1538] Elected as the leading master!
> I0825 22:07:36.817483 27637 master.cpp:1308] Recovering from registrar
> I0825 22:07:36.817509 27635 recover.cpp:580] Successfully joined the Paxos 
> group
> I0825 22:07:36.817708 27633 registrar

[jira] [Commented] (MESOS-3311) SlaveTest.HTTPSchedulerSlaveRestart

2015-08-25 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712113#comment-14712113
 ] 

Anand Mazumdar commented on MESOS-3311:
---

Taking a look, looks related to the re-routing of ExecutorToFrameworkMessages 
through master that we introduced for HTTP Frameworks. ( ac70a59,  9172a5f)

> SlaveTest.HTTPSchedulerSlaveRestart
> ---
>
> Key: MESOS-3311
> URL: https://issues.apache.org/jira/browse/MESOS-3311
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/729/consoleFull
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: flaky-test
>
> Observed on ASF CI
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> Using temporary directory '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA'
> I0825 22:07:36.809872 27610 leveldb.cpp:176] Opened db in 3.751801ms
> I0825 22:07:36.85 27610 leveldb.cpp:183] Compacted db in 1.2194ms
> I0825 22:07:36.811175 27610 leveldb.cpp:198] Created db iterator in 30669ns
> I0825 22:07:36.811197 27610 leveldb.cpp:204] Seeked to beginning of db in 
> 7829ns
> I0825 22:07:36.811208 27610 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6017ns
> I0825 22:07:36.811245 27610 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0825 22:07:36.811722 27638 recover.cpp:449] Starting replica recovery
> I0825 22:07:36.811980 27638 recover.cpp:475] Replica is in EMPTY status
> I0825 22:07:36.813033 27641 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0825 22:07:36.813355 27635 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0825 22:07:36.813756 27628 recover.cpp:566] Updating replica status to 
> STARTING
> I0825 22:07:36.814434 27636 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 570160ns
> I0825 22:07:36.814471 27636 replica.cpp:323] Persisted replica status to 
> STARTING
> I0825 22:07:36.814743 27642 recover.cpp:475] Replica is in STARTING status
> I0825 22:07:36.814965 27638 master.cpp:378] Master 
> 20150825-220736-234885548-51219-27610 (09c6504e3a31) started on 
> 172.17.0.14:51219
> I0825 22:07:36.814999 27638 master.cpp:380] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
> --registry_strict="true" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.25.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/master" 
> --zk_session_timeout="10secs"
> I0825 22:07:36.815347 27638 master.cpp:425] Master only allowing 
> authenticated frameworks to register
> I0825 22:07:36.815371 27638 master.cpp:430] Master only allowing 
> authenticated slaves to register
> I0825 22:07:36.815402 27638 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/credentials'
> I0825 22:07:36.815634 27632 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0825 22:07:36.815752 27638 master.cpp:469] Using default 'crammd5' 
> authenticator
> I0825 22:07:36.815904 27638 master.cpp:506] Authorization enabled
> I0825 22:07:36.815979 27643 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0825 22:07:36.816185 27637 whitelist_watcher.cpp:79] No whitelist given
> I0825 22:07:36.816186 27641 hierarchical.hpp:346] Initialized hierarchical 
> allocator process
> I0825 22:07:36.816519 27630 recover.cpp:566] Updating replica status to VOTING
> I0825 22:07:36.817258 27639 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 475231ns
> I0825 22:07:36.817296 27639 replica.cpp:323] Persisted replica status to 
> VOTING
> I0825 22:07:36.817420 27637 master.cpp:1525] The newly elected leader is 
> master@172.17.0.14:51219 with id 20150825-220736-234885548-51219-27610
> I0825 22:07:36.817467 27637 master.cpp:1538] Elected as the leading master!
> I0825 22:07:36.8174

[jira] [Commented] (MESOS-3311) SlaveTest.HTTPSchedulerSlaveRestart

2015-08-25 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712292#comment-14712292
 ] 

Anand Mazumdar commented on MESOS-3311:
---

>From the logs, the slave sends in a "retried" re-registration request that 
>triggers a FrameworkUpdateMessage again thereby re-writing the pid from 
>0.0.0.0 back to the original scheduler pid.

Just pausing the clock to disable retries should be able to fix this.

> SlaveTest.HTTPSchedulerSlaveRestart
> ---
>
> Key: MESOS-3311
> URL: https://issues.apache.org/jira/browse/MESOS-3311
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/729/consoleFull
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: flaky-test
>
> Observed on ASF CI
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> Using temporary directory '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA'
> I0825 22:07:36.809872 27610 leveldb.cpp:176] Opened db in 3.751801ms
> I0825 22:07:36.85 27610 leveldb.cpp:183] Compacted db in 1.2194ms
> I0825 22:07:36.811175 27610 leveldb.cpp:198] Created db iterator in 30669ns
> I0825 22:07:36.811197 27610 leveldb.cpp:204] Seeked to beginning of db in 
> 7829ns
> I0825 22:07:36.811208 27610 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6017ns
> I0825 22:07:36.811245 27610 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0825 22:07:36.811722 27638 recover.cpp:449] Starting replica recovery
> I0825 22:07:36.811980 27638 recover.cpp:475] Replica is in EMPTY status
> I0825 22:07:36.813033 27641 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0825 22:07:36.813355 27635 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0825 22:07:36.813756 27628 recover.cpp:566] Updating replica status to 
> STARTING
> I0825 22:07:36.814434 27636 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 570160ns
> I0825 22:07:36.814471 27636 replica.cpp:323] Persisted replica status to 
> STARTING
> I0825 22:07:36.814743 27642 recover.cpp:475] Replica is in STARTING status
> I0825 22:07:36.814965 27638 master.cpp:378] Master 
> 20150825-220736-234885548-51219-27610 (09c6504e3a31) started on 
> 172.17.0.14:51219
> I0825 22:07:36.814999 27638 master.cpp:380] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
> --registry_strict="true" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.25.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/master" 
> --zk_session_timeout="10secs"
> I0825 22:07:36.815347 27638 master.cpp:425] Master only allowing 
> authenticated frameworks to register
> I0825 22:07:36.815371 27638 master.cpp:430] Master only allowing 
> authenticated slaves to register
> I0825 22:07:36.815402 27638 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/credentials'
> I0825 22:07:36.815634 27632 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0825 22:07:36.815752 27638 master.cpp:469] Using default 'crammd5' 
> authenticator
> I0825 22:07:36.815904 27638 master.cpp:506] Authorization enabled
> I0825 22:07:36.815979 27643 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0825 22:07:36.816185 27637 whitelist_watcher.cpp:79] No whitelist given
> I0825 22:07:36.816186 27641 hierarchical.hpp:346] Initialized hierarchical 
> allocator process
> I0825 22:07:36.816519 27630 recover.cpp:566] Updating replica status to VOTING
> I0825 22:07:36.817258 27639 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 475231ns
> I0825 22:07:36.817296 27639 replica.cpp:323] Persisted replica status to 
> VOTING
> I0825 22:07:36.817420 27637 master.cpp:1525] The newly elected leader is 
> master@172.17.0.14:51219 with id 20150825-220736-234885548

[jira] [Updated] (MESOS-2708) Design doc for the Executor HTTP API

2015-08-26 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2708:
--
Sprint: Mesosphere Sprint 17

> Design doc for the Executor HTTP API
> 
>
> Key: MESOS-2708
> URL: https://issues.apache.org/jira/browse/MESOS-2708
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rojas
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> This tracks the design of the Executor HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3332) Support HTTP Pipelining in libprocess (http::post)

2015-08-28 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3332:
-

 Summary: Support HTTP Pipelining in libprocess (http::post)
 Key: MESOS-3332
 URL: https://issues.apache.org/jira/browse/MESOS-3332
 Project: Mesos
  Issue Type: Task
  Components: libprocess
Reporter: Anand Mazumdar


Currently , {{ http::post }} in libprocess, does not support HTTP pipelining. 
Each call as of know sends in the {{ Connection: close }} header, thereby, 
signaling to the server to close the TCP socket after the response.

We either need to create a new interface for supporting HTTP pipelining , or 
modify the existing {{http::post}} to do so.

This is needed for the Scheduler/Executor library implementations to make sure 
"Calls" are sent in order to the master. Currently, in order to do so, we send 
in the next request only after we have received a response for an earlier call 
that results in degraded performance.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3332) Support HTTP Pipelining in libprocess (http::post)

2015-08-28 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3332:
--
Description: 
Currently , {{http::post}} in libprocess, does not support HTTP pipelining. 
Each call as of know sends in the {{Connection: close}} header, thereby, 
signaling to the server to close the TCP socket after the response.

We either need to create a new interface for supporting HTTP pipelining , or 
modify the existing {{http::post}} to do so.

This is needed for the Scheduler/Executor library implementations to make sure 
"Calls" are sent in order to the master. Currently, in order to do so, we send 
in the next request only after we have received a response for an earlier call 
that results in degraded performance.



  was:
Currently , {{ http::post }} in libprocess, does not support HTTP pipelining. 
Each call as of know sends in the {{ Connection: close }} header, thereby, 
signaling to the server to close the TCP socket after the response.

We either need to create a new interface for supporting HTTP pipelining , or 
modify the existing {{http::post}} to do so.

This is needed for the Scheduler/Executor library implementations to make sure 
"Calls" are sent in order to the master. Currently, in order to do so, we send 
in the next request only after we have received a response for an earlier call 
that results in degraded performance.




> Support HTTP Pipelining in libprocess (http::post)
> --
>
> Key: MESOS-3332
> URL: https://issues.apache.org/jira/browse/MESOS-3332
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Reporter: Anand Mazumdar
>  Labels: mesosphere, twitter
>
> Currently , {{http::post}} in libprocess, does not support HTTP pipelining. 
> Each call as of know sends in the {{Connection: close}} header, thereby, 
> signaling to the server to close the TCP socket after the response.
> We either need to create a new interface for supporting HTTP pipelining , or 
> modify the existing {{http::post}} to do so.
> This is needed for the Scheduler/Executor library implementations to make 
> sure "Calls" are sent in order to the master. Currently, in order to do so, 
> we send in the next request only after we have received a response for an 
> earlier call that results in degraded performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3339) Implement filtering mechanism for (Scheduler API Events) Testing

2015-08-31 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3339:
-

 Summary: Implement filtering mechanism for (Scheduler API Events) 
Testing
 Key: MESOS-3339
 URL: https://issues.apache.org/jira/browse/MESOS-3339
 Project: Mesos
  Issue Type: Task
  Components: test
Reporter: Anand Mazumdar


Currently, our testing infrastructure does not have a mechanism of 
filtering/dropping HTTP events of a particular type from the Scheduler API 
response stream.  We need a {DROP_HTTP_CALLS} abstraction that can help us to 
filter a particular event type.

{code}
// Enqueues all received events into a libprocess queue.
ACTION_P(Enqueue, queue)
{
  std::queue events = arg0;
  while (!events.empty()) {
// Note that we currently drop HEARTBEATs because most of these tests
// are not designed to deal with heartbeats.
// TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
if (events.front().type() == Event::HEARTBEAT) {
  VLOG(1) << "Ignoring HEARTBEAT event";
} else {
  queue->put(events.front());
}
events.pop();
  }
}
{code}

This helper code is duplicated in at least two places currently, Scheduler 
Library/Maintenance Primitives tests. The solution can be as trivial as moving 
this helper function to a common test-header or implement a decorator reader 
class over {RecordIOReader} having the functionality of filtering events. There 
might be other alternative approaches too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3339) Implement filtering mechanism for (Scheduler API Events) Testing

2015-08-31 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3339:
--
Description: 
Currently, our testing infrastructure does not have a mechanism of 
filtering/dropping HTTP events of a particular type from the Scheduler API 
response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us to 
filter a particular event type.

{code}
// Enqueues all received events into a libprocess queue.
ACTION_P(Enqueue, queue)
{
  std::queue events = arg0;
  while (!events.empty()) {
// Note that we currently drop HEARTBEATs because most of these tests
// are not designed to deal with heartbeats.
// TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
if (events.front().type() == Event::HEARTBEAT) {
  VLOG(1) << "Ignoring HEARTBEAT event";
} else {
  queue->put(events.front());
}
events.pop();
  }
}
{code}

This helper code is duplicated in at least two places currently, Scheduler 
Library/Maintenance Primitives tests. The solution can be as trivial as moving 
this helper function to a common test-header or implement a decorator reader 
class over {{RecordIOReader}} having the functionality of filtering events. 
There might be other alternative approaches too.

  was:
Currently, our testing infrastructure does not have a mechanism of 
filtering/dropping HTTP events of a particular type from the Scheduler API 
response stream.  We need a {DROP_HTTP_CALLS} abstraction that can help us to 
filter a particular event type.

{code}
// Enqueues all received events into a libprocess queue.
ACTION_P(Enqueue, queue)
{
  std::queue events = arg0;
  while (!events.empty()) {
// Note that we currently drop HEARTBEATs because most of these tests
// are not designed to deal with heartbeats.
// TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
if (events.front().type() == Event::HEARTBEAT) {
  VLOG(1) << "Ignoring HEARTBEAT event";
} else {
  queue->put(events.front());
}
events.pop();
  }
}
{code}

This helper code is duplicated in at least two places currently, Scheduler 
Library/Maintenance Primitives tests. The solution can be as trivial as moving 
this helper function to a common test-header or implement a decorator reader 
class over {RecordIOReader} having the functionality of filtering events. There 
might be other alternative approaches too.


> Implement filtering mechanism for (Scheduler API Events) Testing
> 
>
> Key: MESOS-3339
> URL: https://issues.apache.org/jira/browse/MESOS-3339
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Anand Mazumdar
>
> Currently, our testing infrastructure does not have a mechanism of 
> filtering/dropping HTTP events of a particular type from the Scheduler API 
> response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us 
> to filter a particular event type.
> {code}
> // Enqueues all received events into a libprocess queue.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}
> This helper code is duplicated in at least two places currently, Scheduler 
> Library/Maintenance Primitives tests. The solution can be as trivial as 
> moving this helper function to a common test-header or implement a decorator 
> reader class over {{RecordIOReader}} having the functionality of filtering 
> events. There might be other alternative approaches too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3339) Implement filtering mechanism for (Scheduler API Events) Testing

2015-08-31 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3339:
--
Description: 
Currently, our testing infrastructure does not have a mechanism of 
filtering/dropping HTTP events of a particular type from the Scheduler API 
response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us to 
filter a particular event type.

{code}
// Enqueues all received events into a libprocess queue.
ACTION_P(Enqueue, queue)
{
  std::queue events = arg0;
  while (!events.empty()) {
// Note that we currently drop HEARTBEATs because most of these tests
// are not designed to deal with heartbeats.
// TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
if (events.front().type() == Event::HEARTBEAT) {
  VLOG(1) << "Ignoring HEARTBEAT event";
} else {
  queue->put(events.front());
}
events.pop();
  }
}
{code}

This helper code is duplicated in at least two places currently, Scheduler 
Library/Maintenance Primitives tests. 
- The solution can be as trivial as moving this helper function to a common 
test-header
- Implement a decorator reader class over {{RecordIOReader}} having the 
functionality of filtering events.
- Implement a {{DROP_HTTP_CALLS}} similar to what we do for other protobufs via 
{{DROP_CALLS}}.

  was:
Currently, our testing infrastructure does not have a mechanism of 
filtering/dropping HTTP events of a particular type from the Scheduler API 
response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us to 
filter a particular event type.

{code}
// Enqueues all received events into a libprocess queue.
ACTION_P(Enqueue, queue)
{
  std::queue events = arg0;
  while (!events.empty()) {
// Note that we currently drop HEARTBEATs because most of these tests
// are not designed to deal with heartbeats.
// TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
if (events.front().type() == Event::HEARTBEAT) {
  VLOG(1) << "Ignoring HEARTBEAT event";
} else {
  queue->put(events.front());
}
events.pop();
  }
}
{code}

This helper code is duplicated in at least two places currently, Scheduler 
Library/Maintenance Primitives tests. The solution can be as trivial as moving 
this helper function to a common test-header or implement a decorator reader 
class over {{RecordIOReader}} having the functionality of filtering events. 
There might be other alternative approaches too.


> Implement filtering mechanism for (Scheduler API Events) Testing
> 
>
> Key: MESOS-3339
> URL: https://issues.apache.org/jira/browse/MESOS-3339
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Anand Mazumdar
>
> Currently, our testing infrastructure does not have a mechanism of 
> filtering/dropping HTTP events of a particular type from the Scheduler API 
> response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us 
> to filter a particular event type.
> {code}
> // Enqueues all received events into a libprocess queue.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}
> This helper code is duplicated in at least two places currently, Scheduler 
> Library/Maintenance Primitives tests. 
> - The solution can be as trivial as moving this helper function to a common 
> test-header
> - Implement a decorator reader class over {{RecordIOReader}} having the 
> functionality of filtering events.
> - Implement a {{DROP_HTTP_CALLS}} similar to what we do for other protobufs 
> via {{DROP_CALLS}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3339) Implement filtering mechanism for (Scheduler API Events) Testing

2015-08-31 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723789#comment-14723789
 ] 

Anand Mazumdar commented on MESOS-3339:
---

Modified description to list it as a possible approach. I wasn't immediately 
sure if this could be done similar to the DROP_CALLS abstraction and hence had 
not specified it before.

> Implement filtering mechanism for (Scheduler API Events) Testing
> 
>
> Key: MESOS-3339
> URL: https://issues.apache.org/jira/browse/MESOS-3339
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Anand Mazumdar
>
> Currently, our testing infrastructure does not have a mechanism of 
> filtering/dropping HTTP events of a particular type from the Scheduler API 
> response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us 
> to filter a particular event type.
> {code}
> // Enqueues all received events into a libprocess queue.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}
> This helper code is duplicated in at least two places currently, Scheduler 
> Library/Maintenance Primitives tests. 
> - The solution can be as trivial as moving this helper function to a common 
> test-header
> - Implement a decorator reader class over {{RecordIOReader}} having the 
> functionality of filtering events.
> - Implement a {{DROP_HTTP_CALLS}} similar to what we do for other protobufs 
> via {{DROP_CALLS}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3339) Implement filtering mechanism for (Scheduler API Events) Testing

2015-08-31 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723789#comment-14723789
 ] 

Anand Mazumdar edited comment on MESOS-3339 at 8/31/15 6:06 PM:


Modified description to list it as a possible approach. It wasn't immediately 
obvious to me how this could be done similar to the DROP_CALLS abstraction and 
hence had not specified it before.


was (Author: anandmazumdar):
Modified description to list it as a possible approach. I wasn't immediately 
sure if this could be done similar to the DROP_CALLS abstraction and hence had 
not specified it before.

> Implement filtering mechanism for (Scheduler API Events) Testing
> 
>
> Key: MESOS-3339
> URL: https://issues.apache.org/jira/browse/MESOS-3339
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Anand Mazumdar
>
> Currently, our testing infrastructure does not have a mechanism of 
> filtering/dropping HTTP events of a particular type from the Scheduler API 
> response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us 
> to filter a particular event type.
> {code}
> // Enqueues all received events into a libprocess queue.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}
> This helper code is duplicated in at least two places currently, Scheduler 
> Library/Maintenance Primitives tests. 
> - The solution can be as trivial as moving this helper function to a common 
> test-header
> - Implement a decorator reader class over {{RecordIOReader}} having the 
> functionality of filtering events.
> - Implement a {{DROP_HTTP_CALLS}} similar to what we do for other protobufs 
> via {{DROP_CALLS}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3343) Rate Limiting functionality for HTTP Frameworks

2015-08-31 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3343:
-

 Summary: Rate Limiting functionality for HTTP Frameworks
 Key: MESOS-3343
 URL: https://issues.apache.org/jira/browse/MESOS-3343
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


We need to build rate limiting functionality for frameworks connecting via the 
Scheduler HTTP API similar to the PID based frameworks.

Link to the rate-limiting section from design doc:
https://docs.google.com/document/d/1pnIY_HckimKNvpqhKRhbc9eSItWNFT-priXh_urR-T0/edit#heading=h.kzgdk4d5fmba

- This ticket deals with refactoring the existing PID based framework 
functionality and extend it for HTTP frameworks.
- The second part of notifying the framework when rate-limiting is active i.e. 
returning a status of 429 can be undertook as part of MESOS-1664



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3355) Testing the HTTP V1 API

2015-09-01 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3355:
-

 Summary: Testing the HTTP V1 API
 Key: MESOS-3355
 URL: https://issues.apache.org/jira/browse/MESOS-3355
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


Currently, we don't yet have extensive Fault-Tolerance/Partition/Reconciliation 
tests for the HTTP V1 API. The one's in {{tests/fault_tolerance_tests.cpp, 
tests/partition_tests.cpp, tests/reconciliation_tests.cpp}} only test the old 
PID based workflow.

We need to build the functionality for testing the new V1 API. There can be 
various approaches that can help us achieve the objective:
- Implement a new driver speaking HTTP called {{MesosHTTPSchedulerDriver}} 
living in {{tests}} folder. We won't expose it to the framework developers and 
it would be just used for testing.
- Have an environment variable/constructor argument in the existing 
{{MesosSchedulerDriver}} to make it speak HTTP when the relevant flag is set.
- Modify the tests to use the Scheduler library. This can be a bit cumbersome 
as the tests have mock expectations etc defined on the old driver callbacks.

All these 3 approaches would still keep the old PID based workflow intact and 
we can implement 3 new test files like {{tests/fault_tolerance_http_tests.cpp}} 
etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3311) SlaveTest.HTTPSchedulerSlaveRestart

2015-09-02 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3311:
--
  Sprint: Mesosphere Sprint 18
Story Points: 2
  Labels: flaky-test mesosphere  (was: flaky-test)

> SlaveTest.HTTPSchedulerSlaveRestart
> ---
>
> Key: MESOS-3311
> URL: https://issues.apache.org/jira/browse/MESOS-3311
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/729/consoleFull
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: flaky-test, mesosphere
> Fix For: 0.24.0
>
>
> Observed on ASF CI
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> Using temporary directory '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA'
> I0825 22:07:36.809872 27610 leveldb.cpp:176] Opened db in 3.751801ms
> I0825 22:07:36.85 27610 leveldb.cpp:183] Compacted db in 1.2194ms
> I0825 22:07:36.811175 27610 leveldb.cpp:198] Created db iterator in 30669ns
> I0825 22:07:36.811197 27610 leveldb.cpp:204] Seeked to beginning of db in 
> 7829ns
> I0825 22:07:36.811208 27610 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6017ns
> I0825 22:07:36.811245 27610 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0825 22:07:36.811722 27638 recover.cpp:449] Starting replica recovery
> I0825 22:07:36.811980 27638 recover.cpp:475] Replica is in EMPTY status
> I0825 22:07:36.813033 27641 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0825 22:07:36.813355 27635 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0825 22:07:36.813756 27628 recover.cpp:566] Updating replica status to 
> STARTING
> I0825 22:07:36.814434 27636 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 570160ns
> I0825 22:07:36.814471 27636 replica.cpp:323] Persisted replica status to 
> STARTING
> I0825 22:07:36.814743 27642 recover.cpp:475] Replica is in STARTING status
> I0825 22:07:36.814965 27638 master.cpp:378] Master 
> 20150825-220736-234885548-51219-27610 (09c6504e3a31) started on 
> 172.17.0.14:51219
> I0825 22:07:36.814999 27638 master.cpp:380] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
> --registry_strict="true" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.25.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/master" 
> --zk_session_timeout="10secs"
> I0825 22:07:36.815347 27638 master.cpp:425] Master only allowing 
> authenticated frameworks to register
> I0825 22:07:36.815371 27638 master.cpp:430] Master only allowing 
> authenticated slaves to register
> I0825 22:07:36.815402 27638 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/SlaveTest_HTTPSchedulerSlaveRestart_CXyDrA/credentials'
> I0825 22:07:36.815634 27632 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0825 22:07:36.815752 27638 master.cpp:469] Using default 'crammd5' 
> authenticator
> I0825 22:07:36.815904 27638 master.cpp:506] Authorization enabled
> I0825 22:07:36.815979 27643 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0825 22:07:36.816185 27637 whitelist_watcher.cpp:79] No whitelist given
> I0825 22:07:36.816186 27641 hierarchical.hpp:346] Initialized hierarchical 
> allocator process
> I0825 22:07:36.816519 27630 recover.cpp:566] Updating replica status to VOTING
> I0825 22:07:36.817258 27639 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 475231ns
> I0825 22:07:36.817296 27639 replica.cpp:323] Persisted replica status to 
> VOTING
> I0825 22:07:36.817420 27637 master.cpp:1525] The newly elected leader is 
> master@172.17.0.14:51219 with id 20150825-220736-234885548-51219-27610
> I0825 22:07:36.817467 27637 master.cpp:1538] Elected as the leading master!
> I0825 22:07:36.817483 27637 master.cpp:1308] Recovering from registrar

[jira] [Commented] (MESOS-3273) EventCall Test Framework is flaky

2015-09-02 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728577#comment-14728577
 ] 

Anand Mazumdar commented on MESOS-3273:
---

Was able to reproduce this once on my machine. But the issue is not the same as 
the one seen on ASF build though as per logs. So we still need to dig around 
for what the issue on ASF test-run was.

In my test run, there is a race between the master successfully recovering its 
state from registry and the scheduler sending a call. In this case, we just log 
the error and leave it upon the framework to retry the call.

{code}
I0902 23:29:28.815498 113774592 leveldb.cpp:438] Reading position from leveldb 
took 32us
I0902 23:29:28.826355 136355840 registrar.cpp:344] Successfully fetched the 
registry (0B) in 16.811008ms
I0902 23:29:28.826472 136355840 registrar.cpp:443] Applied 1 operations in 
35us; attempting to update the 'registry'
I0902 23:29:28.826869 135819264 http.cpp:333] HTTP POST for 
/master/api/v1/scheduler from 192.168.29.132:56913
W0902 23:29:28.831881 135282688 scheduler.cpp:381] Received '503 Service 
Unavailable' () for SUBSCRIBE
{code}

[~ijimenez] [~vinodkone]

> EventCall Test Framework is flaky
> -
>
> Key: MESOS-3273
> URL: https://issues.apache.org/jira/browse/MESOS-3273
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
>Reporter: Vinod Kone
>  Labels: flaky-test, tech-debt, twitter
>
> Observed this on ASF CI. h/t [~haosd...@gmail.com]
> Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master.
> {code}
> [ RUN  ] ExamplesTest.EventCallFramework
> Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx'
> I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the 
> driver is aborted!
> Shutting down
> Sending SIGTERM to process tree at pid 26061
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26062
> Shutting down
> Killing the following process trees:
> [ 
> ]
> Sending SIGTERM to process tree at pid 26063
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26098
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26099
> Killing the following process trees:
> [ 
> ]
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on 
> 172.17.2.10:60249 for 16 cpus
> I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR
> I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0
> I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms
> I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms
> I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns
> I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in 
> 8429ns
> I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 4219ns
> I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery
> I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status
> I0813 19:55:17.181970 26126 master.cpp:378] Master 
> 20150813-195517-167907756-60249-26100 (297daca2d01a) started on 
> 172.17.2.10:60249
> I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: 
> --acls="permissive: false
> register_frameworks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   roles {
> type: SOME
> values: "*"
>   }
> }
> run_tasks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   users {
> type: SOME
> values: "mesos"
>   }
> }
> " --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" 
> --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.24.0/src/webu

[jira] [Comment Edited] (MESOS-3273) EventCall Test Framework is flaky

2015-09-03 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728577#comment-14728577
 ] 

Anand Mazumdar edited comment on MESOS-3273 at 9/3/15 6:59 AM:
---

Was able to reproduce this once on my machine. But the issue is not the same as 
the one seen on ASF build though as per logs. So we still need to dig around 
for what the issue on ASF test-run was.

In my test run, there is a race between the master successfully recovering its 
state from registry and the scheduler sending a call. In this case, we just log 
the error and leave it upon the framework to retry the call. This happened only 
because I was running in a debugger in a loop and there would have been state 
left in Master across various test invocations leading to the time in Registry 
Recovery. So , I won't worry too much about this occurrence.

{code}
I0902 23:29:28.815498 113774592 leveldb.cpp:438] Reading position from leveldb 
took 32us
I0902 23:29:28.826355 136355840 registrar.cpp:344] Successfully fetched the 
registry (0B) in 16.811008ms
I0902 23:29:28.826472 136355840 registrar.cpp:443] Applied 1 operations in 
35us; attempting to update the 'registry'
I0902 23:29:28.826869 135819264 http.cpp:333] HTTP POST for 
/master/api/v1/scheduler from 192.168.29.132:56913
W0902 23:29:28.831881 135282688 scheduler.cpp:381] Received '503 Service 
Unavailable' () for SUBSCRIBE
{code}

[~ijimenez] [~vinodkone]


was (Author: anandmazumdar):
Was able to reproduce this once on my machine. But the issue is not the same as 
the one seen on ASF build though as per logs. So we still need to dig around 
for what the issue on ASF test-run was.

In my test run, there is a race between the master successfully recovering its 
state from registry and the scheduler sending a call. In this case, we just log 
the error and leave it upon the framework to retry the call.

{code}
I0902 23:29:28.815498 113774592 leveldb.cpp:438] Reading position from leveldb 
took 32us
I0902 23:29:28.826355 136355840 registrar.cpp:344] Successfully fetched the 
registry (0B) in 16.811008ms
I0902 23:29:28.826472 136355840 registrar.cpp:443] Applied 1 operations in 
35us; attempting to update the 'registry'
I0902 23:29:28.826869 135819264 http.cpp:333] HTTP POST for 
/master/api/v1/scheduler from 192.168.29.132:56913
W0902 23:29:28.831881 135282688 scheduler.cpp:381] Received '503 Service 
Unavailable' () for SUBSCRIBE
{code}

[~ijimenez] [~vinodkone]

> EventCall Test Framework is flaky
> -
>
> Key: MESOS-3273
> URL: https://issues.apache.org/jira/browse/MESOS-3273
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
>Reporter: Vinod Kone
>  Labels: flaky-test, tech-debt, twitter
>
> Observed this on ASF CI. h/t [~haosd...@gmail.com]
> Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master.
> {code}
> [ RUN  ] ExamplesTest.EventCallFramework
> Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx'
> I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the 
> driver is aborted!
> Shutting down
> Sending SIGTERM to process tree at pid 26061
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26062
> Shutting down
> Killing the following process trees:
> [ 
> ]
> Sending SIGTERM to process tree at pid 26063
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26098
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26099
> Killing the following process trees:
> [ 
> ]
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on 
> 172.17.2.10:60249 for 16 cpus
> I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR
> I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0
> I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms
> I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms
> I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns
> I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in 
> 8429ns
> I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 4219ns
> I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery
> I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY

[jira] [Commented] (MESOS-3410) MesosContainerizerLaunchTest.ROOT_ChangeRootfs is broken/flaky

2015-09-10 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739739#comment-14739739
 ] 

Anand Mazumdar commented on MESOS-3410:
---

"ldd" is just displaying the contents of the libraries "/usr/bin/stat" was 
linked with at compile time. Does "/usr/lib64/libpcre.so.1" actually exist on 
your machine ? I am sure it does, but just re-confirming the obvious.

> MesosContainerizerLaunchTest.ROOT_ChangeRootfs is broken/flaky
> --
>
> Key: MESOS-3410
> URL: https://issues.apache.org/jira/browse/MESOS-3410
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, test
>Reporter: Kapil Arya
>
> `sudo make check` failed with the following error on my OpenSUSE Tumbleweed 
> box (Linux 4.1, gcc 5.1):
> {code}
> [--] 1 test from MesosContainerizerLaunchTest
> [ RUN  ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs
> Changing root to 
> /tmp/MesosContainerizerLaunchTest_ROOT_ChangeRootfs_4hGV7G/rootfs
> /usr/bin/stat: error while loading shared libraries: libpcre.so.1: cannot 
> open shared object file: No such file or directory
> ../../src/tests/containerizer/launch_tests.cpp:143: Failure
> Value of: *(int *) &(status))) & 0xff00) >> 8)
>   Actual: 127
> Expected: 0
> [  FAILED  ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs (1171 ms)
> {code}
> Here is the output of ldd:
> {code}
> $> ldd /usr/bin/stat
> linux-vdso.so.1 (0x7fffd83fc000)
> libselinux.so.1 => /lib64/libselinux.so.1 (0x7f80748b3000)
> libc.so.6 => /lib64/libc.so.6 (0x7f807450e000)
> libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x7f807429f000)
> libdl.so.2 => /lib64/libdl.so.2 (0x7f807409b000)
> /lib64/ld-linux-x86-64.so.2 (0x5602eb941000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f8073e7d000)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-09-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2907:
--
Description: 
This is the first basic step in ensuring the basic /call functionality: 

- Set up the route on the slave for "api/v1/executor" endpoint.
- The endpoint should perform basic header/protobuf validation and return {501 
NotImplemented} for now.
- Introduce initial tests in executor_api_tests.cpp that just verify the status 
code.


  was:
This is the first basic step in ensuring the basic /call functionality: 
processing a
POST /call
and returning:
202 if all goes well;
401 if not authorized; and
403 if the request is malformed.

Also , we might need to store some identifier which enables us to reject calls 
to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request.


> Slave : Create Basic Functionality to handle /call endpoint
> ---
>
> Key: MESOS-2907
> URL: https://issues.apache.org/jira/browse/MESOS-2907
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> This is the first basic step in ensuring the basic /call functionality: 
> - Set up the route on the slave for "api/v1/executor" endpoint.
> - The endpoint should perform basic header/protobuf validation and return 
> {501 NotImplemented} for now.
> - Introduce initial tests in executor_api_tests.cpp that just verify the 
> status code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-09-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2907:
--
Description: 
This is the first basic step in ensuring the basic /call functionality: 

- Set up the route on the slave for "api/v1/executor" endpoint.
- The endpoint should perform basic header/protobuf validation and return {{501 
NotImplemented}} for now.
- Introduce initial tests in executor_api_tests.cpp that just verify the status 
code.


  was:
This is the first basic step in ensuring the basic /call functionality: 

- Set up the route on the slave for "api/v1/executor" endpoint.
- The endpoint should perform basic header/protobuf validation and return {501 
NotImplemented} for now.
- Introduce initial tests in executor_api_tests.cpp that just verify the status 
code.



> Slave : Create Basic Functionality to handle /call endpoint
> ---
>
> Key: MESOS-2907
> URL: https://issues.apache.org/jira/browse/MESOS-2907
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> This is the first basic step in ensuring the basic /call functionality: 
> - Set up the route on the slave for "api/v1/executor" endpoint.
> - The endpoint should perform basic header/protobuf validation and return 
> {{501 NotImplemented}} for now.
> - Introduce initial tests in executor_api_tests.cpp that just verify the 
> status code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Agent : Create Basic Functionality to handle /call endpoint

2015-09-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2907:
--
Summary: Agent : Create Basic Functionality to handle /call endpoint  (was: 
Slave : Create Basic Functionality to handle /call endpoint)

> Agent : Create Basic Functionality to handle /call endpoint
> ---
>
> Key: MESOS-2907
> URL: https://issues.apache.org/jira/browse/MESOS-2907
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> This is the first basic step in ensuring the basic /call functionality: 
> - Set up the route on the slave for "api/v1/executor" endpoint.
> - The endpoint should perform basic header/protobuf validation and return 
> {{501 NotImplemented}} for now.
> - Introduce initial tests in executor_api_tests.cpp that just verify the 
> status code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-09-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2907:
--
Sprint: Mesosphere Sprint 19

> Slave : Create Basic Functionality to handle /call endpoint
> ---
>
> Key: MESOS-2907
> URL: https://issues.apache.org/jira/browse/MESOS-2907
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> This is the first basic step in ensuring the basic /call functionality: 
> - Set up the route on the slave for "api/v1/executor" endpoint.
> - The endpoint should perform basic header/protobuf validation and return 
> {{501 NotImplemented}} for now.
> - Introduce initial tests in executor_api_tests.cpp that just verify the 
> status code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Agent : Create Basic Functionality to handle /call endpoint

2015-09-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2907:
--
Description: 
This is the first basic step in ensuring the basic /call functionality: 

- Set up the route on the agent for "api/v1/executor" endpoint.
- The endpoint should perform basic header/protobuf validation and return {{501 
NotImplemented}} for now.
- Introduce initial tests in executor_api_tests.cpp that just verify the status 
code.


  was:
This is the first basic step in ensuring the basic /call functionality: 

- Set up the route on the slave for "api/v1/executor" endpoint.
- The endpoint should perform basic header/protobuf validation and return {{501 
NotImplemented}} for now.
- Introduce initial tests in executor_api_tests.cpp that just verify the status 
code.



> Agent : Create Basic Functionality to handle /call endpoint
> ---
>
> Key: MESOS-2907
> URL: https://issues.apache.org/jira/browse/MESOS-2907
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> This is the first basic step in ensuring the basic /call functionality: 
> - Set up the route on the agent for "api/v1/executor" endpoint.
> - The endpoint should perform basic header/protobuf validation and return 
> {{501 NotImplemented}} for now.
> - Introduce initial tests in executor_api_tests.cpp that just verify the 
> status code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2708) Design doc for the Executor HTTP API

2015-09-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2708:
--
Target Version/s: 0.26.0  (was: 1.0.0)

> Design doc for the Executor HTTP API
> 
>
> Key: MESOS-2708
> URL: https://issues.apache.org/jira/browse/MESOS-2708
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rojas
>Assignee: Isabel Jimenez
>  Labels: mesosphere
>
> This tracks the design of the Executor HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2295) Implement the Call endpoint on Slave

2015-09-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2295:
--
Target Version/s: 0.26.0  (was: 1.0.0)

> Implement the Call endpoint on Slave
> 
>
> Key: MESOS-2295
> URL: https://issues.apache.org/jira/browse/MESOS-2295
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2906) Slave : Synchronous Validation for Calls

2015-09-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2906:
--
Target Version/s: 0.26.0  (was: 1.0.0)

> Slave : Synchronous Validation for Calls
> 
>
> Key: MESOS-2906
> URL: https://issues.apache.org/jira/browse/MESOS-2906
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> /call endpoint on the slave will return a 202 accepted code but has to do 
> some basic validations before. In case of invalidation it will return a 4xx 
> code.  
> - We need to create the required infrastructure to validate the request and 
> then process it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Agent : Create Basic Functionality to handle /call endpoint

2015-09-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2907:
--
Target Version/s: 0.26.0  (was: 1.0.0)

> Agent : Create Basic Functionality to handle /call endpoint
> ---
>
> Key: MESOS-2907
> URL: https://issues.apache.org/jira/browse/MESOS-2907
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> This is the first basic step in ensuring the basic /call functionality: 
> - Set up the route on the agent for "api/v1/executor" endpoint.
> - The endpoint should perform basic header/protobuf validation and return 
> {{501 NotImplemented}} for now.
> - Introduce initial tests in executor_api_tests.cpp that just verify the 
> status code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2906) Slave : Synchronous Validation for Calls

2015-09-17 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2906:
--
Description: 
/call endpoint on the slave will return a 202 accepted code but has to do some 
basic validations before. In case of invalidation it will return a 
{{BadRequest}} back to the client.

- We need to create the required infrastructure to validate the request and 
then process it similar to {{src/master/validation.cpp}} in the {{namespace 
scheduler}} i.e. check if the protobuf is properly initialized, has the 
required attributes set pertaining to the call message etc.

  was:
/call endpoint on the slave will return a 202 accepted code but has to do some 
basic validations before. In case of invalidation it will return a 4xx code.  

- We need to create the required infrastructure to validate the request and 
then process it.


> Slave : Synchronous Validation for Calls
> 
>
> Key: MESOS-2906
> URL: https://issues.apache.org/jira/browse/MESOS-2906
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> /call endpoint on the slave will return a 202 accepted code but has to do 
> some basic validations before. In case of invalidation it will return a 
> {{BadRequest}} back to the client.
> - We need to create the required infrastructure to validate the request and 
> then process it similar to {{src/master/validation.cpp}} in the {{namespace 
> scheduler}} i.e. check if the protobuf is properly initialized, has the 
> required attributes set pertaining to the call message etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2296) Implement the Events stream on slave for Call endpoint

2015-09-18 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-2296:
-

Assignee: Anand Mazumdar  (was: Klaus Ma)

> Implement the Events stream on slave for Call endpoint
> --
>
> Key: MESOS-2296
> URL: https://issues.apache.org/jira/browse/MESOS-2296
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2708) Design doc for the Executor HTTP API

2015-09-18 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876681#comment-14876681
 ] 

Anand Mazumdar commented on MESOS-2708:
---

[~vinodkone] I edited the document further based on your feedback. Modified the 
Agent Recovery section. Added a Backoff Strategies section as we had discussed. 
Also, modified the name of a few environment variables so that they can be 
re-used elsewhere in future. Let me know if you have more comments.

> Design doc for the Executor HTTP API
> 
>
> Key: MESOS-2708
> URL: https://issues.apache.org/jira/browse/MESOS-2708
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rojas
>Assignee: Isabel Jimenez
>  Labels: mesosphere
>
> This tracks the design of the Executor HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3476) Refactor Status Update method on Slave to handle HTTP Executors

2015-09-20 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3476:
-

 Summary: Refactor Status Update method on Slave to handle HTTP 
Executors
 Key: MESOS-3476
 URL: https://issues.apache.org/jira/browse/MESOS-3476
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar


Currently, receiving a status update sent from slave to itself , {{runTask}} , 
{{killTask}} and status updates from executors are handled by the 
{{Slave::statusUpdate}} method on Slave. The signature of the method is {{void 
Slave::statusUpdate(StatusUpdate update, const UPID& pid)}}. 

We need to create another overload of it that can also handle HTTP based 
executors which the previous PID based function can also call into. The 
signature of the new function could be:

{{void Slave::statusUpdate(StatusUpdate update, Executor* executor)}}

The HTTP Executor would also call into this new function via 
{{src/slave/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3476) Refactor Status Update method on Slave to handle HTTP Executors

2015-09-20 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-3476:
-

Assignee: Anand Mazumdar

> Refactor Status Update method on Slave to handle HTTP Executors
> ---
>
> Key: MESOS-3476
> URL: https://issues.apache.org/jira/browse/MESOS-3476
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, receiving a status update sent from slave to itself , {{runTask}} 
> , {{killTask}} and status updates from executors are handled by the 
> {{Slave::statusUpdate}} method on Slave. The signature of the method is 
> {{void Slave::statusUpdate(StatusUpdate update, const UPID& pid)}}. 
> We need to create another overload of it that can also handle HTTP based 
> executors which the previous PID based function can also call into. The 
> signature of the new function could be:
> {{void Slave::statusUpdate(StatusUpdate update, Executor* executor)}}
> The HTTP Executor would also call into this new function via 
> {{src/slave/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3476) Refactor Status Update method on Slave to handle HTTP based Executors

2015-09-20 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3476:
--
Summary: Refactor Status Update method on Slave to handle HTTP based 
Executors  (was: Refactor Status Update method on Slave to handle HTTP 
Executors)

> Refactor Status Update method on Slave to handle HTTP based Executors
> -
>
> Key: MESOS-3476
> URL: https://issues.apache.org/jira/browse/MESOS-3476
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, receiving a status update sent from slave to itself , {{runTask}} 
> , {{killTask}} and status updates from executors are handled by the 
> {{Slave::statusUpdate}} method on Slave. The signature of the method is 
> {{void Slave::statusUpdate(StatusUpdate update, const UPID& pid)}}. 
> We need to create another overload of it that can also handle HTTP based 
> executors which the previous PID based function can also call into. The 
> signature of the new function could be:
> {{void Slave::statusUpdate(StatusUpdate update, Executor* executor)}}
> The HTTP Executor would also call into this new function via 
> {{src/slave/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3480) Refactor Executor struct in Slave to handle HTTP based executors

2015-09-21 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3480:
-

 Summary: Refactor Executor struct in Slave to handle HTTP based 
executors
 Key: MESOS-3480
 URL: https://issues.apache.org/jira/browse/MESOS-3480
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar
 Fix For: 0.26.0


Currently, the {{struct Executor}} in slave only supports executors connected 
via message passing (driver). We should refactor it to add support for HTTP 
based Executors similar to what was done for the Scheduler API {{struct 
Framework}} in {{src/master/master.hpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3476) Refactor Status Update method on Slave to handle HTTP based Executors

2015-09-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3476:
--
Issue Type: Task  (was: Bug)

> Refactor Status Update method on Slave to handle HTTP based Executors
> -
>
> Key: MESOS-3476
> URL: https://issues.apache.org/jira/browse/MESOS-3476
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, receiving a status update sent from slave to itself , {{runTask}} 
> , {{killTask}} and status updates from executors are handled by the 
> {{Slave::statusUpdate}} method on Slave. The signature of the method is 
> {{void Slave::statusUpdate(StatusUpdate update, const UPID& pid)}}. 
> We need to create another overload of it that can also handle HTTP based 
> executors which the previous PID based function can also call into. The 
> signature of the new function could be:
> {{void Slave::statusUpdate(StatusUpdate update, Executor* executor)}}
> The HTTP Executor would also call into this new function via 
> {{src/slave/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3490) Mesos UI fails to represent JSON entities

2015-09-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3490:
--
Description: 
The Mesos UI is broken, it seems to fail to represent JSON from /state.
This may have been introduced with https://reviews.apache.org/r/38028 

  was:
The Mesos UI is broken, it seems to fail to represent JSON from /state.
This may got introduced with https://reviews.apache.org/r/38028/.


> Mesos UI fails to represent JSON entities
> -
>
> Key: MESOS-3490
> URL: https://issues.apache.org/jira/browse/MESOS-3490
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Isabel Jimenez
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> The Mesos UI is broken, it seems to fail to represent JSON from /state.
> This may have been introduced with https://reviews.apache.org/r/38028 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3490) Mesos UI fails to represent JSON entities

2015-09-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3490:
--
Sprint: Mesosphere Sprint 19

> Mesos UI fails to represent JSON entities
> -
>
> Key: MESOS-3490
> URL: https://issues.apache.org/jira/browse/MESOS-3490
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Isabel Jimenez
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> The Mesos UI is broken, it seems to fail to represent JSON from /state.
> This may have been introduced with https://reviews.apache.org/r/38028 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3480) Refactor Executor struct in Slave to handle HTTP based executors

2015-09-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3480:
--
Fix Version/s: (was: 0.26.0)

> Refactor Executor struct in Slave to handle HTTP based executors
> 
>
> Key: MESOS-3480
> URL: https://issues.apache.org/jira/browse/MESOS-3480
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the {{struct Executor}} in slave only supports executors connected 
> via message passing (driver). We should refactor it to add support for HTTP 
> based Executors similar to what was done for the Scheduler API {{struct 
> Framework}} in {{src/master/master.hpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3051) performance issues with port ranges comparison

2015-09-23 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904928#comment-14904928
 ] 

Anand Mazumdar commented on MESOS-3051:
---

[~js84],  Can we file a followup JIRA based on comments by [~jieyu] to consider 
using {{IntervalSet}} in the near future ? Ignore my comments if one already 
exists.

> performance issues with port ranges comparison
> --
>
> Key: MESOS-3051
> URL: https://issues.apache.org/jira/browse/MESOS-3051
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 0.22.1
>Reporter: James Peach
>Assignee: Joerg Schad
>  Labels: mesosphere
> Fix For: 0.25.0
>
>
> Testing in an environment with lots of frameworks (>200), where the 
> frameworks permanently decline resources they don't need. The allocator ends 
> up spending a lot of time figuring out whether offers are refused (the code 
> path through {{HierarchicalAllocatorProcess::isFiltered()}}.
> In profiling a synthetic benchmark, it turns out that comparing port ranges 
> is very expensive, involving many temporary allocations. 61% of 
> Resources::contains() run time is in operator -= (Resource). 35% of 
> Resources::contains() run time is in Resources::_contains().
> The heaviest call chain through {{Resources::_contains}} is:
> {code}
> Running Time  Self (ms) Symbol Name
> 7237.0ms   35.5%  4.0
> mesos::Resources::_contains(mesos::Resource const&) const
> 7200.0ms   35.3%  1.0 mesos::contains(mesos::Resource 
> const&, mesos::Resource const&)
> 7133.0ms   35.0%121.0  
> mesos::operator<=(mesos::Value_Ranges const&, mesos::Value_Ranges const&)
> 6319.0ms   31.0%  7.0   
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Ranges const&)
> 6240.0ms   30.6%161.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
> 1867.0ms9.1% 25.0 mesos::Value_Ranges::add_range()
> 1694.0ms8.3%  4.0 
> mesos::Value_Ranges::~Value_Ranges()
> 1495.0ms7.3% 16.0 
> mesos::Value_Ranges::operator=(mesos::Value_Ranges const&)
>  445.0ms2.1% 94.0 
> mesos::Value_Range::MergeFrom(mesos::Value_Range const&)
>  154.0ms0.7% 24.0 mesos::Value_Ranges::range(int) 
> const
>  103.0ms0.5% 24.0 
> mesos::Value_Ranges::range_size() const
>   95.0ms0.4%  2.0 
> mesos::Value_Range::Value_Range(mesos::Value_Range const&)
>   59.0ms0.2%  4.0 
> mesos::Value_Ranges::Value_Ranges()
>   50.0ms0.2% 50.0 mesos::Value_Range::begin() 
> const
>   28.0ms0.1% 28.0 mesos::Value_Range::end() const
>   26.0ms0.1%  0.0 
> mesos::Value_Range::~Value_Range()
> {code}
> mesos::coalesce(Value_Ranges) gets done a lot and ends up being really 
> expensive. The heaviest parts of the inverted call chain are:
> {code}
> Running Time  Self (ms)   Symbol Name
> 3209.0ms   15.7%  3209.0  mesos::Value_Range::~Value_Range()
> 3209.0ms   15.7%  0.0  
> google::protobuf::internal::GenericTypeHandler::Delete(mesos::Value_Range*)
> 3209.0ms   15.7%  0.0   void 
> google::protobuf::internal::RepeatedPtrFieldBase::Destroy::TypeHandler>()
> 3209.0ms   15.7%  0.0
> google::protobuf::RepeatedPtrField::~RepeatedPtrField()
> 3209.0ms   15.7%  0.0 
> google::protobuf::RepeatedPtrField::~RepeatedPtrField()
> 3209.0ms   15.7%  0.0  
> mesos::Value_Ranges::~Value_Ranges()
> 3209.0ms   15.7%  0.0   
> mesos::Value_Ranges::~Value_Ranges()
> 2441.0ms   11.9%  0.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
>  452.0ms2.2%  0.0
> mesos::remove(mesos::Value_Ranges*, mesos::Value_Range const&)
>  169.0ms0.8%  0.0
> mesos::operator<=(mesos::Value_Ranges const&, mesos::Value_Ranges const&)
>   82.0ms0.4%  0.0
> mesos::operator-=(mesos::Value_Ranges&, mesos::Value_Ranges const&)
>   65.0ms0.3%  0.0
> mesos::Value_Ranges::~Value_Ranges()
> 2541.0ms   12.4%  2541.0  
> google::protobuf::internal::GenericTypeHandler::New()
> 2541.0ms   12.4%  0.0  
> google::protobuf::RepeatedPtrField::TypeHandler::Type* 
> google::protobuf::internal::RepeatedPtrFieldBase::Add::TypeHandler>()
> 2305.0ms   11.3%  0.0   
> google::protobuf::RepeatedPtrField::Add()
> 2305.0ms   1

[jira] [Created] (MESOS-3515) Support Subscribe Call for HTTP based Executors

2015-09-24 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3515:
-

 Summary: Support Subscribe Call for HTTP based Executors
 Key: MESOS-3515
 URL: https://issues.apache.org/jira/browse/MESOS-3515
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar


We need to add a {{subscribe(...)}} method in {{src/slave/slave.cpp}} to 
introduce the ability for HTTP based executors to subscribe and then receive 
events on the persistent HTTP connection. Most of the functionality needed 
would be similar to {{Master::subscribe}} in {{src/master/master.cpp}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3520) Add an abstraction to manage the life cycle of file descriptors

2015-09-25 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3520:
--
Summary: Add an abstraction to manage the life cycle of file descriptors  
(was: Added an abstraction to manage the life cycle of file descriptors)

> Add an abstraction to manage the life cycle of file descriptors
> ---
>
> Key: MESOS-3520
> URL: https://issues.apache.org/jira/browse/MESOS-3520
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chi Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3515) Support Subscribe Call for HTTP based Executors

2015-09-27 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3515:
--
  Sprint: Mesosphere Sprint 19
Story Points: 3

> Support Subscribe Call for HTTP based Executors
> ---
>
> Key: MESOS-3515
> URL: https://issues.apache.org/jira/browse/MESOS-3515
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> We need to add a {{subscribe(...)}} method in {{src/slave/slave.cpp}} to 
> introduce the ability for HTTP based executors to subscribe and then receive 
> events on the persistent HTTP connection. Most of the functionality needed 
> would be similar to {{Master::subscribe}} in {{src/master/master.cpp}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3550) Create a Executor Library based on the new Executor HTTP API

2015-09-29 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3550:
-

 Summary: Create a Executor Library based on the new Executor HTTP 
API
 Key: MESOS-3550
 URL: https://issues.apache.org/jira/browse/MESOS-3550
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


Similar to the Scheduler Library {{src/scheduler/scheduler.cpp}} , we would 
need a Executor Library that speaks the new Executor HTTP API. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3515) Support Subscribe Call for HTTP based Executors

2015-09-30 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3515:
--
Shepherd: Vinod Kone

> Support Subscribe Call for HTTP based Executors
> ---
>
> Key: MESOS-3515
> URL: https://issues.apache.org/jira/browse/MESOS-3515
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> We need to add a {{subscribe(...)}} method in {{src/slave/slave.cpp}} to 
> introduce the ability for HTTP based executors to subscribe and then receive 
> events on the persistent HTTP connection. Most of the functionality needed 
> would be similar to {{Master::subscribe}} in {{src/master/master.cpp}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3558) Make the CommandExecutor use the Executor Library speaking HTTP

2015-09-30 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3558:
-

 Summary: Make the CommandExecutor use the Executor Library 
speaking HTTP
 Key: MESOS-3558
 URL: https://issues.apache.org/jira/browse/MESOS-3558
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


Instead of using the {{MesosExecutorDriver}} , we should make the 
{{CommandExecutor}} in {{src/launcher/executor.cpp}} use the new Executor HTTP 
Library that we create in {{MESOS-3550}}. 

This would act as a good validation of the {{HTTP API}} implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3559) Make the Command Scheduler use the HTTP Scheduler Library

2015-09-30 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3559:
-

 Summary: Make the Command Scheduler use the HTTP Scheduler Library
 Key: MESOS-3559
 URL: https://issues.apache.org/jira/browse/MESOS-3559
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


We should make the Command Scheduler in {{src/cli/executor.cpp}} use the 
Scheduler Library {{src/scheduler/scheduler.cpp}} instead of the Scheduler 
Driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3562) Anomalous bytes in stream from HTTPI Api

2015-09-30 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939194#comment-14939194
 ] 

Anand Mazumdar commented on MESOS-3562:
---

[~BenWhitehead] There seems to be some confusion here. Comments Inline.

> This isn't really standard chunks though, there are chunks within chunks and 
> the configuration of the client would have to know that.
Can you elaborate a bit more on what do you mean by chunks between chunks here 
? We strictly adhere to the standard chunk encoding format defined in RFC 2616. 
The only difference here is that the {{data}} in chunks itself is encoded in 
{{RecordIO}} format.

> What is the motivation behind using recordio format ?
We wanted a way to delimit two events for JSON/Protobuf responses and RecordIO 
format allowed us to do that. We could have gone away with RecordIO for JSON 
though by just delimiting on {{\n}} but that would have made it inconsistent in 
behavior when compared to Protobuf Responses.

>  If standard encoding were used then every HTTP client would already have the 
> necessary understanding to know how to deal with the chunks.
We use standard chunk encoding as defined in RFC. What do you mean here ?

> Where is the specification for what recordio format is? I have not been able 
> to find anything online.
We should add more information on this in our docs. For now, till we do that, 
here is a brief description on what the format looks like:
{code}
5\n
hello
6\n
world!
{code}

Ideally, whatever client you are using should do the de-chunking for you. You 
should get this back from the client i.e. just the {{RecordIO}} encoded data.
{code}
104\n{"subscribed":{"framework_id":{"value":"20150930-103028-16777343-5050-11742-0028"}},"type":"SUBSCRIBED"}
{code}

cc'ing [~bmahler] If I missed anything.

> Anomalous bytes in stream from HTTPI Api
> 
>
> Key: MESOS-3562
> URL: https://issues.apache.org/jira/browse/MESOS-3562
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: Linux 3.16.7-24-desktop #1 SMP PREEMPT Mon Aug 3 
> 14:37:06 UTC 2015 (ec183cc) x86_64 x86_64 GNU/Linux
> Mesos 0.24.0
> gcc (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
>Reporter: Ben Whitehead
>Priority: Blocker
>  Labels: http, wireprotocol
> Attachments: app.log, tcpdump.log
>
>
> When connecting to the new HTTP Api and attempting to {{SUBSCRIBE}} there are 
> some anomalous bytes contained in the chunked stream that appear to be 
> causing problems when I attempting to integrate.
> Attached are two log files. app.log represents my application trying to 
> connect to mesos using RxNetty. Netty has been configured to log all data it 
> sends/receives over the wire this can be seen in the byte blocks in the log. 
> The client is constructing a protobuf in java for the subscribe call  
> {code:java}
> final Call subscribeCall = Call.newBuilder()
> .setType(Call.Type.SUBSCRIBE)
> .setSubscribe(
> Call.Subscribe.newBuilder()
> .setFrameworkInfo(
> Protos.FrameworkInfo.newBuilder()
> .setUser("bill")
> .setName("testing_this_shit_out")
> .build()
> )
> )
> .build();
> {code}
>  
> lient sends the protobuf to mesos with the following request headers:
> {code}
> POST /api/v1/scheduler HTTP/1.1
> Content-Type: application/x-protobuf
> Accept: application/json
> Content-Length: 35
> Host: localhost:5050
> User-Agent: RxNetty Client
> {code}
> The body is then serialized via protobuf and sent.
> The response from the mesos master has the following headers:
> {code}
> HTTP/1.1 200 OK
> Transfer-Encoding: chunked
> Date: Wed, 30 Sep 2015 21:07:16 GMT
> Content-Type: application/json
> {code}
> followed by 
> {code}
> \r\n\r\n6c\r\n104\n{"subscribed":{"framework_id":{"value":"20150930-103028-16777343-5050-11742-0028"}},"type":"SUBSCRIBED"}
> {code}
> The {{\r\n\r\n}} is expected for standard http bodies, how ever {{6c\r\n}} 
> doesn't appear to be attached to anything. {{104}} is the correct length of 
> the Subscribe events JSON.
> What is this extra number and why is it there?
> This is not the first time confusion has come up related to the wire format 
> for the event stream from the new http api see 
> [this|http://mail-archives.apache.org/mod_mbox/mesos-user/201508.mbox/%3c94d2c9e8-2fe8-4c11-b0d3-859dac654...@me.com%3E]
>  message from the mailing list.
> In the [Design 
> Doc|https://docs.google.com/document/d/1pnIY_HckimKNvpqhKRhbc9eSItWNFT-priXh_urR-T0/edit#]
>  there is a statement that said 
> {quote}
> All subsequent events that are relevant to

[jira] [Created] (MESOS-3566) Add a section to the Scheduler HTTP API docs around RecordIO specification

2015-09-30 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3566:
-

 Summary: Add a section to the Scheduler HTTP API docs around 
RecordIO specification
 Key: MESOS-3566
 URL: https://issues.apache.org/jira/browse/MESOS-3566
 Project: Mesos
  Issue Type: Improvement
Reporter: Anand Mazumdar


Since the {{RecordIO}} format is not that widely used, searching for it online 
does not offer much help. 
- It would be good if we can add to the docs, a small section on its 
specification for framework developers. 
- Bonus points, if we can have a simple code snippet in C++/Java on reading a 
{{RecordIO}} response to help developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3570) Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess

2015-10-01 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3570:
-

 Summary: Make Scheduler Library use HTTP Pipelining Abstraction in 
Libprocess
 Key: MESOS-3570
 URL: https://issues.apache.org/jira/browse/MESOS-3570
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar


Currently, the scheduler library sends calls in order by chaining them and 
sending them only when it has received a response for the earlier call. This 
was done because there was no HTTP Pipelining abstraction in Libprocess 
{{process::post}}.

However once {{MESOS-3332}} is resolved, we should be now able to use the new 
abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3575) V1 API java protos are not generated

2015-10-02 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3575:
--
Summary: V1 API java protos are not generated  (was: HTTP V1 API java 
protos are not generated)

> V1 API java protos are not generated
> 
>
> Key: MESOS-3575
> URL: https://issues.apache.org/jira/browse/MESOS-3575
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Joris Van Remoortere
>Priority: Blocker
>
> The java protos for the HTTP C1 api should be generated according to the 
> Makefile; however, they do not show up in the generated build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3575) V1 API java protos are not generated

2015-10-02 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3575:
--
Description: The java protos for the V1 api should be generated according 
to the Makefile; however, they do not show up in the generated build directory. 
 (was: The java protos for the HTTP C1 api should be generated according to the 
Makefile; however, they do not show up in the generated build directory.)

> V1 API java protos are not generated
> 
>
> Key: MESOS-3575
> URL: https://issues.apache.org/jira/browse/MESOS-3575
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Joris Van Remoortere
>Priority: Blocker
>
> The java protos for the V1 api should be generated according to the Makefile; 
> however, they do not show up in the generated build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3575) V1 API java protos are not generated

2015-10-02 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-3575:
-

Assignee: Anand Mazumdar

> V1 API java protos are not generated
> 
>
> Key: MESOS-3575
> URL: https://issues.apache.org/jira/browse/MESOS-3575
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>
> The java protos for the V1 api should be generated according to the Makefile; 
> however, they do not show up in the generated build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3575) V1 API java protos are not generated

2015-10-02 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941689#comment-14941689
 ] 

Anand Mazumdar commented on MESOS-3575:
---

The scope of this JIRA should be to just ensure that they get generated in the 
build folder. Hence , it should not be a blocker for 0.25.0 since we don't ship 
those artifacts. MESOS-3524 talks about how we would like to distribute the 
protobufs eventually via Maven/other channels.

[~jvanremoortere] What do you think ?

> V1 API java protos are not generated
> 
>
> Key: MESOS-3575
> URL: https://issues.apache.org/jira/browse/MESOS-3575
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>
> The java protos for the V1 api should be generated according to the Makefile; 
> however, they do not show up in the generated build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3575) V1 API java/python protos are not generated

2015-10-02 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941895#comment-14941895
 ] 

Anand Mazumdar commented on MESOS-3575:
---

I submitted a patch https://reviews.apache.org/r/38967 for stuffing the V1 
Protobufs into the existing Mesos JAR for the time being for anyone to start 
playing with the new API.

For python, the change would be much more involved. Our python infrastructure 
in the Makefile needs some serious re-work. It's a mess to say the least. I 
would file another JIRA for the issues around it that I noticed while trying to 
make this change.

> V1 API java/python protos are not generated
> ---
>
> Key: MESOS-3575
> URL: https://issues.apache.org/jira/browse/MESOS-3575
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>
> The java/python protos for the V1 api should be generated according to the 
> Makefile; however, they do not show up in the generated build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3577) OversubscriptionTest.UpdateAllocatorOnSchedulerFailover is flaky

2015-10-03 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3577:
-

 Summary: OversubscriptionTest.UpdateAllocatorOnSchedulerFailover 
is flaky
 Key: MESOS-3577
 URL: https://issues.apache.org/jira/browse/MESOS-3577
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar


Showed up on ASF

https://builds.apache.org/job/Mesos/890/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos:7,label_exp=docker%7C%7CHadoop/consoleFull

{code}
[ RUN  ] OversubscriptionTest.UpdateAllocatorOnSchedulerFailover
Using temporary directory 
'/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_y5LK6v'
I1003 20:29:03.367100 31549 leveldb.cpp:176] Opened db in 2.322276ms
I1003 20:29:03.368028 31549 leveldb.cpp:183] Compacted db in 888247ns
I1003 20:29:03.368093 31549 leveldb.cpp:198] Created db iterator in 22626ns
I1003 20:29:03.368108 31549 leveldb.cpp:204] Seeked to beginning of db in 1842ns
I1003 20:29:03.368115 31549 leveldb.cpp:273] Iterated through 0 keys in the db 
in 395ns
I1003 20:29:03.368165 31549 replica.cpp:744] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1003 20:29:03.368722 31575 recover.cpp:449] Starting replica recovery
I1003 20:29:03.369118 31575 recover.cpp:475] Replica is in EMPTY status
I1003 20:29:03.370707 31572 replica.cpp:641] Replica in EMPTY status received a 
broadcasted recover request
I1003 20:29:03.371100 31572 master.cpp:376] Master 
d4ff5e08-2202-4f3b-8fb2-5515adf9a97e (9efc27440ed0) started on 172.17.5.73:38504
I1003 20:29:03.371124 31572 master.cpp:378] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_y5LK6v/credentials"
 --framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
--work_dir="/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_y5LK6v/master"
 --zk_session_timeout="10secs"
I1003 20:29:03.371477 31572 master.cpp:423] Master only allowing authenticated 
frameworks to register
I1003 20:29:03.371496 31572 master.cpp:428] Master only allowing authenticated 
slaves to register
I1003 20:29:03.371510 31572 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_y5LK6v/credentials'
I1003 20:29:03.371841 31572 master.cpp:467] Using default 'crammd5' 
authenticator
I1003 20:29:03.371989 31572 master.cpp:504] Authorization enabled
I1003 20:29:03.372009 31580 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1003 20:29:03.372231 31568 hierarchical.hpp:468] Initialized hierarchical 
allocator process
I1003 20:29:03.372349 31579 whitelist_watcher.cpp:79] No whitelist given
I1003 20:29:03.373409 31572 recover.cpp:566] Updating replica status to STARTING
I1003 20:29:03.373558 31576 master.cpp:1603] The newly elected leader is 
master@172.17.5.73:38504 with id d4ff5e08-2202-4f3b-8fb2-5515adf9a97e
I1003 20:29:03.373670 31576 master.cpp:1616] Elected as the leading master!
I1003 20:29:03.373775 31576 master.cpp:1376] Recovering from registrar
I1003 20:29:03.374174 31579 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 611973ns
I1003 20:29:03.374233 31581 registrar.cpp:309] Recovering registrar
I1003 20:29:03.374248 31579 replica.cpp:323] Persisted replica status to 
STARTING
I1003 20:29:03.374455 31579 recover.cpp:475] Replica is in STARTING status
I1003 20:29:03.375416 31576 replica.cpp:641] Replica in STARTING status 
received a broadcasted recover request
I1003 20:29:03.375880 31575 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I1003 20:29:03.376230 31576 recover.cpp:566] Updating replica status to VOTING
I1003 20:29:03.376729 31580 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 370830ns
I1003 20:29:03.376752 31580 replica.cpp:323] Persisted replica status to VOTING
I1003 20:29:03.376893 31580 recover.cpp:580] Successfully joined the Paxos group
I1003 20:29:03.377115 31580 recover.cpp:464] Recover process terminated
I1003 20:29:03.377531 31569 log.cpp:661] Attempting to start the writer
I1003 20:29:03.378665 31583 replica.cpp:477] Replica received implicit promise 
request with proposal 1
I1003 20:29:03.379005 31583 leveldb.cpp:306] Persisting metadata 

[jira] [Created] (MESOS-3578) ProvisionerDockerLocalStoreTest.MetadataManagerInitialization is flaky

2015-10-03 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3578:
-

 Summary: 
ProvisionerDockerLocalStoreTest.MetadataManagerInitialization is flaky
 Key: MESOS-3578
 URL: https://issues.apache.org/jira/browse/MESOS-3578
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Anand Mazumdar


Showed up on ASF CI:
https://builds.apache.org/job/Mesos/881/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull

{code}
[ RUN  ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization
Using temporary directory 
'/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE'
I0929 02:36:44.066397 30457 local_puller.cpp:127] Untarring image from 
'/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE/store/staging/aZND7C'
 to 
'/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE/images/abc:latest.tar'
../../src/tests/containerizer/provisioner_docker_tests.cpp:843: Failure
(layers).failure(): Collect failed: Untar failed with exit code: exited with 
status 2
[  FAILED  ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization (181 
ms)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3579) FetcherCacheTest.LocalUncachedExtract is flaky

2015-10-03 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3579:
-

 Summary: FetcherCacheTest.LocalUncachedExtract is flaky
 Key: MESOS-3579
 URL: https://issues.apache.org/jira/browse/MESOS-3579
 Project: Mesos
  Issue Type: Bug
  Components: fetcher, test
Reporter: Anand Mazumdar


>From ASF CI:
https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console

{code}
[ RUN  ] FetcherCacheTest.LocalUncachedExtract
Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA'
I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms
I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms
I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns
I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 8807ns
I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the db 
in 6325ns
I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery
I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status
I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received a 
broadcasted recover request
I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to STARTING
I0925 19:15:39.546155 27436 master.cpp:376] Master 
c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 
172.17.1.195:41781
I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 747249ns
I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to 
STARTING
I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status
I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
--work_dir="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master" 
--zk_session_timeout="10secs"
I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing authenticated 
frameworks to register
I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing authenticated 
slaves to register
I0925 19:15:39.546632 27436 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials'
I0925 19:15:39.546931 27436 master.cpp:467] Using default 'crammd5' 
authenticator
I0925 19:15:39.547044 27436 master.cpp:504] Authorization enabled
I0925 19:15:39.547276 27441 whitelist_watcher.cpp:79] No whitelist given
I0925 19:15:39.547320 27434 hierarchical.hpp:468] Initialized hierarchical 
allocator process
I0925 19:15:39.547471 27438 replica.cpp:641] Replica in STARTING status 
received a broadcasted recover request
I0925 19:15:39.548318 27443 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I0925 19:15:39.549067 27435 recover.cpp:566] Updating replica status to VOTING
I0925 19:15:39.549115 27440 master.cpp:1603] The newly elected leader is 
master@172.17.1.195:41781 with id c8bf1c95-50f4-4832-a570-c560f0b466ae
I0925 19:15:39.549162 27440 master.cpp:1616] Elected as the leading master!
I0925 19:15:39.549190 27440 master.cpp:1376] Recovering from registrar
I0925 19:15:39.549342 27434 registrar.cpp:309] Recovering registrar
I0925 19:15:39.549666 27430 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 418187ns
I0925 19:15:39.549753 27430 replica.cpp:323] Persisted replica status to VOTING
I0925 19:15:39.550089 27442 recover.cpp:580] Successfully joined the Paxos group
I0925 19:15:39.550320 27442 recover.cpp:464] Recover process terminated
I0925 19:15:39.550904 27430 log.cpp:661] Attempting to start the writer
I0925 19:15:39.551955 27434 replica.cpp:477] Replica received implicit promise 
request with proposal 1
I0925 19:15:39.552351 27434 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 380746ns
I0925 19:15:39.552372 27434 replica.cpp:345] Persisted promised to 1
I

  1   2   3   4   5   6   7   8   9   10   >