Re: Fetcher refactor proposal

2015-11-10 Thread Tom Arnfeld
This looks like a great change, btw!

I have a quick question, how does this change affect things like the 
executable/extract bits that are available in the existing fetcher? Would that 
logic move outside of the fetcher itself, or would it live on the URI?

I’m not sure if I’ve missed something in the design doc about this, but it came 
to mind…

Tom.

> On 10 Nov 2015, at 23:45, Jie Yu  wrote:
> 
> Hi,
> 
> Fetcher was originally designed to fetch CommandInfo::URIs (e.g., executor
> binary) for executors/tasks. A recent refactor (MESOS-336
> ) added caching support to
> the fetcher. The recent work on filesystem isolation/unified containerizer (
> MESOS-2840 ) requires
> Mesos to fetch filesystem images (e.g., APPC/DOCKER images) as well. The
> natural question is: can we leverage the fetcher to fetch those filesystem
> images (and cache them accordingly)? Unfortunately, the existing fetcher
> interface is tightly coupled with CommandInfo::URIs for executors/tasks,
> making it very hard to be re-used to fetch/cache filesystem images.
> 
> Another motivation for the refactor is that we want to extend the fetcher
> to support more types of schemes. For instance, we want to support magnet
> URI to enable p2p fetching. This is in fact quite important for operating a
> large cluster (MESOS-3596 ).
> The goal here is to allow fetcher to be extended (e.g., using modules) so
> that operators can add custom fetching support.
> 
> I proposed a solution in this doc
> .
> The main idea is to decouple artifacts fetching from artifacts cache
> management. We can make artifacts fetching extensible (e.g. to support p2p
> fetching), and solve the cache management part later.
> 
> Let me know your thoughts! Thanks!
> 
> - Jie



Re: CPU soft lock up on mesos-slave

2015-08-31 Thread Tom Arnfeld
Hi Chris,




Perhaps you've run into 
https://community.nitrous.io/posts/stability-and-a-linux-oom-killer-bug. We ran 
into similar symptoms that you've described and taking the above as the cause 
solved all of our issues.




Hope this helps!



--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Mon, Aug 31, 2015 at 11:55 PM, Christopher Ketchum 
wrote:

> Hi all,
> I was running a Mesos cluster on EC2 with c4.8xlarge instance types when
> one of the status checks failed. We are running Mesos 0.22.1 on ubuntu
> 14.04, with kernel version 3.13.0-55-generic. EC2 gave us this console
> output[1]. I did some searching and found similar issues reported here[2]
> on lkml, though those logs indicated a specific task and an older kernel,
> while these logs just show mesos-slave as the causative process.
> Unfortunately, the instance was terminated so I'm not sure how much useful
> debugging can be done. Is this a known issue? We are also using a our own
> python executor, could an error there have caused this?
> [1] http://pastebin.com/NgHi8MnS
> [2] https://lkml.org/lkml/2014/9/30/498
> Thanks,
> Chris

Re: RFC: Framework <-> Executor Message Passing Optimization Removal

2015-06-30 Thread Tom Arnfeld
We're using it for streaming realtime logs to the framework. In our short-lived 
framework for building Docker images, the executor streams back stdout/stderr 
logs from the build to the client for ease of use/debugging and the 
executor->framework best-effort messaging stuff made this effortless.


--


Tom Arnfeld

Developer // DueDil

On Mon, Jun 29, 2015 at 10:48 PM, Benjamin Mahler
 wrote:

> FYI Some folks reached out off thread that they are using this optimization
> for distributed health checking of tasks. This is on the order of O(10,000)
> framework messages per second for them, which may not be possible through
> the master.
> On Tue, Jun 23, 2015 at 6:08 PM, Benjamin Mahler 
> wrote:
>> The existing Mesos API provides unreliable messaging passing for framework
>> <-> executor communication:
>>
>> --> Schedulers can call 'sendFrameworkMessage(executor, slave, data)' on
>> the driver [1], this sends a message to the executor. This has a
>> best-effort optimization to bypass the master, and send the message to the
>> slave directly.
>>
>> --> Executors can call 'sendFrameworkMessage(data)' on the driver [2],
>> which sends a message to the scheduler. This has a best-effort optimization
>> to bypass the master, and send the message to the scheduler driver directly
>> (via the slave).
>>
>> As part of the HTTP API [3], schedulers can only make Calls against the
>> master, and all Events must be streamed back on the scheduler-initiated
>> connection to the master. This means that we can no longer easily support
>> bypassing the master as an optimization.
>>
>> The plan is also to remove this optimization in the existing driver, in
>> order to conform to the upcoming Event/Call messages [4] used in the HTTP
>> API, so:
>>
>>
>> *** If anyone is relying on this best-effort optimization, please chime
>> in! ***
>>
>>
>> [1]
>> https://github.com/apache/mesos/blob/0.22.1/include/mesos/scheduler.hpp#L289
>> [2]
>> https://github.com/apache/mesos/blob/0.22.1/include/mesos/executor.hpp#L185
>> [3]
>> https://docs.google.com/document/d/1pnIY_HckimKNvpqhKRhbc9eSItWNFT-priXh_urR-T0/edit
>> [4]
>> https://github.com/apache/mesos/blob/0.22.1/include/mesos/scheduler/scheduler.proto
>>

Re: Mesos framework example in c++

2015-05-29 Thread Tom Arnfeld
It might be worth looking at the code for mesos-execute, that has some support 
for basic docker containers.



--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Fri, May 29, 2015 at 8:29 AM, Adam Bordelon  wrote:

> I don't know of a C++ framework that uses docker containers yet, but for a
> simple example C++ framework, check out the RENDLER:
> https://github.com/mesosphere/RENDLER/tree/master/cpp
> For a docker-enabled framework written in Go, try Swarm:
> https://github.com/docker/swarm/blob/master/cluster/mesos/task.go#L37
> For a Scala example with docker, see Marathon:
> https://github.com/mesosphere/marathon/blob/master/src/main/scala/mesosphere/mesos/TaskBuilder.scala
> https://github.com/mesosphere/marathon/blob/master/src/main/scala/mesosphere/marathon/state/Container.scala
> On Thu, May 28, 2015 at 4:39 AM, baotiao  wrote:
>>
>> Hi all
>>
>> I run the example docker_no_executor_framework,  But  return  No container
>> info found, skipping launch. So I think because in the code, the task don't
>> set the container. So can you give me an example how to set the docker with
>> executor in c++? or is there a framework written in c++, so I can learn for
>> it.
>>
>>
>>
>> 
>> 陈宗志
>>
>> Blog:baotiao.github.io
>>
>>
>>
>>
>>

Re: Release process for hadoop-mesos

2015-05-09 Thread Tom Arnfeld
Myriad is indeed a completely unrelated and separate project, it's for running 
YARN on Mesos as opposed to the MRv1 Stack with JobTrackers and TaskTrackers.



--


Tom Arnfeld

Developer // DueDil






On Saturday, May 9, 2015 at 10:29 am, Brian Topping , 
wrote:

signature.asc
Description: PGP signature


Re: Design doc for Mesos HTTP API

2015-05-01 Thread Tom Arnfeld
Thanks for sharing this Vinod, very clear and useful document!




Q: Could you explain in a little detail why the decision was made to use a 
single HTTP endpoint rather than something like /event (for the stream) and 
/call for making calls? It seems a little strange / contrived to me that the 
difference between sending data to the master and receiving a stream of events 
would be based on the order of the calls and via the same endpoint. For 
example, would there not be a failure case here where the initial HTTP 
connection (SUBSCRIBE) fails (perhaps due to application error) and the driver 
continue making subsequent POST requests to send messages? In this situation, 
what would happen? Would the next http request that sent a message start 
getting streamed events in the response?




Perhaps i've misread another section of the document that explains this, but 
it'd be great if you could help me understand.



--


Tom Arnfeld

Developer // DueDil






On Thursday, Apr 30, 2015 at 10:26 pm, Vinod Kone , wrote:




Welcome to the Greater Anglia Wi-Fi Service





Home

Tickets & Fares

Travel Information

Offers

Destinations

About Us

Contact Us









Map
















You are now connected





Welcome to the Wi-Fi Service


Wireless Internet access is 
available free for all First Class Customers and to all Standard Class 
Customers upon payment of a small fee. 



In order to use the service you 
will need to complete a one time registration. Thereafter you can log on simply 
by entering your email address in the box below and, for Standard Customers, 
selecting the time period you want to purchase. 



Wireless Internet access is not 
available in the Quiet Coach.








Error



An unexpected error has 
occurred

An unexpected error has 
occurred, please contact the Wi-Fi Support Team on 0845 193 0138 for further 
information and assistance.





Communication error

An error occurred while 
contacting the authentication server. Please contact the Wi-Fi Support Team on 
0845 193 0138 for further information and assistance, or try again later.





An unexpected error has 
occurred

An unexpected error has 
occurred, please contact the Wi-Fi Support Team on 0845 193 0138 for further 
information and assistance.





  

Mesos Hadoop Framework 0.1.0

2015-03-28 Thread Tom Arnfeld
Hey everyone,


I thought it best to send an email to the list before merging and tagging a 
0.1.0 release for the Hadoop on Mesos framework. This release is for a new 
feature we've been working on for quite some time, which allows Hadoop 
TaskTrackers to be semi-terminated when they are idle, without destroying any 
map output they may need to retain for running reduce tasks.


Essentially this means that over the lifetime of a job (one with more 
map/reduce tasks than the size of the cluster) the ratio of map and reduce 
slots can change, resulting in significantly better resource utilization, 
because the map slots can be freed up after they have finished doing work.


If anyone is running Hadoop on Mesos or would be kind enough to contribute to 
reviewing the code in the diff, or giving the branch a go on their cluster, 
that would be very much appreciated! We've been running the patch in production 
for several months and have seen some quite significant performance gains with 
our type of workload.


The pull request is here https://github.com/mesos/hadoop/pull/33.


Feel free to get in touch if you have any questions! Thanks!


--

Tom Arnfeld
Developer // DueDil

Re: Call for mentorship proposals, GSoC and Outreachy

2015-02-28 Thread Tom Arnfeld
Hey Dave,




This sounds quite interesting. Could you tell me a little more about what is 
involved? Sounds like something I might like to put my name down for :-)




Tom.



--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Fri, Feb 27, 2015 at 9:38 PM, Dave Lester  wrote:

> Intern season is only a few months out, and once again Twitter is
> sponsoring Google Summer of Code and Outreachy (previously known as
> the Outreach Program for Women). There's a mentorship opportunity to
> remotely work with a student intern on Apache Mesos, and I wanted to
> make sure we widely distributed a call to the community. This is
> available to all committers and solid contributors to the open
> source project.
> Anyone interested in being a GSoC mentor, and have an idea for a
> contribution a student could work on? (Note: although Twitter is helping
> sponsor a potential intern on Mesos, you don't need to be a Twitter
> employee to mentor)
> Here's a page with several example projects:
> https://github.com/twitter/twitter.github.com/wiki/Google-Summer-of-Code-2015
> If you're interested in mentoring, please let me know in the next few
> days so we can add you to our list of projects.
> And if you're a student interested in either of these intern programs,
> we hope to have the list for Mesos updated early next week. In the
> meantime, we encourage prospective interns to work through the project
> tutorials and give Mesos a spin.
> Best,

Re: Meeting notes about reservations

2015-02-21 Thread Tom Arnfeld
That all sounds great, thanks for clarifying!



--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Sat, Feb 21, 2015 at 7:35 AM, Michael Park  wrote:

> Hi Tom,
> It doesn't *preclude* that capability but it also doesn't *include* it.
> Essentially, the endpoints provide an API through which an operator can
> dynamically change operator-level reservations (currently the reservations
> are set on the individual slaves and need to go through slave reboot to
> change the reservations). Having said that, we also discussed "quota"s
> which is the feature you're describing. It's coming, stay tuned!
> MPark.
> On 20 February 2015 at 22:11, Tom Arnfeld  wrote:
>> Hi all,
>>
>>
>>
>>
>> Exciting to hear discussions about this. You mention that you need to
>> provide the slave ID while making a reservation. Does this preclude the use
>> case where "I want to reserve X CPU + RAM across the entire cluster for R"?
>>
>>
>>
>>
>> Maybe I've misunderstood. I think the ticket for this use case is
>> MESOS-1791.
>>
>>
>>
>> --
>>
>>
>> Tom Arnfeld
>>
>> Developer // DueDil
>>
>>
>>
>>
>>
>> (+44) 7525940046
>>
>> 25 Christopher Street, London, EC2A 2BS
>>
>> On Sat, Feb 21, 2015 at 1:38 AM, Jie Yu  wrote:
>>
>> > Hi,
>> > BenM, BenH, MPark and I had a sync today discussing reservations
>> > (dynamic+static) in Mesos. It was a very productive meeting and here are
>> > some notes I took from the meeting. I think all of us are on the same
>> page
>> > now and are happy with the current design.
>> > - Jie
>> > 
>> > 1) we still keep the static reservation, but we don't introduce a new
>> > reservation type for that. If role = R and reservation is none, then it's
>> > statically reserved (cannot be released by anyone).
>> > 2) /reserve and /unreserve endpoints.
>> > Specify the role, slaveid (or hostname, or both, and we validate),
>> > resources when reserving resources. It only takes resources from static *
>> > resources. The operator can reserve both OP and FW reservations.
>> > When hit the reserve endpoint, we use the offered resources and invoke
>> the
>> > offer rescind mechanism.
>> > Question: should we release dynamic reservations when a framework is
>> > shutting down? Multiple frameworks might share the same role!
>> > 3) reservation id
>> > We might want to have a reservation id (the id can be the framework id as
>> > well). The goal is to distinguish who reserved this resources if two
>> > frameworks are under the same role.
>> > 4) introduce principal for reservation
>> > The idea is that we need to track who reserved a resource so that we can
>> > decide who can unreserve it. The principal and ACLs can help us setup
>> those
>> > rules for deciding who can unreserve what.
>> > 5) protobuf
>> > message Resources {
>> >   ...
>> >   message Reservation {
>> >  optional string principal;
>> >  optional string reservation_id;  // introduce it later maybe
>> >   }
>> >   ...
>> >   optional string role [ Default = "*" ];
>> >   optional Reservation reservation;
>> > }
>>

Re: Meeting notes about reservations

2015-02-20 Thread Tom Arnfeld
Hi all,




Exciting to hear discussions about this. You mention that you need to provide 
the slave ID while making a reservation. Does this preclude the use case where 
"I want to reserve X CPU + RAM across the entire cluster for R"?




Maybe I've misunderstood. I think the ticket for this use case is MESOS-1791.



--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Sat, Feb 21, 2015 at 1:38 AM, Jie Yu  wrote:

> Hi,
> BenM, BenH, MPark and I had a sync today discussing reservations
> (dynamic+static) in Mesos. It was a very productive meeting and here are
> some notes I took from the meeting. I think all of us are on the same page
> now and are happy with the current design.
> - Jie
> 
> 1) we still keep the static reservation, but we don't introduce a new
> reservation type for that. If role = R and reservation is none, then it's
> statically reserved (cannot be released by anyone).
> 2) /reserve and /unreserve endpoints.
> Specify the role, slaveid (or hostname, or both, and we validate),
> resources when reserving resources. It only takes resources from static *
> resources. The operator can reserve both OP and FW reservations.
> When hit the reserve endpoint, we use the offered resources and invoke the
> offer rescind mechanism.
> Question: should we release dynamic reservations when a framework is
> shutting down? Multiple frameworks might share the same role!
> 3) reservation id
> We might want to have a reservation id (the id can be the framework id as
> well). The goal is to distinguish who reserved this resources if two
> frameworks are under the same role.
> 4) introduce principal for reservation
> The idea is that we need to track who reserved a resource so that we can
> decide who can unreserve it. The principal and ACLs can help us setup those
> rules for deciding who can unreserve what.
> 5) protobuf
> message Resources {
>   ...
>   message Reservation {
>  optional string principal;
>  optional string reservation_id;  // introduce it later maybe
>   }
>   ...
>   optional string role [ Default = "*" ];
>   optional Reservation reservation;
> }

Re: Scaling Proposal: MAINTAINERS Files

2015-02-08 Thread Tom Arnfeld
This sounds really interesting Ben - I'm definitely +1 to the idea.




The only question that comes up in my mind is, are files/areas of the code base 
segmented enough at the moment for this to be useful?



--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Sun, Feb 8, 2015 at 10:52 AM, Benjamin Mahler
 wrote:

> Hi all,
> I have been chatting with a few committers and we'd like to consider adding
> the concept of MAINTAINERS files to coincide with our "shepherds" concept,
> introduced here:
> http://mail-archives.apache.org/mod_mbox/mesos-dev/201404.mbox/%3ccafeoqnwjibkayurkf0mfxve2usd5d91xpoe8u+pktiyvszv...@mail.gmail.com%3E
> Please take a moment to read that thread and its responses here in which
> maintainers are alluded to:
> http://mail-archives.apache.org/mod_mbox/mesos-dev/201404.mbox/%3cca+a2mtvc61-3idxtm-ghgcxekqxwz063ouhpbrgbpvsa9zs...@mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/mesos-dev/201404.mbox/%3CCAAkWvAxegdg8+QQ4-sqZ-SKi9J=2WJDCVg_Sc9aaHttS4=6...@mail.gmail.com%3E
> *Motivation:*
> To re-iterate from that thread, many companies rely on Mesos as the
> foundational layer of their software infrastructure stack. Much of the
> success of Mesos can be attributed to our focus on quality (code that is
> simple / easy to read and understand, high attention to detail, thorough
> reviewing, good testing practices, managing technical debt, learning from
> each other, etc).
> As the community of contributors has grown, it's become increasingly
> difficult to ensure that people are able to find reviewers with experience
> in specific areas of the project. Good contributions often fall through the
> cracks as a result of the lack of clarity around this.
> We would like to ensure that reviewers with context and a long-term outlook
> on the particular area of the code are involved in providing feedback. It
> can be difficult for a contributor to consider the implications of their
> change, when they are looking to get a bug fixed or a feature implemented
> before the next release or the end of a sprint.
> We'd like to be able to add more and more committers as the community
> grows, and incentivize them to become responsible maintainers of components
> as they become more involved in the project.
> *MAINTAINERS file system:*
> In order to ensure we can maintain the quality of the code as we grow, we'd
> like to propose adding an MAINTAINERS file system to the source tree.
> From the chromium mailing list (s/OWNERS/MAINTAINERS/):
> *"A MAINTAINERS file lives in a directory and describes (in simple list
> form) whose review is required to commit changes to it. MAINTAINERShip
> inherits, in that someone listed at a higher level in the tree is capable
> of reviewing changes to lower level files.*
> *MAINTAINERS files provide a means for people to find engineers experienced
> in developing specific areas for code reviews. They are designed to help
> ensure changes don't fall through the cracks and get appropriate scrutiny.
> MAINTAINERShip is a responsibility and people designated as MAINTAINERS in
> a given area are responsible for the long term improvement of that area,
> and reviewing code in that area."*
> This would be enforced via our review tooling (post-reviews.py / reviewbot,
> apply-review.py), and a git commit hook if possible.
> There would be a process for becoming a maintainer, the details of which we
> will clarify in a follow up. I’m thinking it will require an existing
> maintainer proposing a candidate to become a maintainer based on merit.
> Merit is not about quantity of work, it means doing things the community
> values in a way that the community values.
> As part of this, we would be documenting qualities we look for in
> committers and maintainers.
> *Feedback:*
> The goal with this is to be even more inclusive than we are today while
> maintaining the quality of our code and design decisions.
> I'm a +1 for this approach, and I would like to hear from others. What do
> you like about this? What are potential concerns? Much of this was thought
> about in terms of how to further the following of the Apache Way for Mesos,
> any concerns there? Take your time to mull this over, your feedback would
> be much appreciated.
> If this does sound good to everyone at a high level, I will follow up with
> further discussion to formalize this, and I’ll work to document and
> implement it.
> Ben

Re: GPU computing resource add into Mesos

2015-01-26 Thread Tom Arnfeld
Chester, you can specify arbitrary resources using the --resources flag to the 
slave and Mesos will share out the resources to frameworks, and then your 
framework can do as it pleases.


I'm not sure any changes are required in Mesos itself to support this, unless 
I'm missing something.


--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Mon, Jan 26, 2015 at 6:15 AM, Chester Kuo 
wrote:

> Hi All
> I'd like to extend and add new feature into Mesos to support GPU
> resource allocation, so we can put OpenCL application/framework on top
> of Mesos and make it write once run across cluster.
> Why choose OpenCL, due to it was widely supported by Intel , Nvidia,
> AMD, Qualcomm GPGPU, so we may extended to have other framework (ex:
> Spark) can try to utilize GPGPU computing resource.
> Any Comments?
> Chester

Re: mesos-dns

2015-01-25 Thread Tom Arnfeld
Thanks for sharing! Does this project by any chance utilise the new (not sure 
if merged or released) service discovery protobufs in Mesos?

--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Sat, Jan 24, 2015 at 5:20 PM, Christos Kozyrakis
 wrote:

> A few days ago, we open sourced a simple DNS server for Mesos clusters that
> automatically draws information from the Mesos master(s). It should be
> helpful for service discovery in multi-framework clusters.
> The code: https://github.com/mesosphere/mesos-dns
> Docs: https://mesosphere.github.com/mesos-dns
> Looking forward your feedback
> The Mesosphere team
> http://www.mesosphere.com

Re: Mesos Community Meetings

2015-01-05 Thread Tom Arnfeld
+1 also! Very interesting to hear what’s being discussed. +1 on the google 
hangouts if these meetings are happening in person so we can listen along.



--


Tom Arnfeld

Developer // DueDil






On Monday, Dec 29, 2014 at 4:12 pm, Chris Aniszczyk , 
wrote:

+1 to opening up meetings! How about create a google calendar with the 
meetings, agenda and info? 


Also someone should take meeting minutes and publish them to the list after 
each meeting for those who can't attend (on top of making information more 
discoverable via search).



Another approach is to use IRC meetings which there's a bot to record meetings, 
but that lacks the visual aspect of GH (e.g., see IRC meeting notes from 
Aurora: 
http://mail-archives.apache.org/mod_mbox/incubator-aurora-dev/201412.mbox/%3C20141201192131.4888419FD5%40urd.zones.apache.org%3E)




Anyways, glad to see this finally happening.





On Mon, Dec 29, 2014 at 7:46 AM, Niklas Nielsen  wrote:Hi 
everyone,


Mesosphere and Twitter has been meeting up regularly to brief and discuss

current joint efforts in the Mesos project.

While this has worked great for the engineering teams, it should be a

community wide meeting as we discuss our agendas, timelines etc. which is

useful for a broader audience.

Unfortunately, we cannot host people on-site, but we can open Google

hangouts for all upcoming meetings.


Any thoughts or suggestions?


Best regards,

Niklas






-- 
Cheers,

Chris Aniszczyk
http://aniszczyk.org
+1 512 961 6719

Re: Review Request 29437: Bug fix: Start the executor registration timer, only when the container has launched successfully

2015-01-05 Thread Tom Arnfeld


> On Jan. 2, 2015, 10:23 p.m., Ben Mahler wrote:
> > What happened previously if a launch takes forever?
> > 
> > What happens now if a launch takes forever?
> 
> Timothy Chen wrote:
> Ben that's a good point, previously the registration timeout gates the 
> launch not to take forever, however it is too coarse of a timeout as it 
> included the time to launch and also the time it take for the executor to 
> register after launch.
> 
> Now it will only time out on the latter case, I think we do need a 
> seperate timeout that checks for the launch itself. Ben what do you think?
> 
> Nishant Suneja wrote:
> I concur with Timothy. I think having a separate timeout to track the 
> container launch itself makes sense.
> 
> Timothy Chen wrote:
> Hi Nishant, can you also add another timer for gating the launch? Once we 
> have both PRs then I think we are in a better state to merge both.

Just caught wind of this change and it's good to see! Going to be very useful 
over here. Can I request that there's also a command line flag for the slave to 
change the launch timeout? :-)


- Tom


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29437/#review66557
---


On Dec. 31, 2014, 11:57 p.m., Nishant Suneja wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29437/
> ---
> 
> (Updated Dec. 31, 2014, 11:57 p.m.)
> 
> 
> Review request for mesos and Timothy Chen.
> 
> 
> Bugs: MESOS-999
> https://issues.apache.org/jira/browse/MESOS-999
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> As part of this bug fix, I have trigerred the executor registration timeout 
> timer after the container's future object is set, instead of starting the 
> timer when the container launch is still pending
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp 50b57819b55bdcdb9f49f20648199badc4d3f37b 
>   src/tests/composing_containerizer_tests.cpp 
> 5ab5a36cadb7f8622bad0c5814e9a5fb338753ad 
>   src/tests/containerizer.hpp 24b014f44d9eec56840e18cf39fbf9100f2c0711 
>   src/tests/slave_tests.cpp f2896a1fc4521452e29fd261a6f117372345dcfc 
> 
> Diff: https://reviews.apache.org/r/29437/diff/
> 
> 
> Testing
> ---
> 
> Added the unit test : SlaveTest::ExecutorRegistrationTimeoutTrigger
> make check succeeds.
> 
> 
> Thanks,
> 
> Nishant Suneja
> 
>



Re: [VOTE] Release Apache Mesos 0.21.1 (rc2)

2014-12-30 Thread Tom Arnfeld
+1

--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Wed, Dec 31, 2014 at 12:16 AM, Ankur Chauhan 
wrote:

> +1
> Sent from my iPhone
>> On Dec 30, 2014, at 16:01, Tim Chen  wrote:
>> 
>> Hi all,
>> 
>> Just a reminder the vote is up for another 2 hours, let me know if any of 
>> you have any objections.
>> 
>> Thanks,
>> 
>> Tim
>> 
>>> On Mon, Dec 29, 2014 at 5:32 AM, Niklas Nielsen  
>>> wrote:
>>> +1, Compiled and tested on Ubuntu Trusty, CentOS Linux 7 and Mac OS X
>>> 
>>> Thanks guys!
>>> Niklas
>>> 
>>> 
>>>> On 19 December 2014 at 22:02, Tim Chen  wrote:
>>>> Hi Ankur,
>>>> 
>>>> Since MESOS-1711 is just a minor improvement I'm inclined to include it 
>>>> for the next major release which shouldn't be too far away from this 
>>>> release.
>>>> 
>>>> If anyone else thinks otherwise please let me know.
>>>> 
>>>> Tim
>>>> 
>>>>> On Fri, Dec 19, 2014 at 12:44 PM, Ankur Chauhan  
>>>>> wrote:
>>>>> Sorry for a late join in can we get 
>>>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/MESOS-1711 in 
>>>>> too or is it too late?
>>>>> -- ankur 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On Dec 19, 2014, at 12:23, Tim Chen  wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> Please vote on releasing the following candidate as Apache Mesos 0.21.1.
>>>>>> 
>>>>>> 
>>>>>> 0.21.1 includes the following:
>>>>>> 
>>>>>> * This is a bug fix release.
>>>>>> 
>>>>>> ** Bug
>>>>>>   * [MESOS-2047] Isolator cleanup failures shouldn't cause TASK_LOST.
>>>>>>   * [MESOS-2071] Libprocess generates invalid HTTP
>>>>>>   * [MESOS-2147] Large number of connections slows statistics.json 
>>>>>> responses.
>>>>>>   * [MESOS-2182] Performance issue in libprocess SocketManager.
>>>>>> 
>>>>>> ** Improvement
>>>>>>   * [MESOS-1925] Docker kill does not allow containers to exit gracefully
>>>>>>   * [MESOS-2113] Improve configure to find apr and svn libraries/headers 
>>>>>> in OSX
>>>>>> 
>>>>>> The CHANGELOG for the release is available at:
>>>>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.21.1-rc2
>>>>>> 
>>>>>> 
>>>>>> The candidate for Mesos 0.21.1 release is available at:
>>>>>> https://dist.apache.org/repos/dist/dev/mesos/0.21.1-rc2/mesos-0.21.1.tar.gz
>>>>>> 
>>>>>> The tag to be voted on is 0.21.1-rc2:
>>>>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.21.1-rc2
>>>>>> 
>>>>>> The MD5 checksum of the tarball can be found at:
>>>>>> https://dist.apache.org/repos/dist/dev/mesos/0.21.1-rc2/mesos-0.21.1.tar.gz.md5
>>>>>> 
>>>>>> The signature of the tarball can be found at:
>>>>>> https://dist.apache.org/repos/dist/dev/mesos/0.21.1-rc2/mesos-0.21.1.tar.gz.asc
>>>>>> 
>>>>>> The PGP key used to sign the release is here:
>>>>>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>>>>> 
>>>>>> The JAR is up in Maven in a staging repository here:
>>>>>> https://repository.apache.org/content/repositories/orgapachemesos-1046
>>>>>> 
>>>>>> Please vote on releasing this package as Apache Mesos 0.21.1!
>>>>>> 
>>>>>> The vote is open until Tue Dec 23 18:00:00 PST 2014 and passes if a 
>>>>>> majority of at least 3 +1 PMC votes are cast.
>>>>>> 
>>>>>> [ ] +1 Release this package as Apache Mesos 0.21.1
>>>>>> [ ] -1 Do not release this package because ...
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Tim & Till
>> 

Re: Review Request 28339: Do not return error if removing bind mount fails.

2014-12-06 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28339/#review64167
---

Ship it!


Ship It!

- Tom Arnfeld


On Nov. 21, 2014, 6:46 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28339/
> ---
> 
> (Updated Nov. 21, 2014, 6:46 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-2047
> https://issues.apache.org/jira/browse/MESOS-2047
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> See ticket for details.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/isolators/network/port_mapping.cpp 
> 3755413b566726a11d584c5149b55c20ab9619da 
> 
> Diff: https://reviews.apache.org/r/28339/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Bug fix release 0.21.1

2014-12-06 Thread Tom Arnfeld
After having a quick catch up with @tillt and @tnachen the following two 
tickets have been suggested to go into 0.21.1.


- https://issues.apache.org/jira/browse/MESOS-1925 (Docker kill does not allow 
containers to exit gracefully)
- https://issues.apache.org/jira/browse/MESOS-2113 (Improve configure to find 
apr and svn libraries/headers in OSX)


After flicking through the currently resolved (and earmarked for 0.22.0) 
tickets I'd also like to suggest the fix relating to libprocess http headers as 
IMO anything that can help speed up pure driver implementations the better.


- https://issues.apache.org/jira/browse/MESOS-2071 (Libprocess generates 
invalid HTTP)


Does anyone have any issue with these three tickets, or is there anything else 
that's being worked on / fixed that could go into this release?


Tom.

--

Tom Arnfeld
Developer // DueDil

Re: Rocket

2014-12-01 Thread Tom Arnfeld
+1 Sounds exciting!


--


Tom Arnfeld

Developer // DueDil

On Mon, Dec 1, 2014 at 8:03 PM, Jie Yu  wrote:

> Sounds great Tim!
> Do you know if they have published an API for the rocket toolset? Are we
> gonna rely on the command line interface?
> - Jie
> On Mon, Dec 1, 2014 at 11:10 AM, Tim Chen  wrote:
>> Hi all,
>>
>> Per the announcement from CoreOS about Rocket (
>> https://coreos.com/blog/rocket/) , it seems to be an exciting
>> containerizer runtime that has composable isolation/components, better
>> security and image specification/distribution.
>>
>> All of these design goals also fits very well into Mesos, where in Mesos
>> we also have a pluggable isolators model and have been experiencing some
>> pain points with our existing containerizers around image distribution and
>> security as well.
>>
>> I'd like to propose to integrate Rocket into Mesos with a new Rocket
>> containerizer, where I can see we can potentially integrate our existing
>> isolators into Rocket runtime.
>>
>> Like to learn what you all think,
>>
>> Thanks!
>>

Re: MESOS-2150: Service discovery info for tasks and executors

2014-11-22 Thread Tom Arnfeld
Hi Christos,


This is absolutely fantastic to see. The design you've proposed here is 
somehing a few members of our team here have been discussing, and we were very 
close to putting together a design doc to share with the community, it's great 
you've done this!




Regarding the design, for us you've hit the nail on the head. If you mix this 
with the ability to plug custom modules in that respond to hooks, Mesos can 
seamlessly integrate with any service discovery system as desired. Our original 
idea was to implement some protobufs like you've done here, but use our 
external containeriser to achieve announcements. A little hacky though.





One thing I would say though, restricting the "levels" or "namespaces" for the 
services would be far from preferable. By this I mean specifying named 
parameters for "environment" and "location". Different infrastructures have 
multiple levels of grouping, for example...




- /ec2/{eu-west}/{1}/{my-vpc}/{my-az/my-subnet}/{my-cluster}/{my-service}

- /{london}/{my-subnet}/{my-cluster}/{my-service}




Perhaps an ordered list of environment or location labels might allow users to 
be more flexible?





Also, it strikes me that a lot of this information is actually quite localised 
to the slave itself, and therefore the cluster. For example, you could probably 
tag the slave with a good chunk of the default discovery properties 
("ec2/{eu-west}/1/{my-vpc}/{my-az/my-subnet}/{my-cluster}") without the 
framework having to provide them. This would be especially useful when running 
slaves across multiple geographic locations or sites, where you want that 
information to be exposed in the discovery system but not to the framework.





I think it'd be great for frameworks to be provided with as little information 
as possible as that will help reduce the barrier for adding new frameworks to a 
cluster.




Tom.





--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Sat, Nov 22, 2014 at 2:25 AM, Christos Kozyrakis
 wrote:

> Hi everybody,
> I have created MESOS-2150
> <https://issues.apache.org/jira/browse/MESOS-2150> about
> adding service discovery info to tasks and executors at Mesos. A proposed
> design doc is available here
> <https://docs.google.com/document/d/1tpnjfHsa5Joka23CqgGppqnK0jODcElBvTFUBBO-A38/edit?usp=sharing>.
> The motivation is the following:
> --
> Mesos enables flexible deployment of tasks in a shared cluster. A task may
> run on any slave and even move between slaves based on resource
> availability, framework shares, slave failures, and other constraints. To
> make the most of this flexibility, we need an automatic way to discover
> where tasks are and how to connect to the services they provide. To address
> this need, a number of service discovery systems have been proposed, based
> on proxies, DNS, or consistent stores such as Zookeeper and etcd.
> Any service discovery system needs to draw information about currently
> running tasks, their location, and their configuration parameters (IP
> address, ports, etc). In a Mesos cluster with multiple frameworks, the only
> authoritative source of information about running tasks is the master
> itself. In order to automatically manage the service discovery system, the
> task information available in the master should include service discovery
> preferences and parameters.
> If service discovery information is not available in the Mesos master,
> Mesos users will likely have to build auxiliary systems that gather and
> serve this information. Keeping the information in such auxiliary systems
> consistent with the task information in the master will only cause
> complications in the long term. It is best to store service discovery
> information along with all the other task information in the Mesos master.
> --
> Looking forward to your feedback,
> Christos

Troubles compiling Mesos 0.21.0 with G++-4.7

2014-11-21 Thread Tom Arnfeld
Hey,


I've just tried to get Mesos 0.21.0 compiled and up and running for a master 
node and am running into compiler errors around g++ 4.7.


The configure phase is not passing with the following output - 
https://gist.github.com/tarnfeld/de177c0948e82cc2b6bf.


I've been compiling mesos 0.19.1 fine with this environment, so I was wondering 
if anyone would be able to detail the changes (if any) that have been made, and 
the new requirements. I'm hesitant to upgrade/downgrade GCC/G++ if that's not 
the issue.


It might also be useful to document these environment compatibility changes in 
the release notes or on the blog to make life simpler for mesos users.


Cheers,


Tom.

--

Tom Arnfeld
Developer // DueDil


(+44) 7525940046
25 Christopher Street, London, EC2A 2BS

Re: [proposal] Module extension: hooks

2014-11-21 Thread Tom Arnfeld
This all sounds really great, and opens up some interesting opportunities for 
automated service discovery (well, the announcement side) for a cluster which 
is what we've been looking into for a while.




Correct me if I'm wrong, but would it be possible to make use of the master log 
to achieve an event stream? I'm not entirely sure what's stored in the shared 
master transaction log but I'm assume some state about tasks etc? If there were 
to be a stream of events, it'd be great to support rewinding and replaying for 
some period of time better allow for HA stream consumers.




Either way, hooks would be a welcomed feature for us!


--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Fri, Nov 21, 2014 at 6:44 AM, Vinod Kone  wrote:

> Good points Ben.
> Also, I've been recently thinking about an events endpoint (not to confuse
> with the Event/Call API) that could stream all kinds of events happening
> the cluster (master events, allocator events, gc events, slave events,
> containerizer events etc). In fact this could probably be exposed by
> libprocess very easily. I was mainly thinking about this in terms of
> auditing. Having such an endpoint would allow external tooling to "hook"
> into that endpoint and consume the event stream. The tooling could then
> perform arbitrary actions *without interfering* with mesos control flow. I
> think such an architecture would be powerful because it is generic and
> non-invasive. Have you considered that approach?
> On Thu, Nov 20, 2014 at 10:24 PM, Benjamin Mahler > wrote:
>> Thanks for sending this Nik!
>>
>> The general idea of hooks sounds good. I think the question for hooks is
>> about which extensibility points make sense, and I think we'll have to
>> assess that with the introduction of each hook.
>>
>> (1) Is the idea behind hooks about actions, as you initially mentioned? Or
>> is it about data transformation, which is what is shown in the API example?
>> Or both?
>>
>> (2) Is external tooling meant to describe hooks? Or is it meant to describe
>> external tools that can leverage the hooks? This part is a bit fuzzy to me.
>>
>> (3) Is instrumentation meant to allow us to gain visibility into things
>> like performance? If so, hooks might not be the most maintainable approach
>> for that. Ideally we could add instrumentation into libprocess. Are there
>> other forms of instrumentation in mind?
>>
>> Let's take the hook example you showed:
>>
>>  // Performs an action and/or transforms the TaskInfo.
>>  virtual TaskInfo preMasterLaunchTask(const TaskInfo& task) = 0;
>>  virtual TaskInfo postMasterLaunchTask(const TaskInfo& task) = 0;
>>  virtual TaskInfo preSlaveLaunchTask(const TaskInfo& task) = 0;
>>  virtual TaskInfo postSlaveLaunchTask(const TaskInfo& task) = 0;
>>
>> Comment mine. This interface suggests synchronous transformation of
>> TaskInfo objects:
>>
>> (A) A transformation of TaskInfo seems a bit surprising to me, how can one
>> do this generically? Is the idea that this would be customized per
>> framework within the hook? How would one differentiate the frameworks? Via
>> role? This part seems fuzzy to me.
>>
>> (B) I assume this also means that there is a side-effect inducing "action"
>> that is performed, in addition to the transformation. I wouldn't be able to
>> do any expensive or asynchronous work through these, unless we made them
>> return Futures. At which point, we would need some additional semantics
>> (e.g. ordering), and we'd be adding complexity to the Master.
>>
>> (C) What differentiates pre and post in this case? Sending the message?
>> Let's consider that these are responsible for performing "actions". Then
>> differentiating pre and post seems a bit arbitrary, since the sending of a
>> message is asynchronous. This means that the "action" occurs after the
>> message has been handed to libprocess, but not before it is sent to the
>> socket, not before it is sent over the wire, not before it is received by
>> the slave, etc. Seems like an odd distinction, no?
>>
>> Looking forward to hearing more, thanks Nik!
>>
>> FYI I'm about to go on vacation, so I'm going to be slow at email. :)
>>
>> On Thu, Nov 20, 2014 at 10:07 AM, Dominic Hamon 
>> wrote:
>>
>> > Do you have specific use cases in mind? Ie, specific actions that might
>> > take place pre and post launch?
>> >
>> > On Thu, Nov 20, 2014 at 9:37 AM, Niklas Nielsen 
>> > wrote:

Re: [VOTE] Release Apache Mesos 0.21.0 (rc1)

2014-11-06 Thread Tom Arnfeld
+1




`make check` passed on Ubuntu 12.04 LTS (kernel 3.2.0-67)


--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Thu, Nov 6, 2014 at 8:43 PM, Ian Downes 
wrote:

> Apologies: I used support/tag.sh but had a local branch *and* local tag and
> it pushed the branch only.
> $ git ls-remote --tags origin-wip | grep 0.21.0
> a7733493dc9e6f2447f825671d8a745602c9bf7a refs/tags/0.21.0-rc1
> On Thu, Nov 6, 2014 at 8:11 AM, Tim St Clair  wrote:
>> $ git tag -l | grep 21
>>
>> $ git branch -r
>>   origin/0.21.0-rc1
>>
>> It looks like you created a branch vs. tag ...?
>>
>> Cheers,
>> Tim
>>
>> - Original Message -
>> > From: "Ian Downes" 
>> > To: dev@mesos.apache.org, u...@mesos.apache.org
>> > Sent: Wednesday, November 5, 2014 5:12:52 PM
>> > Subject: [VOTE] Release Apache Mesos 0.21.0 (rc1)
>> >
>> > Hi all,
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 0.21.0.
>> >
>> >
>> > 0.21.0 includes the following:
>> >
>> 
>> > State reconciliation for frameworks
>> > Support for Mesos modules
>> > Task status now includes source and reason
>> > A shared filesystem isolator
>> > A pid namespace isolator
>> >
>> > The CHANGELOG for the release is available at:
>> >
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.21.0-rc1
>> >
>> 
>> >
>> > The candidate for Mesos 0.21.0 release is available at:
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/0.21.0-rc1/mesos-0.21.0.tar.gz
>> >
>> > The tag to be voted on is 0.21.0-rc1:
>> >
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.21.0-rc1
>> >
>> > The MD5 checksum of the tarball can be found at:
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/0.21.0-rc1/mesos-0.21.0.tar.gz.md5
>> >
>> > The signature of the tarball can be found at:
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/0.21.0-rc1/mesos-0.21.0.tar.gz.asc
>> >
>> > The PGP key used to sign the release is here:
>> > https://dist.apache.org/repos/dist/release/mesos/KEYS
>> >
>> > The JAR is up in Maven in a staging repository here:
>> > https://repository.apache.org/content/repositories/orgapachemesos-1038
>> >
>> > Please vote on releasing this package as Apache Mesos 0.21.0!
>> >
>> > The vote is open until Sat Nov  8 15:09:48 PST 2014 and passes if a
>> > majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Mesos 0.21.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > Thanks,
>> >
>> > Ian Downes
>> >
>>
>> --
>> Cheers,
>> Timothy St. Clair
>> Red Hat Inc.
>>

Re: Review Request 22169: Added External Containerizer documentation.

2014-11-05 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22169/#review59951
---

Ship it!


LGTM! Yeah, relative URLs would be a good idea.

- Tom Arnfeld


On Sept. 5, 2014, 8:48 a.m., Till Toenshoff wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22169/
> ---
> 
> (Updated Sept. 5, 2014, 8:48 a.m.)
> 
> 
> Review request for mesos and Tom Arnfeld.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Adds a markdown document describing the ExternalContainerizer.
> 
> 
> Diffs
> -
> 
>   docs/external-containerizer.md PRE-CREATION 
>   docs/home.md 179a164 
>   docs/images/ec_kill_seqdiag.png PRE-CREATION 
>   docs/images/ec_launch_seqdiag.png PRE-CREATION 
>   docs/images/ec_lifecycle_seqdiag.png PRE-CREATION 
>   docs/images/ec_orphan_seqdiag.png PRE-CREATION 
>   docs/images/ec_recover_seqdiag.png PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22169/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>



Re: Review Request 27516: Rebased and re-edited patch for MESOS-1316: "Abstracted out invoking 'mesos-fetcher'".

2014-11-05 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27516/#review59950
---

Ship it!


- Tom Arnfeld


On Nov. 3, 2014, 4:36 p.m., Bernd Mathiske wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27516/
> ---
> 
> (Updated Nov. 3, 2014, 4:36 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-1316
> https://issues.apache.org/jira/browse/MESOS-1316
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Manually rebasing and re-editing https://reviews.apache.org/r/21233/, which 
> is supposed to be replaced now by this patch. 
> 
> Original description: "To test the mesos-fetcher (and the setting of the 
> environment) more cleanly I did some refactoring into a 'fetcher' namespace."
> 
> Also moved fetcher environment tests to fetcher test file. Added two fetcher 
> tests.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am e6a07150c10b9fa040143e394b2f913a18eeebc1 
>   src/launcher/fetcher.cpp 9323c28237010fa065ef34d74435c151ded530a8 
>   src/slave/containerizer/fetcher.hpp PRE-CREATION 
>   src/slave/containerizer/fetcher.cpp PRE-CREATION 
>   src/slave/containerizer/mesos/containerizer.cpp 
> d4b08f54d6feb453f3a9d27ca54c867176e62102 
>   src/tests/containerizer_tests.cpp 2c90d2fc18a3268c55b6dfe98699bfb36d093983 
>   src/tests/fetcher_tests.cpp d7754009a59fedb43e3422c56b3a786ce80164aa 
> 
> Diff: https://reviews.apache.org/r/27516/diff/
> 
> 
> Testing
> ---
> 
> make check on Mac OS 10.10 and Ubuntu 14.4.
> 
> In total, 3 tests fail: ExamplesTest.NoExecutorFramework, 
> ExamplesTest.JavaFramework
> , ExamplesTest.PythonFramework. It is strongly suspected that those are 
> unrelated to this code change and just generally flaky.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>



Re: Review Request 21277: Passed CommandInfo to mesos-fetcher as JSON.

2014-11-05 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21277/#review59949
---



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/21277/#comment101260>

Do you think it would be OK if we could keep support for the 
MESOS_EXECUTOR_URIS environment variable for now, at least for the next release 
or two?

Mainly because the mesos-fetcher tool is actually being uesd by other 
things outside of the mesos slave itself, e.g external containerizers.

Any thoughts?


- Tom Arnfeld


On May 9, 2014, 7:05 p.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21277/
> ---
> 
> (Updated May 9, 2014, 7:05 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Dominic Hamon, and Tom Arnfeld.
> 
> 
> Bugs: MESOS-1248
> https://issues.apache.org/jira/browse/MESOS-1248
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> See summary (and bug).
> 
> 
> Diffs
> -
> 
>   src/launcher/fetcher.cpp 8c9e20da8f39eb5e90403a5093cbea7fb2680468 
>   src/slave/fetcher.hpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21277/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Cutting 0.21.0.

2014-11-01 Thread Tom Arnfeld
Has there been any further discussion on getting a release candidate for
0.21.0 cut?

On 22 October 2014 22:12, Ian Downes  wrote:

> Please note that we're targeting to cut this release next Wednesday, 29
> October.
>
> Ian
>
> On Wed, Oct 22, 2014 at 1:33 PM, Vinod Kone  wrote:
>
> > Can everyone who has ticket(s) that they absolutely want to get in 0.21.0
> > please mark them with "target version" as 0.21.0? That will make it easy
> to
> > track the blockers.
> >
> > On Wed, Oct 22, 2014 at 11:44 AM, Adam Bordelon 
> > wrote:
> >
> > > I'd also like to see more of the modules work land in 0.21, especially
> > the
> > > Authenticator module (MESOS-1889).
> > > I expect it to land in less than a week, but I don't know what your
> > > timeframe is for 0.21.
> > >
> > > On Wed, Oct 22, 2014 at 11:22 AM, Ian Downes
>  > >
> > > wrote:
> > >
> > > > Can someone please volunteer to shepherd this work and comment on the
> > > state
> > > > of the review?
> > > >
> > > > On Tue, Oct 21, 2014 at 9:01 PM, R.B. Boyer 
> > > > wrote:
> > > >
> > > > > Can someone see if MESOS-1873
> > > > >  is suitable for
> > > > 0.21.0?
> > > > >
> > > > > The patch is super simple 
> and
> > > > fixes
> > > > > a
> > > > > showstopper bug in the command executor.
> > > > >
> > > > > On Tue, Oct 21, 2014 at 10:52 PM, Benjamin Hindman <
> > > > b...@eecs.berkeley.edu
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Awesome, thanks Ben/Ian!
> > > > > >
> > > > > > We've got some Docker updates that we want to land in 0.21.0. My
> > > > estimate
> > > > > > is it will land sometime this week, or early next week.
> > > > > >
> > > > > > On Tue, Oct 21, 2014 at 6:51 PM, Benjamin Mahler <
> > > > > > benjamin.mah...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > We would like to cut 0.21.0 very soon to release the task
> > > > > reconciliation
> > > > > > > work that has been completed recently. I spoke with Ian Downes
> > who
> > > > was
> > > > > > > willing to be the release manager.
> > > > > > >
> > > > > > > I will let him reply here with a ticket and with other features
> > > that
> > > > > have
> > > > > > > made it in the 0.21.0 release.
> > > > > > >
> > > > > > > Please reply to this thread if you have anything that you think
> > > needs
> > > > > to
> > > > > > > land in 0.21.0!
> > > > > > >
> > > > > > > Ben
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: scheduler.killExecutor()

2014-10-01 Thread Tom Arnfeld
That's true, if the scheduler waits until the control task is RUNNING before 
doing anything else, this problem goes away. There's also then no need to rely 
on the order tasks are launched on the executor.



Thanks everyone!

On Tue, Sep 30, 2014 at 5:51 PM, Benjamin Mahler
 wrote:

> Why can't the executor just commit suicide if all running tasks are killed?
> If you're simultaneously launching two tasks for each executor, you'll only 
> see this race if you kill very quickly after launching. Your scheduler is 
> informed when both tasks are running as well, so that could gate the flexing 
> down.
> Anything I'm missing?
> Sent from my iPhone
>> On Sep 30, 2014, at 12:42 AM, Tom Arnfeld  wrote:
>> 
>> Thanks Vinod. I missed that issue when searching!
>> 
>> 
>> I did consider sending a shutdown task, though my worry was that there may 
>> be cases where the task might not launch. Perhaps due to resource starvation 
>> and/or no offers being received. Presumably it would not be correct to store 
>> the original OfferId and launch a new task from that offer, as it *could* be 
>> days old.
>> 
>>> On Tue, Sep 30, 2014 at 2:10 AM, Vinod Kone  wrote:
>>> 
>>> Adding a shutdownExecutor() driver call has been discussed before.
>>> https://issues.apache.org/jira/browse/MESOS-330
>>> As a work around, have you considered sending a special "kill" task as a
>>> signal to the executor to commit suicide?
>>>> On Mon, Sep 29, 2014 at 5:27 PM, Tom Arnfeld  wrote:
>>>> Hi,
>>>> 
>>>> I've been making some modifications to the Hadoop framework recently and
>>>> have come up against a brick wall. I'm wondering if the concept of killing
>>>> an executor from a framework has been discussed before?
>>>> 
>>>> Currently we are launching two tasks for each Hadoop TaskTracker, one that
>>>> has a bit of CPU and all the memory, and then another with the rest of the
>>>> CPU. In total this equals the amount of resources we want to give each
>>>> TaskTracker. This is *kind of* how spark works, ish.
>>>> 
>>>> The reason we do this is to be able to free up CPU resources and remove
>>>> slots from a TaskTracker (killing it half dead) but keeping the executor
>>>> alive. At some undefined point in the future we then want to kill the
>>>> executor, this happens by killing the other "control" task.
>>>> 
>>>> This approach doesn't work very well in practice as a result of
>>>> https://issues.apache.org/jira/browse/MESOS-1812 which means tasks are not
>>>> launched in order on the slave, so there is no way to guarantee the control
>>>> task comes up first, which leads to all sorts of interesting races.
>>>> 
>>>> Is this is bad road to go down? I can't use framework messages as I don't
>>>> believe those are a reliable way of sending signals, so not sure where else
>>>> to turn.
>>>> 
>>>> Cheers,
>>>> 
>>>> Tom.

Re: scheduler.killExecutor()

2014-09-30 Thread Tom Arnfeld
Thanks Vinod. I missed that issue when searching!


I did consider sending a shutdown task, though my worry was that there may be 
cases where the task might not launch. Perhaps due to resource starvation 
and/or no offers being received. Presumably it would not be correct to store 
the original OfferId and launch a new task from that offer, as it *could* be 
days old.

On Tue, Sep 30, 2014 at 2:10 AM, Vinod Kone  wrote:

> Adding a shutdownExecutor() driver call has been discussed before.
> https://issues.apache.org/jira/browse/MESOS-330
> As a work around, have you considered sending a special "kill" task as a
> signal to the executor to commit suicide?
> On Mon, Sep 29, 2014 at 5:27 PM, Tom Arnfeld  wrote:
>> Hi,
>>
>> I've been making some modifications to the Hadoop framework recently and
>> have come up against a brick wall. I'm wondering if the concept of killing
>> an executor from a framework has been discussed before?
>>
>> Currently we are launching two tasks for each Hadoop TaskTracker, one that
>> has a bit of CPU and all the memory, and then another with the rest of the
>> CPU. In total this equals the amount of resources we want to give each
>> TaskTracker. This is *kind of* how spark works, ish.
>>
>> The reason we do this is to be able to free up CPU resources and remove
>> slots from a TaskTracker (killing it half dead) but keeping the executor
>> alive. At some undefined point in the future we then want to kill the
>> executor, this happens by killing the other "control" task.
>>
>> This approach doesn't work very well in practice as a result of
>> https://issues.apache.org/jira/browse/MESOS-1812 which means tasks are not
>> launched in order on the slave, so there is no way to guarantee the control
>> task comes up first, which leads to all sorts of interesting races.
>>
>> Is this is bad road to go down? I can't use framework messages as I don't
>> believe those are a reliable way of sending signals, so not sure where else
>> to turn.
>>
>> Cheers,
>>
>> Tom.
>>

scheduler.killExecutor()

2014-09-29 Thread Tom Arnfeld
Hi,

I've been making some modifications to the Hadoop framework recently and
have come up against a brick wall. I'm wondering if the concept of killing
an executor from a framework has been discussed before?

Currently we are launching two tasks for each Hadoop TaskTracker, one that
has a bit of CPU and all the memory, and then another with the rest of the
CPU. In total this equals the amount of resources we want to give each
TaskTracker. This is *kind of* how spark works, ish.

The reason we do this is to be able to free up CPU resources and remove
slots from a TaskTracker (killing it half dead) but keeping the executor
alive. At some undefined point in the future we then want to kill the
executor, this happens by killing the other "control" task.

This approach doesn't work very well in practice as a result of
https://issues.apache.org/jira/browse/MESOS-1812 which means tasks are not
launched in order on the slave, so there is no way to guarantee the control
task comes up first, which leads to all sorts of interesting races.

Is this is bad road to go down? I can't use framework messages as I don't
believe those are a reliable way of sending signals, so not sure where else
to turn.

Cheers,

Tom.


Re: [VOTE] Release Apache Mesos 0.20.1 (rc3)

2014-09-19 Thread Tom Arnfeld
+1 (non-binding)

Make check on Ubuntu 12.04 with gcc 4.6.3

On 19 September 2014 17:37, Tim Chen  wrote:

> +1 (non-binding)
>
> Make check on Centos 5.5, docker tests all passed too.
>
> Tim
>
> On Fri, Sep 19, 2014 at 9:17 AM, Jie Yu  wrote:
>
>> +1 (binding)
>>
>> Make check on centos5 and centos6 (gcc48)
>>
>> On Thu, Sep 18, 2014 at 4:05 PM, Adam Bordelon 
>> wrote:
>>
>>> Hi all,
>>>
>>> Please vote on releasing the following candidate as Apache Mesos 0.20.1.
>>>
>>>
>>> 0.20.1 includes the following:
>>>
>>> 
>>> Minor bug fixes for docker integration, network isolation, build, etc.
>>>
>>> The CHANGELOG for the release is available at:
>>>
>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.20.1-rc3
>>>
>>> 
>>>
>>> The candidate for Mesos 0.20.1 release is available at:
>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/0.20.1-rc3/mesos-0.20.1.tar.gz
>>>
>>> The tag to be voted on is 0.20.1-rc3:
>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.20.1-rc3
>>>
>>> The MD5 checksum of the tarball can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/0.20.1-rc3/mesos-0.20.1.tar.gz.md5
>>>
>>> The signature of the tarball can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/0.20.1-rc3/mesos-0.20.1.tar.gz.asc
>>>
>>> The PGP key used to sign the release is here:
>>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>>
>>> The JAR is up in Maven in a staging repository here:
>>> https://repository.apache.org/content/repositories/orgapachemesos-1036
>>>
>>> Please vote on releasing this package as Apache Mesos 0.20.1!
>>>
>>> The vote is open until Mon Sep 22 17:00:00 PDT 2014 and passes if a
>>> majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Mesos 0.20.1
>>> [ ] -1 Do not release this package because ...
>>>
>>> Thanks,
>>> Adam and Bhuvan
>>>
>>
>>
>


Hadoop on Mesos

2014-09-16 Thread Tom Arnfeld
Hey everyone,

I've been working on a potential extension to Hadoop on Mesos which allows
the framework to potentially release allocated (but idle) TaskTracker slots
if they are doing nothing. This helps release resources hadoop is allocated
but not using, to increase overall cluster utilisation when multiple
frameworks are involved.

This scenario most commonly appears when you have a large job with an
expensive reduce phase. While the reducers are running the map slots are
completely idle, and therefore are unable to be offered to other frameworks
that could make use of the resources.

However, there are various intricacies of doing this, and it's quite hacky,
largely because it requires we change the number of available slots on a
TaskTracker while it's running.

I'd be interested to hear anyones thoughts on the idea... Especially those
that worked on the hadoop framework early on!

Pull request is here https://github.com/mesos/hadoop/pull/33 and another
issue related to this solution can be found here
https://github.com/mesos/hadoop/issues/32.

Cheers,

Tom.


Re: Dynamic Resource Roles

2014-09-13 Thread Tom Arnfeld
Awesome! That's great to hear. Let me know if there's anything I can help
with.

I can't seem to find a JIRA issue (came across this
https://issues.apache.org/jira/browse/MESOS-505 but seems very old) for it.
So I've made https://issues.apache.org/jira/browse/MESOS-1791.

On 10 September 2014 18:03, Adam Bordelon  wrote:

> BenH has been calling these "master reservations" (globally control
> reservations across all slaves through the master) and "offer reservations"
> (I don't care which nodes it's on, as long as I get X cpu and Y RAM, or Z
> sets of {X,Y}), and they're definitely on the roadmap.
>
> On Wed, Sep 10, 2014 at 9:05 AM, Tom Arnfeld  wrote:
>
> > That's very cool, thanks.
> >
> > On Wed, Sep 10, 2014 at 4:59 PM, Timothy Chen  wrote:
> >
> > > Hi Tom,
> > > Reservations is definitely something we've discussed and will be
> > addressed in the near future.
> > > Tim
> > >> On Sep 10, 2014, at 7:49 AM, Tom Arnfeld  wrote:
> > >>
> > >> Hey everyone,
> > >>
> > >> Just a quick question. Has the ever been any discussion around dynamic
> > >> roles?
> > >>
> > >> What I mean by this – currently if I want to guarantee 1 core and 10
> GB
> > of
> > >> ram to a specific type of framework (or "role") I need to do this at a
> > >> slave level. This means if I only want to guarantee a small number of
> > >> resources, I could do this on one slave. If that slave dies, that
> > resource
> > >> is no longer available.
> > >>
> > >> It would be interesting to see the master (DRF scheduler) capable of
> > >> reserving a minimum about of resource for offering only to frameworks
> > of a
> > >> certain role, such that I can guarantee R amount of resources on N
> > slaves
> > >> across the cluster as a whole.
> > >>
> > >> Tom.
> >
>


Re: Dynamic Resource Roles

2014-09-10 Thread Tom Arnfeld
That's very cool, thanks.

On Wed, Sep 10, 2014 at 4:59 PM, Timothy Chen  wrote:

> Hi Tom,
> Reservations is definitely something we've discussed and will be addressed in 
> the near future.
> Tim
>> On Sep 10, 2014, at 7:49 AM, Tom Arnfeld  wrote:
>> 
>> Hey everyone,
>> 
>> Just a quick question. Has the ever been any discussion around dynamic
>> roles?
>> 
>> What I mean by this – currently if I want to guarantee 1 core and 10 GB of
>> ram to a specific type of framework (or "role") I need to do this at a
>> slave level. This means if I only want to guarantee a small number of
>> resources, I could do this on one slave. If that slave dies, that resource
>> is no longer available.
>> 
>> It would be interesting to see the master (DRF scheduler) capable of
>> reserving a minimum about of resource for offering only to frameworks of a
>> certain role, such that I can guarantee R amount of resources on N slaves
>> across the cluster as a whole.
>> 
>> Tom.

Dynamic Resource Roles

2014-09-10 Thread Tom Arnfeld
Hey everyone,

Just a quick question. Has the ever been any discussion around dynamic
roles?

What I mean by this – currently if I want to guarantee 1 core and 10 GB of
ram to a specific type of framework (or "role") I need to do this at a
slave level. This means if I only want to guarantee a small number of
resources, I could do this on one slave. If that slave dies, that resource
is no longer available.

It would be interesting to see the master (DRF scheduler) capable of
reserving a minimum about of resource for offering only to frameworks of a
certain role, such that I can guarantee R amount of resources on N slaves
across the cluster as a whole.

Tom.


Re: Review Request 25369: Add Dockerfile for building Mesos from source

2014-09-06 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25369/#review52551
---



Dockerfile
<https://reviews.apache.org/r/25369/#comment91365>

You can avoid these `cd /opt` commands by defining a `WORKDIR /opt` 
instruction before this line, every subsequent container will pick that up, so 
you can reduce a bit of code duplication :)


- Tom Arnfeld


On Sept. 5, 2014, 12:50 a.m., Gabriel Monroy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25369/
> ---
> 
> (Updated Sept. 5, 2014, 12:50 a.m.)
> 
> 
> Review request for mesos and Timothy Chen.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Add Dockerfile for building Mesos from source
> 
> 
> Diffs
> -
> 
>   Dockerfile PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/25369/diff/
> 
> 
> Testing
> ---
> 
> Create snapshot builds with: 
> docker build -t mesos/mesos:git-`git rev-parse --short HEAD` .
> 
> Run master/slave with:
> docker run mesos/mesos:git-`git rev-parse --short HEAD` mesos-master [options]
> docker run mesos/mesos:git-`git rev-parse --short HEAD` mesos-slave [options]
> 
> Some Docker layer gymanstics are performed during the final layer to remove 
> build dependencies and reduce image size from ~5GB to < 2GB.
> 
> 
> Thanks,
> 
> Gabriel Monroy
> 
>



Re: Docker Containerization

2014-09-05 Thread Tom Arnfeld
Hey,

Thank you both for taking the time to respond. I wanted to be very
expressive and give context to the points I was making, hopefully it wasn't
too ranty. Ben, I've made some inline comments below.

I've had some chats with Ian Downes about doing something
> like this for the MesosContainerizer too (but using a tarball/zip or path
> that gets read-only bind-mounted into the container instead of a Docker
> image). If we don't already have a JIRA issue for this specific feature, we
> should add one so folks can follow along.


This is very interesting, and can definitely see how that would be useful.

We'd really like to
> consolidate our own internal containerizers/isolation code with
> libcontainer. That consolidation will likely be a long process, but it's
> well worth it in the long run IMHO.


I completely agree, though probably quite painful to change I can imagine
Mesos making use of the same constructs as docker under the hood, and as
you mentioned, completely sidestep the need to interact with the docker
daemon in some cases.

As you mention below,
> this is a huge asset to Mesos and something I'd personally like to make
> sure continues to work well, as would Till Toenshoff I'm sure. If you have
> other suggestions for how to more generically support workflows we'd love
> to hear them!


Personally, from the discussions we've had here around what we plan to
implement in the future and what we've been working on already the external
containerizer API does indeed fit very well with zero adjustments, plus
leveraging the ability to swap in our own logic where needed.

Can you elaborate on what you mean when you say "unity configuration"? If
> you're referring to command line flags, I'd love to hear your suggestions.
> If you're referring to broader support for sharing components between
> containerizers, I agree wholeheartedly. In fact, while the
> containerizer/isolator part of the code base has seen quite a bit of
> volatility in the last year I think it's due in for even more going
> forward. Ultimately I'd like to see us reusing our existing isolators with
> something like the Docker containerizer (and other containerizers) so that
> folks can benefit from the isolation work that's being done irregardless of
> how the container gets initially created!


My mistake, that was meant to read "unify configuration". I think the
containerizers could go a long way to sharing more concepts, and as you
suggest, interact with each other irregardless of how the container is
originally created (e.g container usage stats and limits, perhaps).

I also feel the Docker Containerizer and the EC have diverged somewhat,
especially around the ContainerInfo protobufs. I feel the EC could benefit
from the more declarative options (i.e Volume's), though I do think it
is necessary to support an arbitrary set of configuration options
(key=value) that each specific containerizer could make use of, even if
just until the option made its way top level. It would also be powerful to
mix containerizers purely based on the ContainerInfo.TYPE field,
automatically. If each containerizer were able to accept or reject
responsibility based on a TaskInfo (kind of like how it works now, but more
explicit), you could create more heterogeneous slaves. I am aware Mesos
does some of these things, but I think there's some space to refine.

We're very much looking forward to subsequent iterations of this area of
Mesos, there's lots going on! :-)

Tom.


On 5 September 2014 19:49, Timothy Chen  wrote:

> Hi Tom,
>
> As Ben mentioned it's definitely doable to introduce a default image
> for docker. I was hesitant to use the existing default image flag in
> the earlier point of the docker development as we didn't have
> DockerInfo and was just reusing the ContainerInfo from the
> CommandInfo, which makes it hard to distinguish between different
> containerizers what it was meant for.
>
> Now with the new API it's definitely worth revisiting this and make
> sure we have a good solution that is maintainable and also works well
> with mulitple containerizers.
>
> I just created this, let's discuss more on the JIRA ticket:
> https://issues.apache.org/jira/browse/MESOS-1768
>
> Tim
>
> On Fri, Sep 5, 2014 at 9:50 AM, Benjamin Hindman 
> wrote:
> > I appreciate the thoughtful email Tom! There's a bunch there so I've just
> > made some inline comments addressing your final questions. ;-)
> >
> > - Can the docker containerizer support more friendly defaults? If I only
> >> want my mesos cluster to containerizer things with Docker, but don't
> wish
> >> to require every user specify an image for their tasks.
> >>
> >
> > Requiring a user to specify an image for their task/executor was indeed
> one
> > of the biggest simplifying design decision we made for this first
> iteration
> > of Docker in Mesos. There was some discussion on JIRA about this too.
> That
> > all being said, I don't see any major blockers to introducing the idea
> of a
> > "default

Docker Containerization

2014-09-05 Thread Tom Arnfeld
Hey everyone,

First off; this is a long email... so brace yourself. I appreciate your
time and patience.

I wanted to open up discussion around this, since I've spoken to several
people in isolation and feel it would be great to come to some kind of
resolution. My aim here is to try and improve the new Docker Containerizer
and help to support new use cases that are currently not catered for.

I get the impression Docker + Mesos use cases fall into two groups. There's
the group of users that wish to say "I want to run a docker container on S
slave with R resources", which in almost every case I have come across is
for long running services. The other group is more along the lines of, "I
want to user docker instead of cgroup isolation directly".

There are currently two mechanisms to achieve these things, the External
Containerizer (0.19.0) and the Docker Containerizer (0.20.0), both of which
allow you to run docker containers in some way or another.

- The external containerizer implementation inside mesos knows nothing of
Docker and any Docker specific responsibilities are handed out to another
subprocess (Deimos or alike).
- The docker containerizer is a native implementation of Docker built in to
mesos, and requires zero additional dependencies or configuration (except
for Docker itself).

Each containerizer allows a framework to send additional details with a
`TaskInfo` protobuf to configure the underlying Docker container that's
going to be launched.

- The external containerizer allows the user to specify an image to use
(`ContainerInfo.image`) if they choose
- The docker containerizer requires the user to also specify an image to
use (`ContainerInfo.DockerInfo.image`)

Now here's the difference...

- The external containerizer allows the user to specify a list of "args"
which currently equate to CLI arguments to the `docker run` invocation.
- The external containerizer will be responsible for managing isolation for
all executors.
- The external containerizer is still given the opportunity to act even if
no image or arguments for the container are provided.

- The docker containerizer has a set of explicit, structured options that
define the container, e.g `ContainerInfo.volumes` and `Volume` protobufs
(though there is a limited set of supported options, these are growing over
time)
- The docker containerizer will *only* isolate tasks that include a
`ContainerInfo.type == ContainerInfo.Type.DOCKER` and pass on anything else

@tnachen/@benh (and others involved) have taken the approach to abstract
the container options to a subset of structured values, to separate
frameworks from the docker command line tool and the arguments it accepts.
Maybe in the future Mesos may use the Docker Remote API instead of a
subprocess to `docker run`, and this allows such a change to be
transparent. This is really great imo.

Back to the use cases...

> "I want to run a docker container on S slave with R resources"

This is covered really well with the docker containerizer. A framework can
choose which docker image is used, specify options in an abstract manner,
and Mesos is going to handle the rest. The user must specify an image to
run, but that's OK because the user knows they want to use a specific
image, and that they want to use Docker.

> "I want to user docker instead of cgroup isolation directly"

This use case isn't supported at all by the docker containerizer. From a
system administrators perspective, Docker can be a great tool to completely
isolate processes from one another and more importantly from the host. It
allows different dependencies to be installed and used for different
processes, and means host machines (mesos slaves) can be very pure. If the
person deploying Mesos is separated enough from those using it, the
administrator might not want to enforce that every user using any framework
*must* supply `ContainerInfo` details, especially if they're launching
tasks that require very common things (e.g python27). These common tools
may or may not be installed on the host.

My motivation here is to reduce the amount users or framework developers
need to understand about how the Mesos cluster is put together (and the
specifics of what is going on behind the scenes) and let them focus on
trying out frameworks or building their own.

Essentially what this means is, the ability to launch Mesos tasks and
executors inside docker containers transparently to the user. We've taken
this approach with the External Containerizer and our docker containerizer
implementation, and has proved to work very well for just "getting started"
on our cluster. The system administrators can sleep peacefully knowing
although anyone can run pretty much what they like on the cluster, slaves
aren't going to slowflake or become damaged. The
`--default_container_image` command line option supported by the
`mesos-slave` process was very useful here, but I can see how that explicit
approach can cause problems once you introduce the conce

Re: Review Request 25334: Fixed python egg proto imports.

2014-09-05 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25334/#review52426
---

Ship it!


Ship It!

- Tom Arnfeld


On Sept. 5, 2014, 7:35 a.m., Till Toenshoff wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25334/
> ---
> 
> (Updated Sept. 5, 2014, 7:35 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Tom Arnfeld, Thomas Rampelberg, 
> and Vinod Kone.
> 
> 
> Bugs: MESOS-1750
> https://issues.apache.org/jira/browse/MESOS-1750
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Fixes defective import introduced by the flat folder hierachy of the 
> mesos.interface Python egg generation. 
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 5526189 
> 
> Diff: https://reviews.apache.org/r/25334/diff/
> 
> 
> Testing
> ---
> 
> make check (OSX and linux)
> functional test by running build/src/examples/python/test-containerizer
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>



Re: Review Request 25270: Enable bridge network in Mesos

2014-09-04 Thread Tom Arnfeld


> On Sept. 3, 2014, 10:59 a.m., Tom Arnfeld wrote:
> > What's the reason for not also supporting the `port` resource type? For 
> > example, the Hadoop framework uses this 
> > https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/ResourcePolicy.java#L458-L472.
> >  I noticed there's a TODO here 
> > https://github.com/apache/mesos/blob/master/src/docker/docker.cpp#L256 to 
> > support the ports resource.
> > 
> > I think it'd be nice to support both this kind of PortMapping behaviour for 
> > explicit mappings, as you've done in the DockerInfo protobuf, but if I just 
> > need any old port that's available to bind to a resource type would be 
> > useful.
> 
> Timothy Chen wrote:
> What do you think that will look like with docker run? I'm trying to 
> think how a allocated port will then be translated into a mapping to the 
> container/host.

So if you're using the port resource, you're basically securing a single port 
for use by your container and your contianer only, from the host. I imagine 
this being identical to the current external containerizer implementations, 
using `-p 1234:1234` to map the allocated port on the host to the same port 
inside the container.


- Tom


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25270/#review52158
---


On Sept. 4, 2014, 10:04 p.m., Timothy Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25270/
> ---
> 
> (Updated Sept. 4, 2014, 10:04 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Jie Yu, and Timothy St. Clair.
> 
> 
> Bugs: MESOS-1621
> https://issues.apache.org/jira/browse/MESOS-1621
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Review: https://reviews.apache.org/r/25270
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto dea51f94d130c131421c43e7fd774ceb8941f501 
>   src/docker/docker.cpp af51ac9058382aede61b09e06e312ad2ce6de03e 
>   src/slave/slave.cpp bd31831022c97e68b0293d66e1eb5f28ac508525 
>   src/tests/docker_containerizer_tests.cpp 
> 8654f9c787bd207f6a7b821651e0c083bea9dc8a 
> 
> Diff: https://reviews.apache.org/r/25270/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Timothy Chen
> 
>



The mesos.interface python egg

2014-09-04 Thread Tom Arnfeld
Hey guys,

Has someone been changing things relating to the new *mesos.interface* python
egg on pypi? I don't seem to be able to install it anymore.. which is
strange as it's listed as available on the pypi site.

Here's some output: https://gist.github.com/tarnfeld/dcf936eb247c7bd5d2d1

Cheers,

Tom.


Re: Review Request 25270: Enable bridge networking and port mapping for Docker

2014-09-03 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25270/#review52158
---


What's the reason for not also supporting the `port` resource type? For 
example, the Hadoop framework uses this 
https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/ResourcePolicy.java#L458-L472.
 I noticed there's a TODO here 
https://github.com/apache/mesos/blob/master/src/docker/docker.cpp#L256 to 
support the ports resource.

I think it'd be nice to support both this kind of PortMapping behaviour for 
explicit mappings, as you've done in the DockerInfo protobuf, but if I just 
need any old port that's available to bind to a resource type would be useful.

- Tom Arnfeld


On Sept. 2, 2014, 8:47 p.m., Timothy Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25270/
> ---
> 
> (Updated Sept. 2, 2014, 8:47 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Jie Yu.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Enable bridge networking and port mapping for Docker
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto dea51f9 
>   src/docker/docker.cpp af51ac9 
>   src/slave/slave.cpp 5c76dd1 
>   src/tests/docker_containerizer_tests.cpp 8654f9c 
> 
> Diff: https://reviews.apache.org/r/25270/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Timothy Chen
> 
>



Re: Mesos 0.20.0 blog post

2014-09-03 Thread Tom Arnfeld
+1

On Wednesday, 3 September 2014, Benjamin Hindman 
wrote:

> LGTM!
>
> On Tuesday, September 2, 2014, Jie Yu >
> wrote:
>
> > Hi,
> >
> > I've drafted the blog post for the 0.20.0 release (the link below).
> Please
> > let me know if you have any suggestion or comments. The plan is to
> publish
> > it tomorrow.
> >
> >
> >
> https://docs.google.com/document/d/1fz9M96KX7BoA2hWUdUmFRCiJnSxdyYPn3Qnwhh1JBZw/edit?usp=sharing
> >
> > - Jie
> >
>


Re: Review Request 25237: Avoid Docker pull on each run

2014-09-01 Thread Tom Arnfeld


> On Sept. 1, 2014, 9:57 p.m., Tom Arnfeld wrote:
> > Couple of points, please correct me if I've misunderstood anything :)
> > 
> > Can you not just do a `docker run .. {image} ..` and let docker take care 
> > of pulling the image if needed? By default, docker will pull the image if 
> > one with the same registry/repo/tag combo doesn't exist.
> > 
> > The assumption here is that an image (comprised of {registry + repository + 
> > tag}) is never going to change. For example, the default tag used by docker 
> > is `latest`, which suggests to me that you can push new versions of your 
> > image to a registry, and update the `latest` tag to point to the new image. 
> > After this change in mesos, I would need to log in to every mesos slave 
> > that had ever downloaded this image, and run a `docker pull`.
> > 
> > The alternative is of course to use new tags for every new image (e.g. git 
> > hashes). Though this means I need to update every framework that has been 
> > configured with docker image names and change them to the new tag. I can 
> > see the appeal of this approach when thinking soley about service 
> > schedulers, because it could be problematic to control a rolling release if 
> > any new task will automatically run the new image (as it takes the latest 
> > image from the registry).
> > 
> > I've actually raised this issue several times with various people in the 
> > docker community and never managed to get a concrete answer other than just 
> > run `docker pull` every time (which is what we've been doing outside of 
> > mesos). I think the difference between these use cases needs to be given 
> > some serious thought, as it's caused us pain in various ways, hence why we 
> > ended up running `docker pull` before every task to avoid the problem.
> > 
> > A working example would be the redis repository 
> > (https://registry.hub.docker.com/v1/repositories/redis/tags), you'll see 
> > that the `latest` tag is pointing at version 2.8. This tag is updated every 
> > time a new image is published, and if I were to use the `latest` tag (or 
> > not specify a tag, since it's the default) I would need to either explicity 
> > change my deployment of redis to use a strict version, or manually `docker 
> > pull` on all slaves and restart all the tasks using this container image.
> > 
> > It's also important to take into consideration long running frameworks like 
> > Hadoop on Mesos, if this change were to be merged, and to avoid logging 
> > into every slave and running `docker pull` we would need to restart the 
> > JobTracker and change the image to a newer (never previously used) tag. As 
> > opposed to new TaskTrackers automatically being launched inside the new 
> > image.
> > 
> > I guess a fair amount of this depends on what you're expecting to get from 
> > using Docker. Software deployment or just dependency management and 
> > isolation?
> > 
> > I'm not against running `docker inspect && docker pull` on every slave in 
> > the cluster, but I'd like the requirement to do that to be chosen. Perhaps 
> > you guys have already had this discussion... I'm very interested to see 
> > what others have been doing to solve this problem.
> 
> Timothy Chen wrote:
> Hi Tom, there are definitely lots of trade off questions and honestly I 
> don't think there are obvious choices.
> We could allow docker pull on each run which we originally did, but hits 
> several problems like relying on registry server to be up at all times which 
> proves to be not the case. It also has limitations of the scalability of 
> registry server, as well as no longer allowing anyone to run local images.
> 
> However, without a pull you don't necessarily get the very latest tag if 
> you simply specify no tag.
> 
> Currently Docker run's semantics as you mentioned, doesn't auto pull if 
> it already exists locally and I'm simply matching that for now. If users 
> really want to gurantee what image you're running I think specifying the 
> exact tag for your image is the best way to go, and not relying on latest as 
> that's not reliable since even Docker run doesn't do it.
> 
> It's sure can be optional, but so far from all the use cases I've heard 
> no one has required a docker pull on each run and most people are suprised on 
> why we pull each time. I'm trying not to expose too much knobs that are not 
> necessary. 
> 
> And answering your docker run {image} point, we intentionally

Re: Review Request 25237: Avoid Docker pull on each run

2014-09-01 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25237/#review52004
---


Couple of points, please correct me if I've misunderstood anything :)

Can you not just do a `docker run .. {image} ..` and let docker take care of 
pulling the image if needed? By default, docker will pull the image if one with 
the same registry/repo/tag combo doesn't exist.

The assumption here is that an image (comprised of {registry + repository + 
tag}) is never going to change. For example, the default tag used by docker is 
`latest`, which suggests to me that you can push new versions of your image to 
a registry, and update the `latest` tag to point to the new image. After this 
change in mesos, I would need to log in to every mesos slave that had ever 
downloaded this image, and run a `docker pull`.

The alternative is of course to use new tags for every new image (e.g. git 
hashes). Though this means I need to update every framework that has been 
configured with docker image names and change them to the new tag. I can see 
the appeal of this approach when thinking soley about service schedulers, 
because it could be problematic to control a rolling release if any new task 
will automatically run the new image (as it takes the latest image from the 
registry).

I've actually raised this issue several times with various people in the docker 
community and never managed to get a concrete answer other than just run 
`docker pull` every time (which is what we've been doing outside of mesos). I 
think the difference between these use cases needs to be given some serious 
thought, as it's caused us pain in various ways, hence why we ended up running 
`docker pull` before every task to avoid the problem.

A working example would be the redis repository 
(https://registry.hub.docker.com/v1/repositories/redis/tags), you'll see that 
the `latest` tag is pointing at version 2.8. This tag is updated every time a 
new image is published, and if I were to use the `latest` tag (or not specify a 
tag, since it's the default) I would need to either explicity change my 
deployment of redis to use a strict version, or manually `docker pull` on all 
slaves and restart all the tasks using this container image.

It's also important to take into consideration long running frameworks like 
Hadoop on Mesos, if this change were to be merged, and to avoid logging into 
every slave and running `docker pull` we would need to restart the JobTracker 
and change the image to a newer (never previously used) tag. As opposed to new 
TaskTrackers automatically being launched inside the new image.

I guess a fair amount of this depends on what you're expecting to get from 
using Docker. Software deployment or just dependency management and isolation?

I'm not against running `docker inspect && docker pull` on every slave in the 
cluster, but I'd like the requirement to do that to be chosen. Perhaps you guys 
have already had this discussion... I'm very interested to see what others have 
been doing to solve this problem.

- Tom Arnfeld


On Sept. 1, 2014, 7:16 p.m., Timothy Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25237/
> ---
> 
> (Updated Sept. 1, 2014, 7:16 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Jie Yu.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Avoid Docker pull on each run.
> 
> Currently each Docker run will run a docker pull which calls the docker 
> registry each time.
> To avoid this this patch adds a docker inspect  and skip calling pull 
> if it already exists.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.cpp 0febbac5df4126f6c8d9a06dd0ba1668d041b34a 
> 
> Diff: https://reviews.apache.org/r/25237/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Timothy Chen
> 
>



#MesosCon

2014-08-23 Thread Tom Arnfeld
Hey everyone,

We're in Chicago for #MesosCon (so awesome, thanks Dave & the team!) and
wondered if anyone else still in town wanted to meet for some drinks or
food and talk mesos stuff? We're here tonight and tomorrow night.

Cheers!

T


Re: Review Request 24939: Report bind parameters on failure

2014-08-23 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24939/#review51342
---

Ship it!


Ship It!

- Tom Arnfeld


On Aug. 21, 2014, 6:22 p.m., Nikita Vetoshkin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24939/
> ---
> 
> (Updated Aug. 21, 2014, 6:22 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Bugs: MESOS-1728
> https://issues.apache.org/jira/browse/MESOS-1728
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Report bind parameters on failure
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/process.cpp 
> ddcedb703cea67587c1c87993681686261107f47 
> 
> Diff: https://reviews.apache.org/r/24939/diff/
> 
> 
> Testing
> ---
> 
> Built master and slave and check new error message by issuing the same 
> commands in two separate consoles.
> 
> 
> Thanks,
> 
> Nikita Vetoshkin
> 
>



Re: Updates to the CLI tools

2014-08-07 Thread Tom Arnfeld
This is really quite awesome, I too have found myself needing something
like this...!


On 8 August 2014 00:26, Vinod Kone  wrote:

> This is really awesome. I love it!
>
> Can't wait to use it in production.
>
>
> On Thu, Aug 7, 2014 at 1:57 PM, Thomas Rampelberg 
> wrote:
>
> > I've gone through and done a ton of updates to the CLI tools. If you'd
> > like to give them a try, there's a review posted and you can check out
> > the readme here:
> >
> > https://reviews.apache.org/r/24469/diff/#11
> >
> > (tldr. `pip install mesos.cli`)
> >
> > Please take the time to read the README if you are at all interested.
> > It goes over the design goals, installation steps, features and
> > implemented commands.
> >
>


Re: Python bindings are changing!

2014-08-01 Thread Tom Arnfeld
Woah, this is really awesome Thomas! Especially the pip install ;-)

Looking forward to bringing pesos up to speed with this.


On 1 August 2014 21:30, Jie Yu  wrote:

> Thomas,
>
> Thank you for the heads-up. One question: what if mesos and python binding
> have different versions? For example, is it ok to use a 0.19.0 python
> binding and having a 0.20.0 mesos? Same question for the reverse.
>
> - Jie
>
>
> On Fri, Aug 1, 2014 at 9:37 AM, Thomas Rampelberg 
> wrote:
>
>> - What problem are we trying to solve?
>>
>> Currently, the python bindings group protobufs, stub implementations
>> and compiled code into a single python package that cannot be
>> distributed easily. This forces python projects using mesos to copy
>> protobufs around and forces a onerous dependency on anyone who would
>> like to do a pure python binding.
>>
>> - How was this problem solved?
>>
>> The current python package has been split into two separate packages:
>>
>> - mesos.interface (stub implementations and protobufs)
>> - mesos.native (old _mesos module)
>>
>> These are python meta-packages and can be installed as separate
>> pieces. The `mesos.interface` package will be hosted on pypi and can
>> be installed via. easy_install and pip.
>>
>> See https://issues.apache.org/jira/browse/MESOS-857 and
>> https://reviews.apache.org/r/23224/.
>>
>> - Why should I care?
>>
>> These changes are not backwards compatible. With 0.20.0 you will need
>> to change how you use the python bindings. Here's a quick overview:
>>
>> mesos.Scheduler -> mesos.interface.Scheduler
>> mesos.mesos_pb2 -> mesos.interface.mesos_pb2
>> mesos.MesosSchedulerDriver -> mesos.native.MesosSchedulerDriver
>>
>> For more details, you can take a look at the examples in
>> `src/examples/python".
>>
>
>


Re: Mesos/Libprocess API

2014-07-23 Thread Tom Arnfeld
Fair enough. I've updated my two forks with an extra option in the
setup.cfg file that uses a different build directory to solve that problem.
I have no issue with using pants or or not (so long as vanilla setuptools
works), though I was having issues getting pants to use these forks (via
the git support built into setuptools) in an external project, so gave up
and just used pip.


On 23 July 2014 17:42, Brian Wickman  wrote:

> They can be installed via setuptools -- the repositories have setup.py
> files included at the root so that source distributions can be made.
>  Nothing's been published to pypi yet though, since I don't feel they're
> even ready for an 0.1 release.  I pantsified the repositories to make it
> easier to run tests, but am happy to toxify them instead -- whatever people
> are most comfortable with.  Re: BUILD vs build, you should be able to
> specify an alternate build path for setuptools so that it doesn't conflict
> when building distributions.
>
>
> On Wed, Jul 23, 2014 at 12:01 AM, Tom Arnfeld  wrote:
>
> > R.E Python 3.3;
> >
> > It doesn't actually require it, but it's importing asyncio directly
> > instead of using the standard try/catch around the import and falling
> back
> > to trollius (the Python2 port). We can probably import via tornado, which
> > does this try/catch for us.
> >
> > Right now both compactor and pesos can't be installed with setuptools due
> > to the BUILD files (setuptools tries to create build folders and the name
> > conflicts). I'll be fixing this up today by removing pants, at least just
> > to get it working... not sure how Brian feels about that yet ;-)
> >
> > I can ping the mailing list once both compactor and pesos are up and
> > running in an easy-to-use way (next few hours). I've been using
> > https://github.com/tarnfeld/mesos-python-framework as a test framework
> > skeleton, but that's also not easy to set up just yet.
> >
> > Note: Regardless of the above, you need to have installed our fork of
> > tornado for anything to work beyond the first round of messages.
> >
> > Cheers,
> >
> > Tom.
> >
> > On 23 Jul 2014, at 07:42, Vetoshkin Nikita 
> > wrote:
> >
> > > Hi, Tom!
> > > I would gladly help you to debug if you could provide some information
> > > about your setup. Is it localhost only communication? Any code snippet
> to
> > > reproduce the problem?
> > >
> > > P.S. I'm trying to setup pesos and it seems like python3.3 is a
> > requirement
> > > but it isn't mentioned anywhere.
> > >
> > >
> > > On Tue, Jul 22, 2014 at 1:45 AM, Tom Arnfeld  wrote:
> > >
> > >> Hey,
> > >>
> > >> I've started to try and finish off the work @wickman started around
> > >> pesos[1] and compactor[2] - pure language bindings for mesos and
> > libprocess
> > >> in Python. It's currently far from finished, but have fun into a brick
> > wall
> > >> around libprocess. If anyone could shed any light that'd be great.
> > >>
> > >> To start with, I saw the framework register but disconnect
> immediately.
> > >> From a quick chat on IRC someone mentioned this could be related to
> not
> > >> keeping the inbound message connection open, the one that sends
> > >> mesos.internal.FrameworkRegisteredMessage. In doing this, the
> framework
> > >> does register with the master and shows up in the UI – however no
> > further
> > >> messages are received at all. I'm keeping the outbound connection open
> > >> already.
> > >>
> > >> Mesos seems to think it's sending offers (they show up in the logs and
> > are
> > >> showing in the Offers page) but the master never gets past there, and
> > the
> > >> framework never receives the HTTP connections. No doubt this is a bug
> in
> > >> the socket logic on my end. Struggling to find my way through the
> > >> libprocess source to figure it out.
> > >>
> > >> Any pointers would be much appreciated.
> > >>
> > >> Thanks!
> > >>
> > >> Tom.
> > >>
> > >> [1] https://github.com/wickman/pesos
> > >> [2] https://github.com/wickman/compactor
> >
> >
>


Re: Mesos/Libprocess API

2014-07-23 Thread Tom Arnfeld
R.E Python 3.3;

It doesn't actually require it, but it's importing asyncio directly instead of 
using the standard try/catch around the import and falling back to trollius 
(the Python2 port). We can probably import via tornado, which does this 
try/catch for us.

Right now both compactor and pesos can't be installed with setuptools due to 
the BUILD files (setuptools tries to create build folders and the name 
conflicts). I'll be fixing this up today by removing pants, at least just to 
get it working... not sure how Brian feels about that yet ;-)

I can ping the mailing list once both compactor and pesos are up and running in 
an easy-to-use way (next few hours). I've been using 
https://github.com/tarnfeld/mesos-python-framework as a test framework 
skeleton, but that's also not easy to set up just yet.

Note: Regardless of the above, you need to have installed our fork of tornado 
for anything to work beyond the first round of messages.

Cheers,

Tom.

On 23 Jul 2014, at 07:42, Vetoshkin Nikita  wrote:

> Hi, Tom!
> I would gladly help you to debug if you could provide some information
> about your setup. Is it localhost only communication? Any code snippet to
> reproduce the problem?
> 
> P.S. I'm trying to setup pesos and it seems like python3.3 is a requirement
> but it isn't mentioned anywhere.
> 
> 
> On Tue, Jul 22, 2014 at 1:45 AM, Tom Arnfeld  wrote:
> 
>> Hey,
>> 
>> I've started to try and finish off the work @wickman started around
>> pesos[1] and compactor[2] - pure language bindings for mesos and libprocess
>> in Python. It's currently far from finished, but have fun into a brick wall
>> around libprocess. If anyone could shed any light that'd be great.
>> 
>> To start with, I saw the framework register but disconnect immediately.
>> From a quick chat on IRC someone mentioned this could be related to not
>> keeping the inbound message connection open, the one that sends
>> mesos.internal.FrameworkRegisteredMessage. In doing this, the framework
>> does register with the master and shows up in the UI – however no further
>> messages are received at all. I'm keeping the outbound connection open
>> already.
>> 
>> Mesos seems to think it's sending offers (they show up in the logs and are
>> showing in the Offers page) but the master never gets past there, and the
>> framework never receives the HTTP connections. No doubt this is a bug in
>> the socket logic on my end. Struggling to find my way through the
>> libprocess source to figure it out.
>> 
>> Any pointers would be much appreciated.
>> 
>> Thanks!
>> 
>> Tom.
>> 
>> [1] https://github.com/wickman/pesos
>> [2] https://github.com/wickman/compactor



Re: Mesos/Libprocess API

2014-07-22 Thread Tom Arnfeld
Hey!

Thanks for the reply. After a painful few days I managed to narrow it down to 
an error (that was being swallowed, grr!!) as a result of an implementation bug 
in Tornado (the python library we're using for the libprocess http service). 
I've since submitted a patch here 
https://github.com/tornadoweb/tornado/pull/1124 - feel free to chime in!

This highlights an issue with Mesos which I have also opened here 
https://issues.apache.org/jira/browse/MESOS-1625. I'd love to get any input on 
the problem, as i'm a little scared about the consequences if this were to be 
fixed.

In other news, pesos is now working and i'm finishing off the rest of the 
actions/callback implementations today.

Tom.

On 23 Jul 2014, at 07:42, Vetoshkin Nikita  wrote:

> Hi, Tom!
> I would gladly help you to debug if you could provide some information
> about your setup. Is it localhost only communication? Any code snippet to
> reproduce the problem?
> 
> P.S. I'm trying to setup pesos and it seems like python3.3 is a requirement
> but it isn't mentioned anywhere.
> 
> 
> On Tue, Jul 22, 2014 at 1:45 AM, Tom Arnfeld  wrote:
> 
>> Hey,
>> 
>> I've started to try and finish off the work @wickman started around
>> pesos[1] and compactor[2] - pure language bindings for mesos and libprocess
>> in Python. It's currently far from finished, but have fun into a brick wall
>> around libprocess. If anyone could shed any light that'd be great.
>> 
>> To start with, I saw the framework register but disconnect immediately.
>> From a quick chat on IRC someone mentioned this could be related to not
>> keeping the inbound message connection open, the one that sends
>> mesos.internal.FrameworkRegisteredMessage. In doing this, the framework
>> does register with the master and shows up in the UI – however no further
>> messages are received at all. I'm keeping the outbound connection open
>> already.
>> 
>> Mesos seems to think it's sending offers (they show up in the logs and are
>> showing in the Offers page) but the master never gets past there, and the
>> framework never receives the HTTP connections. No doubt this is a bug in
>> the socket logic on my end. Struggling to find my way through the
>> libprocess source to figure it out.
>> 
>> Any pointers would be much appreciated.
>> 
>> Thanks!
>> 
>> Tom.
>> 
>> [1] https://github.com/wickman/pesos
>> [2] https://github.com/wickman/compactor



Mesos/Libprocess API

2014-07-21 Thread Tom Arnfeld
Hey,

I've started to try and finish off the work @wickman started around pesos[1] 
and compactor[2] - pure language bindings for mesos and libprocess in Python. 
It's currently far from finished, but have fun into a brick wall around 
libprocess. If anyone could shed any light that'd be great.

To start with, I saw the framework register but disconnect immediately. From a 
quick chat on IRC someone mentioned this could be related to not keeping the 
inbound message connection open, the one that sends 
mesos.internal.FrameworkRegisteredMessage. In doing this, the framework does 
register with the master and shows up in the UI – however no further messages 
are received at all. I'm keeping the outbound connection open already. 

Mesos seems to think it's sending offers (they show up in the logs and are 
showing in the Offers page) but the master never gets past there, and the 
framework never receives the HTTP connections. No doubt this is a bug in the 
socket logic on my end. Struggling to find my way through the libprocess source 
to figure it out.

Any pointers would be much appreciated.

Thanks!

Tom.

[1] https://github.com/wickman/pesos
[2] https://github.com/wickman/compactor

Re: Review Request 23461: Added 'bool' return value to Containerizer::launch.

2014-07-16 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23461/#review47869
---

Ship it!


- Tom Arnfeld


On July 14, 2014, 10:31 p.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23461/
> ---
> 
> (Updated July 14, 2014, 10:31 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Jie Yu.
> 
> 
> Bugs: MESOS-1527
> https://issues.apache.org/jira/browse/MESOS-1527
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> See summary and JIRA issue. tl;dr; We need a way of determining whether or 
> not a containerizer could try and launch a container for a task/executor. The 
> most simple API change here was swapping Nothing => bool, see comments on 
> 'Containerizer::launch'.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/containerizer.hpp 
> a9f89fc8f9034e80010ba21f35dad2fa098b270e 
>   src/slave/containerizer/external_containerizer.hpp 
> 94dffbb75a3df7dbc9aaebbc5fd121967353750d 
>   src/slave/containerizer/external_containerizer.cpp 
> 3f28d85972a1666c942d1d689c8f861dbf15f6aa 
>   src/slave/containerizer/mesos/containerizer.hpp 
> 8746968c649d9a90b5a6af4326c8b1c454446983 
>   src/slave/containerizer/mesos/containerizer.cpp 
> 2c394e2c8702166266f5d20ff005abb218da8a6c 
>   src/slave/slave.hpp a896bb66db5d8cd27ef02b6498c9db93cb0d525f 
>   src/slave/slave.cpp e81abb2e6371d052151253172a4abde7169cb72f 
>   src/tests/containerizer.hpp 9325864691fb45d98f006f410c25f8fbd27b76bb 
>   src/tests/containerizer.cpp 3f11d352522e7a1e9c8ca74d033c25c4ec7a6d23 
>   src/tests/containerizer_tests.cpp 70e12455c0774c46f098cbaa2ec770305f6d6d11 
>   src/tests/external_containerizer_test.cpp 
> c26f3c262e60733849fbe8fbdc70a49bb55f5fff 
>   src/tests/slave_recovery_tests.cpp 582f52d73eba0e3ab089ec573d9a6c43bff0339e 
>   src/tests/slave_tests.cpp 371a5b8eb3d15343418d83d8cf08591649ac807c 
> 
> Diff: https://reviews.apache.org/r/23461/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: [VOTE] Release Apache Mesos 0.19.1 (rc1)

2014-07-16 Thread Tom Arnfeld
+1 (non binding)

- Tested on Mac OSX mavericks
- Tested on Ubuntu 12.04 LTS machines (spark and Hadoop run fine also)

On 15 Jul 2014, at 19:48, Niklas Nielsen  wrote:

> +1 (binding)
> 
> Tested on:
> - OSX Mavericks w/ clang-503.0.40 & LLVM 3.4
> - Ubuntu 13.10 w/ gcc-4.8.1 (LogZooKeeperTest.WriteRead is still flaky on 
> that VM)
> 
> Thanks Ben!
> 
> 
> On 14 July 2014 21:39, Benjamin Hindman  wrote:
> +1, thanks Ben!
> 
> 
> On Mon, Jul 14, 2014 at 6:20 PM, Vinod Kone  wrote:
> +1 (binding)
> 
> Tested on OSX Mavericks w/ gcc-4.8
> 
> 
> On Mon, Jul 14, 2014 at 2:35 PM, Timothy Chen  wrote:
> +1 (non-binding).
> 
> Tim
> 
> On Mon, Jul 14, 2014 at 2:32 PM, Benjamin Mahler
>  wrote:
> > Hi all,
> >
> > Please vote on releasing the following candidate as Apache Mesos 0.19.1.
> >
> >
> > 0.19.1 includes the following:
> > 
> > Fixes a long standing critical bug in the JNI bindings that can lead to
> > framework unregistration.
> > Allows the mesos fetcher to handle 30X redirects.
> > Fixes a CHECK failure during container destruction.
> > Fixes a regression that prevented local runs from working correctly.
> >
> > The CHANGELOG for the release is available at:
> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.19.1-rc1
> > 
> >
> > The candidate for Mesos 0.19.1 release is available at:
> > https://dist.apache.org/repos/dist/dev/mesos/0.19.1-rc1/mesos-0.19.1.tar.gz
> >
> > The tag to be voted on is 0.19.1-rc1:
> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.19.1-rc1
> >
> > The MD5 checksum of the tarball can be found at:
> > https://dist.apache.org/repos/dist/dev/mesos/0.19.1-rc1/mesos-0.19.1.tar.gz.md5
> >
> > The signature of the tarball can be found at:
> > https://dist.apache.org/repos/dist/dev/mesos/0.19.1-rc1/mesos-0.19.1.tar.gz.asc
> >
> > The PGP key used to sign the release is here:
> > https://dist.apache.org/repos/dist/release/mesos/KEYS
> >
> > The JAR is up in Maven in a staging repository here:
> > https://repository.apache.org/content/repositories/orgapachemesos-1025
> >
> > Please vote on releasing this package as Apache Mesos 0.19.1!
> >
> > The vote is open until Thu Jul 17 14:28:59 PDT 2014 and passes if a
> > majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Mesos 0.19.1
> > [ ] -1 Do not release this package because ...
> >
> > Thanks,
> > Ben
> 
> 
> 



Re: Mesos language bindings in the wild

2014-07-11 Thread Tom Arnfeld
Very exciting. I'd vote +1 for splitting them out. Especially if you
look at the common way of using Go imports, just stick the project on
GitHub and import it directly using "github.com/mesos/mesos-go" or
similar.

I guess one argument is that you have more fragmentation of the code
(e.g every library has it's own copy of the protos) but I'm not sure
that's a bad thing.

Just my two cents. Looking forward to this!

> On 11 Jul 2014, at 16:59, Thomas Rampelberg  wrote:
>
> I've started preparing the python bindings to hopefully take this
> route ( https://reviews.apache.org/r/23224/ would love some reviews!
> ). In fact, there is already a native python implementation of both
> libprocess and the framework apis! (https://github.com/wickman/pesos/
> , https://github.com/wickman/compactor ).
>
> What are the benefits of bindings being part of the project source
> itself instead of having blessed implementations like mesos-python
> where the source and versioning becomes separate? I've been running
> into difficulties making automake and python's build tools play nicely
> together. It seems like there'd be more flexibility in general by
> splitting them out.
>
>
>> On Thu, Jul 10, 2014 at 3:57 PM, Niklas Nielsen  wrote:
>> I just wanted to clarify - native, meaning _no_ dependency to libmesos and
>> native to its language (only Go, only Python and so on) i.e. use the
>> low-level API.
>>
>> Sorry for the confusion,
>> Niklas
>>
>>
>>> On 10 July 2014 15:55, Dominic Hamon  wrote:
>>>
>>> In my dream world, we wouldn't need any native bindings. I can imagine
>>> having example frameworks or starter frameworks that use the low-level API
>>> (the wire protocol with protocol buffers for message passing), but nothing
>>> like we have that needs C or JNI, etc.
>>>
>>>
>>>
>>>
>>> On Thu, Jul 10, 2014 at 3:26 PM, Niklas Nielsen 
>>> wrote:
>>>
 Hi all,

 I wanted to start a discussion around the language bindings in the wild
 (Go, Haskell, native Python, Go, Java and so on) and possibly get to a
 strategy where we start bringing those into Mesos proper. As most things
 points towards, it will probably make sense to focus on the native
 "bindings" leveraging the low-level API. To name one candidate to start
 with, we are especially interested in getting Go native support in Mesos
 proper (and in a solid state). So Vladimir, we'd be super thrilled to
>>> start
 collaborating with you on your current work.

 We are interested to hear what thoughts you all might have on this.

 Thanks,
 Niklas
>>>


Re: 0.19.1

2014-07-04 Thread Tom Arnfeld
Happy to. It surprised me that this wasn't supported, especially considering 
the fetcher is supposed to be able to download URIs from any URL using http(s). 
This is most useful (and in my opinion quite an important issue) for 
downloading executors from S3 in situations a redirect is incurred, and more 
specifically, github tar archives which almost always go through a 301.

Don't mind if going into the next non-bugfix release if you don't agree it's 
that important.

On 4 Jul 2014, at 20:48, Dominic Hamon  wrote:

> Hi
> 
> Can you give some background as to why this is a critical fix? We try to
> minimise what we include in bug fix releases to avoid feature creep.
> 
> Thanks
> On Jul 4, 2014 12:31 PM, "Tom Arnfeld"  wrote:
> 
>> Any chance we can get https://issues.apache.org/jira/browse/MESOS-1448
>> too?
>> 
>> On 3 Jul 2014, at 21:40, Vinod Kone  wrote:
>> 
>> Hi,
>> 
>> We are planning to release 0.19.1 (likely next week) which will be a bug
>> fix release. Specifically, these are the fixes that we are planning to
>> cherry pick.
>> 
>> 
>> https://issues.apache.org/jira/issues/?filter=12326191&jql=project%20%3D%20MESOS%20AND%20%22Target%20Version%2Fs%22%20%3D%200.19.1
>> 
>> If there are other critical fixes that need to be backported to 0.19.1
>> please reply here as soon as possible.
>> 
>> Thanks,
>> 
>> 
>> 



Re: 0.19.1

2014-07-04 Thread Tom Arnfeld
Any chance we can get https://issues.apache.org/jira/browse/MESOS-1448  too?

On 3 Jul 2014, at 21:40, Vinod Kone  wrote:

> Hi,
> 
> We are planning to release 0.19.1 (likely next week) which will be a bug fix 
> release. Specifically, these are the fixes that we are planning to cherry 
> pick.
> 
> https://issues.apache.org/jira/issues/?filter=12326191&jql=project%20%3D%20MESOS%20AND%20%22Target%20Version%2Fs%22%20%3D%200.19.1
> 
> If there are other critical fixes that need to be backported to 0.19.1 please 
> reply here as soon as possible.
> 
> Thanks,



[jira] [Commented] (MESOS-1524) Implement Docker support in Mesos

2014-06-22 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040106#comment-14040106
 ] 

Tom Arnfeld commented on MESOS-1524:


Thanks for the clarifications [~bhuvan]! Hopefully {{--no-cache}} will make it 
into {{docker run}} soon.

> Implement Docker support in Mesos
> -
>
> Key: MESOS-1524
> URL: https://issues.apache.org/jira/browse/MESOS-1524
> Project: Mesos
>  Issue Type: Epic
>Reporter: Tobi Knaup
>Assignee: Benjamin Hindman
>
> There have been two projects to add Docker support to Mesos, first via an 
> executor, and more recently via an external containerizer written in Python - 
> Deimos: https://github.com/mesosphere/deimos
> We've got a lot of feedback from folks who use Docker and Mesos, and the main 
> wish was to make Docker a first class citizen in Mesos instead of a plugin 
> that needs to be installed separately. Mesos has been using Linux containers 
> for a long time, first via LXC, then via cgroups, and now also via the 
> external containerizer. For a long time it wasn't clear what the winning 
> technology would be, but with Docker becoming the de-facto standard for 
> handling containers I think Mesos should make it a first class citizen and 
> part of core.
> Let's use this JIRA to track wishes/feedback on the implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1524) Implement Docker support in Mesos

2014-06-21 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039899#comment-14039899
 ] 

Tom Arnfeld commented on MESOS-1524:


I agree a first step of using the docker CLI would be great, and as 
you/[~tknaup] pointed out a lot of the thinking and work has been done to get 
that up and running; If anyone is interested, these are probably the most 
prominent that I ran into (and i'm sure [~solidsnack] did too).

- Mounting the sandbox directory can be problematic if the path contains a 
colon, due to docker's CLI parser.
- The container has to share the hosts network, even though port forwarding 
exists. This is because (not sure if libprocess would do the same) the executor 
will pick up the IP of the host->container bridge which isn't accessible 
externally.
- Docker itself doesn't support reading any metrics from their API (so you have 
to go behind docker straight to the cgroup fs)
- Docker itself doesn't support any way to modify the limits or ports of a 
running (or stopped) container. The former can be done via the cgroup fs (much 
the same as the existing cgroup containerizer) and i'm not familiar with a 
method for the latter.
- Docker doesn't actually pull an image if it exists in it's local cache. This 
is problematic because if you update a tag on a docker registry (e.g a private 
one) a slave that has previously launched an executor with that image won't 
download the new one... so you can get various inconsistencies. The only way 
around this currently is to {{pull}} first.

(Hopefully that might come in handy for whoever implements the C++ 
containerizer).

The docker guys seem to be pretty good at keeping the docker CLI API consistent 
and backwards compatible, and i'm sure that'll be even better now they've hit 
1.0. Though users will just be installing the latest version, so it'd be great 
to try and keep the latest version of mesos in working order with the latest 
version of docker.

[~benjaminhindman] – I'm a little concerned about that approach, a couple of 
reasons off the top of my head...

- I've always been a huge fan of how the External Containerizer was implemented 
and I feel like we'd be going backwards to try and squeeze in some kind of "run 
in docker" option
- A first iteration that only supported a concept that's parallel to the 
executor wouldn't work too well
- First of all, the "task"/"executor" split is something everyone i've 
spoken to about Mesos has been impressed with, and it opens up some very 
intriguing patterns for distribution
- Most frameworks seem to use a custom executor, even if it wouldn't be 
*strictly* required because they're not launching Mesos Tasks. Namely _Hadoop 
on Mesos_, _Jenkins on Mesos_ and _Spark on Mesos_ (in coarse mode).

I'm assuming by your comment that since this is something that sits parallel to 
the executor, meaning you can't run executors?

Thanks for creating a new issues, [~jaybuff]!

> Implement Docker support in Mesos
> -
>
> Key: MESOS-1524
> URL: https://issues.apache.org/jira/browse/MESOS-1524
> Project: Mesos
>  Issue Type: Epic
>Reporter: Tobi Knaup
>Assignee: Benjamin Hindman
>
> There have been two projects to add Docker support to Mesos, first via an 
> executor, and more recently via an external containerizer written in Python - 
> Deimos: https://github.com/mesosphere/deimos
> We've got a lot of feedback from folks who use Docker and Mesos, and the main 
> wish was to make Docker a first class citizen in Mesos instead of a plugin 
> that needs to be installed separately. Mesos has been using Linux containers 
> for a long time, first via LXC, then via cgroups, and now also via the 
> external containerizer. For a long time it wasn't clear what the winning 
> technology would be, but with Docker becoming the de-facto standard for 
> handling containers I think Mesos should make it a first class citizen and 
> part of core.
> Let's use this JIRA to track wishes/feedback on the implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MESOS-1524) Implement Docker support in Mesos

2014-06-21 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039739#comment-14039739
 ] 

Tom Arnfeld edited comment on MESOS-1524 at 6/21/14 9:13 AM:
-

This sounds really great. I think implementing this would be easier now since 
the refactor of the Containerizer classes...?

I'd vote for #2 since i'd guess it's easier (than #3) to support launching 
containers on a remote host (the {{-H}} argument in #1). This is really quite 
useful for development, when running Mesos on a Mac and having docker running 
inside VM, for example.

For me though, this does raise a question. From what I gathered the main idea 
behind the external containerizer was to allow not just pluggability (the c++ 
classes _could_ be made "pluggable") but also ease of use for developers and 
users to get involved. How do you envisage this happening with other isolation 
mechanisms? Will the external containerizer act as a way of proving the 
concept, and eventually support for that containerizer would be implemented 
into Mesos itself? Especially given the design of the current external 
containerizer is aimed at support multiple types of containers in one slave 
(with the {{docker:///}} scheme).


was (Author: tarnfeld):
This sounds really great. I think implementing this would be easier now since 
the refactor of the Containerizer classes...?

I'd vote for #2 since i'd guess it's easier (than #3) to support launching 
containers on a remote host (the {{-H}} argument in #1). This is really quite 
useful for development, when running Mesos on a Mac and having docker running 
inside VM, for example.

For me though, this does raise a question. From what I gathered the main idea 
behind the external containerizer was to allow not just pluggability (the c++ 
classes _could_ be made "pluggable") but also ease of use for developers and 
users to get involved. How do you envisage this happening with other isolation 
mechanisms? Will the external containerizer act as a way of proving the 
concept, and eventually support for that containerizer would be implemented 
into Mesos itself?

> Implement Docker support in Mesos
> -
>
> Key: MESOS-1524
> URL: https://issues.apache.org/jira/browse/MESOS-1524
> Project: Mesos
>  Issue Type: Epic
>Reporter: Tobi Knaup
>Assignee: Benjamin Hindman
>
> There have been two projects to add Docker support to Mesos, first via an 
> executor, and more recently via an external containerizer written in Python - 
> Deimos: https://github.com/mesosphere/deimos
> We've got a lot of feedback from folks who use Docker and Mesos, and the main 
> wish was to make Docker a first class citizen in Mesos instead of a plugin 
> that needs to be installed separately. Mesos has been using Linux containers 
> for a long time, first via LXC, then via cgroups, and now also via the 
> external containerizer. For a long time it wasn't clear what the winning 
> technology would be, but with Docker becoming the de-facto standard for 
> handling containers I think Mesos should make it a first class citizen and 
> part of core.
> Let's use this JIRA to track wishes/feedback on the implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MESOS-1524) Implement Docker support in Mesos

2014-06-21 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039739#comment-14039739
 ] 

Tom Arnfeld edited comment on MESOS-1524 at 6/21/14 9:13 AM:
-

This sounds really great. I think implementing this would be easier now since 
the refactor of the Containerizer classes...?

I'd vote for #2 since i'd guess it's easier (than #3) to support launching 
containers on a remote host (the {{-H}} argument in #1). This is really quite 
useful for development, when running Mesos on a Mac and having docker running 
inside VM, for example.

For me though, this does raise a question. From what I gathered the main idea 
behind the external containerizer was to allow not just pluggability (the c++ 
classes _could_ be made "pluggable") but also ease of use for developers and 
users to get involved. How do you envisage this happening with other isolation 
mechanisms? Will the external containerizer act as a way of proving the 
concept, and eventually support for that containerizer would be implemented 
into Mesos itself? Especially given the design of the current external 
containerizer is aimed at supporting multiple types of containers in one slave 
(with the {{docker:///}} scheme), in the future.


was (Author: tarnfeld):
This sounds really great. I think implementing this would be easier now since 
the refactor of the Containerizer classes...?

I'd vote for #2 since i'd guess it's easier (than #3) to support launching 
containers on a remote host (the {{-H}} argument in #1). This is really quite 
useful for development, when running Mesos on a Mac and having docker running 
inside VM, for example.

For me though, this does raise a question. From what I gathered the main idea 
behind the external containerizer was to allow not just pluggability (the c++ 
classes _could_ be made "pluggable") but also ease of use for developers and 
users to get involved. How do you envisage this happening with other isolation 
mechanisms? Will the external containerizer act as a way of proving the 
concept, and eventually support for that containerizer would be implemented 
into Mesos itself? Especially given the design of the current external 
containerizer is aimed at support multiple types of containers in one slave 
(with the {{docker:///}} scheme).

> Implement Docker support in Mesos
> -
>
> Key: MESOS-1524
> URL: https://issues.apache.org/jira/browse/MESOS-1524
> Project: Mesos
>  Issue Type: Epic
>Reporter: Tobi Knaup
>Assignee: Benjamin Hindman
>
> There have been two projects to add Docker support to Mesos, first via an 
> executor, and more recently via an external containerizer written in Python - 
> Deimos: https://github.com/mesosphere/deimos
> We've got a lot of feedback from folks who use Docker and Mesos, and the main 
> wish was to make Docker a first class citizen in Mesos instead of a plugin 
> that needs to be installed separately. Mesos has been using Linux containers 
> for a long time, first via LXC, then via cgroups, and now also via the 
> external containerizer. For a long time it wasn't clear what the winning 
> technology would be, but with Docker becoming the de-facto standard for 
> handling containers I think Mesos should make it a first class citizen and 
> part of core.
> Let's use this JIRA to track wishes/feedback on the implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Docker support in Mesos core

2014-06-21 Thread Tom Arnfeld
Hey Everyone,

Excited to see discussions of this. Something I started playing around with 
just as the external containerizer was coming to life!

Diptanu – A few responses to your notes...

> a. Mesos understanding docker metrics which should be straightforward because 
> docker writes all its metrics in the following fashin for cpu, blkio, memory 
> etc - /sys/fs/cgroup/cpu/docker/

Are these paths not operating system dependent? I'm not too familiar with 
cross-platform cgroups, so I am no doubt wrong here. These cgroup metrics are 
also the ones Mesos currently uses (both for usage statistics and for 
memory/cpu limits) so they can be pulled out much the same.

> b. Easier way to map tasks to docker containers, probably the external 
> containerizer takes care of it to a large extent. It would be helpful if 
> there was a blog about it's API and internals by the core committers 
> explaining the design. Even a simple example in the mesos codebase using the 
> external containerizer would help.

That's an interesting one, a blog post would be awesome. The containerizers 
right now currently use the "ContainerId" string provided to them from Mesos (I 
believe this is just the TaskID but i'm not certain of that). This helps ensure 
consistency of how containerizers are implemented, and makes them much simpler.

> c. stdout and stderr of docker containers in the Mesos task stdout and stderr 
> logs. Does the external containerizer already takes care of it? I had to 
> write a service which runs on every slave for exposing the container logs to 
> an user.

The external containerizer itself doesn't help you with this. The logs from the 
containerizer calls are dumped into the sandbox, however it's up to the 
containerizer (e.g Deimos) to redirect the logs from the container it launches. 
Deimos does take care of this already (as seen here 
https://github.com/mesosphere/deimos/blob/master/deimos/containerizer/docker.py#L132).

> e. Translate all task constraints to docker run flags. This is probably the 
> easiest and I know it's super easy to implement with the external 
> containerizer.

The current Docker containerizer implementations both do this, they support 
CPU, Memory and Ports. Docker currently doesn't support changing these limits 
on a  running container, so you have to go behind docker and write to the 
cgroup limits yourself. There's also no way to change the port mappings of a 
container that I know of.

Hope that answers some of your questions!

Tom.

On 21 Jun 2014, at 00:20, Diptanu Choudhury  wrote:

> Great timing for this thread!
> 
> I have been working on this for the past few months and here is what I am 
> doing and would be nice if docker was supported straight way in Mesos. So 
> here goes the features that I would personally love to see in Mesos Core from 
> the perspective of an user which I had to implement on my own -
> 
> a. Mesos understanding docker metrics which should be straightforward because 
> docker writes all its metrics in the following fashin for cpu, blkio, memory 
> etc - /sys/fs/cgroup/cpu/docker/
> I am sending all these metrics right now as a framework message back to my 
> framework/scheduler but it would be cool if Mesos took care of them.
> 
> b. Easier way to map tasks to docker containers, probably the external 
> containerizer takes care of it to a large extent. It would be helpful if 
> there was a blog about it's API and internals by the core committers 
> explaining the design. Even a simple example in the mesos codebase using the 
> external containerizer would help.
> 
> c. stdout and stderr of docker containers in the Mesos task stdout and stderr 
> logs. Does the external containerizer already takes care of it? I had to 
> write a service which runs on every slave for exposing the container logs to 
> an user.
> 
> d. Mesos GC of tasks taking care of cleaning up docker containers which have 
> terminated. Right now the way I implemented this is that the service which 
> exposes the logs of a container also listens to docker events and when a 
> container exits, it knows that this has to be cleaned up and so removes it 
> after a fixed amount of time[configurable through a Rest API/config file]
> 
> e. Translate all task constraints to docker run flags. This is probably the 
> easiest and I know it's super easy to implement with the external 
> containerizer.
> 
> 
> On Fri, Jun 20, 2014 at 3:40 PM, Tobias Knaup  wrote:
> Hi all,
> 
> We've got a lot of feedback from folks who use Mesos to run Dockers at scale 
> via Deimos, and the main wish was to make Docker a first class citizen in 
> Mesos, instead of a plugin that needs to be installed separately. Mesosphere 
> wants to contribute this and I already chatted with Ben H about what an 
> implementation could look like.
> 
> I'd love for folks on here that are working with Docker to chime in!
> I created a JIRA here: https://issues.apache.org/jira/browse/MESOS-1524
> 
> Cheers,
> 
> Tobi
> 
> 
>

[jira] [Commented] (MESOS-1524) Implement Docker support in Mesos

2014-06-21 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039739#comment-14039739
 ] 

Tom Arnfeld commented on MESOS-1524:


This sounds really great. I think implementing this would be easier now since 
the refactor of the Containerizer classes...?

I'd vote for #2 since i'd guess it's easier (than #3) to support launching 
containers on a remote host (the {{-H}} argument in #1). This is really quite 
useful for development, when running Mesos on a Mac and having docker running 
inside VM, for example.

For me though, this does raise a question. From what I gathered the main idea 
behind the external containerizer was to allow not just pluggability (the c++ 
classes _could_ be made "pluggable") but also ease of use for developers and 
users to get involved. How do you envisage this happening with other isolation 
mechanisms? Will the external containerizer act as a way of proving the 
concept, and eventually support for that containerizer would be implemented 
into Mesos itself?

> Implement Docker support in Mesos
> -
>
> Key: MESOS-1524
> URL: https://issues.apache.org/jira/browse/MESOS-1524
> Project: Mesos
>  Issue Type: Epic
>Reporter: Tobi Knaup
>Assignee: Benjamin Hindman
>
> There have been two projects to add Docker support to Mesos, first via an 
> executor, and more recently via an external containerizer written in Python - 
> Deimos: https://github.com/mesosphere/deimos
> We've got a lot of feedback from folks who use Docker and Mesos, and the main 
> wish was to make Docker a first class citizen in Mesos instead of a plugin 
> that needs to be installed separately. Mesos has been using Linux containers 
> for a long time, first via LXC, then via cgroups, and now also via the 
> external containerizer. For a long time it wasn't clear what the winning 
> technology would be, but with Docker becoming the de-facto standard for 
> handling containers I think Mesos should make it a first class citizen and 
> part of core.
> Let's use this JIRA to track wishes/feedback on the implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1448) Mesos Fetcher doesn't support URLs that have 30X redireects

2014-06-17 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034151#comment-14034151
 ] 

Tom Arnfeld commented on MESOS-1448:


Thanks for the feedback, [~benjaminhindman].

What I mean by breaking the code out;

Since arguably we only need a test for regression (I can't see any tests for 
this code as it stands), and we only need to ensure the correct option is being 
set, we could break out the creation of the curl request to the execution and 
test the creation separately (using {{curl_easy_getopt}}). Not really the best 
kind of test implementation, but it's better than no tests and would be much 
easier to implement.

I'm happy to go ahead and do either, just depends which you'd rather...

> Mesos Fetcher doesn't support URLs that have 30X redireects
> ---
>
> Key: MESOS-1448
> URL: https://issues.apache.org/jira/browse/MESOS-1448
> Project: Mesos
>  Issue Type: Bug
>      Components: slave
>        Reporter: Tom Arnfeld
>Assignee: Tom Arnfeld
>
> The mesos-fetcher program doesn't follow 30X redirects for http:// URLs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1448) Mesos Fetcher doesn't support URLs that have 30X redireects

2014-06-17 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033596#comment-14033596
 ] 

Tom Arnfeld commented on MESOS-1448:


I've submitted a patch to the review board that simply includes the 
{{CURLOPT_FOLLOWLOCATION}} option in stout.

I'm not sure I should just be making changes to stout in the mesos code base, 
since it's a 3rdparty library? Should this change go somewhere else, too?

https://reviews.apache.org/r/22675/

[~ijimenez] should I assign myself to this, or have you started work as well?

On the subject of unit tests, i'd love to add some test coverage for this 
however I'm in two minds.

 - Should we rely on an external resource that sends a Location header?
 - Should we spawn a little local HTTP server (overkill?) that behaves like we 
want?
 - Break the code out so we can verify the option was set without actually 
making a HTTP request

Any other suggestions?

> Mesos Fetcher doesn't support URLs that have 30X redireects
> ---
>
> Key: MESOS-1448
> URL: https://issues.apache.org/jira/browse/MESOS-1448
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Tom Arnfeld
>Assignee: Isabel Jimenez
>
> The mesos-fetcher program doesn't follow 30X redirects for http:// URLs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1447) Include Mesos Version in SlaveInfo protobuf

2014-06-16 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033197#comment-14033197
 ] 

Tom Arnfeld commented on MESOS-1447:


Hmm, that's an interesting one. I think you're right – it's probably best to go 
ahead and just do MESOS-986 and implement it in the UI off the back of that.

> Include Mesos Version in SlaveInfo protobuf
> ---
>
> Key: MESOS-1447
> URL: https://issues.apache.org/jira/browse/MESOS-1447
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Affects Versions: 0.19.0
>    Reporter: Tom Arnfeld
>Assignee: Tom Arnfeld
>Priority: Minor
>
> When rolling out a new deployment of mesos across a large cluster, it'd be 
> very useful to see in the Web UI (the Slaves tab) which slave is running 
> which version of Mesos.
> I think this would be fairly easy to achieve, simply adding a new field into 
> the {{SlaveInfo}} protobuf, and setting the value to some kind of version 
> constant (if anyone knows of one that exists, that'd be great).
> Plus a new column in the slaves table on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [review] draft 0.19.0 blog post

2014-06-12 Thread Tom Arnfeld
Nice one! #shipit

> On 12 Jun 2014, at 19:59, Chris Aniszczyk  wrote:
>
> #shipit
>
>
>> On Thu, Jun 12, 2014 at 1:53 PM, Vinod Kone  wrote:
>>
>> #shipit
>>
>>
>> On Thu, Jun 12, 2014 at 9:10 AM, Niklas Nielsen 
>> wrote:
>>
>>> #shipit
>>>
>>>
 On 12 June 2014 08:23, Dave Lester  wrote:

 Trying one more time, attaching the blog post (formatting still in
 markdown) as a txt file.


> On Thu, Jun 12, 2014 at 8:17 AM, Dave Lester 
 wrote:

> Hi All,
>
> Attached is a draft blog post announcing the 0.19.0 release that Ben
> Mahler wrote up as release manager.
>
> I'd love to get this on the Mesos website before 1pm today -- if folks
> could help review it, provide any comments/suggestions, and give a
>>> #shipit
> that'd be great.
>
> Thanks!
>
> Dave
>
> ps: the file is in markdown format
>
>
>
> --
> Cheers,
>
> Chris Aniszczyk | Open Source | Twitter, Inc.
> @cra | +1 512 961 6719


Re: [VOTE] Release Apache Mesos 0.19.0 (rc3)

2014-06-07 Thread Tom Arnfeld
+1 from me. Tested on OSX Mavericks with gcc 4.8 and compiled+deployed on 
Debian 7.3 with gcc 4.7.2.

I’ve deployed RC3 to our 17 node cluster and it’s been working very well (aside 
from https://issues.apache.org/jira/browse/MESOS-1462 but that’s for 0.19.1)! 
Not really the email thread (sorry) but if anyone fancies taking a look at my 
patch for the Hadoop framework (https://github.com/mesos/hadoop/pull/20) i’ve 
been using that the past few days and it’s working nicely, not noticed any 
other bugs in the external containerizer either.

Big thanks to everyone involved in bringing this through to an alpha release… 
specifically the EC! :-)

Tom.

On 7 Jun 2014, at 18:52, Niklas Nielsen  wrote:

> +1, tested on Mac OS X (clang-503.0.40) and Ubuntu 13.10 (gcc 4.8.1)
> 
> Thanks for the hard work everyone - this is going to be a great release!
> 
> Niklas
> 
> 
> On Thu, Jun 5, 2014 at 11:43 PM, Benjamin Mahler 
> wrote:
> 
>> Hi all,
>> 
>> Please vote on releasing the following candidate as Apache Mesos 0.19.0.
>> Note that I accidentally released the 0.19.0 jar in maven when running the
>> vote script so we'll have to discuss what to do if we require any jar
>> changes. Apologies!
>> 
>> 0.19.0 includes the following:
>> 
>> 
>> The initial release of the registrar, which adds replicated state in the
>> master.
>> Added support for slave authentication.
>> Overhauled metrics reporting.
>> Initial support for external containerization strategies.
>> Numerous bug fixes.
>> 
>> The CHANGELOG for the release is available at:
>> 
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.19.0-rc3
>> 
>> 
>> 
>> The candidate for Mesos 0.19.0 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/0.19.0-rc3/mesos-0.19.0.tar.gz
>> 
>> The tag to be voted on is 0.19.0-rc3:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.19.0-rc3
>> 
>> The MD5 checksum of the tarball can be found at:
>> 
>> https://dist.apache.org/repos/dist/dev/mesos/0.19.0-rc3/mesos-0.19.0.tar.gz.md5
>> 
>> The signature of the tarball can be found at:
>> 
>> https://dist.apache.org/repos/dist/dev/mesos/0.19.0-rc3/mesos-0.19.0.tar.gz.asc
>> 
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>> 
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1023/
>> 
>> Please vote on releasing this package as Apache Mesos 0.19.0!
>> 
>> The vote is open until Monday June 9 10:00:00 PDT 2014 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Mesos 0.19.0
>> [ ] -1 Do not release this package because ...
>> 
>> Thanks,
>> Ben
>> 
> 
> 
> 
> -- 
> Niklas



[jira] [Commented] (MESOS-1462) External Containerizer can leave a task indefinitely in STAGING if the `launch` fails

2014-06-06 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020555#comment-14020555
 ] 

Tom Arnfeld commented on MESOS-1462:


[~bmahler] Fair enough. Assigned to Till.

> External Containerizer can leave a task indefinitely in STAGING if the 
> `launch` fails
> -
>
> Key: MESOS-1462
> URL: https://issues.apache.org/jira/browse/MESOS-1462
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
>Reporter: Tom Arnfeld
>Assignee: Till Toenshoff
>Priority: Blocker
>
> Not sure where else to create issues regarding RC software, but I guess here 
> is as good as anywhere to record it...
> I mentioned it to [~tillt] before so i'm not sure if he thought of a fix (or 
> has one unpushed). Essentially when you launch a task through an external 
> containerizer, if the {{launch}} command of the external process fails for 
> whatever reason (e.g. the fetcher throws an error) the task will sit in the 
> STAGING state and never be terminated.
> At this point, I think it's acceptable to think executor hasn't registered 
> yet, though that's not guaranteed. I'm seeing this behaviour on 0.19.0-rc3.
> Ping [~bmahler] - I think this is worth holding up the vote to investigate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MESOS-1462) External Containerizer can leave a task indefinitely in STAGING if the `launch` fails

2014-06-06 Thread Tom Arnfeld (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Arnfeld updated MESOS-1462:
---

Assignee: Till Toenshoff

> External Containerizer can leave a task indefinitely in STAGING if the 
> `launch` fails
> -
>
> Key: MESOS-1462
> URL: https://issues.apache.org/jira/browse/MESOS-1462
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
>    Reporter: Tom Arnfeld
>Assignee: Till Toenshoff
>Priority: Blocker
>
> Not sure where else to create issues regarding RC software, but I guess here 
> is as good as anywhere to record it...
> I mentioned it to [~tillt] before so i'm not sure if he thought of a fix (or 
> has one unpushed). Essentially when you launch a task through an external 
> containerizer, if the {{launch}} command of the external process fails for 
> whatever reason (e.g. the fetcher throws an error) the task will sit in the 
> STAGING state and never be terminated.
> At this point, I think it's acceptable to think executor hasn't registered 
> yet, though that's not guaranteed. I'm seeing this behaviour on 0.19.0-rc3.
> Ping [~bmahler] - I think this is worth holding up the vote to investigate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MESOS-1462) External Containerizer can leave a task indefinitely in STAGING if the `launch` fails

2014-06-06 Thread Tom Arnfeld (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Arnfeld updated MESOS-1462:
---

Description: 
Not sure where else to create issues regarding RC software, but I guess here is 
as good as anywhere to record it...

I mentioned it to [~tillt] before so i'm not sure if he thought of a fix (or 
has one unpushed). Essentially when you launch a task through an external 
containerizer, if the {{launch}} command of the external process fails for 
whatever reason (e.g. the fetcher throws an error) the task will sit in the 
STAGING state and never be terminated.

At this point, I think it's acceptable to think executor hasn't registered yet, 
though that's not guaranteed. I'm seeing this behaviour on 0.19.0-rc3.

Ping [~bmahler] - I think this is worth holding up the vote to investigate.

  was:
Not sure where else to create issues regarding RC software, but I guess here is 
as good as anywhere to record it...

I mentioned it to [~tillt] before so i'm not sure if he thought of a fix (or 
has one unpushed). Essentially when you launch a task through an external 
containerizer, if the {{launch}} command of the external process fails for 
whatever reason (e.g. the fetcher throws an error) the task will sit in the 
STAGING state and never be terminated.

At this point, I think it's acceptable to think executor hasn't registered yet, 
though that's not guaranteed.

Ping [~bmahler] - I think this is worth holding up the vote to investigate.


> External Containerizer can leave a task indefinitely in STAGING if the 
> `launch` fails
> -
>
> Key: MESOS-1462
> URL: https://issues.apache.org/jira/browse/MESOS-1462
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
>Reporter: Tom Arnfeld
>Priority: Blocker
>
> Not sure where else to create issues regarding RC software, but I guess here 
> is as good as anywhere to record it...
> I mentioned it to [~tillt] before so i'm not sure if he thought of a fix (or 
> has one unpushed). Essentially when you launch a task through an external 
> containerizer, if the {{launch}} command of the external process fails for 
> whatever reason (e.g. the fetcher throws an error) the task will sit in the 
> STAGING state and never be terminated.
> At this point, I think it's acceptable to think executor hasn't registered 
> yet, though that's not guaranteed. I'm seeing this behaviour on 0.19.0-rc3.
> Ping [~bmahler] - I think this is worth holding up the vote to investigate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MESOS-1462) External Containerizer can leave a task indefinitely in STAGING if the `launch` fails

2014-06-06 Thread Tom Arnfeld (JIRA)
Tom Arnfeld created MESOS-1462:
--

 Summary: External Containerizer can leave a task indefinitely in 
STAGING if the `launch` fails
 Key: MESOS-1462
 URL: https://issues.apache.org/jira/browse/MESOS-1462
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.19.0
Reporter: Tom Arnfeld
Priority: Blocker


Not sure where else to create issues regarding RC software, but I guess here is 
as good as anywhere to record it...

I mentioned it to [~tillt] before so i'm not sure if he thought of a fix (or 
has one unpushed). Essentially when you launch a task through an external 
containerizer, if the {{launch}} command of the external process fails for 
whatever reason (e.g. the fetcher throws an error) the task will sit in the 
STAGING state and never be terminated.

At this point, I think it's acceptable to think executor hasn't registered yet, 
though that's not guaranteed.

Ping [~bmahler] - I think this is worth holding up the vote to investigate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: External Containerizer / 0.19.0 / Mesos+Docker

2014-06-04 Thread Tom Arnfeld
Hi Jay,

> Could you go into details on this?  Are you saying deimos doesn't work
with the latest mesos in the github.com tree?

I've been speaking to Jason the past couple of days, and Deimos has now
been brought inline with the RC External Containerizer API.


On 4 June 2014 19:21, Jay Buffington  wrote:

> + solidsnack, the deimos guy.
>
> On Sun, Jun 1, 2014 at 10:14 AM, Tom Arnfeld  wrote:
> >
> > 2) Specifically relating to Mesos and Docker (Deimos + @Mesosphere)
> >
> > Correct me if i’m wrong (and if there’s something un-pushed to Github)
> but
> > it seems the Deimos (Docker Containerizer) has fallen behind the
> > architecture of the EC quite significantly. I guess you’re all working on
> > the big release you have coming up, and it’d be useful to know if any
> time
> > will be dedicated to the project in the short term.
> >
>
> Could you go into details on this?  Are you saying deimos doesn't work with
> the latest mesos in the github.com tree?
>
> Also, where is the fork of marathon that puts the ContainerInfo into
> CommandInfo so that it uses deimos?  I could only find the .deb
>
> Jay
>


Re: 0.19.0 Testing

2014-06-02 Thread Tom Arnfeld
That's exciting! I'd also love if anyone could try out the Docker
Containerizer I've been working on...
https://github.com/duedil-ltd/mesos-docker-containerizer (The deimos
project from mesosphere isn't compatible with the EC release
candidate, or not any version I could find).

I'm in the process of deploying it to a pre-production cluster here at
DueDil (pretty small scale though, 156cpus) to test it out.

Looking forward to the release!

Tom.

> On 2 Jun 2014, at 19:55, Benjamin Mahler  wrote:
>
> The target of the end of this week is nice because we can have the new
> release out by the start of dockercon. I will make sure this is tested in a
> production environment at least at Twitter before calling the vote. Would
> appreciate others pitching in as well.
>
> If there is anyone that would like to have more time to vet this release
> please chime in! For External Containerizer related bugs, I would prefer to
> follow up with 0.19.x bug fix releases through Till or Niklas.


[jira] [Created] (MESOS-1448) Mesos Fetcher doesn't support URLs that have 30X redireects

2014-06-02 Thread Tom Arnfeld (JIRA)
Tom Arnfeld created MESOS-1448:
--

 Summary: Mesos Fetcher doesn't support URLs that have 30X 
redireects
 Key: MESOS-1448
 URL: https://issues.apache.org/jira/browse/MESOS-1448
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Tom Arnfeld


The mesos-fetcher program doesn't follow 30X redirects for http:// URLs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


External Containerizer / 0.19.0 / Mesos+Docker

2014-06-01 Thread Tom Arnfeld
Hey Everyone,

I’m in the process of attempting to productionize the external containerizer 
(well, we’re still in the mesos proof-of-concept phase but treating it as we 
would production) and have a couple of things I wanted to bring up. It’d be 
great to get some feedback from anyone working on the EC, and they’re thoughts 
on the below…

Note: I’m aware this is all considered alpha software, but it’s quite crucial 
for our deployment so i’m very keen to push it forward. :-)

1) Documentation for the EC API

This doesn’t seem to exist from what I can find, though I may be missing 
something. Specifically, details on the different external containerizer 
methods, why they exist and what function they are expected to perform. From my 
view, because quite a lot of state is shared between the slave and the external 
process (and some recovery/reconciliation happens), various examples for what 
should happen and when would be very useful for a user. I guess the unit tests 
do go so far with this.

2) Specifically relating to Mesos and Docker (Deimos + @Mesosphere)

Correct me if i’m wrong (and if there’s something un-pushed to Github) but it 
seems the Deimos (Docker Containerizer) has fallen behind the architecture of 
the EC quite significantly. I guess you’re all working on the big release you 
have coming up, and it’d be useful to know if any time will be dedicated to the 
project in the short term.

IMO Docker is realistically going the be the first mainstream use of the EC and 
though there are various other efforts to marry Mesos with Docker, the external 
containerizer is the most transparent and is designed to serve a specific 
purpose that can’t be achieved elsewhere. It’d be very exciting to have 
something in a workable state that could demonstrate the abilities of the 
containerizer for when 0.19.0 is released.

3) Has anyone in the community (or at Twitter) been testing out the external 
containerizer at any scale, and with any significant workloads? I presume not 
since the EC requires some kind of properly built tooling on the other end 
(a.k.a Deimos) to do the containerization, and i’m not aware of anything else 
in the community.

Looking forward to thoughts,

Tom.

--
Tom Arnfeld
Developer // DueDil

t...@duedil.com
(+44) 7525940046




25 Christopher Street, London, EC2A 2BS
Company Number: 06999618

What is DueDil? |  Product features  |  Try it for free



[jira] [Created] (MESOS-1447) Include Mesos Version in SlaveInfo protobuf

2014-06-01 Thread Tom Arnfeld (JIRA)
Tom Arnfeld created MESOS-1447:
--

 Summary: Include Mesos Version in SlaveInfo protobuf
 Key: MESOS-1447
 URL: https://issues.apache.org/jira/browse/MESOS-1447
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.19.0
Reporter: Tom Arnfeld
Assignee: Tom Arnfeld
Priority: Minor


When rolling out a new deployment of mesos across a large cluster, it'd be very 
useful to see in the Web UI (the Slaves tab) which slave is running which 
version of Mesos.

I think this would be fairly easy to achieve, simply adding a new field into 
the {{SlaveInfo}} protobuf, and setting the value to some kind of version 
constant (if anyone knows of one that exists, that'd be great).

Plus a new column in the slaves table on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: 0.19.0 Testing

2014-05-30 Thread Tom Arnfeld
Hey Ben,

When’s the vote going out? I’m in the process of deploying 0.19.0-rc1 (and 
testing with Deimos) to our pre-production cluster to see how it goes… will 
report back if I find anything odd.

Tom.

On 30 May 2014, at 17:48, Benjamin Mahler  wrote:

> Hi All,
> 
> I've pushed a tag for the first release candidate for 0.19.0: 0.19.0-rc1
> 
> I would like to ask others to assist in testing this candidate so that we can 
> catch anything obvious before the vote!
> 
> Thanks,
> Ben



Re: Website updated to reflect 0.18.2 release, blog post still not online

2014-05-29 Thread Tom Arnfeld
+1

On 29 May 2014, at 22:55, Dave Lester  wrote:

> Hi Niklas,
> 
> Hope you're OK with me moving this back to the dev list for further 
> discussion.
> 
> I've made some minor changes to the blog post, specifically merging the 
> 0.18.2 and 0.18.1 into a single post so the gap between 0.18.0 and 0.18.2 is 
> clear. My diff is attached and named with my last name in the filename.
> 
> Would like your approval and that of another committer, and then let's 
> #shipit.
> 
> Dave
> 
> -- Forwarded message --
> From: Niklas Nielsen 
> Date: Wed, May 28, 2014 at 5:20 PM
> Subject: Re: Website updated to reflect 0.18.2 release, blog post still not 
> online
> To: Dave Lester 
> 
> 
> Hi Dave,
> 
> Again, I apologize for the tardiness. I have it ready now - would you mind 
> take a look before I commit to the public svn?
> 
> Best,
> Niklas
> 
> 
> On Sat, May 24, 2014 at 12:22 PM, Dave Lester  wrote:
> Hi all,
> 
> This morning I updated the Mesos website to reflect the 0.18.2 release.
> 
> Niklas, as release manager are you still planning to write a blog post 
> explaining 0.18.2, and 0.18.1?
> 
> It'd be great to continue continue our practice of having release managers 
> blog with each new release, since it drives a lot of traffic to the Apache 
> website and also promotes transparency around our releases for folks that may 
> not be following the day-to-day activity of the dev list.
> 
> Best,
> Dave
> 
> 
> 
> -- 
> Niklas
> 
> 



Re: Review Request 21316: Preview of MESOS-336

2014-05-29 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21316/#review44301
---



include/mesos/mesos.proto
<https://reviews.apache.org/r/21316/#comment78653>

What's the reason you're only caching on a per-framework basis? Surely if 
the URI is identical the content would be... If I have multiple Hadoop or Spark 
clients running but using the same executor, it's a shame they wouldn't share a 
cache entry.



include/mesos/mesos.proto
<https://reviews.apache.org/r/21316/#comment78654>

Why not cache by default?



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/21316/#comment78655>

Is this really needed?



src/slave/flags.hpp
<https://reviews.apache.org/r/21316/#comment78656>

The naming of this is a little confusing, it's actually a cache rather than 
the directory URIs are "fetched" to.. the final outcome of "fetching" is 
actually the executor sandbox, no?


- Tom Arnfeld


On May 27, 2014, 11:02 a.m., Bernd Mathiske wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21316/
> ---
> 
> (Updated May 27, 2014, 11:02 a.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Bugs: MESOS-336
> https://issues.apache.org/jira/browse/MESOS-336
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Preview of the first cut at fetcher caching. See MESOS-336 JIRA for 
> explanation for this approach: keep the cache info in the 
> MesosContainerizerProcess in the save, leverage actor single-threadedness to 
> deal with concurrency issues without head ache. 
> 
> Features so far:
> - If URI flag "fetched_externally" (default: false) is set,  the fetcher does 
> what it did in Mesos 0.18 and before.
> - If URI flag "cached" (default: false) is not set, the fetcher also fetches 
> every time as in Mesos 0.18 and before.
> - If URI flag "cached" is set, the UIR is only fetched once and all 
> subsequent fetch attempts copy from the cache file.
> - URIs are cached separately per framework (ID).
> - Recovery is implemented by simply wiping the entire cache.
> - GC for cache files. Global flag sets lifetime after last use. Default is 1 
> hour.
> 
> Potential future features:
> - symlinks instead of copying
> - extraction directly from URI, without cache file
> - combine that with symlinks
> - Refreshing, explicit cache invalidation
> - ...
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto ce780ca 
>   src/Makefile.am ae576c5 
>   src/launcher/fetcher.cpp c4425eb 
>   src/local/local.cpp 5d26aff 
>   src/slave/constants.hpp ace4590 
>   src/slave/constants.cpp 51f65bb 
>   src/slave/containerizer/mesos_containerizer.hpp 1f5908a 
>   src/slave/containerizer/mesos_containerizer.cpp d01d443 
>   src/slave/containerizer/mesos_fetcher.hpp PRE-CREATION 
>   src/slave/containerizer/mesos_fetcher.cpp PRE-CREATION 
>   src/slave/flags.hpp 15e5b64 
>   src/slave/slave.hpp 769bd00 
>   src/slave/slave.cpp a4b9570 
>   src/tests/containerizer_tests.cpp 2f4888d 
>   src/tests/fetcher_tests.cpp PRE-CREATION 
>   src/tests/slave_tests.cpp 85ca5c4 
> 
> Diff: https://reviews.apache.org/r/21316/diff/
> 
> 
> Testing
> ---
> 
> Tests need to be written, i.e. this is not commit-ready, just a preview that 
> shows everybody how it works. Tested with Mesosaurus, with and without 
> caching switched on. Seems to work for the easy cases, have not tested all 
> corner cases yet (e.g. extraction on/off, intermittent gc, ...).
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>



Re: Review Request 22028: Cut 0.19.0 off of master.

2014-05-29 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22028/#review44281
---

Ship it!


Ship It!

- Tom Arnfeld


On May 29, 2014, 6:13 p.m., Ben Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22028/
> ---
> 
> (Updated May 29, 2014, 6:13 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Till Toenshoff, and Vinod Kone.
> 
> 
> Bugs: MESOS-1311
> https://issues.apache.org/jira/browse/MESOS-1311
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> All the blockers for the release are now fixed and committed.
> 
> 
> Diffs
> -
> 
>   configure.ac 1ebd19623c837dec205136a8aa0b1497b8dee6ce 
> 
> Diff: https://reviews.apache.org/r/22028/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>



[jira] [Commented] (MESOS-1405) Mesos fetcher does not support S3(n)

2014-05-23 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007023#comment-14007023
 ] 

Tom Arnfeld commented on MESOS-1405:


{quote}
Which implementation do you mean by "this" in "This implementation also isn't 
really very scalable"?
{quote}

Fair point, i'll make my comment a little clearer.

{quote}
Besides, we should be able to make adding custom URI schemes pluggable, even 
with MesosContainerizer, but I suggest to make that another ticket once the 
fetcher code has settled.
{quote}

I completely agree, it's definitely out of the scope of this fix. I still think 
this is a valid issue and fix on it's own, though. It's nice for users to be 
able to uses S3 directly for their mesos executors (currently you can store 
things in S3 and use ACLs to make the keys available over HTTP to mesos).


> Mesos fetcher does not support S3(n)
> 
>
> Key: MESOS-1405
> URL: https://issues.apache.org/jira/browse/MESOS-1405
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.18.2
>Reporter: Tom Arnfeld
>Assignee: Tom Arnfeld
>Priority: Minor
>
> The HDFS client is able to support both S3 and S3N. Details for the 
> difference between the two can be found here: 
> http://wiki.apache.org/hadoop/AmazonS3.
> Examples:
> s3://bucket/path.tar.gz <- S3 Block Store
> s3n://bucket/path.tar.gz <- S3 K/V Store
> Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) 
> and let hadoop do the work, or we can integrate with S3 directly. The latter 
> then requires we have a way of managing S3 credentials, whereas using the 
> HDFS client will just pull credentials from HADOOP_HOME.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MESOS-1405) Mesos fetcher does not support S3(n)

2014-05-23 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007010#comment-14007010
 ] 

Tom Arnfeld edited comment on MESOS-1405 at 5/23/14 9:55 AM:
-

Review request: https://reviews.apache.org/r/21852/

*Before*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 
'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 
'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path 
'/tmp/test-fetch'
{code}

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed 
for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. 
Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://bucket-test/tom/test-fetch
{code}

Here we can see the fetcher classes the URI as a relative path (and since i've 
not set all the environment variables it throws an error, trying to resolve the 
path on the local filesystem).

*After*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 
's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path 
'/tmp/test-fetch'
{code}

I'm not sure if we should just incorporate this change into your work 
[~bernd-mesos] – or if it's something you've already done? This implementation 
also isn't really very scalable, if we want to maintain good compatibility with 
the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos 
to pass their custom URIs through to hadoop. An example here is if a user was 
using GlusterFS instead of HDFS.


was (Author: tarnfeld):
Review request: https://reviews.apache.org/r/21852/

*Before*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 
'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 
'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path 
'/tmp/test-fetch'
{code}

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed 
for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. 
Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://home.duedil.com/tom/test-fetch
{code}

Here we can see the fetcher classes the URI as a relative path (and since i've 
not set all the environment variables it throws an error, trying to resolve the 
path on the local filesystem).

*After*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 
's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path 
'/tmp/test-fetch'
{code}

I'm not sure if we should just incorporate this change into your work 
[~bernd-mesos] – or if it's something you've already done? This implementation 
also isn't really very scalab

[jira] [Commented] (MESOS-1405) Mesos fetcher does not support S3(n)

2014-05-23 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007010#comment-14007010
 ] 

Tom Arnfeld commented on MESOS-1405:


Review request:

*Before*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 
'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 
'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path 
'/tmp/test-fetch'
{code}

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed 
for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. 
Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://home.duedil.com/tom/test-fetch
{code}

Here we can see the fetcher classes the URI as a relative path (and since i've 
not set all the environment variables it throws an error, trying to resolve the 
path on the local filesystem).

*After*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 
's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path 
'/tmp/test-fetch'
{code}

I'm not sure if we should just incorporate this change into your work 
[~bernd-mesos] – or if it's something you've already done? This implementation 
also isn't really very scalable, if we want to maintain good compatibility with 
the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos 
to pass their custom URIs through to hadoop. An example here is if a user was 
using GlusterFS instead of HDFS.

> Mesos fetcher does not support S3(n)
> 
>
> Key: MESOS-1405
> URL: https://issues.apache.org/jira/browse/MESOS-1405
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.18.2
>Reporter: Tom Arnfeld
>Assignee: Tom Arnfeld
>Priority: Minor
>
> The HDFS client is able to support both S3 and S3N. Details for the 
> difference between the two can be found here: 
> http://wiki.apache.org/hadoop/AmazonS3.
> Examples:
> s3://bucket/path.tar.gz <- S3 Block Store
> s3n://bucket/path.tar.gz <- S3 K/V Store
> Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) 
> and let hadoop do the work, or we can integrate with S3 directly. The latter 
> then requires we have a way of managing S3 credentials, whereas using the 
> HDFS client will just pull credentials from HADOOP_HOME.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MESOS-1405) Mesos fetcher does not support S3(n)

2014-05-23 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007010#comment-14007010
 ] 

Tom Arnfeld edited comment on MESOS-1405 at 5/23/14 9:53 AM:
-

Review request: https://reviews.apache.org/r/21852/

*Before*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 
'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 
'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path 
'/tmp/test-fetch'
{code}

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed 
for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. 
Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://home.duedil.com/tom/test-fetch
{code}

Here we can see the fetcher classes the URI as a relative path (and since i've 
not set all the environment variables it throws an error, trying to resolve the 
path on the local filesystem).

*After*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 
's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path 
'/tmp/test-fetch'
{code}

I'm not sure if we should just incorporate this change into your work 
[~bernd-mesos] – or if it's something you've already done? This implementation 
also isn't really very scalable, if we want to maintain good compatibility with 
the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos 
to pass their custom URIs through to hadoop. An example here is if a user was 
using GlusterFS instead of HDFS.


was (Author: tarnfeld):
Review request:

*Before*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 
'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 
'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path 
'/tmp/test-fetch'
{code}

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed 
for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. 
Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://home.duedil.com/tom/test-fetch
{code}

Here we can see the fetcher classes the URI as a relative path (and since i've 
not set all the environment variables it throws an error, trying to resolve the 
path on the local filesystem).

*After*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 
's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path 
'/tmp/test-fetch'
{code}

I'm not sure if we should just incorporate this change into your work 
[~bernd-mesos] – or if it's something you've already done? This implementation 
also isn't really very scalable, if we want to maintain good compatibility wit

[jira] [Commented] (MESOS-1415) Web UI master redirect message doesn't show up

2014-05-23 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007001#comment-14007001
 ] 

Tom Arnfeld commented on MESOS-1415:


https://reviews.apache.org/r/21850/

> Web UI master redirect message doesn't show up
> --
>
> Key: MESOS-1415
> URL: https://issues.apache.org/jira/browse/MESOS-1415
> Project: Mesos
>  Issue Type: Bug
>Reporter: Tom Arnfeld
>Assignee: Tom Arnfeld
>Priority: Trivial
>
> When you go to one of the masters that isn't a leader, the little message 
> telling you you're about to be redirected. Looks to me like a tiny css fix, 
> and also a class change to match the naming convention of the latest twitter 
> bootstrap ({{alert-error}} -> {{alert-danger}}).
> Patch incoming...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (MESOS-1415) Web UI master redirect message doesn't show up

2014-05-23 Thread Tom Arnfeld (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Arnfeld reassigned MESOS-1415:
--

Assignee: Tom Arnfeld

> Web UI master redirect message doesn't show up
> --
>
> Key: MESOS-1415
> URL: https://issues.apache.org/jira/browse/MESOS-1415
> Project: Mesos
>  Issue Type: Bug
>Reporter: Tom Arnfeld
>    Assignee: Tom Arnfeld
>Priority: Trivial
>
> When you go to one of the masters that isn't a leader, the little message 
> telling you you're about to be redirected. Looks to me like a tiny css fix, 
> and also a class change to match the naming convention of the latest twitter 
> bootstrap ({{alert-error}} -> {{alert-danger}}).
> Patch incoming...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MESOS-1415) Web UI master redirect message doesn't show up

2014-05-23 Thread Tom Arnfeld (JIRA)
Tom Arnfeld created MESOS-1415:
--

 Summary: Web UI master redirect message doesn't show up
 Key: MESOS-1415
 URL: https://issues.apache.org/jira/browse/MESOS-1415
 Project: Mesos
  Issue Type: Bug
Reporter: Tom Arnfeld
Priority: Trivial


When you go to one of the masters that isn't a leader, the little message 
telling you you're about to be redirected. Looks to me like a tiny css fix, and 
also a class change to match the naming convention of the latest twitter 
bootstrap ({{alert-error}} -> {{alert-danger}}).

Patch incoming...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1405) Mesos fetcher does not support S3(n)

2014-05-22 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006053#comment-14006053
 ] 

Tom Arnfeld commented on MESOS-1405:


> I like the hdfs client solution. That seems like a trivial addition to the 
> fetcher.

Yeah, it should simply be an addition to the if statement to check whether it's 
hdfs://. I'll take this one, then.

> Mesos fetcher does not support S3(n)
> 
>
> Key: MESOS-1405
> URL: https://issues.apache.org/jira/browse/MESOS-1405
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.18.2
>Reporter: Tom Arnfeld
>Priority: Minor
>
> The HDFS client is able to support both S3 and S3N. Details for the 
> difference between the two can be found here: 
> http://wiki.apache.org/hadoop/AmazonS3.
> Examples:
> s3://bucket/path.tar.gz <- S3 Block Store
> s3n://bucket/path.tar.gz <- S3 K/V Store
> Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) 
> and let hadoop do the work, or we can integrate with S3 directly. The latter 
> then requires we have a way of managing S3 credentials, whereas using the 
> HDFS client will just pull credentials from HADOOP_HOME.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (MESOS-1405) Mesos fetcher does not support S3(n)

2014-05-22 Thread Tom Arnfeld (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Arnfeld reassigned MESOS-1405:
--

Assignee: Tom Arnfeld

> Mesos fetcher does not support S3(n)
> 
>
> Key: MESOS-1405
> URL: https://issues.apache.org/jira/browse/MESOS-1405
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.18.2
>    Reporter: Tom Arnfeld
>    Assignee: Tom Arnfeld
>Priority: Minor
>
> The HDFS client is able to support both S3 and S3N. Details for the 
> difference between the two can be found here: 
> http://wiki.apache.org/hadoop/AmazonS3.
> Examples:
> s3://bucket/path.tar.gz <- S3 Block Store
> s3n://bucket/path.tar.gz <- S3 K/V Store
> Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) 
> and let hadoop do the work, or we can integrate with S3 directly. The latter 
> then requires we have a way of managing S3 credentials, whereas using the 
> HDFS client will just pull credentials from HADOOP_HOME.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MESOS-1405) Mesos fetcher does not support S3(n)

2014-05-22 Thread Tom Arnfeld (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Arnfeld updated MESOS-1405:
---

Description: 
The HDFS client is able to support both S3 and S3N. Details for the difference 
between the two can be found here: http://wiki.apache.org/hadoop/AmazonS3.

Examples:

s3://bucket/path.tar.gz <- S3 Block Store
s3n://bucket/path.tar.gz <- S3 K/V Store

Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) and 
let hadoop do the work, or we can integrate with S3 directly. The latter then 
requires we have a way of managing S3 credentials, whereas using the HDFS 
client will just pull credentials from HADOOP_HOME.

  was:
The HDFS client is able to support both S3 and S3N. Details for the difference 
between the two can be found here: http://wiki.apache.org/hadoop/AmazonS3.

Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) and 
let hadoop do the work, or we can integrate with S3 directly. The latter then 
requires we have a way of managing S3 credentials, whereas using the HDFS 
client will just pull credentials from HADOOP_HOME.


> Mesos fetcher does not support S3(n)
> 
>
> Key: MESOS-1405
> URL: https://issues.apache.org/jira/browse/MESOS-1405
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.18.2
>Reporter: Tom Arnfeld
>Priority: Minor
>
> The HDFS client is able to support both S3 and S3N. Details for the 
> difference between the two can be found here: 
> http://wiki.apache.org/hadoop/AmazonS3.
> Examples:
> s3://bucket/path.tar.gz <- S3 Block Store
> s3n://bucket/path.tar.gz <- S3 K/V Store
> Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) 
> and let hadoop do the work, or we can integrate with S3 directly. The latter 
> then requires we have a way of managing S3 credentials, whereas using the 
> HDFS client will just pull credentials from HADOOP_HOME.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MESOS-1405) Mesos fetcher does not support S3(n)

2014-05-22 Thread Tom Arnfeld (JIRA)
Tom Arnfeld created MESOS-1405:
--

 Summary: Mesos fetcher does not support S3(n)
 Key: MESOS-1405
 URL: https://issues.apache.org/jira/browse/MESOS-1405
 Project: Mesos
  Issue Type: Improvement
Affects Versions: 0.18.2
Reporter: Tom Arnfeld
Priority: Minor


The HDFS client is able to support both S3 and S3N. Details for the difference 
between the two can be found here: http://wiki.apache.org/hadoop/AmazonS3.

Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) and 
let hadoop do the work, or we can integrate with S3 directly. The latter then 
requires we have a way of managing S3 credentials, whereas using the HDFS 
client will just pull credentials from HADOOP_HOME.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


MESOS-695 / Automated self-healing and coordinated repair to Mesos

2014-05-16 Thread Tom Arnfeld
Hi all,

Wasn’t sure if it was right to start this thread on the JIRA issue.. I just 
came across MESOS-695 (and what seems to be something almost finished!) about 
implementing some kind of self-healing mechanism in mesos, and also picked up 
on mentions of monit. From what I could tell based on the comments a while 
back, Twitter uses monit for health checking the slaves and monit will take 
over and restart the slave process if something funky is going on.

I’m a big fan of monit, so this peaks my interest...

1) I’d be interested in knowing what monit rules are defined for a “failing” or 
misbehaving slave, if this can be shared, or a correction on how monit is being 
used with mesos at Twitter.
2) This may already exist outside the community, but has there been discussion 
of writing a monit plugin to achieve this? This way you not only have a way of 
telling monit the slave needs a restart, but alerting comes free with it.

I’m not too familiar with the implementation of this self-healing mechanism, 
but I assume one benefit of it being implemented in/around the master process 
is that it can gain a much wider view of what “misbehaving” means, in relation 
to all nodes in the cluster. The monitoring being outside-in rather than 
inside-out, a little similar to Hadoop’s blacklisting feature…?

Thanks!

Tom.



Re: [VOTE] Release Apache Mesos 0.18.2 (rc1)

2014-05-16 Thread Tom Arnfeld
+1 make check on OSX 10.9.1 (gcc-4.8)

> On 15 May 2014, at 15:55, Niklas Nielsen  wrote:
>
> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 0.18.2.
>
>
> 0.18.2 includes the following:
> 
> [MESOS-1313] - The executor bit is now essentially ignored with the 0.18.1
> fetcher implementation
>
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.18.2-rc1
> 
>
> The candidate for Mesos 0.18.2 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/0.18.2-rc1/mesos-0.18.2.tar.gz
>
> The tag to be voted on is 0.18.2-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.18.2-rc1
>
> The MD5 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/0.18.2-rc1/mesos-0.18.2.tar.gz.md5
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/0.18.2-rc1/mesos-0.18.2.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1020
>
> Please vote on releasing this package as Apache Mesos 0.18.2!
>
> The vote is open until Sat May 17 12:04:39 PDT 2014 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 0.18.2
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Niklas


Re: Where did 0.18.1 go? Suggesting 0.18.2

2014-05-15 Thread Tom Arnfeld
Definitely +1.

On 13 May 2014, at 18:54, Benjamin Hindman  wrote:

> +1!
> 
> 
> On Tue, May 13, 2014 at 9:51 AM, Niklas Nielsen  wrote:
> Hey everyone,
> 
> First and foremost, I apologize for the radio silence on my part with regards 
> to the 0.18.1 release. We didn't announce it or make it public on the website.
> The reason is that a bug in the mesos-fetcher got it's way in and would 
> render 0.18.1 not useful for production settings 
> (https://issues.apache.org/jira/browse/MESOS-1313)
> 
> I suggest yet another bug-fix release 0.18.2 which cherry-pick 
> https://reviews.apache.org/r/21127/, expedite it and have it ready by EOW.
> 
> I'd love some (quick) input before starting this release.
> 
> Thanks,
> Niklas
> 



[jira] [Commented] (MESOS-1316) Implement decent unit test coverage for the mesos-fetcher tool

2014-05-11 Thread Tom Arnfeld (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994540#comment-13994540
 ] 

Tom Arnfeld commented on MESOS-1316:


Nice one, tests are looking great! Sorry I didn't get a chance to start this – 
should probably unassign me from the task. 

> Implement decent unit test coverage for the mesos-fetcher tool
> --
>
> Key: MESOS-1316
> URL: https://issues.apache.org/jira/browse/MESOS-1316
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Tom Arnfeld
>Assignee: Tom Arnfeld
>
> There are current no tests that cover the {{mesos-fetcher}} tool itself, and 
> hence bugs like MESOS-1313 have accidentally slipped though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 21277: Passed CommandInfo to mesos-fetcher as JSON.

2014-05-11 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21277/#review42644
---

Ship it!


Nice to see this! What's the reason we don't just transmit the raw protobuf 
string? Is this to make it easier to implement a custom fetcher (without the 
need to have mesos.proto)?

- Tom Arnfeld


On May 9, 2014, 7:05 p.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21277/
> ---
> 
> (Updated May 9, 2014, 7:05 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Dominic Hamon, and Tom Arnfeld.
> 
> 
> Bugs: MESOS-1248
> https://issues.apache.org/jira/browse/MESOS-1248
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> See summary (and bug).
> 
> 
> Diffs
> -
> 
>   src/launcher/fetcher.cpp 8c9e20da8f39eb5e90403a5093cbea7fb2680468 
>   src/slave/fetcher.hpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21277/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 21277: Passed CommandInfo to mesos-fetcher as JSON.

2014-05-11 Thread Tom Arnfeld

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21277/#review42643
---

Ship it!


Nice to see this! What's the reason we don't just transmit the raw protobuf 
string? Is this to make it easier to implement a custom fetcher (without the 
need to have mesos.proto)?

- Tom Arnfeld


On May 9, 2014, 7:05 p.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21277/
> ---
> 
> (Updated May 9, 2014, 7:05 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Dominic Hamon, and Tom Arnfeld.
> 
> 
> Bugs: MESOS-1248
> https://issues.apache.org/jira/browse/MESOS-1248
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> See summary (and bug).
> 
> 
> Diffs
> -
> 
>   src/launcher/fetcher.cpp 8c9e20da8f39eb5e90403a5093cbea7fb2680468 
>   src/slave/fetcher.hpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21277/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



[jira] [Created] (MESOS-1316) Implement decent unit test coverage for the mesos-fetcher tool

2014-05-06 Thread Tom Arnfeld (JIRA)
Tom Arnfeld created MESOS-1316:
--

 Summary: Implement decent unit test coverage for the mesos-fetcher 
tool
 Key: MESOS-1316
 URL: https://issues.apache.org/jira/browse/MESOS-1316
 Project: Mesos
  Issue Type: Improvement
Reporter: Tom Arnfeld


There are current no tests that cover the {{mesos-fetcher}} tool itself, and 
hence bugs like MESOS-1313 have accidentally slipped though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >