Re: Time Zone information in TimeInfo

2017-03-06 Thread Zameer Manji
The TODO made me think that the time information here could be timezone
dependent in some cases.

If it's intended to always represent the time since the Unix epoch then TZ
info is not useful.

I think that comment should be removed for clarity.

On Mon, Mar 6, 2017 at 8:38 PM, Neil Conway  wrote:

> I always found that TODO confusing. If a `TimeInfo` is intended to
> represent the amount of time that has elapsed since the (Unix) epoch,
> I would expect it to be timezone independent. Can you clarify why
> having TZ info would be useful?
>
> Neil
>
> On Mon, Mar 6, 2017 at 7:51 PM, Zameer Manji  wrote:
> > Hey,
> >
> > I noticed there is a TODO on the TimeInfo for adding Time Zone
> information.
> > ```
> > /**
> >  * Represents time since the epoch, in nanoseconds.
> >  */
> > message TimeInfo {
> >   required int64 nanoseconds = 1;
> >
> >   // TODO(josephw): Add time zone information, if necessary.
> > }
> > ```
> >
> > Since there is no TZ information attached the timestamp, should
> frameworks
> > assume that the Mesos Master system TZ is the same as the framework TZ?
> > That is what I'm thinking of doing, but I'm not sure what was the
> intention
> > of the authors of the API.
> >
> > Also, would it be possible to attach TZ information? It would make
> > understanding the TimeInfo much easier when it is received by the
> framework.
> >
> > --
> > Zameer Manji
>
> --
> Zameer Manji
>


Time Zone information in TimeInfo

2017-03-06 Thread Zameer Manji
Hey,

I noticed there is a TODO on the TimeInfo for adding Time Zone information.
```
/**
 * Represents time since the epoch, in nanoseconds.
 */
message TimeInfo {
  required int64 nanoseconds = 1;

  // TODO(josephw): Add time zone information, if necessary.
}
```

Since there is no TZ information attached the timestamp, should frameworks
assume that the Mesos Master system TZ is the same as the framework TZ?
That is what I'm thinking of doing, but I'm not sure what was the intention
of the authors of the API.

Also, would it be possible to attach TZ information? It would make
understanding the TimeInfo much easier when it is received by the framework.

-- 
Zameer Manji


Re: Disallowing pre-1.0 Mesos agents

2017-01-20 Thread Zameer Manji
+1



On Fri, Jan 20, 2017 at 10:58 AM, Neil Conway  wrote:

> I'd like to propose that the Mesos 1.3.0 should not allow pre-1.0
> Mesos agents to register.
>
> Motivation:
>
> (1) We can simplify the master code in a few places. For example, we
> can assume that we always have a FrameworkInfo for any task running on
> a registered agent. Needing to handle running tasks without a
> FrameworkInfo makes the code unreadable and has been a source of bugs.
>
> (2) The master only needs to report "orphan tasks" and "unregistered
> frameworks" if the cluster contains pre-1.0 agents. If we disallow
> such agents, we can remove the code for computing these fields in the
> HTTP endpoints and elsewhere. (We'll probably still need to keep the
> actual fields in the JSON/protobuf output for backward compatibility,
> but they will always be empty.) We can also remove "orphan tasks" from
> the web UI.
>
> In addition to declaring that Mesos 1.3.0 masters will not support
> pre-1.0 Mesos agents in the CHANGELOG, it seems safer to me to
> disallow such agents from registering.
>
> Comments welcome.
>
> Thanks,
> Neil
>
> --
> Zameer Manji
>


Re: Metrics collection affected when libprocess queue builds up

2016-12-19 Thread Zameer Manji
I believe Zhitao is referring to `/metrics/snapshot` returning a result
after 10-30 seconds.

I think in a typical environment, this will cause most metrics collection
tooling to timeout. This causes the operator to not have any visibility
into the system, making debugging/fighting the problem very hard.

On Mon, Dec 19, 2016 at 9:23 PM, haosdent  wrote:

> Hi, @zhitao
>
> > the `/metrics/snapshot` could take 10-30 seconds to respond.
>
> Do you mean it `/metrics/snapshot` return result after 10~30 seconds?
> Or `/metrics/snapshot` takes 10~30 seconds to reflect the change of `
> allocator/mesos/event_queue_dispatches gauge`?
>
> On Mon, Dec 19, 2016 at 1:11 PM, Zhitao Li  wrote:
>
> > Hi all,
> >
> > While I was debugging an allocator message queue build up issue on master
> > (which I plan to share another thread), I noticed that
> `/metrics/snapshot`
> > is also badly affected.
> >
> > For example, when the allocator queue has ~3k dispatches in it (revealed
> by
> > the allocator/mesos/event_queue_dispatches gauge), the
> `/metrics/snapshot`
> > could take 10-30 seconds to respond.
> >
> > During an active debugging or outage fighting, this is pretty undesired.
> >
> > My guess is that many stats collection code relies on *deferring* to
> > another libprocess and collect the result.
> >
> > Should we explore a more reliable way to track metrics independently from
> > libprocess's queue?
> >
> > --
> > Cheers,
> >
> > Zhitao Li
> >
>
>
>
> --
> Best Regards,
> Haosdent Huang
>
> --
> Zameer Manji
>


Re: thread_local supported on Apple

2016-12-19 Thread Zameer Manji
I believe this thread_local support is in XCode 8.2. From the link you
shared:

> Xcode 8.2 requires a Mac running macOS 10.11.5 or later

This means that users can upgrade the compiler on El Capitan just fine
without a system upgrade.

Users on Yosemite need to do a system upgrade to pick up the new compiler
however.

On Mon, Dec 19, 2016 at 12:33 PM, Joris Van Remoortere 
wrote:

> Is my understanding incorrect regarding the ability to upgrade the compiler
> version on Yosemite and El Capitan without requiring a full system upgrade?
>
> @Mpark are you making a case for not updating to `thread_local` just yet?
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Fri, Dec 16, 2016 at 11:11 AM, Michael Park  wrote:
>
> > Brief survey from the #dev channel: https://mesos.slack.com/
> > archives/dev/p1481760285000430
> >
> > Yosemite 10.10: Fail. Compilation error. (by @hausdorff
> > https://mesos.slack.com/archives/dev/p1481760552000435)
> > El Capitan 10.11: Fail. Compilation error. (by @zhitao
> > https://mesos.slack.com/files/zhitao/F3F7WUCNM/-.diff)
> > Sierra 10.12: Success (by @mpark)
> >
> > On Wed, Dec 14, 2016 at 3:27 PM, Joris Van Remoortere <
> jo...@mesosphere.io
> > >
> > wrote:
> >
> > > The time has come and we finally have `thread_local` support in the
> Apple
> > > tool chain:
> > > https://developer.apple.com/library/content/releasenotes/Dev
> > > eloperTools/RN-Xcode/Introduction.html
> > >
> > > In our code base we have a special exception for Apple that defines our
> > > thread local to be `__thread` rather than the c++11 standard
> > > `thread_local`.
> > > https://github.com/apache/mesos/blob/812e5e3d4e4d9e044a1cfe6
> > > cc7eaab10efb499b6/3rdparty/stout/include/stout/thread_local.hpp
> > >
> > > A consequence of using `__thread` on Apple is that initializers for
> > thread
> > > locals are required to be constant expressions. This is not the case
> for
> > > the c++11 standard `thread_local`.
> > >
> > > I would like to propose that we remove this exception on the Apple
> > platform
> > > now that the Apple toolchain supports the c++11 standard.
> > >
> > > As I am not a common user of the Apple development experience I would
> > like
> > > to ask for some input from the community as to whether requiring this
> > > toolchain update is acceptable, and if we need a deprecation period or
> if
> > > we can just make this change now.
> > >
> > > I am leaning towards no deprecation period as I am not aware of
> > production
> > > environments running on systems that define `__APPLE__`.
> > > —
> > > *Joris Van Remoortere*
> > > Mesosphere
> > >
> >
>
> --
> Zameer Manji
>


Re: Mesos V1 Operator HTTP API - Java Proto Classes

2016-11-16 Thread Zameer Manji
I think this is a bug, I feel the jar should include all v1 protobuf files.

Vijay, I encourage you to file a ticket.

On Tue, Nov 15, 2016 at 8:04 PM, Vijay Srinivasaraghavan <
vijikar...@yahoo.com.invalid> wrote:

> I believe the HTTP API will use the same underlying message format (proto
> def) and hence the request/response value objects (java) needs to be
> auto-generated from the proto files for it to be used in Jersey based java
> rest client?
>
> On Tuesday, November 15, 2016 12:37 PM, Tomek Janiszewski <
> jani...@gmail.com> wrote:
>
>
>  I suspect jar is deprecated and includes only old API used by mesoslib.
> The
> goal is to create HTTP API and stop supporting native libs (jars, so, etc).
> I think you shouldn't use that jar in your project.
>
> wt., 15.11.2016, 20:38 użytkownik Vijay Srinivasaraghavan <
> vijikar...@yahoo.com> napisał:
>
> > Hello,
> >
> > I am writing a rest client for "operator APIs" and found that some of the
> > protobuf java classes (like "include/mesos/v1/quota/quota.proto",
> > "include/mesos/v1/master/master.proto") are not included in the mesos
> jar
> > file. While investigating, I have found that the "Make" file does not
> > include these proto definition files.
> >
> > I have updated the Make file and added the protos that I am interested in
> > and built a new jar file. Is there any reason why these proto definitions
> > are not included in the original build apart from the reason that the
> APIs
> > are still evolving?
> >
> > Regards
> > Vijay
> >
>
> --
> Zameer Manji
>


Re: Allowing both CommandInfo and ExecutorInfo on TaskInfo

2016-11-04 Thread Zameer Manji
It isn't issue if it a broadening of possibilities.

I would like to point out that, if we do want to have a better contract in
the future (ie a set of CommandInfos), we are setting up ourselves for a
API change in the future.

If we clearly document that the CommandInfo is just passed down to the
executor by the master and agent, I don't see any harm.


On Fri, Nov 4, 2016 at 6:53 AM, Joris Van Remoortere 
wrote:

> @zameer
>
> I think your example makes a lot of sense. I didn't interpret the proposal
> as one that would prevent your case.
>
> The way I read the proposal is that we want to allow setting both, not
> require it. If I misunderstood then please ignore my comments.
>
> When I first heard about this proposal it seemed like a nice way for
> frameworks and executors to start defining a more structured contract _if
> they wanted to_, while still allowing others to keep passing their (from
> Mesos's view) unstructured data as long as both sides agree on the
> serialization / de-serialization.
>
> Does this still seem like an issue if it's not a requirement but just a
> broadening of possibilities?
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Wed, Nov 2, 2016 at 1:49 PM, Zameer Manji  wrote:
>
> > Joris,
> >
> > You make a good point. However, I'm not convinced that `CommandInfo`
> should
> > be the well defined construct that people use. Can you please describe
> > different custom executors, and the overlap between them and how
> > CommandInfo will reduce that overlap? I'm having a hard time seeing where
> > CommandInfo will solve all of their cases.
> >
> > Consider the cause of Thermos (Aurora's Executor), it could never use a
> > `CommandInfo` struct because it executes a processes graph instead of a
> > single command.
> >
> > If the project wants to go down this path, I think generalizing
> > `CommandInfo` that could capture more cases (ie multiple commands or a
> > graph of commands) would be a better first step.
> >
> > What do you think?
> >
> > On Wed, Oct 26, 2016 at 10:38 AM, Joris Van Remoortere <
> > jo...@mesosphere.io>
> > wrote:
> >
> > > I do think it would be valuable to have a more well defined contract
> > > between frameworks and custom executors.
> > >
> > > As Zameer pointed out a specific framework and accompanying custom
> > executor
> > > can decide to do that in the data bytes; however, if we started
> building
> > > out a few different flavors of executors then it would be great for
> there
> > > to be standard way to pass command information to them.
> > >
> > > The current model works well in a 1-1 mapping between framework and
> > > executor binaries. In a world where that is 1-N it means all N
> executors
> > > have to use the same method of passing the command.
> > >
> > > —
> > > *Joris Van Remoortere*
> > > Mesosphere
> > >
> > > On Mon, Oct 17, 2016 at 4:25 PM, Zameer Manji 
> wrote:
> > >
> > > > I'm not convinced this is a valid use case.
> > > >
> > > > Mesos is supposed to be a generic kernel for launching "tasks",
> > whatever
> > > > they might be.
> > > >
> > > > In some cases it is useful to launch an executable, in other cases it
> > > might
> > > > be useful to launch a series of executables, and in some other cases
> it
> > > > might be useful to spawn a thread to do some work. Whatever that
> might
> > > be,
> > > > it doesn't matter to Mesos and the executor and framework are free to
> > > > establish a contract in `ExecutorInfo.data`, completely independent
> of
> > > the
> > > > Mesos API.
> > > >
> > > > I think formalizing this contract between executors and frameworks
> via
> > > > CommandInfo is going to introduce more problems than what they solve.
> > If
> > > > the CommandInfo struct is useful, frameworks and executors can just
> > stuff
> > > > that into ExecutorInfo.data, however it's not something that they
> need
> > to
> > > > adhere too.
> > > >
> > > > What's the underlying motivation for this?
> > > >
> > > >
> > > >
> > > > On Thu, Oct 13, 2016 at 10:40 AM, haosdent 
> wrote:
> > > >
> > > > > For command task, if its `ExecutorInfo` would set with
> > > `CommandExecutor`
> > > > as
> 

Re: Allowing both CommandInfo and ExecutorInfo on TaskInfo

2016-11-02 Thread Zameer Manji
Joris,

You make a good point. However, I'm not convinced that `CommandInfo` should
be the well defined construct that people use. Can you please describe
different custom executors, and the overlap between them and how
CommandInfo will reduce that overlap? I'm having a hard time seeing where
CommandInfo will solve all of their cases.

Consider the cause of Thermos (Aurora's Executor), it could never use a
`CommandInfo` struct because it executes a processes graph instead of a
single command.

If the project wants to go down this path, I think generalizing
`CommandInfo` that could capture more cases (ie multiple commands or a
graph of commands) would be a better first step.

What do you think?

On Wed, Oct 26, 2016 at 10:38 AM, Joris Van Remoortere 
wrote:

> I do think it would be valuable to have a more well defined contract
> between frameworks and custom executors.
>
> As Zameer pointed out a specific framework and accompanying custom executor
> can decide to do that in the data bytes; however, if we started building
> out a few different flavors of executors then it would be great for there
> to be standard way to pass command information to them.
>
> The current model works well in a 1-1 mapping between framework and
> executor binaries. In a world where that is 1-N it means all N executors
> have to use the same method of passing the command.
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Mon, Oct 17, 2016 at 4:25 PM, Zameer Manji  wrote:
>
> > I'm not convinced this is a valid use case.
> >
> > Mesos is supposed to be a generic kernel for launching "tasks", whatever
> > they might be.
> >
> > In some cases it is useful to launch an executable, in other cases it
> might
> > be useful to launch a series of executables, and in some other cases it
> > might be useful to spawn a thread to do some work. Whatever that might
> be,
> > it doesn't matter to Mesos and the executor and framework are free to
> > establish a contract in `ExecutorInfo.data`, completely independent of
> the
> > Mesos API.
> >
> > I think formalizing this contract between executors and frameworks via
> > CommandInfo is going to introduce more problems than what they solve. If
> > the CommandInfo struct is useful, frameworks and executors can just stuff
> > that into ExecutorInfo.data, however it's not something that they need to
> > adhere too.
> >
> > What's the underlying motivation for this?
> >
> >
> >
> > On Thu, Oct 13, 2016 at 10:40 AM, haosdent  wrote:
> >
> > > For command task, if its `ExecutorInfo` would set with
> `CommandExecutor`
> > as
> > > well?
> > >
> > > Some tickets may relate to this.
> > >
> > > [1]: https://issues.apache.org/jira/browse/MESOS-2330
> > > [2]: https://issues.apache.org/jira/browse/MESOS-527
> > > [3]: https://issues.apache.org/jira/browse/MESOS-5198
> > >
> > > On Fri, Oct 14, 2016 at 1:00 AM, Vinod Kone 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We are contemplating whether to allow both CommandInfo and
> ExecutorInfo
> > > on
> > > > TaskInfo (MESOS-6294 <https://issues.apache.org/
> jira/browse/MESOS-6294
> > > >).
> > > > Currently we only allow one or the other. The motivation is to allow
> > > custom
> > > > executors a more structured way to pass information (e.g, command)
> > about
> > > > Task. Right now custom executors have to get this data via
> > > `TaskInfo.bytes`
> > > > which is not ideal.
> > > >
> > > > Are there any custom executors out there that crash if they get Tasks
> > > with
> > > > CommandInfo set?
> > > >
> > > > Thoughts?
> > > >
> > > > Vinod
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards,
> > > Haosdent Huang
> > >
> > > --
> > > Zameer Manji
> > >
> >
>
> --
> Zameer Manji
>


Re: [VOTE] Release Apache Mesos 1.1.0 (rc1)

2016-10-24 Thread Zameer Manji
 - **Experimental** A new default executor is introduced
>> which
>> frameworks can use to launch task groups as nested containers. All the
>> nested containers share resources likes cpu, memory, network and
>> volumes.
>>
>>   * [MESOS-6014] - **Experimental** A new port-mapper CNI plugin, the
>> `mesos-cni-port-mapper` has been introduced. For Mesos containers,
>> with the
>> CNI port-mapper plugin, users can now expose container ports through
>> host
>> ports using DNAT. This is especially useful when Mesos containers are
>> attached to isolated CNI networks such as private bridge networks,
>> and the
>> services running in the container needs to be exposed outside these
>> isolated networks.
>>
>>
>> The CHANGELOG for the release is available at:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>> lain;f=CHANGELOG;hb=1.1.0-rc1
>> 
>> 
>>
>> The candidate for Mesos 1.1.0 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc1/mesos-1.1.0.tar.gz
>>
>> The tag to be voted on is 1.1.0-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.1.0-rc1
>>
>> The MD5 checksum of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc1/mesos
>> -1.1.0.tar.gz.md5
>>
>> The signature of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc1/mesos
>> -1.1.0.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1158
>>
>> Please vote on releasing this package as Apache Mesos 1.1.0!
>>
>> The vote is open until Fri Oct 21 21:57:02 CEST 2016 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 1.1.0
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>> Alex & Till
>>
>>
>
>
> --
> David Robinson
> SRE - Mesos
> @daverobinson
>
> --
> Zameer Manji
>


Re: Allowing both CommandInfo and ExecutorInfo on TaskInfo

2016-10-17 Thread Zameer Manji
I'm not convinced this is a valid use case.

Mesos is supposed to be a generic kernel for launching "tasks", whatever
they might be.

In some cases it is useful to launch an executable, in other cases it might
be useful to launch a series of executables, and in some other cases it
might be useful to spawn a thread to do some work. Whatever that might be,
it doesn't matter to Mesos and the executor and framework are free to
establish a contract in `ExecutorInfo.data`, completely independent of the
Mesos API.

I think formalizing this contract between executors and frameworks via
CommandInfo is going to introduce more problems than what they solve. If
the CommandInfo struct is useful, frameworks and executors can just stuff
that into ExecutorInfo.data, however it's not something that they need to
adhere too.

What's the underlying motivation for this?



On Thu, Oct 13, 2016 at 10:40 AM, haosdent  wrote:

> For command task, if its `ExecutorInfo` would set with `CommandExecutor` as
> well?
>
> Some tickets may relate to this.
>
> [1]: https://issues.apache.org/jira/browse/MESOS-2330
> [2]: https://issues.apache.org/jira/browse/MESOS-527
> [3]: https://issues.apache.org/jira/browse/MESOS-5198
>
> On Fri, Oct 14, 2016 at 1:00 AM, Vinod Kone  wrote:
>
> > Hi,
> >
> > We are contemplating whether to allow both CommandInfo and ExecutorInfo
> on
> > TaskInfo (MESOS-6294 <https://issues.apache.org/jira/browse/MESOS-6294
> >).
> > Currently we only allow one or the other. The motivation is to allow
> custom
> > executors a more structured way to pass information (e.g, command) about
> > Task. Right now custom executors have to get this data via
> `TaskInfo.bytes`
> > which is not ideal.
> >
> > Are there any custom executors out there that crash if they get Tasks
> with
> > CommandInfo set?
> >
> > Thoughts?
> >
> > Vinod
> >
>
>
>
> --
> Best Regards,
> Haosdent Huang
>
> --
> Zameer Manji
>


Re: Non-checkpointing frameworks

2016-10-17 Thread Zameer Manji
Qian,

Turns out the --checkpoint flag was made default and removed in Mesos 0.22.

On Sun, Oct 16, 2016 at 4:38 PM, Qian Zhang  wrote:

> and requires operators to enable checkpointing on the slaves.
>
>
> Just curious why operator needs to enable checkpointing on the slaves (I
> do not see an agent flag for that), I think checkpointing should be enabled
> in framework level rather than slave.
>
>
> Thanks,
> Qian Zhang
>
> On Sun, Oct 16, 2016 at 10:18 AM, Zameer Manji  wrote:
>
>> +1 to A and B
>>
>> Aurora has enabled checkpointing for years and requires operators to
>> enable
>> checkpointing on the slaves.
>>
>> On Sat, Oct 15, 2016 at 11:57 AM, Joris Van Remoortere <
>> jo...@mesosphere.io>
>> wrote:
>>
>> > I'm in favor of A & B. I find it provides a better "first experience" to
>> > users.
>> > From my experience you usually have to have an explicit reason to not
>> want
>> > to checkpoint. Most people assume the semantics provided by the
>> checkpoint
>> > behavior is default and it can be a frustrating experience for them to
>> find
>> > out that is not the case.
>> >
>> > —
>> > *Joris Van Remoortere*
>>
>> > Mesosphere
>> >
>> > On Fri, Oct 14, 2016 at 3:11 PM, Neil Conway 
>> > wrote:
>> >
>> >> Hi folks,
>> >>
>> >> I'd like input from individuals who currently use frameworks but do
>> >> not enable checkpointing.
>> >>
>> >> Background: "checkpointing" is a parameter that can be enabled in
>> >> FrameworkInfo; if enabled, the agent will write the framework pid,
>> >> executor PIDs, and status updates to disk for any tasks started by
>> >> that framework. This checkpointed information means that these tasks
>> >> can survive an agent crash: if the agent exits (whether due to
>> >> crashing or as part of an upgrade procedure), a restarted agent can
>> >> use this information to reconnect to executors started by the previous
>> >> instance of the agent. The downside is that checkpointing requires
>> >> some additional disk I/O at the agent.
>> >>
>> >> Checkpointing is not currently the default, but in my experience it is
>> >> often enabled for production frameworks. As part of the work on
>> >> supporting partition-aware Mesos frameworks (see MESOS-4049), we are
>> >> considering:
>> >>
>> >> (a) requiring that partition-aware frameworks must also enable
>> >> checkpointing, and/or
>> >> (b) enabling checkpointing by default
>> >>
>> >> If you have intentionally decided to disable checkpointing for your
>> >> Mesos framework, I'd be curious to hear more about your use-case and
>> >> why you haven't enabled it.
>> >>
>> >> Thanks!
>> >>
>> >> Neil
>> >>
>> >> --
>> >> Zameer Manji
>> >>
>> >
>>
>> --
>> Zameer Manji
>>
>


Re: Non-checkpointing frameworks

2016-10-15 Thread Zameer Manji
+1 to A and B

Aurora has enabled checkpointing for years and requires operators to enable
checkpointing on the slaves.

On Sat, Oct 15, 2016 at 11:57 AM, Joris Van Remoortere 
wrote:

> I'm in favor of A & B. I find it provides a better "first experience" to
> users.
> From my experience you usually have to have an explicit reason to not want
> to checkpoint. Most people assume the semantics provided by the checkpoint
> behavior is default and it can be a frustrating experience for them to find
> out that is not the case.
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Fri, Oct 14, 2016 at 3:11 PM, Neil Conway 
> wrote:
>
>> Hi folks,
>>
>> I'd like input from individuals who currently use frameworks but do
>> not enable checkpointing.
>>
>> Background: "checkpointing" is a parameter that can be enabled in
>> FrameworkInfo; if enabled, the agent will write the framework pid,
>> executor PIDs, and status updates to disk for any tasks started by
>> that framework. This checkpointed information means that these tasks
>> can survive an agent crash: if the agent exits (whether due to
>> crashing or as part of an upgrade procedure), a restarted agent can
>> use this information to reconnect to executors started by the previous
>> instance of the agent. The downside is that checkpointing requires
>> some additional disk I/O at the agent.
>>
>> Checkpointing is not currently the default, but in my experience it is
>> often enabled for production frameworks. As part of the work on
>> supporting partition-aware Mesos frameworks (see MESOS-4049), we are
>> considering:
>>
>> (a) requiring that partition-aware frameworks must also enable
>> checkpointing, and/or
>> (b) enabling checkpointing by default
>>
>> If you have intentionally decided to disable checkpointing for your
>> Mesos framework, I'd be curious to hear more about your use-case and
>> why you haven't enabled it.
>>
>> Thanks!
>>
>> Neil
>>
>> --
>> Zameer Manji
>>
>


Re: Deprecate MESOS_DIRECTORY executor environment variable

2016-10-07 Thread Zameer Manji
Thanks for clearing that up.

I've cced Joshua Cohen (who is currently maintaining thermos) to weigh in
here.

I suspect that deprecating this blindly might break something for tasks
under the Docker Containerizer.

On Fri, Oct 7, 2016 at 5:33 PM, Jie Yu  wrote:

> https://github.com/apache/mesos/blob/master/docs/versioning.md
>
> "The deprecation period for any given feature will be 6 months. Having a
> set period allows Mesos developers to not indefinitely accrue technical
> debt and allows users time to plan for upgrades."
>
> - Jie
>
> On Fri, Oct 7, 2016 at 5:28 PM, Zameer Manji  wrote:
>
> > Jie,
> >
> > Without commenting on this deprecation, how is this going to work now
> that
> > Mesos is 1.0?
> >
> > What is the definition of "deprecate" being used here? Is it something
> that
> > will be removed in Mesos 2.0?
> >
> > On Fri, Oct 7, 2016 at 4:49 PM, Jie Yu  wrote:
> >
> > > Hi,
> > >
> > > Want to initiate a discussion here. Before Mesos containerizer has
> > > container image support (all containers share the same host file
> system),
> > > $MESOS_DIRECTORY env variable is used to let executor know their
> sandbox
> > > location.
> > >
> > > Later, we introduced container image support to Mesos containerizer so
> > that
> > > each container can has its own root filesystem. Due to some historical
> > > reason (thermos), we decided to keep $MESOS_DIRECTORY to be the path to
> > the
> > > sandbox on the host filesystem (e.g., `/var/lib/mesos/slaves/...`) even
> > if
> > > the container has its own root filesystem. And introduced a new
> > > $MESOS_SANDBOX to point to the sandbox in the container's root
> filesystem
> > > (e.g., `/mnt/mesos/sandbox`). If the container does not have a root
> > > filesystem, $MESOS_DIRECTORY == $MESOS_SANDBOX.
> > >
> > > Now, we plan to deprecate $MESOS_DIRECTORY because it'll be really
> > > confusing to executor writers, and it'll be an error if they try to
> > access
> > > $MESOS_DIRECTORY if their container has a root filesystem defined.
> > >
> > > - Jie
> > >
> > > --
> > > Zameer Manji
> > >
> >
>
> --
> Zameer Manji
>


Re: Deprecate MESOS_DIRECTORY executor environment variable

2016-10-07 Thread Zameer Manji
Jie,

Without commenting on this deprecation, how is this going to work now that
Mesos is 1.0?

What is the definition of "deprecate" being used here? Is it something that
will be removed in Mesos 2.0?

On Fri, Oct 7, 2016 at 4:49 PM, Jie Yu  wrote:

> Hi,
>
> Want to initiate a discussion here. Before Mesos containerizer has
> container image support (all containers share the same host file system),
> $MESOS_DIRECTORY env variable is used to let executor know their sandbox
> location.
>
> Later, we introduced container image support to Mesos containerizer so that
> each container can has its own root filesystem. Due to some historical
> reason (thermos), we decided to keep $MESOS_DIRECTORY to be the path to the
> sandbox on the host filesystem (e.g., `/var/lib/mesos/slaves/...`) even if
> the container has its own root filesystem. And introduced a new
> $MESOS_SANDBOX to point to the sandbox in the container's root filesystem
> (e.g., `/mnt/mesos/sandbox`). If the container does not have a root
> filesystem, $MESOS_DIRECTORY == $MESOS_SANDBOX.
>
> Now, we plan to deprecate $MESOS_DIRECTORY because it'll be really
> confusing to executor writers, and it'll be an error if they try to access
> $MESOS_DIRECTORY if their container has a root filesystem defined.
>
> - Jie
>
> --
> Zameer Manji
>


Re: Mapped diagnostics context - Adding internal Mesos IDs as context to the logs

2016-10-04 Thread Zameer Manji
I don't know if this is feasible or not, but I would be a strong +1 to this.

This would make tracing failures much easier.

On Tue, Oct 4, 2016 at 6:12 AM, Frank Scholten 
wrote:

> Hi,
>
> On JIRA I found several issues about making logs less spammy as well
> as making them easier to understand for day to day operations.
>
> https://issues.apache.org/jira/browse/MESOS-4432 Condense (redundant)
> log messages related to task launch/status/finish
> https://issues.apache.org/jira/browse/MESOS-5467 offer DECLINE /
> ACCEPT + Recovered resource messages are spammy
> https://issues.apache.org/jira/browse/MESOS-4430 Identify and change
> logging level for message that don't contain specific
> task/framework/slave info
>
> Besides reducing the logs would it be possible to add more context? In
> the Java world the technique of 'Mapped diagnostic context' is used
> with Logback where each log line contains a few fields with context.
> See http://logback.qos.ch/manual/mdc.html
>
> To translate this to Mesos how about adding the internal IDs such as
> agent, framework, and task IDs at the beginning of the log, so this
> information is separated from the textual, human readble log message.
> At the moment this information is tangled which makes it hard to
> interpret, especially when there are so many logs for each task.
>
> For example, change this
>
> I1004 12:24:12.118803  3780 status_update_manager.cpp:320] Received
> status update TASK_FAILED (UUID: a1d03948-30bf-46b3-9599-cfcfc7cbc27b)
> for task weave-demo_database_catalogue-db.6d415c17-8a2d-11e6-90a7-
> 0242458f9469
> of framework f1546295-ab46-496a-8cf9-91756fece4ed-
>
> to
>
> I1004 12:24:12.118803  3780 status_update_manager.cpp:320
> A:agent12318032910 F:f1546295-ab46-496a-8cf9-91756fece4ed-
> T:xyz-a1d03948-30bf-46b3-9599-cfcfc7cbc27b] Received status update
> TASK_FAILED for task 'xyz'
>
> In this case the header contains A:$AGENT_ID, F:$FRAMEWORK_ID,
> T:$TASK_ID and as the context and lines with similar context can be
> more easily correlated visually.
>
> Is this feasible?
>
> Cheers,
>
> Frank
>
> --
> Zameer Manji
>


Re: 1.0 Release Candidate

2016-05-25 Thread Zameer Manji
I might be in the minority here, but I think cutting an RC for 1.0 right
now is very aggressive. Does there exist even a single framework that uses
the Scheduler HTTP API or the Executor HTTP API? Does anyone even use these
APIs in production? Is there a single entity that uses the Operator API to
manage agents?

I think cutting an RC right now is 100% premature until the community can
provide clear answers to these questions.

I think Mesos project has been historically successful because its features
were developed in a slow methodical manner and then battle tested by at
least a user before the feature was declared 'stable' and ready for use for
everyone. I think not following those steps here for the HTTP APIs is a
huge error.

On Wed, May 25, 2016 at 12:51 PM, Vinod Kone  wrote:

> Post 1.0. Jie might be able to shed more light regarding the plans for
> Docker Containerizer.
>
> On Wed, May 25, 2016 at 12:10 PM, Jeff Schroeder <
> jeffschroe...@computer.org> wrote:
>
>> Does this mean the work to deprecate the docker containerizer will be
>> post-1.0, or have those plans changed?
>>
>>
>> On Wednesday, May 25, 2016, Vinod Kone  wrote:
>>
>>> Hi folks,
>>>
>>> As discussed in the previous community sync, we plan to cut a release
>>> candidate for our next release (1.0) early next week.
>>>
>>> 1.0 is mainly centered around new APIs for Mesos. Please take a look at
>>> MESOS-338  for
>>> blocking issues. We got some great design and testing feedback for the v1
>>> scheduler and executor APIs. Please do the same for the in-progress v1
>>> operator API
>>> 
>>> .
>>>
>>> Since this is a 1.0, we would like to do the release a little
>>> differently.
>>>
>>> First, the voting period for vetting the release candidate would be a
>>> few weeks (2-3 weeks) instead of the typical 3 days.
>>>
>>> Second, we are wiling to make major changes (scalability fixes, API
>>> fixes) if there are any issues reported by the community.
>>>
>>> We are doing these because we really want the community to thoroughly
>>> test the 1.0 release and give feedback.
>>>
>>> Thanks,
>>>
>>
>>
>> --
>> Text by Jeff, typos by iPhone
>>
>
>


Re: Usage of protobuf 'enum' fields

2016-03-22 Thread Zameer Manji
+1

I have run into this issue before and it was very confusing.

On Tue, Mar 22, 2016 at 1:37 AM, tommy xiao  wrote:

> yes, following apache upgrade doc guide, the step is master update firstly,
> than upgrade slave. it can't support slave firstly. so this is rule on our
> ops step.
>
> 2016-03-22 10:29 GMT+08:00 Benjamin Mahler :
>
> > Hi folks,
> >
> > I wanted to surface the following ticket to our attention:
> > https://issues.apache.org/jira/browse/MESOS-4997
> >
> > The issue is that when enum fields are deserialized, unknown enum values
> > are _stripped_. This means that if an enum field is 'required' and a new
> > value is added, old clients cannot deserialize messages with the new enum
> > value set: the message is considered to have a missing required field and
> > is dropped.
> >
> > The suggested approach to ensure new enum values can be safely added is
> the
> > following:
> >
> > -Enum fields should be optional.
> > -The first entry in an enum list should be UNKNOWN (and/or we set
> [default
> > = UNKNOWN]).
> >
> > Having them as optional ensures that the protobuf deserialization
> considers
> > messages with stripped enum fields to be initialized. Also, if our code
> > calls the getter unconditionally it is safer to get UNKNOWN rather than
> an
> > arbitrary enum value (whatever happens to be the first in the list).
> >
> > I will follow up and ensure we fix this for FrameworkInfo::Capability,
> > where we added a TASK_KILLING_STATE capability in 0.28. Frameworks that
> try
> > to set this new capability but talk to a 0.27 (or earlier) master will
> not
> > be able to register because the message will be dropped.
> >
> > Ben
> >
>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>
> --
> Zameer Manji
>
> <http://gmail.com>


Re: Backport r/44230 to 0.27 branch

2016-03-19 Thread Zameer Manji
Cong brings up a good point here. Currently Mesos has a very aggressive
release cadence. This results in several questions as a cluster operator
and framework author.

   - What is the support from the community/committers for each release?
   - Do cluster operators and framework authors need to move at the same
   space at the community?
   - Will bugfixes be automatically backported?

The lack of clarity here can result in several issues because it is easy
for the Mesos PMC to cut releases quickly, but it isn't easy for people
with existing clusters to upgrade at that pace. An aggressive release
policy without clear support for older releases can leave several users in
a bad position where they might need to upgrade Mesos through one (or
more!) releases just to get a critical bugfix.



On Wed, Mar 16, 2016 at 11:44 AM, Cong Wang  wrote:

> On Tue, Mar 15, 2016 at 2:39 PM, Jie Yu  wrote:
> > Mesos currently has no notion of long term stable releases (i.e., LTS). I
> > think the consensus in the last community sync was to introduce LTS after
> > 1.0.
>
>
> You don't need LTS as kernel, even talking about short term stable releases
> like 0.27.2 (?), they look horrible too, I don't see any git tags or
> branches for
> these releases, just a tar ball?! Huh...
>
>
> >
> > 0.27.2 has already been released. Looks like we need 0.27.3 if we want to
> > backport it.
>
>
> What determines which patches need to backport for Mesos community?
> It doesn't look like every bug fix is evaluated and considered after they
> are merged into master branch.
>
> >
> > I am OK with back porting it. Then the question is that whether we want
> to
> > backport it to other releases as well.
> >
>
> It should be backported to whichever releases it applies to and you
> support,
> I don't see Mesos community has such a procedure.
>
> --
> Zameer Manji
>
>


Re: Mesos 0.27.2

2016-02-24 Thread Zameer Manji
+1

Thanks for catching this issue and managing the release.

On Wed, Feb 24, 2016 at 2:15 AM, Michael Park  wrote:

> Today we ran into another backwards compatibility issue in master the
> `/state` endpoint in 0.27.1. Since 0.27.1 has already been shipped, I would
> like to propose to cut a 0.27.2.
>
> The JIRA ticket is MESOS-4754
> <https://issues.apache.org/jira/browse/MESOS-4754>, and r43937
> <https://reviews.apache.org/r/43937/> is the only required cherry-pick.
>
> I'll volunteer to be the release manager for it.
> Please let me know if you find further issues with 0.27.1.
>
> Thanks,
>
> MPark
>
> --
> Zameer Manji
>
>


Re: Enable compiler optimization by default?

2016-02-17 Thread Zameer Manji
+1

Can't this problem also be solved by distributing packages that have
optimized binaries?

On Wed, Feb 17, 2016 at 4:56 PM, Alexander Rojas 
wrote:

> +1
>
> Since the old days users are used to run
>
> ```
> configure
> make
> sudo make install
> ```
>
> and things just work. With the model we have, we are just encouraging
> users to run their data centers with unoptimized versions of Mesos, which
> just hurts their performance.
>
>
> > On 17 Feb 2016, at 16:24, Neil Conway  wrote:
> >
> > Hi folks,
> >
> > At present, Mesos defaults to compiling with "-O0"; to enable compiler
> > optimizations, the user needs to specify "--enable-optimize".
> >
> > I'd like to propose we change the default, for a few reasons:
> >
> > (1) The autoconf default for CFLAGS/CXXFLAGS is "-O2 -g". Anecdotally,
> > I think most software packages compile with a reasonable level of
> > optimizations enabled by default.
> >
> > (2) I think we should make the default configure flags appropriate for
> > end-users (rather than Mesos developers): developers will be familiar
> > enough with Mesos to tune the configure flags according to their own
> > preferences.
> >
> > (3) The performance consequences of not enabling compiler
> > optimizations can be pretty severe: 5x in a benchmark I just ran, and
> > we've seen between 2x and 30x (!) performance differences for some
> > real-world workloads.
> >
> > Neil
>
> --
> Zameer Manji
>
>


Re: Framework disconnect kills running tasks

2016-02-10 Thread Zameer Manji
Setting `failover_timeout` is key. The Apache Aurora framework defaults
this value to 21 days to ensure there is no accidental destruction of tasks
in a production environment. FWIW, I think the default is terrible and not
desirable. I really think frameworks should opt in to this behaviour than
opt out. A minor ZK or network blip can cause destruction of tasks by
default.

On Wed, Feb 10, 2016 at 5:05 PM, Shuai Lin  wrote:

> Hi suppandi,
>
> To make sure your tasks survive framework restarts, you need to:
>
> 1. When registering your framework,  set `failover_timeout` attribute of
> the FrameworkInfo PB. This is how long the master would wait for your
> framework to reconnect. By default it's 0, that's why your tasks are killed
> immediately when the framework exits.
>
> 2. When you reregister your framework, You need to use the same framework
> id as the previous run, so that the master can identify it's the framework
> reconnecting.
>
> Regards,
> Shuai
>
>
> On Thu, Feb 11, 2016 at 6:37 AM, suppandi  wrote:
>
> > Hi,
> >
> > I am trying to write my first framework and i wanted to test task
> > reconciliation. But whenever i kill my framework (with a kill -9), mesos
> > seems to cleanup the tasks by updating its state to TASK_KILLED.
> >
> > Is there a parameter when creating the framework or the task that makes
> > this happen? I want my task to remain alive when the framework is
> > disconnected/dead.
> >
> > Here is how i create my framework
> > https://gist.github.com/anonymous/3357783ce938c4293947
> >
> > and here is how i create my task
> > https://gist.github.com/anonymous/d35f917ade791127f4c5
> >
> > Thanks
> > suppandi
> >
>
> --
> Zameer Manji
>
>


Re: Release / deprecation policy

2015-11-25 Thread Zameer Manji
I think the six month deprecation period is much better than what Mesos
provides currently. Apache Aurora has recently struggled with how to handle
the current Mesos deprecation policy, adopting a new policy of six months
will make it easier for us to adopt new Mesos releases without backwards
compatibility concerns.

On Wed, Nov 25, 2015 at 12:02 PM, Neil Conway  wrote:

> Hi Marco,
>
> Thanks for your comments! I agree that extending "mixed version"
> compatibility to N-6 versions is not warranted, at least right now.
>
> Going by lazy consensus: if anyone does NOT like the idea of a six
> release deprecation period (~six months), please speak up soon.
> Otherwise, I'll writeup a docs page that has our release/deprecation
> policy (MESOS-3995).
>
> Neil
>
> On Thu, Nov 19, 2015 at 6:32 PM, Marco Massenzio 
> wrote:
> > Thanks, Neil, for getting the ball rolling on the matter.
> >
> > Absolutely in favor of extending the deprecation cycle of features to
> make
> > framework developers' and operators' lives easier.
> >
> > However,
> > -1 for extending compatibility up to N - 6
> >
> > 1) this prevents us to innovate and introduce functionality as quickly as
> > we still believe is necessary at this stage of development;
> >
> > 2) it makes release testing explode combinatorially (it's already bad
> > enough as it is right now).
> > as you correctly noted.
> >
> > Please note those are two different problems that we'd be addressing
> here,
> > and I don't really think that #2 has been really an issue so far (but,
> yes,
> > of course, it might in the future).
> >
> > Hence,
> > +1
> > in providing tooling to make cluster upgrades easier to automate.
> >
> > Thanks!
> >
> > --
> > *Marco Massenzio*
> > Distributed Systems Engineer
> > http://codetrips.com
> >
> > On Mon, Nov 16, 2015 at 9:24 PM, Neil Conway 
> wrote:
> >
> >> Folks,
> >>
> >> In the last community sync, we briefly discussed Mesos release policy.
> >> In particular, we talked about the current cadence of ~monthly
> >> releases and how that relates to (a) deprecation periods (b) support
> >> for running a "mixed version" cluster.
> >>
> >> As I understand it, the current policy is as follows:
> >>
> >> * To remove functionality, it should first be deprecated in one
> >> release and can then be removed in the next.
> >> * Mixed cluster versions are supported going back one release: e.g.,
> >> 0.25 masters and slaves must support communicating with 0.24 masters
> >> and slaves.
> >>
> >> Given that Mesos 1.0 is on the not-to-distant horizon (at which point
> >> we'll have different guarantees about API stability), I think we can
> >> revisit adopting a formal release policy at that point. In the
> >> interim, are there any pressing problems we need to address?
> >>
> >> Deprecation:
> >> ==
> >>
> >> Removing deprecated functionality after one release makes sense when
> >> releases were made relatively infrequently, but with a monthly release
> >> cycle, this seems like an unreasonable rate of change to expect from
> >> authors of frameworks and tools.
> >>
> >> Proposal: After marking functionality as deprecated (e.g., in the
> >> documentation and "upgrade guide"), we should wait for at least 6
> >> monthly releases before removing it. So functionality that has been
> >> deprecated in 0.26 can be safely removed in 0.32.
> >>
> >> Mixed Cluster Versions:
> >> ======
> >>
> >> We could adopt the same rule as above (if any two releases are made
> >> within six months of one another, they must be compatible), or else we
> >> could keep the same compatibility policy we have now (single release).
> >> I'm not sure the right answer here: keeping the current policy will
> >> make upgrading from, say, 0.26 to 0.32 somewhat painful, but (a) that
> >> can be ameliorated with deployment tooling (b) if we change to a 6-12
> >> month compatibility period, it will make testing the full
> >> compatibility matrix pretty difficult.
> >>
> >> Comments welcome!
> >>
> >> Neil
> >>
>
> --
> Zameer Manji
>
>


Re: Problems with deprecation cycles for critical/hard to adapt dependencies

2015-10-01 Thread Zameer Manji
+1 to Timothy's proposal.

Here is a concrete example that can guide the policy. Aurora 0.9.0 was
released in July 2015 and was built against Mesos 0.22. At the time, I
don't think anyone was aware that 0.24 would come out in September 2015 and
break compatibility with with 0.22 w.r.t JSON/Protobuf. This means folks
who are using Aurora 0.9.0 in production can only upgrade to Mesos 0.23 at
latest. Now the Aurora project is faced with a problem. Aurora is much
smaller than the Mesos project and cannot keep up with the Mesos release
cadence. However if we don't do something right now we will continue to
prevent our users from upgrading their Mesos installations which may
contain upgrades that they need. Note that if we do release 0.9.1 with an
updated Mesos dependency, we might only be able to release against 0.23 so
we don't break users who are running 0.22 in production.

If anyone has ideas on what we should do here please comment on AURORA-1503
<https://issues.apache.org/jira/browse/AURORA-1503>.

On Wed, Sep 30, 2015 at 6:35 PM, Timothy Chen  wrote:

> I think besides changing to time based, we should provide a lot more
> visibility of the features that we are starting to deprecate, and I think
> each release we can also highlight the remaining releases/time each feature
> remaining lifetime so users are reminded on each release the full list they
> should be aware.
>
> Tim
>
> > On Sep 30, 2015, at 5:17 PM, Niklas Nielsen 
> wrote:
> >
> > @vinod, ben, jie - Any thoughts on this?
> >
> > I am in favor of the time based deprecation as well and can come up with
> a
> > proposal, taken there are no objections.
> >
> > Niklas
> >
> > On 28 September 2015 at 21:09, James DeFelice 
> > wrote:
> >
> >> +1 for time-based deprecation cycle of O(months)
> >>
> >>> On Mon, Sep 28, 2015 at 6:16 PM, Zameer Manji 
> wrote:
> >>>
> >>> Niklas,
> >>>
> >>> Thanks for starting this thread. I think Mesos can best move forward if
> >> it
> >>> switches from release based deprecation cycle to a time based
> deprecation
> >>> cycle. This means that APIs would be deprecated after a time period
> (ie 4
> >>> months) instead of at a specific release. This is the model that
> Google's
> >>> Guava library uses and I think it works really well. It ensures that
> the
> >>> ecosystem and community has sufficient time to react to deprecations
> >> while
> >>> still allowing them to move forward at a reasonable pace.
> >>>
> >>> On Mon, Sep 28, 2015 at 2:19 PM, Niklas Nielsen 
> >>> wrote:
> >>>
> >>>> Hi everyone,
> >>>>
> >>>> With a (targeted) release cadence of *one month*, we should revisit
> our
> >>>> deprecation cycles of 3 releases (e.g. in version N, we warn. In
> >> version
> >>>> N+1, support both old and new API. In Version N+2, we break
> >>> compatibility).
> >>>> Sometimes we cannot do the first step, and we deprecate in version N+1
> >>> and
> >>>> thus in 2 releases. With the new cadence, that is no longer around two
> >>>> quarters but two months which is too short for 3rd party tooling to
> >>> adapt.
> >>>>
> >>>> Even though our release cycles have been longer than one month in the
> >>> past,
> >>>> we are running into issues with deprecation due to lack of outreach
> >> (i.e.
> >>>> our communication to framework and 3rd party tooling communities) or
> >>>> because we are simply unaware on the internal dependencies they have
> on
> >>>> Mesos.
> >>>>
> >>>> We/I became aware of this, when we saw a planned deprecation of
> >>> /state.json
> >>>> in 0.26.0 (0.25.0 supports both). I suspect that _a lot_ of tools will
> >>>> break because of this. This, on top of the problems we have run into
> >>>> recently with the Zookeeper master info change from binary protobuf to
> >>>> json.
> >>>>
> >>>> Even though we document this in our upgrade.md, the
> >> visibility/knowledge
> >>>> of
> >>>> this document seem too low and we probably need to do more.
> >>>>
> >>>> Do you guys have thoughts/ideas on how we can address this?
> >>>>
> >>>> Cheers,
> >>>> Niklas
> >>>>
> >>>> --
> >>>> Zameer Manji
> >>
> >>
> >>
> >> --
> >> James DeFelice
> >> 585.241.9488 (voice)
> >> 650.649.6071 (fax)
> >>
>
> --
> Zameer Manji
>
>


Re: Problems with deprecation cycles for critical/hard to adapt dependencies

2015-09-28 Thread Zameer Manji
Niklas,

Thanks for starting this thread. I think Mesos can best move forward if it
switches from release based deprecation cycle to a time based deprecation
cycle. This means that APIs would be deprecated after a time period (ie 4
months) instead of at a specific release. This is the model that Google's
Guava library uses and I think it works really well. It ensures that the
ecosystem and community has sufficient time to react to deprecations while
still allowing them to move forward at a reasonable pace.

On Mon, Sep 28, 2015 at 2:19 PM, Niklas Nielsen 
wrote:

> Hi everyone,
>
> With a (targeted) release cadence of *one month*, we should revisit our
> deprecation cycles of 3 releases (e.g. in version N, we warn. In version
> N+1, support both old and new API. In Version N+2, we break compatibility).
> Sometimes we cannot do the first step, and we deprecate in version N+1 and
> thus in 2 releases. With the new cadence, that is no longer around two
> quarters but two months which is too short for 3rd party tooling to adapt.
>
> Even though our release cycles have been longer than one month in the past,
> we are running into issues with deprecation due to lack of outreach (i.e.
> our communication to framework and 3rd party tooling communities) or
> because we are simply unaware on the internal dependencies they have on
> Mesos.
>
> We/I became aware of this, when we saw a planned deprecation of /state.json
> in 0.26.0 (0.25.0 supports both). I suspect that _a lot_ of tools will
> break because of this. This, on top of the problems we have run into
> recently with the Zookeeper master info change from binary protobuf to
> json.
>
> Even though we document this in our upgrade.md, the visibility/knowledge
> of
> this document seem too low and we probably need to do more.
>
> Do you guys have thoughts/ideas on how we can address this?
>
> Cheers,
> Niklas
>
> --
> Zameer Manji
>
>


Re: Review Request 31905: Fixed protobuf comparisons by accounting for new fields.

2015-03-10 Thread Zameer Manji


> On March 10, 2015, 11:40 a.m., Zameer Manji wrote:
> > src/common/type_utils.cpp, line 56
> > <https://reviews.apache.org/r/31905/diff/1/?file=890459#file890459line56>
> >
> > Would it be possible to add some sort of test or tooling to prevent 
> > regressions?
> 
> Vinod Kone wrote:
> Not sure, what are the right set of tests for operators of this kind. A 
> bunch of tests, where we compare two messages with them differing in just one 
> field, for all possible fields? That seems like a lot of code!
> 
> I think the thing we are missing is a way to automatically check that a 
> newly added field is accounted for in the comparison? Not sure if that's what 
> you mean by regression? I don't yet know how to do this.
> 
> With regards to ExecutorInfo, a valuable set of tests to have is where we 
> launch a new task on an old running executor after master/slave failover. 
> I'll send a review out for those tests in a subsequent review.

Existing code could use the reflection API to ensure all fields are accounted 
for in equality: 
https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.message#Reflection.
 Alternatively, the == operator implementations could be generated by a 
protobuf plugin: 
https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.compiler.plugin

I think either of those solutions could prevent regressions.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31905/#review75925
---


On March 10, 2015, 5:37 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31905/
> ---
> 
> (Updated March 10, 2015, 5:37 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Jie Yu, Joerg Schad, and Timothy Chen.
> 
> 
> Bugs: MESOS-2309
> https://issues.apache.org/jira/browse/MESOS-2309
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When new fields were added to protobufs these operators were not updated. 
> Fixed now.
> 
> 
> Diffs
> -
> 
>   src/common/type_utils.cpp a1704c67d04d19f65d94dbe56a61bb28561e5bf3 
> 
> Diff: https://reviews.apache.org/r/31905/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 31905: Fixed protobuf comparisons by accounting for new fields.

2015-03-10 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31905/#review75925
---



src/common/type_utils.cpp
<https://reviews.apache.org/r/31905/#comment123254>

Would it be possible to add some sort of test or tooling to prevent 
regressions?


- Zameer Manji


On March 10, 2015, 11:27 a.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31905/
> ---
> 
> (Updated March 10, 2015, 11:27 a.m.)
> 
> 
> Review request for mesos, Ben Mahler, Jie Yu, Joerg Schad, and Timothy Chen.
> 
> 
> Bugs: MESOS-2309
> https://issues.apache.org/jira/browse/MESOS-2309
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When new fields were added to protobufs these operators were not updated. 
> Fixed now.
> 
> 
> Diffs
> -
> 
>   src/common/type_utils.cpp a1704c67d04d19f65d94dbe56a61bb28561e5bf3 
> 
> Diff: https://reviews.apache.org/r/31905/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 31784: type_utils: Relaxened the equality check of CommandInfo to allow 'unset' environment == 'empty' environment.

2015-03-06 Thread Zameer Manji


> On March 6, 2015, 10:18 a.m., Vinod Kone wrote:
> > src/common/type_utils.cpp, line 59
> > <https://reviews.apache.org/r/31784/diff/2/?file=887439#file887439line59>
> >
> > I don't think doing this change piece-wise for one variable (e.g., 
> > 'environment') at a time is intuitive to readers of this code.
> > 
> > We need to decide if we want to change comparing 'optional' messages 
> > everywhere. FWIW, with proto3 this will no longer be an issue, because 
> > there are neither optionals nor defaults.
> 
> Alexander Rukletsov wrote:
> Agreed with Vinod, this desrves a more general approach. The difference 
> between this case and `shell` case here: https://reviews.apache.org/r/31011/ 
> is that the latter has a default. What you try to enforce here is that 
> absence is equivalent to empty instance. This seems reasonable for this case 
> (and many other) but it's not universally true. A quick question: why do you 
> explicitly set empty env for some tasks and leave it unset for other?

In addition, a general approach would make understanding the API much easier. 
Framework writers will not need to consult the equality code in the mesos code 
to determine if wire level changes will cause rejections.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31784/#review75502
---


On March 6, 2015, 10:04 a.m., Chi Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31784/
> ---
> 
> (Updated March 6, 2015, 10:04 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Joerg Schad, Till Toenshoff, 
> Vinod Kone, and Zameer Manji.
> 
> 
> Bugs: mesos-2309
> https://issues.apache.org/jira/browse/mesos-2309
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> type_utils: Relaxened the equality check of CommandInfo to allow 'unset' 
> environment == 'empty' environment.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/common/type_utils.cpp a1704c67d04d19f65d94dbe56a61bb28561e5bf3 
>   src/tests/type_utils_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/31784/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Chi Zhang
> 
>



Re: Review Request 31011: Changed comparison for CommandInfo to consider shell default value.

2015-02-17 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31011/#review72753
---


Would it be possible to handle all default values in all protobufs in a generic 
manner? For example the 'role' field of FrameworkInfo has the same problem: 
https://github.com/apache/mesos/blob/1efdf1d69373cca7903bc06847f1c44a91383032/include/mesos/mesos.proto#L128

- Zameer Manji


On Feb. 17, 2015, 4:58 a.m., Joerg Schad wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31011/
> ---
> 
> (Updated Feb. 17, 2015, 4:58 a.m.)
> 
> 
> Review request for mesos and Till Toenshoff.
> 
> 
> Bugs: MESOS-2309
> https://issues.apache.org/jira/browse/MESOS-2309
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Changed comparison for CommandInfo to consider shell default value.
> 
> 
> Diffs
> -
> 
>   src/common/type_utils.cpp 12a36bbd7d7773b25dedf2d0d951c79e0b5141d6 
> 
> Diff: https://reviews.apache.org/r/31011/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joerg Schad
> 
>



Fetching and Caching Binaries from HDFS

2014-06-23 Thread Zameer Manji
Hey,

I noticed in MESOS-336 that there was some discussion on how to cache the
Mesos
executor so it does not need to be repeatedly fetched from HDFS. This
parallels
a problem faced by users of Aurora which is how to fetch binaries needed for
tasks. Twitter mitigated this problem by caching fetched binaries from HDFS
on
the slave file system and having the first process of each task fetch
binaries
from the cache if possible. If it is not possible to fetch it from the
cache,
the process places the binary in the cache for subsequent task starts on the
same slave.

The code that does this and a brief explanation on how it works can be
found in
this gist: https://gist.github.com/zmanji/f41df77510ef9d00265a. I hope it
serves
as a good example on how this problem can be mitigated.

-- 
Zameer Manji


[jira] [Created] (MESOS-847) Update Mesos webui favicon to use Mesos Logo

2013-11-26 Thread Zameer Manji (JIRA)
Zameer Manji created MESOS-847:
--

 Summary: Update Mesos webui favicon to use Mesos Logo
 Key: MESOS-847
 URL: https://issues.apache.org/jira/browse/MESOS-847
 Project: Mesos
  Issue Type: Improvement
  Components: webui
Reporter: Zameer Manji
Priority: Trivial


The current mesos favicon in the webui is the letter M. It should be updated to 
use the mesos logo.



--
This message was sent by Atlassian JIRA
(v6.1#6144)