Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

2018-09-14 Thread Chun-Hung Hsiao
It seems we have the following issues w.r.t path generation:

1. Path separators are disallowed:
This is general to all systems, so we'll need to put a
platform-independent check. But since no one's doing this we can put this
into the backlog.
2. Other invalid characters on different platforms:
For now let's just focus on Windows since Un*x doesn't have any
restriction other than /,
but since we're already working on this issue, how about resolve all
of 0x00-0x1F 0x7F " * / : < > ? | at once?
This can be a Windows-specific now, as proposed by Andy.
3. Other path constraints, e.g., invalid sequences of valid characters.
This is platform-dependent but the problem is there for both Un*x and
Windows. We can resolve this along with 1 later.

As long as the way we resolve 2 (i.e., the encoding/decoding process) won't
introduce any compatibility problem in the future,
I'm good at only fixing 2 for now and follow up with a clean up later.
To be conservative, if we're sure that there's no existing framework using
% in its ID,
does it make sense to add a check for now to ensure that?

On Tue, Sep 4, 2018 at 2:12 PM Andrew Schwartzmeyer <
and...@schwartzmeyer.com> wrote:

> I think your approach would be fairly sound. That is, to change the
> logic to read the IDs from the info file instead of the paths. But I
> also think we can punt this for now (as I do not think a task ID like
> 'Hello%3AWorld' is plausibly in use right now), and implement a fix for
> colons now that would remain compatible.
>
> If we add encode/decode logic for colons on Windows, we do not introduce
> backward compatibility issues on other platforms (as we'd constrain the
> change to Windows), and in the future, we can safely replace the decode
> logic with your approach. That is to say, we implement the encoding as
> sparingly as possible, but implement it now, because it's kind of
> required, and we implement the decoding only as a stop-gap until we
> replace this logic with reading from the info file instead. If we later
> find another character in use that also needs to be encoded, we can then
> abstract the single encoding into a per-platform encoding set.
>
> Does this seem reasonable?
>
> Thanks,
>
> Andy
>
> P.S. Sorry this took a while to get back to, I was out last week.
>
> On 08/23/2018 3:34 pm, Chun-Hung Hsiao wrote:
> > I'm a bit concerned about the recovery logic and backward
> > compatibility:
> > The changes we're making shouldn't affect existing users,
> > and we should try hard to avoid any future backward compatibility
> > problem.
> >
> > Say if there is already some custom framework using task ID
> > 'Hello%3AWorld',
> > then if we blindly decode the task path during recovery, we will get
> > the
> > wrong ID 'Hello:World'.
> > On the other hand, if we don't decode the task path during recovery,
> > then later on during checkpointing for the same task,
> > we shouldn't blindly encode the task ID, because it might create a
> > different path,
> > and we'll need to introduce some migration code to avoid such
> > duplication.
> >
> > Fortunately, we do checkpoint the executor IDs and task IDs in the info
> > files under the meta dir.
> > So I'm considering the following design to minimize the backward
> > compatibility issue we might have:
> > During recovery, we don't decode the recovered task path,
> > but get the executor/task ID from the info file instead of relying on
> > parsing the executor/task path.
> > When checkpointing, we only encode executor/task IDs if they contain
> > reserved characters.
> > The set of reserved characters could be defined as a platform-dependent
> > variable,
> > similar to what we have done for `PATH_SEPARATOR`.
> >
> > The above design would look a bit more complicated then just blindly
> > applying percent encoding
> > to when constructing checkpoint paths, but it doesn't require extra
> > checkpoint migration logic,
> > and would keep the exact same behavior we have now for "normal"
> > executor/task IDs.
> >
> > What did you guys think? Please feel free to raise any concern :)
> > And we don't need to implement the whole thing for now.
> > For example, we could start with just dealing with colons,
> > and extend the implementation later on,
> > as long as the partial solution we're going to have right now doesn't
> > create future tech debts!
> >
> > Best,
> > Chun-Hung
> >
> > On Thu, Aug 23, 2018 at 1:42 PM Greg Mann  wrote:
> >
> >> Thanks Andy! Responses inlined below.
> >>
> >>
> >>
> >>> No: As the only character we've run into a problem with is `:`
> >>> (MESOS-9109), it might not be worth it to generalize this to solve a
> >>> bunch
> >>> of problems that we haven't encountered.
> >>>
> >>>
> >> It's true that I'm not aware of other scenarios where
> >> filesystem-disallowed characters in task/executor IDs have caused
> >> issues
> >> for users, and this issue has existed for a long time. However, when
> >> feasible I would like to fix issues that we're aware 

Re: [VOTE] Release Apache Mesos 1.7.0 (rc3)

2018-09-14 Thread Kapil Arya
+1 (binding).

Internal CI succeeded. The binary deb/rpm packages for this RC can be found
here:
http://open.mesosphere.com/downloads/mesos-rc/#apache-mesos-1.7.0-rc3

On Fri, Sep 14, 2018 at 4:18 AM Alex Rukletsov  wrote:

> +1 (binding)
>
> Mesosphere's internal CI run with the aforementioned tag. Observed 4 flaky
> tests, 3 are known:
> https://issues.apache.org/jira/browse/MESOS-5048
> https://issues.apache.org/jira/browse/MESOS-8260
> https://issues.apache.org/jira/browse/MESOS-8951
>
> One has been introduced as part of adding GC to nested containers
> (MESOS-7947), which is disabled in the release:
> https://issues.apache.org/jira/browse/MESOS-9217
>
>
> On Tue, Sep 11, 2018 at 8:09 PM, Gastón Kleiman 
> wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.7.0.
>>
>>
>> 1.7.0 includes the following:
>>
>> 
>> * Performance Improvements:
>>   * Master `/state` endpoint: ~130% throughput improvement through
>> RapidJSON
>>   * Allocator: Improved allocator cycle significantly
>>   * Agent `/containers` endpoint: Fixed a performance issue
>>   * Agent container launch / destroy throughput is significantly improved
>> * Containerization:
>>   * **Experimental** Supported docker image tarball fetching from HDFS
>>   * Added new `cgroups/all` and `linux/devices` isolators
>>   * Added metrics for `network/cni` isolator and docker pull latency
>> * Windows:
>>   * Added support to libprocess for the Windows Thread Pool API
>> * Multi-Framework Workloads:
>>   * **Experimental** Added per-framework metrics to the master
>>   * A new weighted random sorter was added as an alternative to the DRF
>> sorter
>>
>> The CHANGELOG for the release is available at:
>>
>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.7.0-rc3
>>
>> 
>>
>> The candidate for Mesos 1.7.0 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc3/mesos-1.7.0.tar.gz
>>
>> The tag to be voted on is 1.7.0-rc3:
>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.7.0-rc3
>>
>> The SHA512 checksum of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc3/mesos-1.7.0.tar.gz.sha512
>>
>> The signature of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc3/mesos-1.7.0.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1234
>>
>> Please vote on releasing this package as Apache Mesos 1.7.0!
>>
>> The vote is open until Fri Sep 14 11:06:30 PDT 2018 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 1.7.0
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>>
>> Chun-Hung & Gastón
>>
>
>


Re: [VOTE] Release Apache Mesos 1.7.0 (rc3)

2018-09-14 Thread Alex Rukletsov
+1 (binding)

Mesosphere's internal CI run with the aforementioned tag. Observed 4 flaky
tests, 3 are known:
https://issues.apache.org/jira/browse/MESOS-5048
https://issues.apache.org/jira/browse/MESOS-8260
https://issues.apache.org/jira/browse/MESOS-8951

One has been introduced as part of adding GC to nested containers
(MESOS-7947), which is disabled in the release:
https://issues.apache.org/jira/browse/MESOS-9217


On Tue, Sep 11, 2018 at 8:09 PM, Gastón Kleiman 
wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.7.0.
>
>
> 1.7.0 includes the following:
> 
> 
> * Performance Improvements:
>   * Master `/state` endpoint: ~130% throughput improvement through
> RapidJSON
>   * Allocator: Improved allocator cycle significantly
>   * Agent `/containers` endpoint: Fixed a performance issue
>   * Agent container launch / destroy throughput is significantly improved
> * Containerization:
>   * **Experimental** Supported docker image tarball fetching from HDFS
>   * Added new `cgroups/all` and `linux/devices` isolators
>   * Added metrics for `network/cni` isolator and docker pull latency
> * Windows:
>   * Added support to libprocess for the Windows Thread Pool API
> * Multi-Framework Workloads:
>   * **Experimental** Added per-framework metrics to the master
>   * A new weighted random sorter was added as an alternative to the DRF
> sorter
>
> The CHANGELOG for the release is available at:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain
> ;f=CHANGELOG;hb=1.7.0-rc3
> 
> 
>
> The candidate for Mesos 1.7.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc3/mesos-1.7.0.tar.gz
>
> The tag to be voted on is 1.7.0-rc3:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.7.0-rc3
>
> The SHA512 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc3/mesos
> -1.7.0.tar.gz.sha512
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc3/mesos
> -1.7.0.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1234
>
> Please vote on releasing this package as Apache Mesos 1.7.0!
>
> The vote is open until Fri Sep 14 11:06:30 PDT 2018 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.7.0
> [ ] -1 Do not release this package because ...
>
> Thanks,
>
> Chun-Hung & Gastón
>