Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

2018-09-04 Thread Andrew Schwartzmeyer
I think your approach would be fairly sound. That is, to change the 
logic to read the IDs from the info file instead of the paths. But I 
also think we can punt this for now (as I do not think a task ID like 
'Hello%3AWorld' is plausibly in use right now), and implement a fix for 
colons now that would remain compatible.


If we add encode/decode logic for colons on Windows, we do not introduce 
backward compatibility issues on other platforms (as we'd constrain the 
change to Windows), and in the future, we can safely replace the decode 
logic with your approach. That is to say, we implement the encoding as 
sparingly as possible, but implement it now, because it's kind of 
required, and we implement the decoding only as a stop-gap until we 
replace this logic with reading from the info file instead. If we later 
find another character in use that also needs to be encoded, we can then 
abstract the single encoding into a per-platform encoding set.


Does this seem reasonable?

Thanks,

Andy

P.S. Sorry this took a while to get back to, I was out last week.

On 08/23/2018 3:34 pm, Chun-Hung Hsiao wrote:
I'm a bit concerned about the recovery logic and backward 
compatibility:

The changes we're making shouldn't affect existing users,
and we should try hard to avoid any future backward compatibility 
problem.


Say if there is already some custom framework using task ID 
'Hello%3AWorld',
then if we blindly decode the task path during recovery, we will get 
the

wrong ID 'Hello:World'.
On the other hand, if we don't decode the task path during recovery,
then later on during checkpointing for the same task,
we shouldn't blindly encode the task ID, because it might create a
different path,
and we'll need to introduce some migration code to avoid such 
duplication.


Fortunately, we do checkpoint the executor IDs and task IDs in the info
files under the meta dir.
So I'm considering the following design to minimize the backward
compatibility issue we might have:
During recovery, we don't decode the recovered task path,
but get the executor/task ID from the info file instead of relying on
parsing the executor/task path.
When checkpointing, we only encode executor/task IDs if they contain
reserved characters.
The set of reserved characters could be defined as a platform-dependent
variable,
similar to what we have done for `PATH_SEPARATOR`.

The above design would look a bit more complicated then just blindly
applying percent encoding
to when constructing checkpoint paths, but it doesn't require extra
checkpoint migration logic,
and would keep the exact same behavior we have now for "normal"
executor/task IDs.

What did you guys think? Please feel free to raise any concern :)
And we don't need to implement the whole thing for now.
For example, we could start with just dealing with colons,
and extend the implementation later on,
as long as the partial solution we're going to have right now doesn't
create future tech debts!

Best,
Chun-Hung

On Thu, Aug 23, 2018 at 1:42 PM Greg Mann  wrote:


Thanks Andy! Responses inlined below.




No: As the only character we've run into a problem with is `:`
(MESOS-9109), it might not be worth it to generalize this to solve a 
bunch

of problems that we haven't encountered.



It's true that I'm not aware of other scenarios where
filesystem-disallowed characters in task/executor IDs have caused 
issues

for users, and this issue has existed for a long time. However, when
feasible I would like to fix issues that we're aware of before they 
cause
problems for users, rather than after. I would suggest that since we 
have
one compelling case that we need to address now, it's worth 
formulating an

approach for the general case, so that we can be sure any current work
doesn't get in our way later on.


I'm somewhat comfortable doing so only for Windows, as we don't 
really
need to worry about the recovery scenario; but very uncomfortable 
about

doing so for Linux etc., for precisely that reason.

So expanding this is definitely up for debate; but we must fix the 
bug

with `:`.


Indeed, addressing the general case may prove to be much more complex 
- I
can certainly identify with this situation, where a fix for a smaller 
issue

turns into a big project :)
It may turn out to be possible to implement a scoped-down solution for 
the

colon case now, and extend it later on. I think it would be good if we
could at least get an idea of how we want to handle the general case 
now,

so that any short-term solutions can be a constructive step toward the
long-term.

Cheers,
G



Re: Feature request: Include state of Agent (UP/DRAINING) in Get_Agents Response

2018-09-04 Thread Sachin Sharma
Hi Benjamin,

Thanks for your reply. I am aware of GET_MAINTENANCE_STATUS call. I agree
it is a light-weight, low overhead call.

*GET_AGENTS*
> This call retrieves information about all the agents known to the master.


But what I am trying to say is it would be good to have all agent related
information in the GET_AGENTS response. GET_AGENTS response has a lot of
fields describing the agent state. Since "maintenance state" is also
contributes to an agent state, why not make it a part of GET_AGENTS
response?

Thank you,
Sachin

On Sun, Sep 2, 2018 at 2:03 PM, Benjamin Bannier <
benjamin.bann...@mesosphere.io> wrote:

> Hi Sachin,
>
> > I would like to make a feature request to include maintenance state of
> the agent as a part of response of Operator HTTP API Get_Agents call.
>
>
> we currently already have a number of maintenance related calls, e.g., the
> information you are after should be available through
> `GET_MAINTENANCE_STATUS`. This not only allows users to query the
> information they are after in a low-overhead way, but also mirrors the v0
> HTTP API. Does that work for you?
>
>
> Cheers,
>
> Benjamin
>


Re: make check failed, but mesos-tests.sh --gtest_filter="SVNTest.DiffPatch" tests passed

2018-09-04 Thread James Peach
This might be caused by inconsistent linking in Homebrew. Try forcing Homebrew 
to build svn from source, something like this: brew install --force 
--build-from-source subversion


> On Sep 4, 2018, at 2:29 AM, Chang Shawn  wrote:
> 
> After 'make' succesfully on my macOS 10.13.6, I run 'make check', but fail on 
> test case "SVNTest.DiffPatch".The error output is:
> 
> [--] 2 tests from SVNTest
> 
> [ RUN  ] SVNTest.DiffPatch
> 
> *** Aborted at 1536051660 (unix time) try "date -d @1536051660" if you are 
> using GNU date ***
> 
> PC: @0x1094239d6 apr_pool_create_ex
> 
> *** SIGSEGV (@0x30) received by PID 84174 (TID 0x7fff8a2b6380) stack trace: 
> ***
> 
> @ 0x7fff51ab0f5a _sigtramp
> 
> @0x0 (unknown)
> 
> @0x10922380e svn_pool_create_ex
> 
> @0x107e13f4e svn::diff()
> 
> @0x107e133eb SVNTest_DiffPatch_Test::TestBody()
> 
> @0x107fbbebe 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> 
> @0x107f5c01b 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> 
> @0x107f5bf46 testing::Test::Run()
> 
> @0x107f5dd5d testing::TestInfo::Run()
> 
> @0x107f5f38c testing::TestCase::Run()
> 
> @0x107f6fbac testing::internal::UnitTestImpl::RunAllTests()
> 
> @0x107fbf14e 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> 
> @0x107f6f5db 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> 
> @0x107f6f49c testing::UnitTest::Run()
> 
> @0x107b51ab1 RUN_ALL_TESTS()
> 
> @0x107b51825 main
> 
> @ 0x7fff517a2015 start
> 
> make[6]: *** [check-local] Segmentation fault: 11
> 
> make[5]: *** [check-am] Error 2
> 
> make[4]: *** [check-recursive] Error 1
> 
> make[3]: *** [check] Error 2
> 
> make[2]: *** [check-recursive] Error 1
> 
> make[1]: *** [check] Error 2
> 
> make: *** [check-recursive] Error 1
> 
> So I run with ./bin/mesos-tests.sh --gtest_filter="SVNTest.DiffPatch" try to 
> get more information, but it seems that tests passed:

The SVN tests are part of stout (but are run during make check):

 ./3rdparty/stout/stout-tests --gtest_list_tests 
--gtest_filter="SVNTest.DiffPatch"
SVNTest.
  DiffPatch

J

[API WG] Meeting today

2018-09-04 Thread Greg Mann
Hi all,
We're having an API working group meeting this morning at 11am PST. I'll be
facilitating a discussion about the future of metrics in Mesos. If you have
any other topics for  discussion, feel free to add them to the agenda:
https://docs.google.com/document/d/1JrF7pA6gcBZ6iyeP5YgDG62ifn0cZIBWw1f_Ler6fLM/edit

Cheers,
Greg


make check failed, but mesos-tests.sh --gtest_filter="SVNTest.DiffPatch" tests passed

2018-09-04 Thread Chang Shawn
After 'make' succesfully on my macOS 10.13.6, I run 'make check', but fail
on test case "SVNTest.DiffPatch".The error output is:


























































*[--] 2 tests from SVNTest[ RUN  ] SVNTest.DiffPatch*** Aborted
at 1536051660 (unix time) try "date -d @1536051660" if you are using GNU
date ***PC: @0x1094239d6 apr_pool_create_ex*** SIGSEGV (@0x30)
received by PID 84174 (TID 0x7fff8a2b6380) stack trace: ***@
0x7fff51ab0f5a _sigtramp@0x0 (unknown)@
 0x10922380e svn_pool_create_ex@0x107e13f4e svn::diff()@
 0x107e133eb SVNTest_DiffPatch_Test::TestBody()@0x107fbbebe
testing::internal::HandleSehExceptionsInMethodIfSupported<>()@
 0x107f5c01b testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x107f5bf46 testing::Test::Run()@0x107f5dd5d
testing::TestInfo::Run()@0x107f5f38c testing::TestCase::Run()
  @0x107f6fbac testing::internal::UnitTestImpl::RunAllTests()@
   0x107fbf14e
testing::internal::HandleSehExceptionsInMethodIfSupported<>()@
 0x107f6f5db testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x107f6f49c testing::UnitTest::Run()@0x107b51ab1
RUN_ALL_TESTS()@0x107b51825 main@ 0x7fff517a2015
startmake[6]: *** [check-local] Segmentation fault: 11make[5]: ***
[check-am] Error 2make[4]: *** [check-recursive] Error 1make[3]: ***
[check] Error 2make[2]: *** [check-recursive] Error 1make[1]: *** [check]
Error 2make: *** [check-recursive] Error 1*
So I run with ./bin/mesos-tests.sh --gtest_filter="SVNTest.DiffPatch" try
to get more information, but it seems that tests passed:

























































































*-We cannot run
any aufs tests because:aufs tests not supported on non-Linux
systems--We
cannot run any Docker tests because:Docker tests are not supported on this
platform-PING
google.com  (172.217.161.142): 56 data bytes---
google.com  ping statistics ---1 packets transmitted, 0
packets received, 100.0% packet
loss-We cannot
run any INTERNET tests because no internet
access--No
'logrotate' command found so no 'logrotate' testswill be
runCannot
enable net_cls cgroup subsystem associated test casessince this platform
does not support
cgroups.No
'nvidia-smi' command found so no Nvidia GPU tests will
run--We
cannot run any overlayfs tests because:overlayfs tests not supported on
non-Linux
systems--Tests
using 'perf' cannot be run on non-Linux
systems--No
usable unprivileged user found from the 'SUDO_USER'environment variable. So
tests that rely on an unprivilegeduser will not
run--We
cannot run any xfs tests because:xfs tests not supported on non-Linux
systems-[==]
Running 0 tests from 0 test cases.[==] 0 tests from 0 test cases
ran. (4 ms total)[  PASSED  ] 0 tests.*