[jira] [Created] (MESOS-8686) Mesos build failed with /permissive- + MSVC on windows

2018-03-19 Thread PhoebeHui (JIRA)
PhoebeHui created MESOS-8686:


 Summary: Mesos build failed with /permissive- + MSVC on windows 
 Key: MESOS-8686
 URL: https://issues.apache.org/jira/browse/MESOS-8686
 Project: Mesos
  Issue Type: Bug
  Components: build
 Environment: VS2017 15.5.7 + windows server 2016
Reporter: PhoebeHui


Mesos(master branch) failed with error C2276 when build with permissive- with 
MSVC, this should be source issue, the code is trying to use a member function 
of a dependent base class. 

Noted that this issue only found when compiles with unreleased vctoolset, that 
next release of MSVC will have this behavior.

 

On line#528 and #529 of 
"D:\Mesos\src\3rdparty\libprocess\src\tests\benchmarks.cpp"

    dispatch(self(), &Self::_handler).then(

        defer(self(), &Self::handler, data));

 

Should be

    dispatch(*this->*self(), &Self::_handler).then(

        defer(*this->*self(), &Self::handler, data));

 

Failures like

D:\Mesos\src\3rdparty\libprocess\src\tests\benchmarks.cpp(566): error C2276: 
'&': illegal operation on bound member function expression

 

*Environment:*

VS2017 15.5.7 + windows server 2016

 

*Repro steps:*
 # git clone -c core.autocrlf=true https://github.com/apache/mesos D:\mesos\src
 # cd d:\mesos\src
 # .\bootstrap.bat
 # cd..
 # set _CL_=/D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING /permissive-
 # mkdir build_x64 && pushd build_x64
 # cmake ..\src -G "Visual Studio 15 2017 Win64" 
-DCMAKE_SYSTEM_VERSION=10.0.16299.0 -DENABLE_LIBEVENT=1 -DHAS_AUTHENTICATION=0 
-DPATCHEXE_PATH="C:\gnuwin32\bin" -T host=x64
 # msbuild Mesos.sln /p:Configuration=Debug /p:Platform=x64 /maxcpucount:4 
/t:Rebuild



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8635) Flaky tests on ARM

2018-03-19 Thread Tomasz Janiszewski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Janiszewski reassigned MESOS-8635:
-

Resolution: Fixed
  Assignee: Tomasz Janiszewski

Fixed by disabling libtool wrapper in ARM CI by [~jpe...@apache.org] 
https://lists.apache.org/thread.html/f97ba197a5afa6e595c4eb16177471d7e271cb5273b61b4f61a54d4f@%3Cdev.mesos.apache.org%3E

> Flaky tests on ARM
> --
>
> Key: MESOS-8635
> URL: https://issues.apache.org/jira/browse/MESOS-8635
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Tomasz Janiszewski
>Assignee: Tomasz Janiszewski
>Priority: Major
>  Labels: arm
>
> Some tests are failing on ARM when ran together. When ran separately they are 
> passing.
> Problematic tests are listed below:
> https://builds.apache.org/job/Mesos-Buildbot-ARM/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-java%20--disable-python,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,label_exp=arm/403/display/redirect?page=changes
> {code:java}
> [  FAILED  ] CommandExecutorCheckTest.CommandCheckDeliveredAndReconciled
> [  FAILED  ] CommandExecutorCheckTest.HTTPCheckDelivered
> [  FAILED  ] DefaultExecutorCheckTest.CommandCheckDeliveredAndReconciled
> [  FAILED  ] DefaultExecutorCheckTest.CommandCheckStatusChange
> [  FAILED  ] DefaultExecutorCheckTest.CommandCheckSeesParentsEnv
> [  FAILED  ] DefaultExecutorCheckTest.CommandCheckSharesWorkDirWithTask
> [  FAILED  ] DefaultExecutorCheckTest.CommandCheckTimeout
> [  FAILED  ] DefaultExecutorCheckTest.MultipleTasksWithChecks
> [  FAILED  ] DefaultExecutorCheckTest.HTTPCheckDelivered
> [  FAILED  ] DefaultExecutorCheckTest.TCPCheckDelivered
> [  FAILED  ] HealthCheckTest.DefaultExecutorCommandHealthCheck
> [  FAILED  ] HealthCheckTest.DefaultExecutorWithDockerImageCommandHealthCheck
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8687) Check failure in `ProcessBase::_consume()`.

2018-03-19 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8687:
--

 Summary: Check failure in `ProcessBase::_consume()`.
 Key: MESOS-8687
 URL: https://issues.apache.org/jira/browse/MESOS-8687
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 1.6.0
 Environment: ec2 CentOS 7 with SSL
Reporter: Alexander Rukletsov
 Attachments: MasterFailover-badrun.txt

Observed a segfault in the {{MasterAPITest.MasterFailover}} test:
{noformat}
10:59:04 I0319 10:59:04.312197  3274 master.cpp:649] Authorization enabled
10:59:04 F0319 10:59:04.312772  3274 owned.hpp:110] Check failed: 'get()' Must 
be non NULL
10:59:04 *** Check failure stack trace: ***
10:59:04 I0319 10:59:04.313470  3279 hierarchical.cpp:175] Initialized 
hierarchical allocator process
10:59:04 I0319 10:59:04.313500  3279 whitelist_watcher.cpp:77] No whitelist 
given
10:59:04 @ 0x7fe82d44e0cd  google::LogMessage::Fail()
10:59:04 @ 0x7fe82d44ff1d  google::LogMessage::SendToLog()
10:59:04 @ 0x7fe82d44dcb3  google::LogMessage::Flush()
10:59:04 @ 0x7fe82d450919  google::LogMessageFatal::~LogMessageFatal()
10:59:04 @ 0x7fe82d3cee16  google::CheckNotNull<>()
10:59:04 @ 0x7fe82d3b4253  process::ProcessBase::_consume()
10:59:04 @ 0x7fe82d3b4a66  
_ZNO6lambda12CallableOnceIFN7process6FutureINS1_4http8ResponseEEEvEE10CallableFnINS_8internal7PartialIZNS1_11ProcessBase7consumeEONS1_9HttpEventEEUlRKNS1_5OwnedINS3_7Request_JSG_clEv
10:59:04 @ 0x7fe82c39c3ca  
_ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureINS1_4http8ResponseclINS0_IFSE_vESE_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISD_EESt14default_deleteISQ_EEOSI_S3_E_JST_SI_St12_PlaceholderILi1EEclEOS3_
10:59:04 @ 0x7fe82d39f2c1  process::ProcessBase::consume()
10:59:04 @ 0x7fe82d3b84da  process::ProcessManager::resume()
10:59:04 @ 0x7fe82d3bbf56  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
10:59:04 @ 0x7fe82d577870  execute_native_thread_routine
10:59:04 @ 0x7fe82a761e25  start_thread
10:59:04 @ 0x7fe82986334d  __clone
{noforma}
Full test log is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-03-19 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405054#comment-16405054
 ] 

Andrei Budnik commented on MESOS-8545:
--

Steps to reproduce:
 1. Add `::sleep(1)` before [sending http 
response|https://github.com/apache/mesos/blob/95bbe784da51b3a7eaeb9127e2541ea0b2af07b5/3rdparty/libprocess/src/http.cpp#L1741]
 to a socket.
 2. Recompile and run: `make check 
GTEST_FILTER=ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0`

> AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
> ---
>
> Key: MESOS-8545
> URL: https://issues.apache.org/jira/browse/MESOS-8545
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: Mesosphere, flaky-test
> Attachments: 
> AgentAPIStreamingTest.AttachInputToNestedContainerSession-badrun.txt, 
> AgentAPIStreamingTest.AttachInputToNestedContainerSession-badrun2.txt
>
>
> {code:java}
> I0205 17:11:01.091872 4898 http_proxy.cpp:132] Returning '500 Internal Server 
> Error' for '/slave(974)/api/v1' (Disconnected)
> /home/centos/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-centos-7/mesos/src/tests/api_tests.cpp:6596:
>  Failure
> Value of: (response).get().status
> Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: "Disconnected"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8688) Persistent volumes under taskgroup may not be writable due to executor user under root.

2018-03-19 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-8688:
---

 Summary: Persistent volumes under taskgroup may not be writable 
due to executor user under root.
 Key: MESOS-8688
 URL: https://issues.apache.org/jira/browse/MESOS-8688
 Project: Mesos
  Issue Type: Bug
  Components: executor
Reporter: Gilbert Song


If the executor and the task are with different users, the persistent volume 
may not be writable. For example, in the case of the default executor consuming 
persistent volumes, if the executor is root user (from the frameworkinfo) and 
the task is non-root user (from the commandinfo), the persistent volume would 
be owned by the root which is unwritable for the task.

This is caused by the persistent volume support for nested container with the 
default executor is a workaround (rely on the default executor specifying a 
sandbox_path volume). We should figure out a correct way to support persistent 
volume primitive for nested containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8689) Add -Werror semantics to the CMake build

2018-03-19 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-8689:
---

 Summary: Add -Werror semantics to the CMake build
 Key: MESOS-8689
 URL: https://issues.apache.org/jira/browse/MESOS-8689
 Project: Mesos
  Issue Type: Improvement
Reporter: Andrew Schwartzmeyer


While MESOS-8658 got us to warning-parity with Autotools, we still don't have 
-Werror turned on because we need to figure out precisely which it'll apply to. 
Autotools adds it to {{MESOS_CPPFLAGS}} which are then included in almost all 
targets under {{src}}, but when we tried the equivalent via 
{{add_compile_options}} we had issues due to warnings from protobufs.

Ideally we turned it on using {{target_compile_options(foo PRIVATE)}} so that 
it's not directory-wide nor inheritable; but that'll mean figuring out which 
targets to turn it on for. We probably should turn it on for stout-tests, 
libprocess, and libprocess-tests too, which the Autotools build does not yet do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8690) CMake mkdir commands always run

2018-03-19 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-8690:
---

 Summary: CMake mkdir commands always run
 Key: MESOS-8690
 URL: https://issues.apache.org/jira/browse/MESOS-8690
 Project: Mesos
  Issue Type: Bug
Reporter: Andrew Schwartzmeyer


{noformat}
> ninja -v tests
[1/3] cd /home/andschwa/src/mesos/build/warnings && /usr/bin/cmake -E 
make_directory /home/andschwa/src/mesos/build/warnings/src
[2/3] cd /home/andschwa/src/mesos/build/warnings && /usr/bin/cmake -E 
make_directory /home/andschwa/src/mesos/build/warnings/include
[3/3] cd /home/andschwa/src/mesos/build/warnings/src && /usr/bin/cmake -E 
make_directory /home/andschwa/src/mesos/build/warnings/include/csi
{noformat}

This happens on every rebuild. Our custom targets to create src, include, and 
include/csi directories are running everytime instead of just once.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8691) Forward CXX_FLAGS to C++ projects and C_FLAGS to C projects in CMake

2018-03-19 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-8691:
---

Assignee: Andrew Schwartzmeyer

> Forward CXX_FLAGS to C++ projects and C_FLAGS to C projects in CMake
> 
>
> Key: MESOS-8691
> URL: https://issues.apache.org/jira/browse/MESOS-8691
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>
> Right now we forward CXX and C flags together to all 3rdparty depedencies. 
> This results in warnings from CMake for the projects that don't use these 
> flags. Specifically, dependencies like libevent which are C libraries ignore 
> the CXX flags. 
> We should instead pass the C flags only (and generic CMake flags) to these 
> dependencies so they don't complain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8691) Forward CXX_FLAGS to C++ projects and C_FLAGS to C projects in CMake

2018-03-19 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-8691:
---

 Summary: Forward CXX_FLAGS to C++ projects and C_FLAGS to C 
projects in CMake
 Key: MESOS-8691
 URL: https://issues.apache.org/jira/browse/MESOS-8691
 Project: Mesos
  Issue Type: Bug
Reporter: Andrew Schwartzmeyer


Right now we forward CXX and C flags together to all 3rdparty depedencies. This 
results in warnings from CMake for the projects that don't use these flags. 
Specifically, dependencies like libevent which are C libraries ignore the CXX 
flags. 

We should instead pass the C flags only (and generic CMake flags) to these 
dependencies so they don't complain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8258) Mesos.DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer is flaky.

2018-03-19 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405402#comment-16405402
 ] 

Alexander Rukletsov commented on MESOS-8258:


Disabled this test for now.

> Mesos.DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer is flaky.
> --
>
> Key: MESOS-8258
> URL: https://issues.apache.org/jira/browse/MESOS-8258
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 16.04
> Ubuntu 17.04
> Debian 9
>Reporter: Alexander Rukletsov
>Priority: Major
>  Labels: flaky-test
> Attachments: ROOT_DOCKER_SlaveRecoveryTaskContainer-badrun.txt, 
> ROOT_DOCKER_SlaveRecoveryTaskContainer-badrun2.txt
>
>
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-17.04/mesos/src/tests/containerizer/docker_containerizer_tests.cpp:2772
>   Expected: 1
> To be equal to: reregister.updates_size()
>   Which is: 2
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8033) Use more idiomatic CMake for compiler features

2018-03-19 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-8033:
---

   Resolution: Fixed
 Assignee: Andrew Schwartzmeyer
Fix Version/s: 1.6.0

commit 0c18ca176
Author: Andrew Schwartzmeyer 
Date:   Thu Mar 8 10:03:48 2018 -0800

CMake: Set C++11 as standard automatically.

Instead of setting the compiler option manually, we use the
`CMAKE_CXX_STANDARD` variable to set the default for all targets. This
automatically appends the correct flag for each compiler.

Review: https://reviews.apache.org/r/66007

commit 5cb101761
Author: Andrew Schwartzmeyer 
Date:   Thu Mar 8 14:56:22 2018 -0800

CMake: Enabled compiler warnings.

We had previously been using the default sets of warnings, but now we
use the same warnings as on Autotools. This meant disabling two common
possible-loss-of-data warnings on Windows that are not part of the
GNU/Clang default warnings.

This also replaces the use of `string(APPEND CMAKE_CXX_FLAGS)` with
the canonical command `add_compile_options`. Although generally the
use of `target_compile_options` is preferred, it would currently
result in a lot more churn, and the build already supports setting
these flags globally.

Review: https://reviews.apache.org/r/66008

> Use more idiomatic CMake for compiler features
> --
>
> Key: MESOS-8033
> URL: https://issues.apache.org/jira/browse/MESOS-8033
> Project: Mesos
>  Issue Type: Improvement
>  Components: cmake
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: cmake
> Fix For: 1.6.0
>
>
> Specifically, we should replace
> {noformat}
>   string(APPEND CMAKE_CXX_FLAGS " -std=c++11")
> {noformat}
> With {{CMAKE_CXX_STANDARD}}, and use [compile feature 
> requirements|https://cmake.org/cmake/help/latest/manual/cmake-compile-features.7.html#compile-feature-requirements].
> And replace
> {noformat}
>   string(APPEND CMAKE_CXX_FLAGS " -Wformat-security")
> {noformat}
> With compile options instead of appending to {{CMAKE_CXX_FLAGS}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky

2018-03-19 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405417#comment-16405417
 ] 

Alexander Rukletsov commented on MESOS-3160:


Disabled this test for now.

> CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky
> 
>
> Key: MESOS-3160
> URL: https://issues.apache.org/jira/browse/MESOS-3160
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04
> CentOS 7
>Reporter: Paul Brett
>Assignee: Greg Mann
>Priority: Major
>  Labels: cgroups, flaky-test, mesosphere
>
> Test will occasionally with:
> [ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): Failed to sync with the subprocess
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet
> [  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS 
> (223 ms)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8692) Replace _chsize_s with SetEndOfFile on Windows

2018-03-19 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-8692:
---

 Summary: Replace _chsize_s with SetEndOfFile on Windows
 Key: MESOS-8692
 URL: https://issues.apache.org/jira/browse/MESOS-8692
 Project: Mesos
  Issue Type: Task
Reporter: Andrew Schwartzmeyer
Assignee: Andrew Schwartzmeyer


The function {{os::ftruncate}} on Windows uses the CRT API {{_chsize_s}} which 
requires an integer file descriptor. We can replace this with the semantically 
similar if not equivalent logic of {{SetFilePointer}} followed by 
{{SetEndOfFile}}. Major different is that it doesn't write null bytes when 
extending the file; it leaves the data uninitialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8693) agent: update_resource_provider w/ identical RP info should not always force-restart plugin

2018-03-19 Thread James DeFelice (JIRA)
James DeFelice created MESOS-8693:
-

 Summary: agent: update_resource_provider w/ identical RP info 
should not always force-restart plugin
 Key: MESOS-8693
 URL: https://issues.apache.org/jira/browse/MESOS-8693
 Project: Mesos
  Issue Type: Task
Affects Versions: 1.5.0
Reporter: James DeFelice


Currently when the UPDATE_RESOURCE_PROVIDER call is sent to an agent, and the 
RP info of the request is identical to that of the running configuration, the 
agent force-restarts the related CSI plugin. This is surprising on two accounts:

First, because it increases the complexity of the client that wants to ensure 
the latest RP configuration is pushed to the agent. A CSI plugin may take a 
long time to become ready after being reconfigured. It's likely that a caller 
will experience a timeout while waiting for the RP to come into a healthy state 
w/ the desired configuration. Upon retrying the update, a client DOES NOT 
always wish to restart an ongoing reconfiguration effort – especially when for 
long running reconfiguration operations. Mesos should NOT restart the related 
CSI plugin by default if the new RP info matches the existing one, and instead 
should either return 409 or some other, more appropriate error code (409 would 
be nice/consistent, see below).

Second, because it differs from the idempotent nature of the 
ADD_RESOURCE_PROVIDER call, which does NOT change the state of the plugin in 
case of a duplicate request. The ADD_RESOURCE_PROVIDER call returns a 409 
response, which allows callers to simply re-issue redundant requests without 
concern for interrupting the state of a running plugin.

In the event that caller DOES want to force the restart of an underlying CSI 
plugin, I suggest that we extend the UPDATE_RESOURCE_PROVIDER call w/ a 
"force_restart" field (sibling to the "info" field). "force_restart == true" 
would only have meaning for updates that involve unchanged RP info, otherwise 
it would go unused.

/cc [~jieyu] [~chhsia0]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8578) `UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole` is flaky.

2018-03-19 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405476#comment-16405476
 ] 

Alexander Rukletsov commented on MESOS-8578:


Observed the same failure on the same machine.

> `UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole` is flaky.
> --
>
> Key: MESOS-8578
> URL: https://issues.apache.org/jira/browse/MESOS-8578
> Project: Mesos
>  Issue Type: Bug
> Environment: Debian 9 SSL GRPC
>Reporter: Andrei Budnik
>Priority: Major
>  Labels: flaky-test, mesosphere
> Attachments: 
> UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole-badrun.txt
>
>
> {code:java}
> ../../src/tests/upgrade_tests.cpp:664
> Failed to wait 15secs for offers
> {code}
> See logs in attachments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7357) Command checks via agent implicitly set up IOSwitchboard but do not use it.

2018-03-19 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405512#comment-16405512
 ] 

Gilbert Song commented on MESOS-7357:
-

Compared to defining isolations for debug containers exclusively, probably they 
should be inclusive in isolations?

> Command checks via agent implicitly set up IOSwitchboard but do not use it.
> ---
>
> Key: MESOS-7357
> URL: https://issues.apache.org/jira/browse/MESOS-7357
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Alexander Rukletsov
>Priority: Major
>  Labels: check, health-check, mesosphere
>
> Command checks via agent leverage launching debug containers via agent API to 
> start check commands. This means IOSwitchboard is also set up despite not 
> being used. To improve performance, we should bypass IOSwtichboard altogether 
> or at least fast track its cleanup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8694) Possibly unkillable task for customer docker executor if the docker daemon fails to capture the container exit code.

2018-03-19 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-8694:
---

 Summary: Possibly unkillable task for customer docker executor if 
the docker daemon fails to capture the container exit code.
 Key: MESOS-8694
 URL: https://issues.apache.org/jira/browse/MESOS-8694
 Project: Mesos
  Issue Type: Bug
  Components: docker
Reporter: Gilbert Song


Possibly unkillable task for customer docker executor if the docker daemon 
fails to capture the container exit code.

/cc [~qianzhang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8694) Possibly unkillable task for customer docker executor if the docker daemon fails to capture the container exit code.

2018-03-19 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405550#comment-16405550
 ] 

Gilbert Song commented on MESOS-8694:
-

/cc [~zhitao]

> Possibly unkillable task for customer docker executor if the docker daemon 
> fails to capture the container exit code.
> 
>
> Key: MESOS-8694
> URL: https://issues.apache.org/jira/browse/MESOS-8694
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Gilbert Song
>Priority: Major
>  Labels: containerizer, docker
>
> Possibly unkillable task for customer docker executor if the docker daemon 
> fails to capture the container exit code.
> /cc [~qianzhang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8695) Consider doing a corresponding workaround for docker hanging inspect for the custom docker executor.

2018-03-19 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-8695:
---

 Summary: Consider doing a corresponding workaround for docker 
hanging inspect for the custom docker executor.
 Key: MESOS-8695
 URL: https://issues.apache.org/jira/browse/MESOS-8695
 Project: Mesos
  Issue Type: Improvement
  Components: containerization, docker
Reporter: Gilbert Song


Consider doing a corresponding workaround for docker hanging inspect for the 
custom docker executor.

/cc [~abudnik] [~zhitao]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8696) Add 'user' field to ContainerInfo and deprecate the 'user' in CommandInfo.

2018-03-19 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-8696:
---

 Summary: Add 'user' field to ContainerInfo and deprecate the 
'user' in CommandInfo.
 Key: MESOS-8696
 URL: https://issues.apache.org/jira/browse/MESOS-8696
 Project: Mesos
  Issue Type: Improvement
  Components: containerization, security
Reporter: Gilbert Song






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8697) Make gRPC-related tests compatible to Windows.

2018-03-19 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-8697:
--

 Summary: Make gRPC-related tests compatible to Windows.
 Key: MESOS-8697
 URL: https://issues.apache.org/jira/browse/MESOS-8697
 Project: Mesos
  Issue Type: Task
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao


Since gRPC support in-process channels, we should use them in the related unit 
tests to make the tests cross-platform.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8698) Enable gRPC-based CSI support in CMake.

2018-03-19 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-8698:
--

 Summary: Enable gRPC-based CSI support in CMake.
 Key: MESOS-8698
 URL: https://issues.apache.org/jira/browse/MESOS-8698
 Project: Mesos
  Issue Type: Task
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao


We should be able to generate gRPC files for CSI and build related code with 
CMake.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8463) Test MasterAllocatorTest/1.SingleFramework is flaky

2018-03-19 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405626#comment-16405626
 ] 

Till Toenshoff commented on MESOS-8463:
---

The solution above was too invasive and not matching the problem - needs to be 
updated. See MESOS-8613

In fact, options (1) and (2) shall be ignored for now while reducing this fix 
towards (3) implemented by
{noformat}
EXPECT_CALL(allocator, addSlave(_, _, _, _, _, _))
  .WillOnce(DoAll(InvokeAddSlave(&allocator))
  .WillRepeatedly(Return());
{noformat}

One problematic area are the multi-slave tests - we can not simply use the 
above pattern for those and thus they remain flaky in their design.

> Test MasterAllocatorTest/1.SingleFramework is flaky
> ---
>
> Key: MESOS-8463
> URL: https://issues.apache.org/jira/browse/MESOS-8463
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, test
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>Assignee: Till Toenshoff
>Priority: Major
>  Labels: flaky-test
> Fix For: 1.6.0
>
> Attachments: consoleText.txt
>
>
> Observed in our internal CI on a ubuntu-16 setup in a plain autotools build,
> {noformat}
> ../../src/tests/master_allocator_tests.cpp:175
> Mock function called more times than expected - taking default action 
> specified at:
> ../../src/tests/allocator.hpp:273:
> Function call: addSlave(@0x7fe8dc03d0e8 
> 1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1, @0x7fe8dc03d108 hostname: 
> "ip-172-16-10-65.ec2.internal"
> resources {
>   name: "cpus"
>   type: SCALAR
>   scalar {
> value: 2
>   }
> }
> resources {
>   name: "mem"
>   type: SCALAR
>   scalar {
> value: 1024
>   }
> }
> resources {
>   name: "ports"
>   type: RANGES
>   ranges {
> range {
>   begin: 31000
>   end: 32000
> }
>   }
> }
> id {
>   value: "1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1"
> }
> checkpoint: true
> port: 40262
> , @0x7fe8ffa276c0 { 32-byte object <48-94 7D-0E E9-7F 00-00 00-00 00-00 00-00 
> 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00>, 32-byte object <48-94 
> 7D-0E E9-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 02-00 00-00 
> 00-00 00-00>, 32-byte object <48-94 7D-0E E9-7F 00-00 00-00 00-00 00-00 00-00 
> 01-00 00-00 00-00 00-00 03-00 00-00 73-79 73-74> }, @0x7fe8ffa27720 48-byte 
> object <01-00 00-00 E8-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 08-7A A2-FF E8-7F 00-00 A0-32 24-7D 62-55 00-00 DE-3C 11-0A E9-7F 
> 00-00>, @0x7fe8dc03d4c8 { cpus:2, mem:1024, ports:[31000-32000] }, 
> @0x7fe8dc03d460 {})
>  Expected: to be called once
>Actual: called twice - over-saturated and active
> Stacktrace
> ../../src/tests/master_allocator_tests.cpp:175
> Mock function called more times than expected - taking default action 
> specified at:
> ../../src/tests/allocator.hpp:273:
> Function call: addSlave(@0x7fe8dc03d0e8 
> 1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1, @0x7fe8dc03d108 hostname: 
> "ip-172-16-10-65.ec2.internal"
> resources {
>   name: "cpus"
>   type: SCALAR
>   scalar {
> value: 2
>   }
> }
> resources {
>   name: "mem"
>   type: SCALAR
>   scalar {
> value: 1024
>   }
> }
> resources {
>   name: "ports"
>   type: RANGES
>   ranges {
> range {
>   begin: 31000
>   end: 32000
> }
>   }
> }
> id {
>   value: "1eb6ab2c-293d-4b99-b76b-87bd939a1a19-S1"
> }
> checkpoint: true
> port: 40262
> , @0x7fe8ffa276c0 { 32-byte object <48-94 7D-0E E9-7F 00-00 00-00 00-00 00-00 
> 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00>, 32-byte object <48-94 
> 7D-0E E9-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 02-00 00-00 
> 00-00 00-00>, 32-byte object <48-94 7D-0E E9-7F 00-00 00-00 00-00 00-00 00-00 
> 01-00 00-00 00-00 00-00 03-00 00-00 73-79 73-74> }, @0x7fe8ffa27720 48-byte 
> object <01-00 00-00 E8-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 08-7A A2-FF E8-7F 00-00 A0-32 24-7D 62-55 00-00 DE-3C 11-0A E9-7F 
> 00-00>, @0x7fe8dc03d4c8 { cpus:2, mem:1024, ports:[31000-32000] }, 
> @0x7fe8dc03d460 {})
>  Expected: to be called once
>Actual: called twice - over-saturated and active
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8578) UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole is flaky.

2018-03-19 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405665#comment-16405665
 ] 

Michael Park commented on MESOS-8578:
-

Looks to me like we should invoke {{EXPECT_CALL}} before {{detector.appoint}}:
https://github.com/apache/mesos/blob/50de784d589d2fc476f7869f39822c77c50745f3/src/tests/upgrade_tests.cpp#L650-L653

> UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole is flaky.
> 
>
> Key: MESOS-8578
> URL: https://issues.apache.org/jira/browse/MESOS-8578
> Project: Mesos
>  Issue Type: Bug
> Environment: Debian 9 SSL GRPC
>Reporter: Andrei Budnik
>Priority: Major
>  Labels: flaky-test, mesosphere
> Attachments: 
> UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole-badrun.txt
>
>
> {code:java}
> ../../src/tests/upgrade_tests.cpp:664
> Failed to wait 15secs for offers
> {code}
> See logs in attachments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8578) UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole is flaky.

2018-03-19 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405665#comment-16405665
 ] 

Michael Park edited comment on MESOS-8578 at 3/20/18 1:00 AM:
--

Looks to me like we should install the {{EXPECT_CALL}} before 
{{detector.appoint(...)}}:
https://github.com/apache/mesos/blob/50de784d589d2fc476f7869f39822c77c50745f3/src/tests/upgrade_tests.cpp#L650-L653


was (Author: mcypark):
Looks to me like we should invoke {{EXPECT_CALL}} before {{detector.appoint}}:
https://github.com/apache/mesos/blob/50de784d589d2fc476f7869f39822c77c50745f3/src/tests/upgrade_tests.cpp#L650-L653

> UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole is flaky.
> 
>
> Key: MESOS-8578
> URL: https://issues.apache.org/jira/browse/MESOS-8578
> Project: Mesos
>  Issue Type: Bug
> Environment: Debian 9 SSL GRPC
>Reporter: Andrei Budnik
>Priority: Major
>  Labels: flaky-test, mesosphere
> Attachments: 
> UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole-badrun.txt
>
>
> {code:java}
> ../../src/tests/upgrade_tests.cpp:664
> Failed to wait 15secs for offers
> {code}
> See logs in attachments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8578) UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole is flaky.

2018-03-19 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405665#comment-16405665
 ] 

Michael Park edited comment on MESOS-8578 at 3/20/18 1:01 AM:
--

Looks to me like we should install the {{EXPECT_CALL}} before 
{{detector.appoint(...)}} 
[here|https://github.com/apache/mesos/blob/50de784d589d2fc476f7869f39822c77c50745f3/src/tests/upgrade_tests.cpp#L650-L653]


was (Author: mcypark):
Looks to me like we should install the {{EXPECT_CALL}} before 
{{detector.appoint(...)}}:
https://github.com/apache/mesos/blob/50de784d589d2fc476f7869f39822c77c50745f3/src/tests/upgrade_tests.cpp#L650-L653

> UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole is flaky.
> 
>
> Key: MESOS-8578
> URL: https://issues.apache.org/jira/browse/MESOS-8578
> Project: Mesos
>  Issue Type: Bug
> Environment: Debian 9 SSL GRPC
>Reporter: Andrei Budnik
>Priority: Major
>  Labels: flaky-test, mesosphere
> Attachments: 
> UpgradeTest.UpgradeAgentIntoHierarchicalRoleForNonHierarchicalRole-badrun.txt
>
>
> {code:java}
> ../../src/tests/upgrade_tests.cpp:664
> Failed to wait 15secs for offers
> {code}
> See logs in attachments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8699) mesos agent id is empty in agent HTTP endpoint /state or /state.json

2018-03-19 Thread pwzgorilla (JIRA)
pwzgorilla created MESOS-8699:
-

 Summary: mesos agent id is empty in agent HTTP endpoint /state or 
/state.json
 Key: MESOS-8699
 URL: https://issues.apache.org/jira/browse/MESOS-8699
 Project: Mesos
  Issue Type: Bug
  Components: agent
Reporter: pwzgorilla


when get data from /state or /state.json http endpoint on agent, agent id is 
empty:

 

"frameworks": [],

    "git_sha": "f7e3872b0359c6095f8eeaefe408cb7dcef5bb83",

    "git_tag": "1.5.0",

    "hostname": "192.168.99.105",

    {color:#FF}*"id": "",*{color}

    "log_dir": "/var/log/mesos",

    "pid": "slave(1)@127.0.0.1:5051",

    "reserved_resources": {},

    "reserved_resources_allocated": {},

    "reserved_resources_full": {},

    "resources": {

        "cpus": 1.0,

        "disk": 12274.0,

        "gpus": 0.0,

        "mem": 496.0,

        "ports": "[31000-32000]"

    },



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)