[jira] [Issue Comment Deleted] (MESOS-5425) Consider using IntervalSet for Port range resource math
[ https://issues.apache.org/jira/browse/MESOS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanyan Hu updated MESOS-5425: - Comment: was deleted (was: Hi, Joseph, I just made a quick test using "IntervalSet" data type: I first converted two "Ranges" values to "IntervalSet" values and performed subtraction operation between them. Then I converted the result "IntervalSet" back to "Ranges" value. Test results illustrate that the performance is much better when there are 1600 sub ranges in res2. The test result is as followed: res2 range_size execution time(second) 1 0.010 100 0.028 200 0.030 400 0.035 800 0.044 1600 0.061 So just as you suggested that using IntervalSet in Port range resource math can resolve this issue effectively.) > Consider using IntervalSet for Port range resource math > --- > > Key: MESOS-5425 > URL: https://issues.apache.org/jira/browse/MESOS-5425 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Joseph Wu > Labels: mesosphere > > Follow-up JIRA for comments raised in MESOS-3051 (see comments there). > We should consider utilizing > [{{IntervalSet}}|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/3rdparty/stout/include/stout/interval.hpp] > in [Port range resource > math|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/src/common/values.cpp#L143]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5425) Consider using IntervalSet for Port range resource math
[ https://issues.apache.org/jira/browse/MESOS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296038#comment-15296038 ] Yanyan Hu commented on MESOS-5425: -- Hi, Joseph, I just made a quick test using "IntervalSet" data type: I first converted two "Ranges" values to "IntervalSet" values and performed subtraction operation between them. Then I converted the result "IntervalSet" back to "Ranges" value. Test results illustrate that the performance is much better when there are 1600 sub ranges in res2. The test result is as followed: res2 range_size execution time(second) 1 0.010 100 0.028 200 0.030 400 0.035 800 0.044 1600 0.061 So just as you suggested that using IntervalSet in Port range resource math can resolve this issue effectively. > Consider using IntervalSet for Port range resource math > --- > > Key: MESOS-5425 > URL: https://issues.apache.org/jira/browse/MESOS-5425 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Joseph Wu > Labels: mesosphere > > Follow-up JIRA for comments raised in MESOS-3051 (see comments there). > We should consider utilizing > [{{IntervalSet}}|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/3rdparty/stout/include/stout/interval.hpp] > in [Port range resource > math|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/src/common/values.cpp#L143]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5425) Consider using IntervalSet for Port range resource math
[ https://issues.apache.org/jira/browse/MESOS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296040#comment-15296040 ] Yanyan Hu commented on MESOS-5425: -- Hi, Joseph, I just made a quick test using "IntervalSet" data type: I first converted two "Ranges" values to "IntervalSet" values and performed subtraction operation between them. Then I converted the result "IntervalSet" back to "Ranges" value. Test results illustrate that the performance is much better when there are 1600 sub ranges in res2. The test result is as followed: res2 range_size execution time(second) 1 0.010 100 0.028 200 0.030 400 0.035 800 0.044 1600 0.061 So just as you suggested that using IntervalSet in Port range resource math should be able to resolve this issue effectively. > Consider using IntervalSet for Port range resource math > --- > > Key: MESOS-5425 > URL: https://issues.apache.org/jira/browse/MESOS-5425 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Joseph Wu > Labels: mesosphere > > Follow-up JIRA for comments raised in MESOS-3051 (see comments there). > We should consider utilizing > [{{IntervalSet}}|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/3rdparty/stout/include/stout/interval.hpp] > in [Port range resource > math|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/src/common/values.cpp#L143]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5425) Consider using IntervalSet for Port range resource math
[ https://issues.apache.org/jira/browse/MESOS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296045#comment-15296045 ] Yanyan Hu commented on MESOS-5425: -- Will make more tests to see whether we can get Mesos allocator work more efficiently with this optimization. Thanks. > Consider using IntervalSet for Port range resource math > --- > > Key: MESOS-5425 > URL: https://issues.apache.org/jira/browse/MESOS-5425 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Joseph Wu > Labels: mesosphere > > Follow-up JIRA for comments raised in MESOS-3051 (see comments there). > We should consider utilizing > [{{IntervalSet}}|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/3rdparty/stout/include/stout/interval.hpp] > in [Port range resource > math|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/src/common/values.cpp#L143]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4279) Docker executor truncates task's output when the task is killed.
[ https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296080#comment-15296080 ] Martin Bydzovsky commented on MESOS-4279: - Btw, https://github.com/mesosphere/marathon/issues/2707. This bug (which we see as well) is EXACTLY the same problem. You are killing the parent process ({code}mesos/build/src/.libs/lt-mesos-docker-executor{code}) too early (and the docker stop doesn't get called at all - so the \-\-rm flag doesnt get respected by docker stop. We waste our slaves disk space like once a week - cos the containers simply doesn't get removed. Hmm, now im kinda curious, why this flag doesnt take effect at all: {code}--docker_remove_delay=VALUE The amount of time to wait before removing docker containers (e.g., 3days, 2weeks, etc). (default: 6hrs){code} But i dont want to test it unless this is fixed. > Docker executor truncates task's output when the task is killed. > > > Key: MESOS-4279 > URL: https://issues.apache.org/jira/browse/MESOS-4279 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.25.0, 0.26.0, 0.27.2, 0.28.1 >Reporter: Martin Bydzovsky >Assignee: Martin Bydzovsky >Priority: Blocker > Labels: docker, mesosphere > Fix For: 0.29.0 > > > I'm implementing a graceful restarts of our mesos-marathon-docker setup and I > came to a following issue: > (it was already discussed on > https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere > got to a point that its probably a docker containerizer problem...) > To sum it up: > When i deploy simple python script to all mesos-slaves: > {code} > #!/usr/bin/python > from time import sleep > import signal > import sys > import datetime > def sigterm_handler(_signo, _stack_frame): > print "got %i" % _signo > print datetime.datetime.now().time() > sys.stdout.flush() > sleep(2) > print datetime.datetime.now().time() > print "ending" > sys.stdout.flush() > sys.exit(0) > signal.signal(signal.SIGTERM, sigterm_handler) > signal.signal(signal.SIGINT, sigterm_handler) > try: > print "Hello" > i = 0 > while True: > i += 1 > print datetime.datetime.now().time() > print "Iteration #%i" % i > sys.stdout.flush() > sleep(1) > finally: > print "Goodbye" > {code} > and I run it through Marathon like > {code:javascript} > data = { > args: ["/tmp/script.py"], > instances: 1, > cpus: 0.1, > mem: 256, > id: "marathon-test-api" > } > {code} > During the app restart I get expected result - the task receives sigterm and > dies peacefully (during my script-specified 2 seconds period) > But when i wrap this python script in a docker: > {code} > FROM node:4.2 > RUN mkdir /app > ADD . /app > WORKDIR /app > ENTRYPOINT [] > {code} > and run appropriate application by Marathon: > {code:javascript} > data = { > args: ["./script.py"], > container: { > type: "DOCKER", > docker: { > image: "bydga/marathon-test-api" > }, > forcePullImage: yes > }, > cpus: 0.1, > mem: 256, > instances: 1, > id: "marathon-test-api" > } > {code} > The task during restart (issued from marathon) dies immediately without > having a chance to do any cleanup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5436) GPU resource broke framework data table
[ https://issues.apache.org/jira/browse/MESOS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296094#comment-15296094 ] Kevin Klues commented on MESOS-5436: I think it's OK to just put 0 or "N/A" in cases where we don't yet have the proper statistics. We can fill them in once we have them. > GPU resource broke framework data table > --- > > Key: MESOS-5436 > URL: https://issues.apache.org/jira/browse/MESOS-5436 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: gpu > Attachments: incorrect_agent_framework_page.png, > incorrect_agent_page.png > > > In agent_framework.html and master/static/agent.html, we add {{GPUs (Used / > Allocated)}} in table header. But we didn't add the corresponding column to > the table body as well. > On the other hand, we didn't provide statistics for gpus on monitor endpoints. > To provide those data in webui, it requires we implement gpus statistics in > monitor endpoints firstly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5255) Add GPUs to container resource consumption metrics.
[ https://issues.apache.org/jira/browse/MESOS-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296096#comment-15296096 ] Kevin Klues commented on MESOS-5255: It does. Can you point me at the lines of code that are causing the errors you see in the other bug? Where did I miss adding a column, etc. > Add GPUs to container resource consumption metrics. > --- > > Key: MESOS-5255 > URL: https://issues.apache.org/jira/browse/MESOS-5255 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues > Labels: gpu > > Currently the usage callback in the Nvidia GPU isolator is unimplemented: > {noformat} > src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp > {noformat} > It should use functionality from NVML to gather the current GPU usage and add > it to a ResourceStatistics object. It is still an open question as to exactly > what information we want to expose here (power, memory consumption, current > load, etc.). Whatever we decide on should be standard across different GPU > types, different GPU vendors, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5439) Got registration problem
kimjoohwan created MESOS-5439: - Summary: Got registration problem Key: MESOS-5439 URL: https://issues.apache.org/jira/browse/MESOS-5439 Project: Mesos Issue Type: Bug Components: c++ api, slave Affects Versions: 0.27.0 Reporter: kimjoohwan Currently, we are using Mesos 0.27.0. The master is build up with a Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz CPU and a 4GB RAM. The slave (Banana PI) is build up with a Cortex -A7 Dual-Core CPU and a 1GB RAM. By using the Mesos API, we have developed and completed the execution of the framework which is based on python. but, we found that it takes too much time between the messages, 'Forked child with pid' and 'Got registration for executor' from the slave log. (5sec) If you know how to deal with this problem, please let us know. I0523 17:38:16.264289 1787 slave.cpp:5208] Launching executor default of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 with resources in work directory '/tmp/mesos/slaves/3fb86eea-96c4-4b07-aaa2-caf071275bdf-S2/frameworks/3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010/executors/default/runs/1c830c9a-4120-4ef0-af80-49a52d307539' I0523 17:38:16.290601 1789 containerizer.cpp:616] Starting container '1c830c9a-4120-4ef0-af80-49a52d307539' for executor 'default' of framework '3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010' I0523 17:38:16.293285 1787 slave.cpp:1626] Queuing task '0' for executor 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 I0523 17:38:16.297369 1787 slave.cpp:4233] Current disk usage 2.14%. Max allowed age: 6.150293798159722days I0523 17:38:16.504043 1789 launcher.cpp:132] Forked child with pid '1837' for container '1c830c9a-4120-4ef0-af80-49a52d307539' I0523 17:38:21.510535 1785 slave.cpp:2573] Got registration for executor 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from executor(1)@192.168.0.8:56508 I0523 17:38:21.554608 1785 slave.cpp:1791] Sending queued task '0' to executor 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 at executor(1)@192.168.0.8:56508 I0523 17:38:21.594511 1789 slave.cpp:2932] Handling status update TASK_RUNNING (UUID: cd04ec2a-0e68-460a-ad2e-e4f504f3b032) for task 0 of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from executor(1)@192.168.0.8:56508 I0523 17:38:21.600050 1789 slave.cpp:2932] Handling status update TASK_FINISHED (UUID: 46e110c8-4078-4f98-ae30-30b3a1376034) for task 0 of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from executor(1)@192.168.0.8:56508 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5255) Add GPUs to container resource consumption metrics.
[ https://issues.apache.org/jira/browse/MESOS-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296107#comment-15296107 ] haosdent commented on MESOS-5255: - Got it, thank you for your check. At here https://reviews.apache.org/r/47719/ > Add GPUs to container resource consumption metrics. > --- > > Key: MESOS-5255 > URL: https://issues.apache.org/jira/browse/MESOS-5255 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues > Labels: gpu > > Currently the usage callback in the Nvidia GPU isolator is unimplemented: > {noformat} > src/slave/containerizer/mesos/isolators/cgroups/devices/gpus/nvidia.cpp > {noformat} > It should use functionality from NVML to gather the current GPU usage and add > it to a ResourceStatistics object. It is still an open question as to exactly > what information we want to expose here (power, memory consumption, current > load, etc.). Whatever we decide on should be standard across different GPU > types, different GPU vendors, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5439) registerExecutor problem
[ https://issues.apache.org/jira/browse/MESOS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kimjoohwan updated MESOS-5439: -- Summary: registerExecutor problem (was: Got registration problem) > registerExecutor problem > > > Key: MESOS-5439 > URL: https://issues.apache.org/jira/browse/MESOS-5439 > Project: Mesos > Issue Type: Bug > Components: c++ api, slave >Affects Versions: 0.27.0 >Reporter: kimjoohwan > > Currently, we are using Mesos 0.27.0. The master is build up with a Intel(R) > Core(TM) i5-3470 CPU @ 3.20GHz CPU and a 4GB RAM. The slave (Banana PI) is > build up with a Cortex -A7 Dual-Core CPU and a 1GB RAM. > By using the Mesos API, we have developed and completed the execution of the > framework which is based on python. > but, we found that it takes too much time between the messages, 'Forked child > with pid' and 'Got registration for executor' from the slave log. (5sec) > If you know how to deal with this problem, please let us know. > I0523 17:38:16.264289 1787 slave.cpp:5208] Launching executor default of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 with resources in work > directory > '/tmp/mesos/slaves/3fb86eea-96c4-4b07-aaa2-caf071275bdf-S2/frameworks/3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010/executors/default/runs/1c830c9a-4120-4ef0-af80-49a52d307539' > I0523 17:38:16.290601 1789 containerizer.cpp:616] Starting container > '1c830c9a-4120-4ef0-af80-49a52d307539' for executor 'default' of framework > '3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010' > I0523 17:38:16.293285 1787 slave.cpp:1626] Queuing task '0' for executor > 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 > I0523 17:38:16.297369 1787 slave.cpp:4233] Current disk usage 2.14%. Max > allowed age: 6.150293798159722days > I0523 17:38:16.504043 1789 launcher.cpp:132] Forked child with pid '1837' > for container '1c830c9a-4120-4ef0-af80-49a52d307539' > I0523 17:38:21.510535 1785 slave.cpp:2573] Got registration for executor > 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 > I0523 17:38:21.554608 1785 slave.cpp:1791] Sending queued task '0' to > executor 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 at > executor(1)@192.168.0.8:56508 > I0523 17:38:21.594511 1789 slave.cpp:2932] Handling status update > TASK_RUNNING (UUID: cd04ec2a-0e68-460a-ad2e-e4f504f3b032) for task 0 of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 > I0523 17:38:21.600050 1789 slave.cpp:2932] Handling status update > TASK_FINISHED (UUID: 46e110c8-4078-4f98-ae30-30b3a1376034) for task 0 of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5440) There is a misspelling in some markdown files
GyeongWon, Do created MESOS-5440: Summary: There is a misspelling in some markdown files Key: MESOS-5440 URL: https://issues.apache.org/jira/browse/MESOS-5440 Project: Mesos Issue Type: Documentation Reporter: GyeongWon, Do Priority: Trivial "This endpoint requires authentication {color:red}iff{color} HTTP authentication is enabled." I think iff is misspelling about if, is it right? There are many occurrences about that statement in many markdown files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3139) Incorporate CMake into standard documentation
[ https://issues.apache.org/jira/browse/MESOS-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296255#comment-15296255 ] Frank Scholten commented on MESOS-3139: --- Trying to post a review but it fails {code} frank@franktop:~/src/mesos$ ./support/post-reviews.py --server=https://reviews.apache.org --tracking-branch=origin/master --target-groups=mesos --open Running 'rbt post' across all of ... 0949e6be6a4260933172ea93acc4bc0592c1e2f1 - (HEAD -> MESOS-3139) Added first draft CMake build docs. (4 minutes ago) Creating diff of: 0949e6be6a4260933172ea93acc4bc0592c1e2f1 - (HEAD -> MESOS-3139) Added first draft CMake build docs. Press enter to continue or 'Ctrl-C' to skip. Review request #47723 posted. https://reviews.apache.org/r/47723/ https://reviews.apache.org/r/47723/diff/ [10746:10777:0523/133438:ERROR:nss_util.cc(839)] After loading Root Certs, loaded==false: NSS error code: -8018 Created new window in existing browser session. Failed to execute: 'git commit --amend -m Added first draft CMake build docs. Review: [10746:10777:0523/133438:ERROR:nss_util.cc(839)] After loading Root Certs, loaded==false: NSS error code: -8018 ': Usage: ./mesos-split.py ... Error: No line in the commit message summary may exceed 72 characters. {code} > Incorporate CMake into standard documentation > - > > Key: MESOS-3139 > URL: https://issues.apache.org/jira/browse/MESOS-3139 > Project: Mesos > Issue Type: Task > Components: cmake >Reporter: Alex Clemmer >Assignee: Alex Clemmer > Labels: build, cmake, mesosphere > > Right now it's anyone's guess how to build with CMake. If we want people to > use it, we should put up documentation. The central challenge is that the > CMake instructions will be slightly different for different platforms. > For example, on Linux, the gist of the build is basically the same as > autotools; you pull down the system dependencies (like APR, _etc_.), and then: > ``` > ./bootstrap > mkdir build-cmake && cd build-cmake > cmake .. > make > ``` > But, on Windows, it will be somewhat more complicated. There is no bootstrap > step, for example, because Windows doesn't have bash natively. And even when > we put that in, you'll still have to build the glog stuff out-of-band because > CMake has no way of booting up Visual Studio and calling "build." > So practically, we need to figure out: > * What our build story is for different platforms > * Write specific instructions for our "core" target platforms. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5441) Tests fail to use mounted cgroups on Ubuntu 16.04
Jan Schlicht created MESOS-5441: --- Summary: Tests fail to use mounted cgroups on Ubuntu 16.04 Key: MESOS-5441 URL: https://issues.apache.org/jira/browse/MESOS-5441 Project: Mesos Issue Type: Bug Components: cgroups, tests Environment: Ubuntu 16.04 Reporter: Jan Schlicht Test fixtures inheriting from {{mesos::internal::tests::ContainerizerTest}} fail if {{sudo ./bin/mesos-tests.sh}} is run. Here's an example from our internal CI: {noformat} [23:49:18] : [Step 10/10] [ RUN ] SlaveRecoveryTest/0.RecoverSlaveState [23:49:18] : [Step 10/10] ../../src/tests/mesos.cpp:864: Failure [23:49:18] : [Step 10/10] cgroups::mount(hierarchy, subsystem): 'cpu' is already attached to another hierarchy [23:49:18] : [Step 10/10] - [23:49:18] : [Step 10/10] We cannot run any cgroups tests that require [23:49:18] : [Step 10/10] a hierarchy with subsystem 'cpu' [23:49:18] : [Step 10/10] because we failed to find an existing hierarchy [23:49:18] : [Step 10/10] or create a new one (tried '/run/lxcfs/controllers/cpu'). [23:49:18] : [Step 10/10] You can either remove all existing [23:49:18] : [Step 10/10] hierarchies, or disable this test case [23:49:18] : [Step 10/10] (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). [23:49:18] : [Step 10/10] - [23:49:18] : [Step 10/10] ../../src/tests/mesos.cpp:918: Failure [23:49:18] : [Step 10/10] cgroups: '/run/lxcfs/controllers/cpu' is not a valid hierarchy [23:49:18] : [Step 10/10] [ FAILED ] SlaveRecoveryTest/0.RecoverSlaveState, where TypeParam = mesos::internal::slave::MesosContainerizer (11 ms) {noformat} It seems that {{lxcfs}} of Ubuntu 16.04 might be causing this, {{/proc/mounts}} looks like this: {noformat} sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 udev /dev devtmpfs rw,nosuid,relatime,size=3809788k,nr_inodes=952447,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=765776k,nr_inodes=957217,mode=755 0 0 /dev/xvda1 / ext4 rw,relatime,discard,data=ordered 0 0 securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev,size=3828868k,nr_inodes=957217 0 0 tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,nr_inodes=957217 0 0 tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,size=3828868k,nr_inodes=957217,mode=755 0 0 cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd,nsroot=/ 0 0 pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0 cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio,nsroot=/ 0 0 cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct,nsroot=/ 0 0 cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio,nsroot=/ 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer,nsroot=/ 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset,nsroot=/ 0 0 cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory,nsroot=/ 0 0 cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb,nsroot=/ 0 0 cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event,nsroot=/ 0 0 cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids,nsroot=/ 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices,nsroot=/ 0 0 systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=25,pgrp=1,timeout=0,minproto=5,maxproto=5,direct 0 0 mqueue /dev/mqueue mqueue rw,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 /dev/xvdb /mnt ext3 rw,relatime,data=ordered 0 0 tmpfs /run/user/1000 tmpfs rw,nosuid,nodev,relatime,size=765780k,mode=700,uid=1000,gid=1000 0 0 tmpfs /run/lxcfs/controllers tmpfs rw,relatime,size=100k,mode=700 0 0 devices /run/lxcfs/controllers/devices cgroup rw,relatime,devices,nsroot=/ 0 0 pids /run/lxcfs/controllers/pids cgroup rw,relatime,pids,nsroot=/ 0 0 perf_event /run/lxcfs/controllers/perf_event cgroup rw,relatime,perf_event,nsroot=/ 0 0 hugetlb /run/lxcfs/controllers/hugetlb cgroup rw,relatime,hugetlb,nsroot=/ 0 0 memory /run/lxcfs/controllers/memory cgroup rw,relatime,memory,nsroot=/ 0 0 cpuset /run/lxcfs/controllers/cpuset cgroup rw,relatime,cpuset,nsroot=/ 0 0 freezer /run/lxcfs/controllers/freezer cgroup rw,relatime,freezer,nsroot=/ 0 0 net_cls,net_prio /run/lxcfs/controllers/net_cls
[jira] [Created] (MESOS-5442) Stuck when extracting two archive contains overlapped file structure
Timon Wong created MESOS-5442: - Summary: Stuck when extracting two archive contains overlapped file structure Key: MESOS-5442 URL: https://issues.apache.org/jira/browse/MESOS-5442 Project: Mesos Issue Type: Bug Components: fetcher Affects Versions: 0.28.1 Reporter: Timon Wong Priority: Minor Provided we have two zip files: {code} aaa.zip: - conf/aaa.conf # Overlapped file structure - aaa aaa-patch.zip: - conf/aaa.conf # Overlapped file structure {code} Then we create a marathon task for it: {code:javascript} { // ... "uris": [ "http://X/aaa.zip";, "http://X/aaa-patch.zip"; ] } {code} Then after the `aaa.zip` was extracted, it get stuck when trying to extracting `aaa-patch.zip`, the log will finally look like: {code} I0522 01:23:05.618922 25041 fetcher.cpp:134] Downloading resource from 'http://X/-patch.zip' to '/var/lib//-patch.zip' I0522 01:23:05.624514 25041 fetcher.cpp:84] Extracting with command: unzip -d '/var/lib/' '/var/lib//aaa-patch.zip' replace /var/lib//conf/aaa.conf? [y]es, [n]o, [A]ll, [N]one, [r]ename: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5437) AppC appc_simple_discovery_uri_prefix is lost in configuration.md
[ https://issues.apache.org/jira/browse/MESOS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-5437: -- Shepherd: Jie Yu Story Points: 1 > AppC appc_simple_discovery_uri_prefix is lost in configuration.md > -- > > Key: MESOS-5437 > URL: https://issues.apache.org/jira/browse/MESOS-5437 > Project: Mesos > Issue Type: Bug > Components: documentation >Affects Versions: 0.29.0 >Reporter: Guangya Liu >Assignee: Guangya Liu > Fix For: 0.29.0 > > > AppC appc_simple_discovery_uri_prefix is lost in configuration.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5436) GPU resource broke framework data table
[ https://issues.apache.org/jira/browse/MESOS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-5436: Attachment: after_agent_page.png after_agent_framework_page.png > GPU resource broke framework data table > --- > > Key: MESOS-5436 > URL: https://issues.apache.org/jira/browse/MESOS-5436 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: gpu > Attachments: after_agent_framework_page.png, after_agent_page.png, > incorrect_agent_framework_page.png, incorrect_agent_page.png > > > In agent_framework.html and master/static/agent.html, we add {{GPUs (Used / > Allocated)}} in table header. But we didn't add the corresponding column to > the table body as well. > On the other hand, we didn't provide statistics for gpus on monitor endpoints. > To provide those data in webui, it requires we implement gpus statistics in > monitor endpoints firstly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5413) `network/cni` isolator should skip the bind mounting of the CNI network information root directory if possible
[ https://issues.apache.org/jira/browse/MESOS-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-5413: -- Sprint: Mesosphere Sprint 35 > `network/cni` isolator should skip the bind mounting of the CNI network > information root directory if possible > -- > > Key: MESOS-5413 > URL: https://issues.apache.org/jira/browse/MESOS-5413 > Project: Mesos > Issue Type: Bug >Reporter: Qian Zhang >Assignee: Qian Zhang > Fix For: 0.29.0 > > > Currently in the create() method `network/cni` isolator, for the CNI network > information root directory (i.e., {{/var/run/mesos/isolators/network/cni}}), > we do a self bind mount and make sure it is a shared mount of its own peer > group. However, we should not do a self bind mount if the mount containing > the CNI network information root directory is already a shared mount in its > own share peer group, just like what we did for `filesystem/linux` isolator > in [MESOS-5239 | https://issues.apache.org/jira/browse/MESOS-5239]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5413) `network/cni` isolator should skip the bind mounting of the CNI network information root directory if possible
[ https://issues.apache.org/jira/browse/MESOS-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-5413: -- Story Points: 3 > `network/cni` isolator should skip the bind mounting of the CNI network > information root directory if possible > -- > > Key: MESOS-5413 > URL: https://issues.apache.org/jira/browse/MESOS-5413 > Project: Mesos > Issue Type: Bug >Reporter: Qian Zhang >Assignee: Qian Zhang > Fix For: 0.29.0 > > > Currently in the create() method `network/cni` isolator, for the CNI network > information root directory (i.e., {{/var/run/mesos/isolators/network/cni}}), > we do a self bind mount and make sure it is a shared mount of its own peer > group. However, we should not do a self bind mount if the mount containing > the CNI network information root directory is already a shared mount in its > own share peer group, just like what we did for `filesystem/linux` isolator > in [MESOS-5239 | https://issues.apache.org/jira/browse/MESOS-5239]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5436) GPU resource broke framework data table
[ https://issues.apache.org/jira/browse/MESOS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-5436: Attachment: (was: after_agent_page.png) > GPU resource broke framework data table > --- > > Key: MESOS-5436 > URL: https://issues.apache.org/jira/browse/MESOS-5436 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: gpu > Attachments: incorrect_agent_page.png > > > In agent_framework.html and master/static/agent.html, we add {{GPUs (Used / > Allocated)}} in table header. But we didn't add the corresponding column to > the table body as well. > On the other hand, we didn't provide statistics for gpus on monitor endpoints. > To provide those data in webui, it requires we implement gpus statistics in > monitor endpoints firstly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5436) GPU resource broke framework data table
[ https://issues.apache.org/jira/browse/MESOS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-5436: Attachment: (was: incorrect_agent_framework_page.png) > GPU resource broke framework data table > --- > > Key: MESOS-5436 > URL: https://issues.apache.org/jira/browse/MESOS-5436 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: gpu > Attachments: incorrect_agent_page.png > > > In agent_framework.html and master/static/agent.html, we add {{GPUs (Used / > Allocated)}} in table header. But we didn't add the corresponding column to > the table body as well. > On the other hand, we didn't provide statistics for gpus on monitor endpoints. > To provide those data in webui, it requires we implement gpus statistics in > monitor endpoints firstly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5436) GPU resource broke framework data table
[ https://issues.apache.org/jira/browse/MESOS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-5436: Attachment: (was: after_agent_framework_page.png) > GPU resource broke framework data table > --- > > Key: MESOS-5436 > URL: https://issues.apache.org/jira/browse/MESOS-5436 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: gpu > Attachments: incorrect_agent_page.png > > > In agent_framework.html and master/static/agent.html, we add {{GPUs (Used / > Allocated)}} in table header. But we didn't add the corresponding column to > the table body as well. > On the other hand, we didn't provide statistics for gpus on monitor endpoints. > To provide those data in webui, it requires we implement gpus statistics in > monitor endpoints firstly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5436) GPU resource broke framework data table
[ https://issues.apache.org/jira/browse/MESOS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-5436: Attachment: incorrect_agent_page.png incorrect_agent_framework_page.png after_agent_page.png after_agent_framework_page.png > GPU resource broke framework data table > --- > > Key: MESOS-5436 > URL: https://issues.apache.org/jira/browse/MESOS-5436 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: gpu > Attachments: after_agent_framework_page.png, after_agent_page.png, > incorrect_agent_framework_page.png, incorrect_agent_page.png > > > In agent_framework.html and master/static/agent.html, we add {{GPUs (Used / > Allocated)}} in table header. But we didn't add the corresponding column to > the table body as well. > On the other hand, we didn't provide statistics for gpus on monitor endpoints. > To provide those data in webui, it requires we implement gpus statistics in > monitor endpoints firstly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5436) GPU resource broke framework data table
[ https://issues.apache.org/jira/browse/MESOS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-5436: Attachment: (was: incorrect_agent_page.png) > GPU resource broke framework data table > --- > > Key: MESOS-5436 > URL: https://issues.apache.org/jira/browse/MESOS-5436 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: gpu > Attachments: after_agent_framework_page.png, after_agent_page.png, > incorrect_agent_framework_page.png, incorrect_agent_page.png > > > In agent_framework.html and master/static/agent.html, we add {{GPUs (Used / > Allocated)}} in table header. But we didn't add the corresponding column to > the table body as well. > On the other hand, we didn't provide statistics for gpus on monitor endpoints. > To provide those data in webui, it requires we implement gpus statistics in > monitor endpoints firstly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5442) Stuck when extracting two archive contains overlapped file structure
[ https://issues.apache.org/jira/browse/MESOS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296651#comment-15296651 ] Jie Yu commented on MESOS-5442: --- I just backported the patch to 0.28.x branch. > Stuck when extracting two archive contains overlapped file structure > > > Key: MESOS-5442 > URL: https://issues.apache.org/jira/browse/MESOS-5442 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 0.28.1 >Reporter: Timon Wong >Priority: Minor > > Provided we have two zip files: > {code} > aaa.zip: > - conf/aaa.conf # Overlapped file structure > - aaa > aaa-patch.zip: > - conf/aaa.conf # Overlapped file structure > {code} > Then we create a marathon task for it: > {code:javascript} > { > // ... > "uris": [ > "http://X/aaa.zip";, > "http://X/aaa-patch.zip"; > ] > } > {code} > Then after the `aaa.zip` was extracted, it get stuck when trying to > extracting `aaa-patch.zip`, the log will finally look like: > {code} > I0522 01:23:05.618922 25041 fetcher.cpp:134] Downloading resource from > 'http://X/-patch.zip' to '/var/lib//-patch.zip' > I0522 01:23:05.624514 25041 fetcher.cpp:84] Extracting with command: unzip -d > '/var/lib/' '/var/lib//aaa-patch.zip' > replace /var/lib//conf/aaa.conf? [y]es, [n]o, [A]ll, [N]one, > [r]ename: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4885) Unzip should force overwrite
[ https://issues.apache.org/jira/browse/MESOS-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4885: -- Fix Version/s: 0.28.2 > Unzip should force overwrite > > > Key: MESOS-4885 > URL: https://issues.apache.org/jira/browse/MESOS-4885 > Project: Mesos > Issue Type: Bug > Components: fetcher >Reporter: Tomasz Janiszewski >Assignee: Tomasz Janiszewski >Priority: Trivial > Fix For: 0.29.0, 0.28.2 > > > Consider situation when zip file is malformed and contains duplicated files . > When fetcher downloads malformed zip file, that contains duplicated files > (e.g., dist zips generated by gradle could have duplicated files in libs dir) > and try to uncompress it, deployment hang in staged phase because unzip > prompt if file should be replaced. unzip should overrite this file or break > with error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5431) Update the website generation and development workflows with docker.
[ https://issues.apache.org/jira/browse/MESOS-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-5431: Attachment: website.gif > Update the website generation and development workflows with docker. > > > Key: MESOS-5431 > URL: https://issues.apache.org/jira/browse/MESOS-5431 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Attachments: website.gif > > > As the discussion thread in [Readme update | > http://search-hadoop.com/m/0Vlr6JIzkd2QAk85&subj=Re+WEBSITE+Readme+update] > From [~vinodkone] and [~klueska]'s comments, > {quote} > On Fri, May 20, 2016 at 9:00 AM, haosdent <[EMAIL PROTECTED]> wrote: > yes. maybe update the rake target ":default" target to also do doxygen and > javadoc tasks? > {quote} > {quote} > While we are fixing the dockerfile for the website, can I also request that > we update the docker file to not muck up the mesos source directory that > gets mounted in? Right now, the end result of 'docker run' is a "publish" > directory and a "documentation" directory inside "mesos" directory, which > means I need to clean those up manually later. > For "publish", I would like us to mount a source directory (mesos) and a > publish directory, like so: > sudo docker run -it --rm -p 4567:4567 -v :/mesos -v > :/publish mesos/website > Then as a committer, I would set to the publish folder of > my svn clone of the site (e.g., ~/workspace/site/publish). > For "documentation", I would like the Dockerfile to delete it during exit. > {quote} > We need to implement above things to make it more convenience to > develop and generate the website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5427) Mesos master locks up after slave fails to authenticate
[ https://issues.apache.org/jira/browse/MESOS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296744#comment-15296744 ] analogue commented on MESOS-5427: - Yes, this is running on Ubuntu Lucid 10.04 LTS :( Planning to upgrade, but just wanted to leave some breadcrumbs regarding this failure if others happen to run into something similar. > Mesos master locks up after slave fails to authenticate > --- > > Key: MESOS-5427 > URL: https://issues.apache.org/jira/browse/MESOS-5427 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 0.20.1 > Environment: Linux XX-X 3.13.0-49-generic #81-Ubuntu SMP > Tue Mar 24 19:29:48 UTC 2015 x86_64 GNU/Linux > Ubuntu 10.04.1 LTS > AWS/8cores/16GB >Reporter: analogue >Priority: Minor > > In a mesos master cluster with one leader and two backups, a single slave > attempting to authenticate with the leader locked up the master and resulted > in 2 CPU cores pegged at 100% CPU usage until restarted. > master > {noformat} > I0516 02:55:39.945566 32126 master.cpp:3612] Authenticating > slave(1)@10.85.20.76:5051 > I0516 02:55:39.945757 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.945802 32123 authenticator.hpp:156] Creating new server SASL > connection > I0516 02:55:39.945991 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946030 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946063 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946095 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946126 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946158 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946189 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946221 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946252 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946285 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946316 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946347 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > I0516 02:55:39.946379 32126 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > ... > W0516 02:55:44.945811 32124 master.cpp:3670] Authentication timed out > I0516 02:55:49.290623 32121 master.cpp:3598] Queuing up authentication > request from slave(1)@10.85.20.76:5051 because authentication is still in > progress > (last long line repeats until mesos-master restarted) > {noformat} > slave > {noformat} > Log file created at: 2016/05/16 02:37:52 > Running on machine: 10-85-20-76-uswest2btestopia > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg > I0516 02:37:52.112509 10198 logging.cpp:142] INFO level logging started! > I0516 02:37:52.112761 10198 main.cpp:126] Build: 2014-12-12 00:52:32 by > I0516 02:37:52.112772 10198 main.cpp:128] Version: 0.20.1 > I0516 02:37:52.112778 10198 main.cpp:131] Git tag: 0.20.1 > I0516 02:37:52.112783 10198 main.cpp:135] Git SHA: > fe0a39112f3304283f970f1b08b322b1e970829d > I0516 02:37:52.112793 10198 containerizer.cpp:89] Using isolation: > cgroups/cpu,cgroups/mem > I0516 02:37:52.125773 10198 linux_launcher.cpp:78] Using > /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > I0516 02:37:52.126652 10198 main.cpp:149] Starting Mesos slave > I0516 02:37:52.128687 10246 slave.cpp:167] Slave started on > 1)@10.85.20.76:5051 > I0516 02:37:52.128708 10246 credentials.hpp:84] Loading credential
[jira] [Commented] (MESOS-5439) registerExecutor problem
[ https://issues.apache.org/jira/browse/MESOS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296759#comment-15296759 ] Joseph Wu commented on MESOS-5439: -- A couple questions: * How many tasks are you launching at once? (i.e. from a single offer) And how many over a given time? * Are you using the default command executor? Or are you launching a custom executor? * What flags are you using to launch the agent? * What do the executor's stdout/stderr files (in the sandbox) say? There should be glog logs in there too. > registerExecutor problem > > > Key: MESOS-5439 > URL: https://issues.apache.org/jira/browse/MESOS-5439 > Project: Mesos > Issue Type: Bug > Components: c++ api, slave >Affects Versions: 0.27.0 >Reporter: kimjoohwan > > Currently, we are using Mesos 0.27.0. The master is build up with a Intel(R) > Core(TM) i5-3470 CPU @ 3.20GHz CPU and a 4GB RAM. The slave (Banana PI) is > build up with a Cortex -A7 Dual-Core CPU and a 1GB RAM. > By using the Mesos API, we have developed and completed the execution of the > framework which is based on python. > but, we found that it takes too much time between the messages, 'Forked child > with pid' and 'Got registration for executor' from the slave log. (5sec) > If you know how to deal with this problem, please let us know. > I0523 17:38:16.264289 1787 slave.cpp:5208] Launching executor default of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 with resources in work > directory > '/tmp/mesos/slaves/3fb86eea-96c4-4b07-aaa2-caf071275bdf-S2/frameworks/3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010/executors/default/runs/1c830c9a-4120-4ef0-af80-49a52d307539' > I0523 17:38:16.290601 1789 containerizer.cpp:616] Starting container > '1c830c9a-4120-4ef0-af80-49a52d307539' for executor 'default' of framework > '3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010' > I0523 17:38:16.293285 1787 slave.cpp:1626] Queuing task '0' for executor > 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 > I0523 17:38:16.297369 1787 slave.cpp:4233] Current disk usage 2.14%. Max > allowed age: 6.150293798159722days > I0523 17:38:16.504043 1789 launcher.cpp:132] Forked child with pid '1837' > for container '1c830c9a-4120-4ef0-af80-49a52d307539' > I0523 17:38:21.510535 1785 slave.cpp:2573] Got registration for executor > 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 > I0523 17:38:21.554608 1785 slave.cpp:1791] Sending queued task '0' to > executor 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 at > executor(1)@192.168.0.8:56508 > I0523 17:38:21.594511 1789 slave.cpp:2932] Handling status update > TASK_RUNNING (UUID: cd04ec2a-0e68-460a-ad2e-e4f504f3b032) for task 0 of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 > I0523 17:38:21.600050 1789 slave.cpp:2932] Handling status update > TASK_FINISHED (UUID: 46e110c8-4078-4f98-ae30-30b3a1376034) for task 0 of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296781#comment-15296781 ] haosdent commented on MESOS-5430: - [~jmanalus] Thank you very much for your wonderful design! For the current [website | http://mesos.apache.org/], we have a news part. This have been removed in the new design, right? > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5436) GPU resource broke framework data table
[ https://issues.apache.org/jira/browse/MESOS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-5436: --- Shepherd: Benjamin Mahler Sprint: Mesosphere Sprint 35 Story Points: 1 > GPU resource broke framework data table > --- > > Key: MESOS-5436 > URL: https://issues.apache.org/jira/browse/MESOS-5436 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: gpu > Attachments: after_agent_framework_page.png, after_agent_page.png, > incorrect_agent_framework_page.png, incorrect_agent_page.png > > > In agent_framework.html and master/static/agent.html, we add {{GPUs (Used / > Allocated)}} in table header. But we didn't add the corresponding column to > the table body as well. > On the other hand, we didn't provide statistics for gpus on monitor endpoints. > To provide those data in webui, it requires we implement gpus statistics in > monitor endpoints firstly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296812#comment-15296812 ] Vinod Kone commented on MESOS-5430: --- +1 love the design. [~haosd...@gmail.com] I think the idea is to move news to the "Blog" section. > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296905#comment-15296905 ] Jonathan Manalus commented on MESOS-5430: - [~haosd...@gmail.com] - [~vinod] is exactly right on moving everything to the blog section that now lives in the navbar. Currently all the news posts are mostly changleogs for the last year. You will be able to see the most recent changlog post listed under the download section. But to answer your question - Yes it has been removed from the homepage. > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5428) Update the mechanism to define flags in FlagsBase derived clases
[ https://issues.apache.org/jira/browse/MESOS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Pravat updated MESOS-5428: - Description: If a program exeposes flags, the recommendation from Mesos was to use a derived class from FlagsBase, add the new flags in constructor. As benefit the new `Flags` class `inherits` all the flags from the derived classes. Each derived calss calls the method `add` implemented in `FlagsBase` which uses `dynamic_cast` to set the default value and other things. To use the use `FlagsBase` in Visual Studio we should disable construction displacements using `/vd2` compile option. More info: https://msdn.microsoft.com/en-us/library/7sf3txa8.aspx was: If a program exeposes flags, the recommendation from Mesos was to use a derived class from FlagsBase, add the new flags in constructor. As benefit the new `Flags` class `inherits` all the flags from the derived classes. Each derived calss calls the method `add` implemented in `FlagsBase` which uses `dynamic_cast` to set the default value and other things. Since the constructor is not completed class is not completed (in Visual Studio the vtable is not correct at that time) the code does not work on Windows. We should have to call a separate method in Windows. > Update the mechanism to define flags in FlagsBase derived clases > > > Key: MESOS-5428 > URL: https://issues.apache.org/jira/browse/MESOS-5428 > Project: Mesos > Issue Type: Bug >Reporter: Daniel Pravat > > If a program exeposes flags, the recommendation from Mesos was to use a > derived class from FlagsBase, add the new flags in constructor. > As benefit the new `Flags` class `inherits` all the flags from the derived > classes. > Each derived calss calls the method `add` implemented in `FlagsBase` which > uses `dynamic_cast` to set the default value and other things. > To use the use `FlagsBase` in Visual Studio we should disable construction > displacements using `/vd2` compile option. > More info: https://msdn.microsoft.com/en-us/library/7sf3txa8.aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5428) Update the mechanism to define flags in FlagsBase derived clases
[ https://issues.apache.org/jira/browse/MESOS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Pravat updated MESOS-5428: - Description: If a program exeposes flags, the recommendation from Mesos was to use a derived class from FlagsBase, add the new flags in constructor. As benefit the new `Flags` class `inherits` all the flags from the derived classes. Each derived calss calls the method `add` implemented in `FlagsBase` which uses `dynamic_cast` to set the default value and other things. To use `FlagsBase` derived classes in Visual Studio we should disable construction displacements using `/vd2` compile option. More info: https://msdn.microsoft.com/en-us/library/7sf3txa8.aspx was: If a program exeposes flags, the recommendation from Mesos was to use a derived class from FlagsBase, add the new flags in constructor. As benefit the new `Flags` class `inherits` all the flags from the derived classes. Each derived calss calls the method `add` implemented in `FlagsBase` which uses `dynamic_cast` to set the default value and other things. To use the use `FlagsBase` in Visual Studio we should disable construction displacements using `/vd2` compile option. More info: https://msdn.microsoft.com/en-us/library/7sf3txa8.aspx > Update the mechanism to define flags in FlagsBase derived clases > > > Key: MESOS-5428 > URL: https://issues.apache.org/jira/browse/MESOS-5428 > Project: Mesos > Issue Type: Bug >Reporter: Daniel Pravat > > If a program exeposes flags, the recommendation from Mesos was to use a > derived class from FlagsBase, add the new flags in constructor. > As benefit the new `Flags` class `inherits` all the flags from the derived > classes. > Each derived calss calls the method `add` implemented in `FlagsBase` which > uses `dynamic_cast` to set the default value and other things. > To use `FlagsBase` derived classes in Visual Studio we should disable > construction displacements using `/vd2` compile option. > More info: https://msdn.microsoft.com/en-us/library/7sf3txa8.aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5420) Implement os::exists for processes
[ https://issues.apache.org/jira/browse/MESOS-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Pravat updated MESOS-5420: - Description: os::exists returns true if the process identified by the parameter is still running or was running and we are able to get information about it, such us the exit code. In Windows after obtaining a handle to the process it is possible perform those operations. (was: os::exists returns true if the process identified by the parameter is still running or was running. In Windows, subprocess class keeps an open handle to the process, allowing ReaperProcess::reap to get the exit code even if the process is terminated.) > Implement os::exists for processes > -- > > Key: MESOS-5420 > URL: https://issues.apache.org/jira/browse/MESOS-5420 > Project: Mesos > Issue Type: Improvement > Environment: Windows >Reporter: Daniel Pravat >Assignee: Daniel Pravat > > os::exists returns true if the process identified by the parameter is still > running or was running and we are able to get information about it, such us > the exit code. In Windows after obtaining a handle to the process it is > possible perform those operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5362) Add authentication to example frameworks
[ https://issues.apache.org/jira/browse/MESOS-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5362: - Sprint: (was: Mesosphere Sprint 35) > Add authentication to example frameworks > > > Key: MESOS-5362 > URL: https://issues.apache.org/jira/browse/MESOS-5362 > Project: Mesos > Issue Type: Improvement > Components: security >Reporter: Greg Mann >Assignee: Greg Mann > Labels: authentication, mesosphere, security > > Some example frameworks do not have the ability to authenticate with the > master. Adding authentication to the example frameworks that don't already > have it implemented would allow us to use these frameworks for testing in > authenticated/authorized scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2013) Slave read endpoint doesn't encode non-ascii characters correctly
[ https://issues.apache.org/jira/browse/MESOS-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297085#comment-15297085 ] Whitney Sorenson commented on MESOS-2013: - We have just deployed 0.28.1 and I can report, that yes, this does help our case. Thank you. We have to do some processing ourselves to handle utf-8 characters which have been chopped, but it is something we can work around. As a note, earlier, when I asked if we should find a competent C++ developer I meant to imply someone besides myself - not that anyone else was incompetent ;) > Slave read endpoint doesn't encode non-ascii characters correctly > - > > Key: MESOS-2013 > URL: https://issues.apache.org/jira/browse/MESOS-2013 > Project: Mesos > Issue Type: Bug > Components: json api >Reporter: Whitney Sorenson >Assignee: Anand Mazumdar > > Create a file in a sandbox with a non-ascii character, like this one: > http://www.fileformat.info/info/unicode/char/2018/index.htm > Hit the read endpoint for that file. > The response will have something like: > data: "\u00E2\u0080\u0098" > It should actually be: > data: "\u2018" > If you put either into JSON.parse() in the browser you will see the first > does not render correctly but the second does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5443) Remove "const" for some primitive types in function parameters
Guangya Liu created MESOS-5443: -- Summary: Remove "const" for some primitive types in function parameters Key: MESOS-5443 URL: https://issues.apache.org/jira/browse/MESOS-5443 Project: Mesos Issue Type: Bug Reporter: Guangya Liu Priority: Minor It is not suggested to use `const` for a primitive type when using it in as a function parameter. There are indeed some cases using this such as `bool` here, we should scan and remove those invalid use. https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/isolators/cgroups/mem.cpp#L75 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4248) mesos slave can't start in CentOS-7 docker container
[ https://issues.apache.org/jira/browse/MESOS-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297391#comment-15297391 ] Shane da Silva commented on MESOS-4248: --- FWIW, we can't reproduce this issue on Mesos 0.26.0, but we do hit it on 0.27.2 and 0.28.1. It would be great for someone to review Yubao's patch and consider merging it, as it's convenient to be able to run Mesos in a container for integration testing. For example, in the Chef ecosystem using test-kitchen with kitchen-docker to quickly spin up pseudo-"VMs" is common practice. [~liuyb]: did you by chance ever find a workaround for this issue? > mesos slave can't start in CentOS-7 docker container > > > Key: MESOS-4248 > URL: https://issues.apache.org/jira/browse/MESOS-4248 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 0.26.0 > Environment: My host OS is Debian Jessie, the container OS is CentOS > 7.2. > {code} > # cat /etc/system-release > CentOS Linux release 7.2.1511 (Core) > # rpm -qa |grep mesos > mesosphere-zookeeper-3.4.6-0.1.20141204175332.centos7.x86_64 > mesosphere-el-repo-7-1.noarch > mesos-0.26.0-0.2.145.centos701406.x86_64 > $ docker version > Client: > Version: 1.9.1 > API version: 1.21 > Go version: go1.4.2 > Git commit: a34a1d5 > Built:Fri Nov 20 12:59:02 UTC 2015 > OS/Arch: linux/amd64 > Server: > Version: 1.9.1 > API version: 1.21 > Go version: go1.4.2 > Git commit: a34a1d5 > Built:Fri Nov 20 12:59:02 UTC 2015 > OS/Arch: linux/amd64 > {code} >Reporter: Yubao Liu > > // Check the "Environment" label above for kinds of software versions. > "systemctl start mesos-slave" can't start mesos-slave: > {code} > # journalctl -u mesos-slave > > Dec 24 10:35:25 mesos-slave1 systemd[1]: Started Mesos Slave. > Dec 24 10:35:25 mesos-slave1 systemd[1]: Starting Mesos Slave... > Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210180 12838 > logging.cpp:172] INFO level logging started! > Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210603 12838 > main.cpp:190] Build: 2015-12-16 23:06:16 by root > Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210625 12838 > main.cpp:192] Version: 0.26.0 > Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210634 12838 > main.cpp:195] Git tag: 0.26.0 > Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210644 12838 > main.cpp:199] Git SHA: d3717e5c4d1bf4fca5c41cd7ea54fae489028faa > Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210765 12838 > containerizer.cpp:142] Using isolation: posix/cpu,posix/mem,filesystem/posix > Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.215638 12838 > linux_launcher.cpp:103] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.220279 12838 > systemd.cpp:128] systemd version `219` detected > Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.227017 12838 > systemd.cpp:210] Started systemd slice `mesos_executors.slice` > Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: Failed to create a > containerizer: Could not create MesosContainerizer: Failed to create > launcher: Failed to locate systemd cgroups hierarchy: does not exist > Dec 24 10:35:25 mesos-slave1 systemd[1]: mesos-slave.service: main process > exited, code=exited, status=1/FAILURE > Dec 24 10:35:25 mesos-slave1 systemd[1]: Unit mesos-slave.service entered > failed state. > Dec 24 10:35:25 mesos-slave1 systemd[1]: mesos-slave.service failed. > {code} > I used strace to debug it, mesos-slave tried to access > "/sys/fs/cgroup/systemd/mesos_executors.slice", but it's actually at > "/sys/fs/cgroup/systemd/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope/mesos_executors.slice/", >mesos-slave should check "/proc/self/cgroup" to find those intermediate > directories: > {code} > # cat /proc/self/cgroup > 8:perf_event:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope > 7:blkio:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope > 6:net_cls,net_prio:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope > 5:freezer:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope > 4:devices:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope > 3:cpu,cpuacct:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope > 2:cpuset:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope >
[jira] [Commented] (MESOS-2346) Docker tasks exiting normally, but returning TASK_FAILED
[ https://issues.apache.org/jira/browse/MESOS-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297474#comment-15297474 ] Eran Withana commented on MESOS-2346: - I also experienced the same issue over the weekend. ``` I0522 18:02:54.606420 2848 slave.cpp:3002] Handling status update TASK_FINISHED (UUID: fe58f9f9-830f-42b9-b0d1-4a1c14fb5997) for task ct:146394000:0:Victimized Job: of framework 20150526-223237-3758129930-5050-6543-0001 from executor(1)@x.x.x.x:52669 I0522 18:02:54.606534 2848 slave.cpp:3528] executor(1)@x.x.x.x:52669 exited I0522 18:02:54.606561 2848 slave.cpp:3886] Executor 'ct:146394000:0:Victimized Job:' of framework 20150526-223237-3758129930-5050-6543-0001 exited with status 0 I0522 18:02:54.606608 2848 slave.cpp:3002] Handling status update TASK_FAILED (UUID: 4f5f97d7-0134-426a-a77a-ba1042dfa0cc) for task ct:146394000:0:Victimized Job: of framework 20150526-223237-3758129930-5050-6543-0001 from @0.0.0.0:0 ``` OS: Ubuntu 14.04 Mesos: 0.28.0-2.0.16.ubuntu1404 Docker: 1.8.3-0~trusty We didn't have this issue before but started happening all of a sudden over the weekend. The issue seems to be gone for now but want to know what caused this issue. > Docker tasks exiting normally, but returning TASK_FAILED > > > Key: MESOS-2346 > URL: https://issues.apache.org/jira/browse/MESOS-2346 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.22.0 >Reporter: Brenden Matthews >Priority: Critical > > Docker tasks which exit normally will return TASK_FAILED, as opposed to > TASK_FINISHED. This problem seems to occur only after `mesos-slave` has been > running for some time. If the slave is restarted, it will begin returning > TASK_FINISHED correctly. > Sample slave log: > {noformat} > Feb 11 23:22:13 ip-10-102-188-213.ec2.internal mesos-slave[793]: I0211 > 23:22:13.483464 798 slave.cpp:1138] Got assigned task > ct:1423696932164:2:canary: for framework > 20150211-045421-1401302794-5050-714-0001 > Feb 11 23:22:13 ip-10-102-188-213.ec2.internal mesos-slave[793]: I0211 > 23:22:13.483667 798 slave.cpp:3854] Checkpointing FrameworkInfo to > '/tmp/mesos/meta/slaves/20150211-045421-1401302794-5050-714-S0/frameworks/20150211-045421-1401302794-5050-714-0001/framework.info' > Feb 11 23:22:13 ip-10-102-188-213.ec2.internal mesos-slave[793]: I0211 > 23:22:13.483894 798 slave.cpp:3861] Checkpointing framework pid > 'scheduler-f4679749-d7ad-4d8c-b610-f7043332d243@10.102.188.213:56385' to > '/tmp/mesos/meta/slaves/20150211-045421-1401302794-5050-714-S0/frameworks/20150211-045421-1401302794-5050-714-0001/framework.pid' > Feb 11 23:22:13 ip-10-102-188-213.ec2.internal mesos-slave[793]: I0211 > 23:22:13.484426 798 gc.cpp:84] Unscheduling > '/tmp/mesos/slaves/20150211-045421-1401302794-5050-714-S0/frameworks/20150211-045421-1401302794-5050-714-0001' > from gc > Feb 11 23:22:13 ip-10-102-188-213.ec2.internal mesos-slave[793]: I0211 > 23:22:13.484648 797 gc.cpp:84] Unscheduling > '/tmp/mesos/meta/slaves/20150211-045421-1401302794-5050-714-S0/frameworks/20150211-045421-1401302794-5050-714-0001' > from gc > Feb 11 23:22:13 ip-10-102-188-213.ec2.internal mesos-slave[793]: I0211 > 23:22:13.484748 797 slave.cpp:1253] Launching task > ct:1423696932164:2:canary: for framework > 20150211-045421-1401302794-5050-714-0001 > Feb 11 23:22:13 ip-10-102-188-213.ec2.internal mesos-slave[793]: I0211 > 23:22:13.485697 797 slave.cpp:4297] Checkpointing ExecutorInfo to > '/tmp/mesos/meta/slaves/20150211-045421-1401302794-5050-714-S0/frameworks/20150211-045421-1401302794-5050-714-0001/executors/ct:1423696932164:2:canary:/executor.info' > Feb 11 23:22:13 ip-10-102-188-213.ec2.internal mesos-slave[793]: I0211 > 23:22:13.485999 797 slave.cpp:3929] Launching executor > ct:1423696932164:2:canary: of framework > 20150211-045421-1401302794-5050-714-0001 in work directory > '/tmp/mesos/slaves/20150211-045421-1401302794-5050-714-S0/frameworks/20150211-045421-1401302794-5050-714-0001/executors/ct:1423696932164:2:canary:/runs/5395b133-d10d-4204-999e-4a38c03c55f5' > Feb 11 23:22:13 ip-10-102-188-213.ec2.internal mesos-slave[793]: I0211 > 23:22:13.486212 797 slave.cpp:4320] Checkpointing TaskInfo to > '/tmp/mesos/meta/slaves/20150211-045421-1401302794-5050-714-S0/frameworks/20150211-045421-1401302794-5050-714-0001/executors/ct:1423696932164:2:canary:/runs/5395b133-d10d-4204-999e-4a38c03c55f5/tasks/ct:1423696932164:2:canary:/task.info' > Feb 11 23:22:13 ip-10-102-188-213.ec2.internal mesos-slave[793]: I0211 > 23:22:13.509457 797 slave.cpp:1376] Queuing task > 'ct:1423696932164:2:canary:' for executor ct:1423696932164:2:canary: of > framework '20150211-045421-1401302794-5050-714-0001 > Feb 11 23:22:13 ip-10-102-188-213.ec2.
[jira] [Created] (MESOS-5444) agent state endpoint misses framework principal field
haosdent created MESOS-5444: --- Summary: agent state endpoint misses framework principal field Key: MESOS-5444 URL: https://issues.apache.org/jira/browse/MESOS-5444 Project: Mesos Issue Type: Bug Reporter: haosdent Assignee: haosdent Found by [~deshna] in https://reviews.apache.org/r/47702/ When launch a Framework with principal, the state endpoint of Agent didn't show the principal of Framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5440) There is a misspelling in some markdown files
[ https://issues.apache.org/jira/browse/MESOS-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297559#comment-15297559 ] GyeongWon, Do commented on MESOS-5440: -- Thank you! > There is a misspelling in some markdown files > - > > Key: MESOS-5440 > URL: https://issues.apache.org/jira/browse/MESOS-5440 > Project: Mesos > Issue Type: Documentation >Reporter: GyeongWon, Do >Priority: Trivial > > "This endpoint requires authentication {color:red}iff{color} HTTP > authentication is enabled." > I think iff is misspelling about if, is it right? > There are many occurrences about that statement in many markdown files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5445) Allow libprocess/stout to build without first doing `make` in 3rdparty.
Kapil Arya created MESOS-5445: - Summary: Allow libprocess/stout to build without first doing `make` in 3rdparty. Key: MESOS-5445 URL: https://issues.apache.org/jira/browse/MESOS-5445 Project: Mesos Issue Type: Bug Components: build Reporter: Kapil Arya Assignee: Kapil Arya Fix For: 0.29.0 After the 3rdparty reorg, libprocess/stout are enable to build their dependencies and so one has to do `make` in 3rdpart/ before building libprocess/stout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5359) The scheduler library should have a delay before initiating a connection with master.
[ https://issues.apache.org/jira/browse/MESOS-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297581#comment-15297581 ] José Guilherme Vanz commented on MESOS-5359: Cool! Thanks [~anandmazumdar]] for the code pointer. Should this delay be configurable by some flag? > The scheduler library should have a delay before initiating a connection with > master. > - > > Key: MESOS-5359 > URL: https://issues.apache.org/jira/browse/MESOS-5359 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.29.0 >Reporter: Anand Mazumdar >Assignee: José Guilherme Vanz > Labels: mesosphere > > Currently, the scheduler library {{src/scheduler/scheduler.cpp}} does have an > artificially induced delay when trying to initially establish a connection > with the master. In the event of a master failover or ZK disconnect, a large > number of frameworks can get disconnected and then thereby overwhelm the > master with TCP SYN requests. > On a large cluster with many agents, the master is already overwhelmed with > handling connection requests from the agents. This compounds the issue > further on the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4565) slave recovers and attempt to destroy executor's child containers, then begins rejecting task status updates
[ https://issues.apache.org/jira/browse/MESOS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297666#comment-15297666 ] Chanh Le commented on MESOS-4565: - Any update on that? I still get the issues. > slave recovers and attempt to destroy executor's child containers, then > begins rejecting task status updates > > > Key: MESOS-4565 > URL: https://issues.apache.org/jira/browse/MESOS-4565 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 >Reporter: James DeFelice > Labels: mesosphere > > AFAICT the slave is doing this: > 1) recovering from some kind of failure > 2) checking the containers that it pulled from its state store > 3) complaining about cgroup children hanging off of executor containers > 4) rejecting task status updates related to the executor container, the first > of which in the logs is: > {code} > E0130 02:22:21.979852 12683 slave.cpp:2963] Failed to update resources for > container 1d965a20-849c-40d8-9446-27cb723220a9 of executor > 'd701ab48a0c0f13_k8sm-executor' running task > pod.f2dc2c43-c6f7-11e5-ad28-0ad18c5e6c7f on status update for terminal task, > destroying container: Container '1d965a20-849c-40d8-9446-27cb723220a9' not > found > {code} > To be fair, I don't believe that my custom executor is re-registering > properly with the slave prior to attempting to send these (failing) status > updates. But the slave doesn't complain about that .. it complains that it > can't find the **container**. > slave log here: > https://gist.github.com/jdef/265663461156b7a7ed4e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5410) Support cgroup namespace in unified container
[ https://issues.apache.org/jira/browse/MESOS-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297704#comment-15297704 ] Nirav commented on MESOS-5410: -- Currently because "cgroup" namespace is not supported, following two test-case are failing: 1. NsTest.ROOT_setns 2. NsTest.ROOT_getns The error observed is : "nstype: Unknown namespace 'cgroup'" This is because the contents of the directory "/proc/self/ns" has been changed in kernel version 4.6 (cgroup is added). > Support cgroup namespace in unified container > - > > Key: MESOS-5410 > URL: https://issues.apache.org/jira/browse/MESOS-5410 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: Qian Zhang > > In Linux 4.6 kernel, a new namespace (cgroup namespace) was introduced to > make a process can be created in its own cgroup namespace so that the global > cgroup hierarchy will not be leaked to the process. See the following link > for more details about this namespace: > http://man7.org/linux/man-pages/man7/cgroup_namespaces.7.html > We need to support this namespace in unified container to provide better > isolation for the containers created by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297735#comment-15297735 ] haosdent commented on MESOS-5430: - [~jmanalus][~vinodkone] Thanks a lot for your reply, I posted a quick demo in http://blog.haosdent.me/mesos-site-demo/source/ There are some minor mismatches between the demo page above and [~jmanalus]'s design. If [~jmanalus] you use sketch or photoshop, may you send the file to my email(haosd...@gmail.com) or upload it in jira. So that I could adjust my demo to match your design more exactly. > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5410) Support cgroup namespace in unified container
[ https://issues.apache.org/jira/browse/MESOS-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297740#comment-15297740 ] haosdent commented on MESOS-5410: - I think we could add {code} namespaces.erase("cgroup"); {code} as a workaround. Let me file a jira for this. > Support cgroup namespace in unified container > - > > Key: MESOS-5410 > URL: https://issues.apache.org/jira/browse/MESOS-5410 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: Qian Zhang > > In Linux 4.6 kernel, a new namespace (cgroup namespace) was introduced to > make a process can be created in its own cgroup namespace so that the global > cgroup hierarchy will not be leaked to the process. See the following link > for more details about this namespace: > http://man7.org/linux/man-pages/man7/cgroup_namespaces.7.html > We need to support this namespace in unified container to provide better > isolation for the containers created by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5446) NsTest.ROOT_setns and NsTest.ROOT_getns failed in Linux 4.6
haosdent created MESOS-5446: --- Summary: NsTest.ROOT_setns and NsTest.ROOT_getns failed in Linux 4.6 Key: MESOS-5446 URL: https://issues.apache.org/jira/browse/MESOS-5446 Project: Mesos Issue Type: Bug Reporter: haosdent Priority: Minor >From [~nthakkar%40us.ibm.com] {quote} Currently because "cgroup" namespace is not supported, following two test-case are failing: 1. NsTest.ROOT_setns 2. NsTest.ROOT_getns The error observed is : "nstype: Unknown namespace 'cgroup'" This is because the contents of the directory "/proc/self/ns" has been changed in kernel version 4.6 (cgroup is added). {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5446) NsTest.ROOT_setns and NsTest.ROOT_getns failed in Linux 4.6
[ https://issues.apache.org/jira/browse/MESOS-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-5446: --- Assignee: haosdent > NsTest.ROOT_setns and NsTest.ROOT_getns failed in Linux 4.6 > --- > > Key: MESOS-5446 > URL: https://issues.apache.org/jira/browse/MESOS-5446 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent >Priority: Minor > > From [~nthakkar%40us.ibm.com] > {quote} > Currently because "cgroup" namespace is not supported, following two > test-case are failing: > 1. NsTest.ROOT_setns > 2. NsTest.ROOT_getns > The error observed is : "nstype: Unknown namespace 'cgroup'" > This is because the contents of the directory "/proc/self/ns" has been > changed in kernel version 4.6 (cgroup is added). > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4565) slave recovers and attempt to destroy executor's child containers, then begins rejecting task status updates
[ https://issues.apache.org/jira/browse/MESOS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297752#comment-15297752 ] haosdent commented on MESOS-4565: - [~giaosuddau] Do you encounter the {code} E0130 02:22:21.009094 12686 containerizer.cpp:553] Failed to clean up an isolator when destroying orphan container kube-proxy: Failed to remove cgroup '/sys/fs/cgroup/memory/mesos/1d965a20-849c-40d8-9446-27cb723220a9/kube-proxy': Device or resource busy {code} A quick workaround it unmount it manually and make Agent recover successfully. > slave recovers and attempt to destroy executor's child containers, then > begins rejecting task status updates > > > Key: MESOS-4565 > URL: https://issues.apache.org/jira/browse/MESOS-4565 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 >Reporter: James DeFelice > Labels: mesosphere > > AFAICT the slave is doing this: > 1) recovering from some kind of failure > 2) checking the containers that it pulled from its state store > 3) complaining about cgroup children hanging off of executor containers > 4) rejecting task status updates related to the executor container, the first > of which in the logs is: > {code} > E0130 02:22:21.979852 12683 slave.cpp:2963] Failed to update resources for > container 1d965a20-849c-40d8-9446-27cb723220a9 of executor > 'd701ab48a0c0f13_k8sm-executor' running task > pod.f2dc2c43-c6f7-11e5-ad28-0ad18c5e6c7f on status update for terminal task, > destroying container: Container '1d965a20-849c-40d8-9446-27cb723220a9' not > found > {code} > To be fair, I don't believe that my custom executor is re-registering > properly with the slave prior to attempting to send these (failing) status > updates. But the slave doesn't complain about that .. it complains that it > can't find the **container**. > slave log here: > https://gist.github.com/jdef/265663461156b7a7ed4e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5410) Support cgroup namespace in unified container
[ https://issues.apache.org/jira/browse/MESOS-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297764#comment-15297764 ] Nirav commented on MESOS-5410: -- Hi, Or we can add a macro in the file. I tried adding that,and it worked well. Since that would help in future. #ifndef CLONE_NEWCGROUP #define CLONE_NEWCGROUP 0x0200 #endif and nstypes["cgroup"] = CLONE_NEWCGROUP; I can submit the required patch. > Support cgroup namespace in unified container > - > > Key: MESOS-5410 > URL: https://issues.apache.org/jira/browse/MESOS-5410 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: Qian Zhang > > In Linux 4.6 kernel, a new namespace (cgroup namespace) was introduced to > make a process can be created in its own cgroup namespace so that the global > cgroup hierarchy will not be leaked to the process. See the following link > for more details about this namespace: > http://man7.org/linux/man-pages/man7/cgroup_namespaces.7.html > We need to support this namespace in unified container to provide better > isolation for the containers created by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5410) Support cgroup namespace in unified container
[ https://issues.apache.org/jira/browse/MESOS-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297786#comment-15297786 ] haosdent commented on MESOS-5410: - Cool! Could you send a email to the dev mailing list to become a contributor in jira, so that I could change the assignee of MESOS-5446 to you. > Support cgroup namespace in unified container > - > > Key: MESOS-5410 > URL: https://issues.apache.org/jira/browse/MESOS-5410 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: Qian Zhang > > In Linux 4.6 kernel, a new namespace (cgroup namespace) was introduced to > make a process can be created in its own cgroup namespace so that the global > cgroup hierarchy will not be leaked to the process. See the following link > for more details about this namespace: > http://man7.org/linux/man-pages/man7/cgroup_namespaces.7.html > We need to support this namespace in unified container to provide better > isolation for the containers created by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)