[jira] [Commented] (MESOS-6577) Failed to run docker inspect
[ https://issues.apache.org/jira/browse/MESOS-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657292#comment-15657292 ] Marc Villacorta commented on MESOS-6577: I might be reaching the {{DOCKER_INSPECT_TIMEOUT}}: https://github.com/apache/mesos/blob/bf7e9ce836d0fe9924adc2e94054469c4a1906a0/src/docker/executor.cpp#L70-L71 > Failed to run docker inspect > > > Key: MESOS-6577 > URL: https://issues.apache.org/jira/browse/MESOS-6577 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.1 > Environment: {code:none} > core@kato-2 ~ $ cat /etc/systemd/system/mesos-agent.service > [Unit] > Description=Mesos agent > After=go-dnsmasq.service > [Service] > Slice=machine.slice > Restart=always > RestartSec=10 > TimeoutStartSec=0 > KillMode=mixed > EnvironmentFile=/etc/kato.env > ExecStartPre=/usr/bin/sh -c "[ -d /var/lib/mesos/agent ] || mkdir -p > /var/lib/mesos/agent" > ExecStartPre=/usr/bin/sh -c "[ -d /etc/certs ] || mkdir -p /etc/certs" > ExecStartPre=/usr/bin/sh -c "[ -d /etc/cni ] || mkdir -p /etc/cni" > ExecStartPre=/opt/bin/zk-alive ${KATO_QUORUM_COUNT} > ExecStartPre=/usr/bin/rkt fetch quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 > ExecStartPre=/usr/bin/docker pull > quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 > ExecStart=/usr/bin/rkt run \ > --net=host \ > --dns=host \ > --hosts-entry=host \ > --volume cni,kind=host,source=/etc/cni \ > --mount volume=cni,target=/etc/cni \ > --volume certs,kind=host,source=/etc/certs \ > --mount volume=certs,target=/etc/certs \ > --volume docker,kind=host,source=/var/run/docker.sock \ > --mount volume=docker,target=/var/run/docker.sock \ > --volume data,kind=host,source=/var/lib/mesos \ > --mount volume=data,target=/var/lib/mesos \ > --stage1-name=coreos.com/rkt/stage1-fly \ > quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 --exec /usr/sbin/mesos-agent > -- \ > --no-systemd_enable_support \ > --docker_mesos_image=quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 \ > --hostname=worker-${KATO_HOST_ID}.${KATO_DOMAIN} \ > --ip=${KATO_HOST_IP} \ > --containerizers=docker \ > --executor_registration_timeout=2mins \ > --master=zk://${KATO_ZK}/mesos \ > --work_dir=/var/lib/mesos/agent \ > --log_dir=/var/log/mesos/agent \ > --network_cni_config_dir=/etc/cni \ > --network_cni_plugins_dir=/var/lib/mesos/cni-plugins > [Install] > WantedBy=kato.target > {code} > {code:none} > core@kato-2 ~ $ docker version > Client: > Version: 1.12.3 > API version: 1.24 > Go version: go1.6.3 > Git commit: 34a2ead > Built: > OS/Arch: linux/amd64 > Server: > Version: 1.12.3 > API version: 1.24 > Go version: go1.6.3 > Git commit: 34a2ead > Built: > OS/Arch: linux/amd64 > {code} >Reporter: Marc Villacorta > > I am running a _rocketized_ mesos agent. > I am using the docker containerizer. > My executors are _dockerized_. > The very first time I deploy a sample platform I get some errors like the one > below: > {code:none} > Failed to launch container: Failed to run 'docker -H > unix:///var/run/docker.sock inspect > mesos-84a9df2b-be0e-459e-afc9-b95d4e8ced57-S0.0116a0a2-ccaf-4f1a-846c-361ec4e4a179': > exited with status 1; stderr='Error: No such image, container or task: > mesos-84a9df2b-be0e-459e-afc9-b95d4e8ced57-S0.0116a0a2-ccaf-4f1a-846c-361ec4e4a179 > ' > {code} > But when I check with {{docker ps}} I can see the supposedly missing > container and I can even successfully run {{docker inspect}} on it. Then > marathon reschedules and I get a duplicate. Nor mesos neither marathon list > any duplicate (only docker does). > Restarting the mesos-agent wipes out the reported missing container leaving > the other ones alive. > When all my nodes have the docker image layers cached I can deploy the sample > platform smoothly and I don't get the previous errors. > If a container needs a remote volume attached (EBS via REX-Ray) the error > happens all the time. No matter if cached or not. > Reading the code I suspect it is related to the _retryInterval_ of > _Docker::inspect_ > https://github.com/apache/mesos/blob/2e013890e47c30053b7b83cd205b432376589216/src/docker/docker.cpp#L950-L952 > but there is no option to modify this setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6577) Failed to run docker inspect
[ https://issues.apache.org/jira/browse/MESOS-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Villacorta updated MESOS-6577: --- Environment: {code:none} core@kato-2 ~ $ cat /etc/kato.env KATO_CLUSTER_ID=cell-1-dub KATO_QUORUM_COUNT=3 KATO_ROLES='quorum master worker ' KATO_HOST_NAME=kato KATO_HOST_ID=2 KATO_ZK=quorum-1:2181,quorum-2:2181,quorum-3:2181 KATO_ALERT_MANAGERS=http://master-1:9093,http://master-2:9093,http://master-3:9093 KATO_DOMAIN=cell-1.dub.xnood.com KATO_MESOS_DOMAIN=cell-1.dub.mesos KATO_HOST_IP=10.136.64.12 KATO_QUORUM=2 DOCKER_VERSION=1.12.3 {code} {code:none} core@kato-2 ~ $ cat /etc/systemd/system/mesos-agent.service [Unit] Description=Mesos agent After=go-dnsmasq.service [Service] Slice=machine.slice Restart=always RestartSec=10 TimeoutStartSec=0 KillMode=mixed EnvironmentFile=/etc/kato.env ExecStartPre=/usr/bin/sh -c "[ -d /var/lib/mesos/agent ] || mkdir -p /var/lib/mesos/agent" ExecStartPre=/usr/bin/sh -c "[ -d /etc/certs ] || mkdir -p /etc/certs" ExecStartPre=/usr/bin/sh -c "[ -d /etc/cni ] || mkdir -p /etc/cni" ExecStartPre=/opt/bin/zk-alive ${KATO_QUORUM_COUNT} ExecStartPre=/usr/bin/rkt fetch quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 ExecStartPre=/usr/bin/docker pull quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 ExecStart=/usr/bin/rkt run \ --net=host \ --dns=host \ --hosts-entry=host \ --volume cni,kind=host,source=/etc/cni \ --mount volume=cni,target=/etc/cni \ --volume certs,kind=host,source=/etc/certs \ --mount volume=certs,target=/etc/certs \ --volume docker,kind=host,source=/var/run/docker.sock \ --mount volume=docker,target=/var/run/docker.sock \ --volume data,kind=host,source=/var/lib/mesos \ --mount volume=data,target=/var/lib/mesos \ --stage1-name=coreos.com/rkt/stage1-fly \ quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 --exec /usr/sbin/mesos-agent -- \ --no-systemd_enable_support \ --docker_mesos_image=quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 \ --hostname=worker-${KATO_HOST_ID}.${KATO_DOMAIN} \ --ip=${KATO_HOST_IP} \ --containerizers=docker \ --executor_registration_timeout=2mins \ --master=zk://${KATO_ZK}/mesos \ --work_dir=/var/lib/mesos/agent \ --log_dir=/var/log/mesos/agent \ --network_cni_config_dir=/etc/cni \ --network_cni_plugins_dir=/var/lib/mesos/cni-plugins [Install] WantedBy=kato.target {code} {code:none} core@kato-2 ~ $ docker version Client: Version: 1.12.3 API version: 1.24 Go version: go1.6.3 Git commit: 34a2ead Built: OS/Arch: linux/amd64 Server: Version: 1.12.3 API version: 1.24 Go version: go1.6.3 Git commit: 34a2ead Built: OS/Arch: linux/amd64 {code} was: {code:none} core@kato-2 ~ $ cat /etc/systemd/system/mesos-agent.service [Unit] Description=Mesos agent After=go-dnsmasq.service [Service] Slice=machine.slice Restart=always RestartSec=10 TimeoutStartSec=0 KillMode=mixed EnvironmentFile=/etc/kato.env ExecStartPre=/usr/bin/sh -c "[ -d /var/lib/mesos/agent ] || mkdir -p /var/lib/mesos/agent" ExecStartPre=/usr/bin/sh -c "[ -d /etc/certs ] || mkdir -p /etc/certs" ExecStartPre=/usr/bin/sh -c "[ -d /etc/cni ] || mkdir -p /etc/cni" ExecStartPre=/opt/bin/zk-alive ${KATO_QUORUM_COUNT} ExecStartPre=/usr/bin/rkt fetch quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 ExecStartPre=/usr/bin/docker pull quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 ExecStart=/usr/bin/rkt run \ --net=host \ --dns=host \ --hosts-entry=host \ --volume cni,kind=host,source=/etc/cni \ --mount volume=cni,target=/etc/cni \ --volume certs,kind=host,source=/etc/certs \ --mount volume=certs,target=/etc/certs \ --volume docker,kind=host,source=/var/run/docker.sock \ --mount volume=docker,target=/var/run/docker.sock \ --volume data,kind=host,source=/var/lib/mesos \ --mount volume=data,target=/var/lib/mesos \ --stage1-name=coreos.com/rkt/stage1-fly \ quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 --exec /usr/sbin/mesos-agent -- \ --no-systemd_enable_support \ --docker_mesos_image=quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 \ --hostname=worker-${KATO_HOST_ID}.${KATO_DOMAIN} \ --ip=${KATO_HOST_IP} \ --containerizers=docker \ --executor_registration_timeout=2mins \ --master=zk://${KATO_ZK}/mesos \ --work_dir=/var/lib/mesos/agent \ --log_dir=/var/log/mesos/agent \ --network_cni_config_dir=/etc/cni \ --network_cni_plugins_dir=/var/lib/mesos/cni-plugins [Install] WantedBy=kato.target {code} {code:none} core@kato-2 ~ $ docker version Client: Version: 1.12.3 API version: 1.24 Go version: go1.6.3 Git commit: 34a2ead Built: OS/Arch: linux/amd64 Server: Version: 1.12.3 API version: 1.24 Go version: go1.6.3 Git commit: 34a2ead Built: OS/Arch: linux/amd64 {code} > Failed to run docker inspect > > > Key: MESOS-6577 > URL:
[jira] [Updated] (MESOS-6577) Failed to run docker inspect
[ https://issues.apache.org/jira/browse/MESOS-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Villacorta updated MESOS-6577: --- Description: I am running a _rocketized_ mesos agent. I am using the docker containerizer. My executors are _dockerized_. The very first time I deploy a sample platform I get some errors like the one below: {code:none} Failed to launch container: Failed to run 'docker -H unix:///var/run/docker.sock inspect mesos-84a9df2b-be0e-459e-afc9-b95d4e8ced57-S0.0116a0a2-ccaf-4f1a-846c-361ec4e4a179': exited with status 1; stderr='Error: No such image, container or task: mesos-84a9df2b-be0e-459e-afc9-b95d4e8ced57-S0.0116a0a2-ccaf-4f1a-846c-361ec4e4a179 ' {code} But when I check with {{docker ps}} I can see the supposedly missing container and I can even successfully run {{docker inspect}} on it. Then marathon reschedules and I get a duplicate. Nor mesos neither marathon list any duplicate (only docker does). Restarting the mesos-agent wipes out the reported missing container leaving the other ones alive. When all my nodes have the docker image layers cached I can deploy the sample platform smoothly and I don't get the previous errors. If a container needs a remote volume attached (EBS via REX-Ray) the error happens all the time. No matter if cached or not. Reading the code I suspect it is related to the _retryInterval_ of _Docker::inspect_ https://github.com/apache/mesos/blob/2e013890e47c30053b7b83cd205b432376589216/src/docker/docker.cpp#L950-L952 but there is no option to modify this setting. was: I am running a _rocketized_ mesos agent. I am using the docker containerizer. My executors are _dockerized_. The very first time I deploy a sample platform I get some errors like the one below: {code:none} Failed to launch container: Failed to run 'docker -H unix:///var/run/docker.sock inspect mesos-84a9df2b-be0e-459e-afc9-b95d4e8c ed57-S0.0116a0a2-ccaf-4f1a-846c-361ec4e4a179': exited with status 1; stderr='Error: No such image, container or task: mesos-84a 9df2b-be0e-459e-afc9-b95d4e8ced57-S0.0116a0a2-ccaf-4f1a-846c-361ec4e4a179 ' {code} But when I check with {{docker ps}} I can see the supposedly missing container and I can even successfully run {{docker inspect}} on it. Then marathon reschedules and I get a duplicate. Nor mesos neither marathon list any duplicate (only docker does). Restarting the mesos-agent wipes out the reported missing container leaving the other ones alive. When all my nodes have the docker image layers cached I can deploy the sample platform smoothly and I don't get the previous errors. If a container needs a remote volume attached (EBS via REX-Ray) the error happens all the time. No matter if cached or not. Reading the code I suspect it is related to the _retryInterval_ of _Docker::inspect_ https://github.com/apache/mesos/blob/2e013890e47c30053b7b83cd205b432376589216/src/docker/docker.cpp#L950-L952 but there is no option to modify this setting. > Failed to run docker inspect > > > Key: MESOS-6577 > URL: https://issues.apache.org/jira/browse/MESOS-6577 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.1 > Environment: {code:none} > core@kato-2 ~ $ cat /etc/systemd/system/mesos-agent.service > [Unit] > Description=Mesos agent > After=go-dnsmasq.service > [Service] > Slice=machine.slice > Restart=always > RestartSec=10 > TimeoutStartSec=0 > KillMode=mixed > EnvironmentFile=/etc/kato.env > ExecStartPre=/usr/bin/sh -c "[ -d /var/lib/mesos/agent ] || mkdir -p > /var/lib/mesos/agent" > ExecStartPre=/usr/bin/sh -c "[ -d /etc/certs ] || mkdir -p /etc/certs" > ExecStartPre=/usr/bin/sh -c "[ -d /etc/cni ] || mkdir -p /etc/cni" > ExecStartPre=/opt/bin/zk-alive ${KATO_QUORUM_COUNT} > ExecStartPre=/usr/bin/rkt fetch quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 > ExecStartPre=/usr/bin/docker pull > quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 > ExecStart=/usr/bin/rkt run \ > --net=host \ > --dns=host \ > --hosts-entry=host \ > --volume cni,kind=host,source=/etc/cni \ > --mount volume=cni,target=/etc/cni \ > --volume certs,kind=host,source=/etc/certs \ > --mount volume=certs,target=/etc/certs \ > --volume docker,kind=host,source=/var/run/docker.sock \ > --mount volume=docker,target=/var/run/docker.sock \ > --volume data,kind=host,source=/var/lib/mesos \ > --mount volume=data,target=/var/lib/mesos \ > --stage1-name=coreos.com/rkt/stage1-fly \ > quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 --exec /usr/sbin/mesos-agent > -- \ > --no-systemd_enable_support \ > --docker_mesos_image=quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 \ > --hostname=worker-${KATO_HOST_ID}.${KATO_DOMAIN} \ > --ip=${KATO_HOST_IP} \ > --containerizers=docker \ > --executor_registration_timeout=2mins \ >
[jira] [Created] (MESOS-6577) Failed to run docker inspect
Marc Villacorta created MESOS-6577: -- Summary: Failed to run docker inspect Key: MESOS-6577 URL: https://issues.apache.org/jira/browse/MESOS-6577 Project: Mesos Issue Type: Bug Components: containerization, docker Affects Versions: 1.0.1 Environment: {code:none} core@kato-2 ~ $ cat /etc/systemd/system/mesos-agent.service [Unit] Description=Mesos agent After=go-dnsmasq.service [Service] Slice=machine.slice Restart=always RestartSec=10 TimeoutStartSec=0 KillMode=mixed EnvironmentFile=/etc/kato.env ExecStartPre=/usr/bin/sh -c "[ -d /var/lib/mesos/agent ] || mkdir -p /var/lib/mesos/agent" ExecStartPre=/usr/bin/sh -c "[ -d /etc/certs ] || mkdir -p /etc/certs" ExecStartPre=/usr/bin/sh -c "[ -d /etc/cni ] || mkdir -p /etc/cni" ExecStartPre=/opt/bin/zk-alive ${KATO_QUORUM_COUNT} ExecStartPre=/usr/bin/rkt fetch quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 ExecStartPre=/usr/bin/docker pull quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 ExecStart=/usr/bin/rkt run \ --net=host \ --dns=host \ --hosts-entry=host \ --volume cni,kind=host,source=/etc/cni \ --mount volume=cni,target=/etc/cni \ --volume certs,kind=host,source=/etc/certs \ --mount volume=certs,target=/etc/certs \ --volume docker,kind=host,source=/var/run/docker.sock \ --mount volume=docker,target=/var/run/docker.sock \ --volume data,kind=host,source=/var/lib/mesos \ --mount volume=data,target=/var/lib/mesos \ --stage1-name=coreos.com/rkt/stage1-fly \ quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 --exec /usr/sbin/mesos-agent -- \ --no-systemd_enable_support \ --docker_mesos_image=quay.io/kato/mesos:v1.0.1-${DOCKER_VERSION}-2 \ --hostname=worker-${KATO_HOST_ID}.${KATO_DOMAIN} \ --ip=${KATO_HOST_IP} \ --containerizers=docker \ --executor_registration_timeout=2mins \ --master=zk://${KATO_ZK}/mesos \ --work_dir=/var/lib/mesos/agent \ --log_dir=/var/log/mesos/agent \ --network_cni_config_dir=/etc/cni \ --network_cni_plugins_dir=/var/lib/mesos/cni-plugins [Install] WantedBy=kato.target {code} {code:none} core@kato-2 ~ $ docker version Client: Version: 1.12.3 API version: 1.24 Go version: go1.6.3 Git commit: 34a2ead Built: OS/Arch: linux/amd64 Server: Version: 1.12.3 API version: 1.24 Go version: go1.6.3 Git commit: 34a2ead Built: OS/Arch: linux/amd64 {code} Reporter: Marc Villacorta I am running a _rocketized_ mesos agent. I am using the docker containerizer. My executors are _dockerized_. The very first time I deploy a sample platform I get some errors like the one below: {code:none} Failed to launch container: Failed to run 'docker -H unix:///var/run/docker.sock inspect mesos-84a9df2b-be0e-459e-afc9-b95d4e8c ed57-S0.0116a0a2-ccaf-4f1a-846c-361ec4e4a179': exited with status 1; stderr='Error: No such image, container or task: mesos-84a 9df2b-be0e-459e-afc9-b95d4e8ced57-S0.0116a0a2-ccaf-4f1a-846c-361ec4e4a179 ' {code} But when I check with {{docker ps}} I can see the supposedly missing container and I can even successfully run {{docker inspect}} on it. Then marathon reschedules and I get a duplicate. Nor mesos neither marathon list any duplicate (only docker does). Restarting the mesos-agent wipes out the reported missing container leaving the other ones alive. When all my nodes have the docker image layers cached I can deploy the sample platform smoothly and I don't get the previous errors. If a container needs a remote volume attached (EBS via REX-Ray) the error happens all the time. No matter if cached or not. Reading the code I suspect it is related to the _retryInterval_ of _Docker::inspect_ https://github.com/apache/mesos/blob/2e013890e47c30053b7b83cd205b432376589216/src/docker/docker.cpp#L950-L952 but there is no option to modify this setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2115) Improve recovering Docker containers when slave is contained
[ https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633471#comment-15633471 ] Marc Villacorta edited comment on MESOS-2115 at 11/3/16 5:07 PM: - [~SailC] The docker image you specify in {{--docker_mesos_image}} must have a docker client embedded (not bind-mounted) this image will be used to run the mesos executor. I personally use the same image for the mesos-agent and for the executor. In this [commit|https://github.com/katosys/kato/commit/50b7a82d8c63373b53072be33943cb6ff56a20b5] I switch from docker to rocket and it might be of interest to you because it shows how this can be achieved with both container runtimes. was (Author: h0tbird): [~SailC] The docker image you specify in {{--docker_mesos_image}} must have a docker client embedded (not bind-mounted) this image will be used to run the mesos executor. I personally use the same image for the mesos-agent and for the executor. In this [commit|https://github.com/katosys/kato/commit/50b7a82d8c63373b53072be33943cb6ff56a20b5] I switch from docker to rocker and it might be of interest to you because it shows how this can be achieved with both container runtimes. > Improve recovering Docker containers when slave is contained > > > Key: MESOS-2115 > URL: https://issues.apache.org/jira/browse/MESOS-2115 > Project: Mesos > Issue Type: Epic > Components: docker >Reporter: Timothy Chen >Assignee: Timothy Chen > Labels: docker > Fix For: 0.23.0 > > > Currently when docker containerizer is recovering it checks the checkpointed > executor pids to recover which containers are still running, and remove the > rest of the containers from docker ps that isn't recognized. > This is problematic when the slave itself was in a docker container, as when > the slave container dies all the forked processes are removed as well, so the > checkpointed executor pids are no longer valid. > We have to assume the docker containers might be still running even though > the checkpointed executor pids are not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-6486) Mesos on Alpine Linux: JVM Segmentation fault
[ https://issues.apache.org/jira/browse/MESOS-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Villacorta updated MESOS-6486: --- Comment: was deleted (was: What do you think? Is this a problem with _libjvm.so_ or perhaps a JNI problem in _libmesos-1.0.1.so_?) > Mesos on Alpine Linux: JVM Segmentation fault > - > > Key: MESOS-6486 > URL: https://issues.apache.org/jira/browse/MESOS-6486 > Project: Mesos > Issue Type: Wish >Affects Versions: 1.0.1 > Environment: *Docker* > {code:none} > ➜ ~ docker version > Client: > Version: 1.12.1 > API version: 1.24 > Go version: go1.7.1 > Git commit: 6f9534c > Built:Thu Sep 8 10:31:18 2016 > OS/Arch: darwin/amd64 > Server: > Version: 1.12.1 > API version: 1.24 > Go version: go1.6.3 > Git commit: 23cf638 > Built:Thu Aug 18 17:52:38 2016 > OS/Arch: linux/amd64 > {code} > *Alpine* > {code:none} > --- S Y S T E M --- > OS:NAME="Alpine Linux" > ID=alpine > VERSION_ID=3.4.4 > PRETTY_NAME="Alpine Linux v3.4" > HOME_URL="http://alpinelinux.org; > BUG_REPORT_URL="http://bugs.alpinelinux.org; > uname:Linux 4.4.20-moby #1 SMP Thu Sep 15 12:10:20 UTC 2016 x86_64 > libc:glibc 2.9 NPTL > rlimit: STACK 8192k, CORE infinity, NPROC infinity, NOFILE 1048576, AS > infinity > load average:0.01 0.39 0.89 > {code} > *Java* > {code:none} > # JRE version: OpenJDK Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 > compressed oops) > # Derivative: IcedTea 3.1.0 > # Distribution: Custom build (Tue Aug 30 20:38:19 GMT 2016) > {code} >Reporter: Marc Villacorta >Priority: Minor > Attachments: hs_err_pid1677.log > > > I have compiled Mesos 1.0.1 inside a Docker container using Alpine Linux > (Dockerfile below): > {code:none} > # Set the base image for subsequent instructions: > FROM alpine:3.4 > MAINTAINER Marc Villacorta Morera> # Environment variables: > ENV TAG="1.0.1" \ > PREFIX="/usr/local" \ > JAVA_HOME="/usr/lib/jvm/default-jvm" \ > > JAVA_JVM_LIBRARY="/usr/lib/jvm/default-jvm/jre/lib/amd64/server/libjvm.so" \ > LD_LIBRARY_PATH="/usr/lib/jvm/default-jvm/jre/lib/amd64/server" \ > EDGE_REPO="http://nl.alpinelinux.org/alpine/edge; > # Install mesos: > RUN apk add -U --no-cache -t dev git autoconf automake libtool g++ \ > zlib-dev fts-dev apr-dev curl-dev file cyrus-sasl-dev cyrus-sasl-crammd5 \ > subversion-dev make patch linux-headers binutils && apk add -U --no-cache > \ > -t dev openjdk8 maven --repository ${EDGE_REPO}/community && apk add -U \ > --no-cache libstdc++ libgcc subversion-libs libcurl fts zlib coreutils \ > && git clone https://git-wip-us.apache.org/repos/asf/mesos.git && cd > mesos \ > && { [ "${TAG}" != "master" ] && git checkout tags/${TAG} -b ${TAG}; }; \ > ./bootstrap && mkdir build && cd build && ../configure --prefix=${PREFIX} > \ > --disable-dependency-tracking --disable-maintainer-mode --disable-python \ > --enable-optimize --enable-silent-rules \ > && CORES=$(cat /proc/cpuinfo | grep processor | wc -l) \ > && make -j${CORES} && make install && cd && rm -rf /mesos > ${PREFIX}/include \ > && find ${PREFIX} -type f -perm /u=x,g=x,o=x | xargs strip -s > 2>/dev/null; \ > apk del --purge dev && rm -rf /var/cache/apk/* > # Command: > CMD ["/bin/sh"] > {code} > Some tests are failing and my biggest concern is with this one: > {code:none} > make check GTEST_FILTER="ExamplesTest.JavaFramework" > {code} > {code:none} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from ExamplesTest > [ RUN ] ExamplesTest.JavaFramework > ../../src/tests/script.cpp:80: Failure > Failed > java_framework_test.sh terminated with signal Segmentation fault > [ FAILED ] ExamplesTest.JavaFramework (5655 ms) > [--] 1 test from ExamplesTest (5656 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (5689 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] ExamplesTest.JavaFramework > {code} > An ugly SIGSEGV is dispatched by the kernel. It looks like _libjvm.so_ is the > offending library but I am not sure at all: > {code:none} > I1026 15:19:54.843340 1706 replica.cpp:712] Persisted action at 7 > I1026 15:19:54.843683 1706 replica.cpp:691] Replica received learned notice > for position 7 from @0.0.0.0:0 > I1026 15:19:54.864063 1706 leveldb.cpp:341] Persisting action (690 bytes) to > leveldb took 20.333769ms > I1026 15:19:54.864123 1706 replica.cpp:712] Persisted action at 7 > I1026 15:19:54.864131 1706 replica.cpp:697] Replica learned APPEND action at
[jira] [Commented] (MESOS-6486) Mesos on Alpine Linux: JVM Segmentation fault
[ https://issues.apache.org/jira/browse/MESOS-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608833#comment-15608833 ] Marc Villacorta commented on MESOS-6486: What do you think? Is this a problem with _libjvm.so_ or perhaps a JNI problem in _libmesos-1.0.1.so_? > Mesos on Alpine Linux: JVM Segmentation fault > - > > Key: MESOS-6486 > URL: https://issues.apache.org/jira/browse/MESOS-6486 > Project: Mesos > Issue Type: Wish >Affects Versions: 1.0.1 > Environment: *Docker* > {code:none} > ➜ ~ docker version > Client: > Version: 1.12.1 > API version: 1.24 > Go version: go1.7.1 > Git commit: 6f9534c > Built:Thu Sep 8 10:31:18 2016 > OS/Arch: darwin/amd64 > Server: > Version: 1.12.1 > API version: 1.24 > Go version: go1.6.3 > Git commit: 23cf638 > Built:Thu Aug 18 17:52:38 2016 > OS/Arch: linux/amd64 > {code} > *Alpine* > {code:none} > --- S Y S T E M --- > OS:NAME="Alpine Linux" > ID=alpine > VERSION_ID=3.4.4 > PRETTY_NAME="Alpine Linux v3.4" > HOME_URL="http://alpinelinux.org; > BUG_REPORT_URL="http://bugs.alpinelinux.org; > uname:Linux 4.4.20-moby #1 SMP Thu Sep 15 12:10:20 UTC 2016 x86_64 > libc:glibc 2.9 NPTL > rlimit: STACK 8192k, CORE infinity, NPROC infinity, NOFILE 1048576, AS > infinity > load average:0.01 0.39 0.89 > {code} > *Java* > {code:none} > # JRE version: OpenJDK Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 > compressed oops) > # Derivative: IcedTea 3.1.0 > # Distribution: Custom build (Tue Aug 30 20:38:19 GMT 2016) > {code} >Reporter: Marc Villacorta >Priority: Minor > Attachments: hs_err_pid1677.log > > > I have compiled Mesos 1.0.1 inside a Docker container using Alpine Linux > (Dockerfile below): > {code:none} > # Set the base image for subsequent instructions: > FROM alpine:3.4 > MAINTAINER Marc Villacorta Morera> # Environment variables: > ENV TAG="1.0.1" \ > PREFIX="/usr/local" \ > JAVA_HOME="/usr/lib/jvm/default-jvm" \ > > JAVA_JVM_LIBRARY="/usr/lib/jvm/default-jvm/jre/lib/amd64/server/libjvm.so" \ > LD_LIBRARY_PATH="/usr/lib/jvm/default-jvm/jre/lib/amd64/server" \ > EDGE_REPO="http://nl.alpinelinux.org/alpine/edge; > # Install mesos: > RUN apk add -U --no-cache -t dev git autoconf automake libtool g++ \ > zlib-dev fts-dev apr-dev curl-dev file cyrus-sasl-dev cyrus-sasl-crammd5 \ > subversion-dev make patch linux-headers binutils && apk add -U --no-cache > \ > -t dev openjdk8 maven --repository ${EDGE_REPO}/community && apk add -U \ > --no-cache libstdc++ libgcc subversion-libs libcurl fts zlib coreutils \ > && git clone https://git-wip-us.apache.org/repos/asf/mesos.git && cd > mesos \ > && { [ "${TAG}" != "master" ] && git checkout tags/${TAG} -b ${TAG}; }; \ > ./bootstrap && mkdir build && cd build && ../configure --prefix=${PREFIX} > \ > --disable-dependency-tracking --disable-maintainer-mode --disable-python \ > --enable-optimize --enable-silent-rules \ > && CORES=$(cat /proc/cpuinfo | grep processor | wc -l) \ > && make -j${CORES} && make install && cd && rm -rf /mesos > ${PREFIX}/include \ > && find ${PREFIX} -type f -perm /u=x,g=x,o=x | xargs strip -s > 2>/dev/null; \ > apk del --purge dev && rm -rf /var/cache/apk/* > # Command: > CMD ["/bin/sh"] > {code} > Some tests are failing and my biggest concern is with this one: > {code:none} > make check GTEST_FILTER="ExamplesTest.JavaFramework" > {code} > {code:none} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from ExamplesTest > [ RUN ] ExamplesTest.JavaFramework > ../../src/tests/script.cpp:80: Failure > Failed > java_framework_test.sh terminated with signal Segmentation fault > [ FAILED ] ExamplesTest.JavaFramework (5655 ms) > [--] 1 test from ExamplesTest (5656 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (5689 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] ExamplesTest.JavaFramework > {code} > An ugly SIGSEGV is dispatched by the kernel. It looks like _libjvm.so_ is the > offending library but I am not sure at all: > {code:none} > I1026 15:19:54.843340 1706 replica.cpp:712] Persisted action at 7 > I1026 15:19:54.843683 1706 replica.cpp:691] Replica received learned notice > for position 7 from @0.0.0.0:0 > I1026 15:19:54.864063 1706 leveldb.cpp:341] Persisting action (690 bytes) to > leveldb took 20.333769ms > I1026 15:19:54.864123 1706 replica.cpp:712] Persisted action at 7 > I1026 15:19:54.864131 1706 replica.cpp:697] Replica learned APPEND
[jira] [Updated] (MESOS-6486) Mesos on Alpine Linux: JVM Segmentation fault
[ https://issues.apache.org/jira/browse/MESOS-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Villacorta updated MESOS-6486: --- Summary: Mesos on Alpine Linux: JVM Segmentation fault (was: Mesos on Alpine Linux) > Mesos on Alpine Linux: JVM Segmentation fault > - > > Key: MESOS-6486 > URL: https://issues.apache.org/jira/browse/MESOS-6486 > Project: Mesos > Issue Type: Wish >Affects Versions: 1.0.1 > Environment: *Docker* > {code:none} > ➜ ~ docker version > Client: > Version: 1.12.1 > API version: 1.24 > Go version: go1.7.1 > Git commit: 6f9534c > Built:Thu Sep 8 10:31:18 2016 > OS/Arch: darwin/amd64 > Server: > Version: 1.12.1 > API version: 1.24 > Go version: go1.6.3 > Git commit: 23cf638 > Built:Thu Aug 18 17:52:38 2016 > OS/Arch: linux/amd64 > {code} > *Alpine* > {code:none} > --- S Y S T E M --- > OS:NAME="Alpine Linux" > ID=alpine > VERSION_ID=3.4.4 > PRETTY_NAME="Alpine Linux v3.4" > HOME_URL="http://alpinelinux.org; > BUG_REPORT_URL="http://bugs.alpinelinux.org; > uname:Linux 4.4.20-moby #1 SMP Thu Sep 15 12:10:20 UTC 2016 x86_64 > libc:glibc 2.9 NPTL > rlimit: STACK 8192k, CORE infinity, NPROC infinity, NOFILE 1048576, AS > infinity > load average:0.01 0.39 0.89 > {code} > *Java* > {code:none} > # JRE version: OpenJDK Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 > compressed oops) > # Derivative: IcedTea 3.1.0 > # Distribution: Custom build (Tue Aug 30 20:38:19 GMT 2016) > {code} >Reporter: Marc Villacorta >Priority: Minor > Attachments: hs_err_pid1677.log > > > I have compiled Mesos 1.0.1 inside a Docker container using Alpine Linux > (Dockerfile below): > {code:none} > # Set the base image for subsequent instructions: > FROM alpine:3.4 > MAINTAINER Marc Villacorta Morera> # Environment variables: > ENV TAG="1.0.1" \ > PREFIX="/usr/local" \ > JAVA_HOME="/usr/lib/jvm/default-jvm" \ > > JAVA_JVM_LIBRARY="/usr/lib/jvm/default-jvm/jre/lib/amd64/server/libjvm.so" \ > LD_LIBRARY_PATH="/usr/lib/jvm/default-jvm/jre/lib/amd64/server" \ > EDGE_REPO="http://nl.alpinelinux.org/alpine/edge; > # Install mesos: > RUN apk add -U --no-cache -t dev git autoconf automake libtool g++ \ > zlib-dev fts-dev apr-dev curl-dev file cyrus-sasl-dev cyrus-sasl-crammd5 \ > subversion-dev make patch linux-headers binutils && apk add -U --no-cache > \ > -t dev openjdk8 maven --repository ${EDGE_REPO}/community && apk add -U \ > --no-cache libstdc++ libgcc subversion-libs libcurl fts zlib coreutils \ > && git clone https://git-wip-us.apache.org/repos/asf/mesos.git && cd > mesos \ > && { [ "${TAG}" != "master" ] && git checkout tags/${TAG} -b ${TAG}; }; \ > ./bootstrap && mkdir build && cd build && ../configure --prefix=${PREFIX} > \ > --disable-dependency-tracking --disable-maintainer-mode --disable-python \ > --enable-optimize --enable-silent-rules \ > && CORES=$(cat /proc/cpuinfo | grep processor | wc -l) \ > && make -j${CORES} && make install && cd && rm -rf /mesos > ${PREFIX}/include \ > && find ${PREFIX} -type f -perm /u=x,g=x,o=x | xargs strip -s > 2>/dev/null; \ > apk del --purge dev && rm -rf /var/cache/apk/* > # Command: > CMD ["/bin/sh"] > {code} > Some tests are failing and my biggest concern is with this one: > {code:none} > make check GTEST_FILTER="ExamplesTest.JavaFramework" > {code} > {code:none} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from ExamplesTest > [ RUN ] ExamplesTest.JavaFramework > ../../src/tests/script.cpp:80: Failure > Failed > java_framework_test.sh terminated with signal Segmentation fault > [ FAILED ] ExamplesTest.JavaFramework (5655 ms) > [--] 1 test from ExamplesTest (5656 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (5689 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] ExamplesTest.JavaFramework > {code} > An ugly SIGSEGV is dispatched by the kernel. It looks like _libjvm.so_ is the > offending library but I am not sure at all: > {code:none} > I1026 15:19:54.843340 1706 replica.cpp:712] Persisted action at 7 > I1026 15:19:54.843683 1706 replica.cpp:691] Replica received learned notice > for position 7 from @0.0.0.0:0 > I1026 15:19:54.864063 1706 leveldb.cpp:341] Persisting action (690 bytes) to > leveldb took 20.333769ms > I1026 15:19:54.864123 1706 replica.cpp:712] Persisted action at 7 > I1026 15:19:54.864131 1706 replica.cpp:697] Replica learned APPEND action at > position 7 > I1026 15:19:54.864936 1705
[jira] [Updated] (MESOS-6486) Mesos on Alpine Linux
[ https://issues.apache.org/jira/browse/MESOS-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Villacorta updated MESOS-6486: --- Attachment: hs_err_pid1677.log > Mesos on Alpine Linux > - > > Key: MESOS-6486 > URL: https://issues.apache.org/jira/browse/MESOS-6486 > Project: Mesos > Issue Type: Wish >Affects Versions: 1.0.1 > Environment: *Docker* > {code:none} > ➜ ~ docker version > Client: > Version: 1.12.1 > API version: 1.24 > Go version: go1.7.1 > Git commit: 6f9534c > Built:Thu Sep 8 10:31:18 2016 > OS/Arch: darwin/amd64 > Server: > Version: 1.12.1 > API version: 1.24 > Go version: go1.6.3 > Git commit: 23cf638 > Built:Thu Aug 18 17:52:38 2016 > OS/Arch: linux/amd64 > {code} > *Alpine* > {code:none} > --- S Y S T E M --- > OS:NAME="Alpine Linux" > ID=alpine > VERSION_ID=3.4.4 > PRETTY_NAME="Alpine Linux v3.4" > HOME_URL="http://alpinelinux.org; > BUG_REPORT_URL="http://bugs.alpinelinux.org; > uname:Linux 4.4.20-moby #1 SMP Thu Sep 15 12:10:20 UTC 2016 x86_64 > libc:glibc 2.9 NPTL > rlimit: STACK 8192k, CORE infinity, NPROC infinity, NOFILE 1048576, AS > infinity > load average:0.01 0.39 0.89 > {code} > *Java* > {code:none} > # JRE version: OpenJDK Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 > compressed oops) > # Derivative: IcedTea 3.1.0 > # Distribution: Custom build (Tue Aug 30 20:38:19 GMT 2016) > {code} >Reporter: Marc Villacorta >Priority: Minor > Attachments: hs_err_pid1677.log > > > I have compiled Mesos 1.0.1 inside a Docker container using Alpine Linux > (Dockerfile below): > {code:none} > # Set the base image for subsequent instructions: > FROM alpine:3.4 > MAINTAINER Marc Villacorta Morera> # Environment variables: > ENV TAG="1.0.1" \ > PREFIX="/usr/local" \ > JAVA_HOME="/usr/lib/jvm/default-jvm" \ > > JAVA_JVM_LIBRARY="/usr/lib/jvm/default-jvm/jre/lib/amd64/server/libjvm.so" \ > LD_LIBRARY_PATH="/usr/lib/jvm/default-jvm/jre/lib/amd64/server" \ > EDGE_REPO="http://nl.alpinelinux.org/alpine/edge; > # Install mesos: > RUN apk add -U --no-cache -t dev git autoconf automake libtool g++ \ > zlib-dev fts-dev apr-dev curl-dev file cyrus-sasl-dev cyrus-sasl-crammd5 \ > subversion-dev make patch linux-headers binutils && apk add -U --no-cache > \ > -t dev openjdk8 maven --repository ${EDGE_REPO}/community && apk add -U \ > --no-cache libstdc++ libgcc subversion-libs libcurl fts zlib coreutils \ > && git clone https://git-wip-us.apache.org/repos/asf/mesos.git && cd > mesos \ > && { [ "${TAG}" != "master" ] && git checkout tags/${TAG} -b ${TAG}; }; \ > ./bootstrap && mkdir build && cd build && ../configure --prefix=${PREFIX} > \ > --disable-dependency-tracking --disable-maintainer-mode --disable-python \ > --enable-optimize --enable-silent-rules \ > && CORES=$(cat /proc/cpuinfo | grep processor | wc -l) \ > && make -j${CORES} && make install && cd && rm -rf /mesos > ${PREFIX}/include \ > && find ${PREFIX} -type f -perm /u=x,g=x,o=x | xargs strip -s > 2>/dev/null; \ > apk del --purge dev && rm -rf /var/cache/apk/* > # Command: > CMD ["/bin/sh"] > {code} > Some tests are failing and my biggest concern is with this one: > {code:none} > make check GTEST_FILTER="ExamplesTest.JavaFramework" > {code} > {code:none} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from ExamplesTest > [ RUN ] ExamplesTest.JavaFramework > ../../src/tests/script.cpp:80: Failure > Failed > java_framework_test.sh terminated with signal Segmentation fault > [ FAILED ] ExamplesTest.JavaFramework (5655 ms) > [--] 1 test from ExamplesTest (5656 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (5689 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] ExamplesTest.JavaFramework > {code} > An ugly SIGSEGV is dispatched by the kernel. It looks like _libjvm.so_ is the > offending library but I am not sure at all: > {code:none} > I1026 15:19:54.843340 1706 replica.cpp:712] Persisted action at 7 > I1026 15:19:54.843683 1706 replica.cpp:691] Replica received learned notice > for position 7 from @0.0.0.0:0 > I1026 15:19:54.864063 1706 leveldb.cpp:341] Persisting action (690 bytes) to > leveldb took 20.333769ms > I1026 15:19:54.864123 1706 replica.cpp:712] Persisted action at 7 > I1026 15:19:54.864131 1706 replica.cpp:697] Replica learned APPEND action at > position 7 > I1026 15:19:54.864936 1705 registrar.cpp:509] Successfully updated the > 'registry' in 31.458048ms > I1026 15:19:54.864989 1700
[jira] [Commented] (MESOS-6310) Remove or define non-POSIX function
[ https://issues.apache.org/jira/browse/MESOS-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576711#comment-15576711 ] Marc Villacorta commented on MESOS-6310: It builds successfully after I applied that last patch. > Remove or define non-POSIX function > --- > > Key: MESOS-6310 > URL: https://issues.apache.org/jira/browse/MESOS-6310 > Project: Mesos > Issue Type: Improvement > Components: containerization >Affects Versions: 1.0.2 >Reporter: Marc Villacorta >Assignee: Kevin Klues >Priority: Minor > Fix For: 1.1.0 > > > I was trying to compile Mesos using _musl_ inside Alpine Linux 3.4. > But this [commit| > https://github.com/apache/mesos/commit/498d14e934233e4501597b43da3924bfe8b2de20] > introduced the {{W_EXITCODE()}} macro which is not defined in _musl_ and > seems to be non-POSIX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6310) Remove or define non-POSIX function
[ https://issues.apache.org/jira/browse/MESOS-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576290#comment-15576290 ] Marc Villacorta commented on MESOS-6310: I think {{stout/os/wait.hpp}} should be included in {{src/slave/containerizer/mesos/launch.cpp}} too. {code:none} CXX slave/containerizer/mesos/libmesos_no_3rdparty_la-launch.lo ../../src/slave/containerizer/mesos/launch.cpp: In function 'void mesos::internal::slave::exitWithSignal(int)': ../../src/slave/containerizer/mesos/launch.cpp:224:44: error: 'W_EXITCODE' was not declared in this scope signalSafeWriteStatus(W_EXITCODE(0, sig)); ^ ../../src/slave/containerizer/mesos/launch.cpp: In function 'void mesos::internal::slave::exitWithStatus(int)': ../../src/slave/containerizer/mesos/launch.cpp:236:47: error: 'W_EXITCODE' was not declared in this scope signalSafeWriteStatus(W_EXITCODE(status, 0)); ^ {code} > Remove or define non-POSIX function > --- > > Key: MESOS-6310 > URL: https://issues.apache.org/jira/browse/MESOS-6310 > Project: Mesos > Issue Type: Improvement > Components: containerization >Affects Versions: 1.0.2 >Reporter: Marc Villacorta >Assignee: Kevin Klues >Priority: Minor > Fix For: 1.1.0 > > > I was trying to compile Mesos using _musl_ inside Alpine Linux 3.4. > But this [commit| > https://github.com/apache/mesos/commit/498d14e934233e4501597b43da3924bfe8b2de20] > introduced the {{W_EXITCODE()}} macro which is not defined in _musl_ and > seems to be non-POSIX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6314) It looks like getgrouplist returns duplicated results
Marc Villacorta created MESOS-6314: -- Summary: It looks like getgrouplist returns duplicated results Key: MESOS-6314 URL: https://issues.apache.org/jira/browse/MESOS-6314 Project: Mesos Issue Type: Bug Components: tests Affects Versions: 1.0.2 Environment: Inside Docker container {{alpine:3.4}} Reporter: Marc Villacorta In my Alpine 3.4 system OsTest.User fails: {code:none} /mesos/build # id -G 0 1 2 3 4 6 10 11 20 26 27 {code} {code:none} RUN ] OsTest.User ../../../3rdparty/stout/tests/os_tests.cpp:696: Failure Value of: expected_gids Actual: { "0", "0", "1", "10", "11", "2", "20", "26", "27", "3", "4", "6" } Expected: tokens.get() Which is: { "0", "1", "10", "11", "2", "20", "26", "27", "3", "4", "6" } [ FAILED ] OsTest.User (6 ms) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6314) OsTest.User: It looks like getgrouplist returns duplicated results
[ https://issues.apache.org/jira/browse/MESOS-6314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Villacorta updated MESOS-6314: --- Summary: OsTest.User: It looks like getgrouplist returns duplicated results (was: It looks like getgrouplist returns duplicated results) > OsTest.User: It looks like getgrouplist returns duplicated results > -- > > Key: MESOS-6314 > URL: https://issues.apache.org/jira/browse/MESOS-6314 > Project: Mesos > Issue Type: Bug > Components: tests >Affects Versions: 1.0.2 > Environment: Inside Docker container {{alpine:3.4}} >Reporter: Marc Villacorta > > In my Alpine 3.4 system OsTest.User fails: > {code:none} > /mesos/build # id -G > 0 1 2 3 4 6 10 11 20 26 27 > {code} > {code:none} > RUN ] OsTest.User > ../../../3rdparty/stout/tests/os_tests.cpp:696: Failure > Value of: expected_gids > Actual: { "0", "0", "1", "10", "11", "2", "20", "26", "27", "3", "4", "6" } > Expected: tokens.get() > Which is: { "0", "1", "10", "11", "2", "20", "26", "27", "3", "4", "6" } > [ FAILED ] OsTest.User (6 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5909) Stout "OsTest.User" test can fail on some systems
[ https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548353#comment-15548353 ] Marc Villacorta commented on MESOS-5909: In my Alpine 3.4 system this test still fails: {code:none} /mesos/build # id -G 0 1 2 3 4 6 10 11 20 26 27 {code} {code:none} RUN ] OsTest.User ../../../3rdparty/stout/tests/os_tests.cpp:696: Failure Value of: expected_gids Actual: { "0", "0", "1", "10", "11", "2", "20", "26", "27", "3", "4", "6" } Expected: tokens.get() Which is: { "0", "1", "10", "11", "2", "20", "26", "27", "3", "4", "6" } [ FAILED ] OsTest.User (6 ms) {code} Should I open a new Jira? > Stout "OsTest.User" test can fail on some systems > - > > Key: MESOS-5909 > URL: https://issues.apache.org/jira/browse/MESOS-5909 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Kapil Arya >Assignee: Mao Geng > Labels: mesosphere > Fix For: 1.1.0 > > Attachments: MESOS-5909-fix.diff > > > Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner > (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted > list ("100 471" in my case) causing the validation inside the loop to fail. > We should sort both lists before comparing the values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6310) Remove or define non-POSIX function
[ https://issues.apache.org/jira/browse/MESOS-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Villacorta updated MESOS-6310: --- Summary: Remove or define non-POSIX function (was: Remove or define non-posix function) > Remove or define non-POSIX function > --- > > Key: MESOS-6310 > URL: https://issues.apache.org/jira/browse/MESOS-6310 > Project: Mesos > Issue Type: Improvement > Components: containerization >Affects Versions: 1.0.2 >Reporter: Marc Villacorta >Priority: Minor > > I was trying to compile Mesos using _musl_ inside Alpine Linux 3.4. > But this [commit| > https://github.com/apache/mesos/commit/498d14e934233e4501597b43da3924bfe8b2de20] > introduced the {{W_EXITCODE()}} macro which is not defined in _musl_ and > seems to be non-POSIX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6310) Remove or define non-posix function
Marc Villacorta created MESOS-6310: -- Summary: Remove or define non-posix function Key: MESOS-6310 URL: https://issues.apache.org/jira/browse/MESOS-6310 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 1.0.2 Reporter: Marc Villacorta Priority: Minor I was trying to compile Mesos using _musl_ inside Alpine Linux 3.4. But this [commit| https://github.com/apache/mesos/commit/498d14e934233e4501597b43da3924bfe8b2de20] introduced the {{W_EXITCODE()}} macro which is not defined in _musl_ and seems to be non-POSIX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6202) Docker containerizer kills containers whose name starts with 'mesos-'
[ https://issues.apache.org/jira/browse/MESOS-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505959#comment-15505959 ] Marc Villacorta commented on MESOS-6202: Sure, here you have it: MESOS-6212 > Docker containerizer kills containers whose name starts with 'mesos-' > - > > Key: MESOS-6202 > URL: https://issues.apache.org/jira/browse/MESOS-6202 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.1 > Environment: Dockerized > {{mesosphere/mesos-slave:1.0.1-2.0.93.ubuntu1404}} >Reporter: Marc Villacorta > > I run 3 docker containers in my CoreOS system whose names start with > _'mesos-'_ those are: _'mesos-master'_, _'mesos-dns'_ and _'mesos-agent'_. > I can start the first two without any problem but when I start the third one > _('mesos-agent')_ all three containers are killed by the docker daemon. > If I rename the containers to _'m3s0s-master'_, _'m3s0s-dns'_ and > _'m3s0s-agent'_ everything works. > I tracked down the problem to > [this|https://github.com/apache/mesos/blob/16a563aca1f226b021b8f8815c4d115a3212f02b/src/slave/containerizer/docker.cpp#L116-L120] > code which is marked to be removed after deprecation cycle. > I was previously running Mesos 0.28.2 without this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6212) Validate the name format of mesos-managed docker containers
Marc Villacorta created MESOS-6212: -- Summary: Validate the name format of mesos-managed docker containers Key: MESOS-6212 URL: https://issues.apache.org/jira/browse/MESOS-6212 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 1.0.1 Reporter: Marc Villacorta Priority: Minor Validate the name format of mesos-managed docker containers in order to avoid false positives when looking for orphaned mesos tasks. Currently names such as _'mesos-master'_, _'mesos-agent'_ and _'mesos-dns'_ are wrongly terminated when {{--docker_kill_orphans}} is set to true (default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6202) Docker containerizer kills containers whose name starts with 'mesos-'
[ https://issues.apache.org/jira/browse/MESOS-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502699#comment-15502699 ] Marc Villacorta commented on MESOS-6202: Would you considere adding a validation to make sure {{id}} is a valid Docker UUID? > Docker containerizer kills containers whose name starts with 'mesos-' > - > > Key: MESOS-6202 > URL: https://issues.apache.org/jira/browse/MESOS-6202 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.1 > Environment: Dockerized > {{mesosphere/mesos-slave:1.0.1-2.0.93.ubuntu1404}} >Reporter: Marc Villacorta > > I run 3 docker containers in my CoreOS system whose names start with > _'mesos-'_ those are: _'mesos-master'_, _'mesos-dns'_ and _'mesos-agent'_. > I can start the first two without any problem but when I start the third one > _('mesos-agent')_ all three containers are killed by the docker daemon. > If I rename the containers to _'m3s0s-master'_, _'m3s0s-dns'_ and > _'m3s0s-agent'_ everything works. > I tracked down the problem to > [this|https://github.com/apache/mesos/blob/16a563aca1f226b021b8f8815c4d115a3212f02b/src/slave/containerizer/docker.cpp#L116-L120] > code which is marked to be removed after deprecation cycle. > I was previously running Mesos 0.28.2 without this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6202) Docker containerizer kills containers whose name starts with 'mesos-'
[ https://issues.apache.org/jira/browse/MESOS-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Villacorta updated MESOS-6202: --- Description: I run 3 docker containers in my CoreOS system whose names start with _'mesos-'_ those are: _'mesos-master'_, _'mesos-dns'_ and _'mesos-agent'_. I can start the first two without any problem but when I start the third one _('mesos-agent')_ all three containers are killed by the docker daemon. If I rename the containers to _'m3s0s-master'_, _'m3s0s-dns'_ and _'m3s0s-agent'_ everything works. I tracked down the problem to [this|https://github.com/apache/mesos/blob/16a563aca1f226b021b8f8815c4d115a3212f02b/src/slave/containerizer/docker.cpp#L116-L120] code which is marked to be removed after deprecation cycle. I was previously running Mesos 0.28.2 without this problem. was: I run 3 docker containers in my CoreOS system whose names start with _'mesos-'_ those are: _'mesos-master'_, _'mesos-dns'_ and _'mesos-agent'_. I can start the first two without any problem but when I start the third one _('mesos-agent')_ all three containers are killed by the docker daemon. If I rename the containers to _'m3s0s-master'_, _'m3s0s-dns'_ and _'m3s0s-agent'_ everithing works. I tracked down the problem to [this|https://github.com/apache/mesos/blob/16a563aca1f226b021b8f8815c4d115a3212f02b/src/slave/containerizer/docker.cpp#L116-L120] code which is marked to be removed after deprecation cycle. I was previously running Mesos 0.28.2 without this problem. > Docker containerizer kills containers whose name starts with 'mesos-' > - > > Key: MESOS-6202 > URL: https://issues.apache.org/jira/browse/MESOS-6202 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.1 > Environment: Dockerized > {{mesosphere/mesos-slave:1.0.1-2.0.93.ubuntu1404}} >Reporter: Marc Villacorta > > I run 3 docker containers in my CoreOS system whose names start with > _'mesos-'_ those are: _'mesos-master'_, _'mesos-dns'_ and _'mesos-agent'_. > I can start the first two without any problem but when I start the third one > _('mesos-agent')_ all three containers are killed by the docker daemon. > If I rename the containers to _'m3s0s-master'_, _'m3s0s-dns'_ and > _'m3s0s-agent'_ everything works. > I tracked down the problem to > [this|https://github.com/apache/mesos/blob/16a563aca1f226b021b8f8815c4d115a3212f02b/src/slave/containerizer/docker.cpp#L116-L120] > code which is marked to be removed after deprecation cycle. > I was previously running Mesos 0.28.2 without this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6202) Docker containerizer kills containers whose name starts with 'mesos-'
[ https://issues.apache.org/jira/browse/MESOS-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Villacorta updated MESOS-6202: --- Description: I run 3 docker containers in my CoreOS system whose names start with _'mesos-'_ those are: _'mesos-master'_, _'mesos-dns'_ and _'mesos-agent'_. I can start the first two without any problem but when I start the third one _('mesos-agent')_ all three containers are killed by the docker daemon. If I rename the containers to _'m3s0s-master'_, _'m3s0s-dns'_ and _'m3s0s-agent'_ everithing works. I tracked down the problem to [this|https://github.com/apache/mesos/blob/16a563aca1f226b021b8f8815c4d115a3212f02b/src/slave/containerizer/docker.cpp#L116-L120] code which is marked to be removed after deprecation cycle. I was previously running Mesos 0.28.2 without this problem. was: I run 3 docker containers in my CoreOS system whose names start with _'mesos-'_ those are: _'mesos-master'_, _'mesos-dns'_ and _'mesos-agent'_. I can start the first two without any problem but when I start the third one _('mesos-agent')_ all three containers are killed by the docker daemon. If I rename the containers to 'm3s0s-master'_, _'m3s0s-dns'_ and _'m3s0s-agent'_ everithing works. I tracked down the problem to [this|https://github.com/apache/mesos/blob/16a563aca1f226b021b8f8815c4d115a3212f02b/src/slave/containerizer/docker.cpp#L116-L120] code which is marked to be removed after deprecation cycle. I was previously running Mesos 0.28.2 without this problem. > Docker containerizer kills containers whose name starts with 'mesos-' > - > > Key: MESOS-6202 > URL: https://issues.apache.org/jira/browse/MESOS-6202 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.1 > Environment: Dockerized > {{mesosphere/mesos-slave:1.0.1-2.0.93.ubuntu1404}} >Reporter: Marc Villacorta > > I run 3 docker containers in my CoreOS system whose names start with > _'mesos-'_ those are: _'mesos-master'_, _'mesos-dns'_ and _'mesos-agent'_. > I can start the first two without any problem but when I start the third one > _('mesos-agent')_ all three containers are killed by the docker daemon. > If I rename the containers to _'m3s0s-master'_, _'m3s0s-dns'_ and > _'m3s0s-agent'_ everithing works. > I tracked down the problem to > [this|https://github.com/apache/mesos/blob/16a563aca1f226b021b8f8815c4d115a3212f02b/src/slave/containerizer/docker.cpp#L116-L120] > code which is marked to be removed after deprecation cycle. > I was previously running Mesos 0.28.2 without this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6202) Docker containerizer kills containers whose name starts with 'mesos-'
Marc Villacorta created MESOS-6202: -- Summary: Docker containerizer kills containers whose name starts with 'mesos-' Key: MESOS-6202 URL: https://issues.apache.org/jira/browse/MESOS-6202 Project: Mesos Issue Type: Bug Components: containerization, docker Affects Versions: 1.0.1 Environment: Dockerized {{mesosphere/mesos-slave:1.0.1-2.0.93.ubuntu1404}} Reporter: Marc Villacorta I run 3 docker containers in my CoreOS system whose names start with _'mesos-'_ those are: _'mesos-master'_, _'mesos-dns'_ and _'mesos-agent'_. I can start the first two without any problem but when I start the third one _('mesos-agent')_ all three containers are killed by the docker daemon. If I rename the containers to 'm3s0s-master'_, _'m3s0s-dns'_ and _'m3s0s-agent'_ everithing works. I tracked down the problem to [this|https://github.com/apache/mesos/blob/16a563aca1f226b021b8f8815c4d115a3212f02b/src/slave/containerizer/docker.cpp#L116-L120] code which is marked to be removed after deprecation cycle. I was previously running Mesos 0.28.2 without this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5472) Hadoop-free S3 fetcher
Marc Villacorta created MESOS-5472: -- Summary: Hadoop-free S3 fetcher Key: MESOS-5472 URL: https://issues.apache.org/jira/browse/MESOS-5472 Project: Mesos Issue Type: Wish Components: fetcher Reporter: Marc Villacorta Priority: Minor My mesos agents are running on systems without Hadoop. I would like to fetch _S3_ uris into my sandboxes. How about using the _'awscli'_? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2115) Improve recovering Docker containers when slave is contained
[ https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072769#comment-15072769 ] Marc Villacorta commented on MESOS-2115: Is there a way to define other volumes (or Docker parameters in general) to bind-mount to the container where the executor is running (the one set by _'docker_mesos_image'_)? I am trying to use: {code:none} --docker_mesos_image=mesosphere/mesos-slave:0.26.0-0.2.145.ubuntu1404 {code} ... but in CoreOS I must set some extra bind-mounts such as: {code:none} --volume /usr/bin/docker:/usr/bin/docker:ro --volume /lib64/libdevmapper.so.1.02:/lib/libdevmapper.so.1.02:ro --volume /lib64/libsystemd.so.0:/lib/libsystemd.so.0:ro --volume /lib64/libgcrypt.so.20:/lib/libgcrypt.so.20:ro {code} Which Docker image do you set in _'--docker_mesos_image'_? > Improve recovering Docker containers when slave is contained > > > Key: MESOS-2115 > URL: https://issues.apache.org/jira/browse/MESOS-2115 > Project: Mesos > Issue Type: Epic > Components: docker >Reporter: Timothy Chen >Assignee: Timothy Chen > Labels: docker > Fix For: 0.23.0 > > > Currently when docker containerizer is recovering it checks the checkpointed > executor pids to recover which containers are still running, and remove the > rest of the containers from docker ps that isn't recognized. > This is problematic when the slave itself was in a docker container, as when > the slave container dies all the forked processes are removed as well, so the > checkpointed executor pids are no longer valid. > We have to assume the docker containers might be still running even though > the checkpointed executor pids are not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)