[ https://issues.apache.org/jira/browse/MESOS-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839408#comment-16839408 ]
Joseph Wu commented on MESOS-9749: ---------------------------------- The default behavior of Mesos's logging is to write to stdout/stderr. When launching via systemd, this means you are writing to journald. And if journald is restarted, the pipe between the agent and journald would be broken. These sorts of broken pipes usually terminate the agent, but it seems to be different in systemd's case. See also: [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771122] There are a variety of ways to get around this, basically involving writing logs to some other location: --- h2. Built-in solutions Mesos lets you write stdout/stderr to disk instead. If you specify the {{--log_dir}} flag, Mesos will leverage glog's log writing behavior, which has some form of log rotation built in. But unfortunately, this does not seem to bound the size of logs on disk, so you'd end up writing a script or such to clean up logs. Besides that, you may modify your service file to write to something besides journald, such as syslog, or a file. https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Logging%20and%20Standard%20Input/Output h2. Other solutions By the looks of your agent configuration, you are not averse to deploying modules ({{--modules='file:///etc/mesos-chef/slave-modules.json'}}). In this case, you have some other options. DC/OS uses a {{LogSink}} module (which is a Mesos Anonymous module implementing a glog module) to pipe logs to file, which are then rotated by another timer. https://github.com/dcos/dcos-mesos-modules/tree/master/logsink If the goal is to get logs into journald, across journald restarts, this is also possible with a {{LogSink}}. This would entail using the journald C API, like {{sd_journal_send}}. I believe this is capable of reconnecting after journald restarts. https://www.freedesktop.org/software/systemd/man/sd_journal_print.html > mesos agent logging hangs upon systemd-journald restart > ------------------------------------------------------- > > Key: MESOS-9749 > URL: https://issues.apache.org/jira/browse/MESOS-9749 > Project: Mesos > Issue Type: Bug > Affects Versions: 1.7.2 > Environment: Running on centos 7.4.1708, systemd 219 (probably > heavily patched by centos) > mesos-agent command: > {code} > /usr/sbin/mesos-slave \ > > --attributes='canary:canary-false;maintenance_group:group-6;network:10g;platform:centos;platform_major_version:7;rack_name:22.05;type:base;version:v2018-q-1' > \ > --cgroups_enable_cfs \ > --cgroups_hierarchy='/sys/fs/cgroup' \ > --cgroups_net_cls_primary_handle='0xC370' \ > --container_logger='org_apache_mesos_LogrotateContainerLogger' \ > --containerizers='mesos' \ > --credential='file:///etc/mesos-chef/slave-credential' \ > > --default_container_info='\{"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"},\{"host_path":"var_tmp","container_path":"/var/tmp","mode":"RW"},\{"host_path":".","container_path":"/mnt/mesos/sandbox","mode":"RW"},\{"host_path":"/usr/share/mesos/geoip","container_path":"/mnt/mesos/geoip","mode":"RO"}]}' > \ > --docker_registry='https://filer-docker-registry.prod.crto.in/' \ > --docker_store_dir='/var/opt/mesos/store/docker' \ > --enforce_container_disk_quota \ > > --executor_environment_variables='\{"PATH":"/bin:/usr/bin","CRITEO_DC":"par","CRITEO_ENV":"prod","CRITEO_GEOIP_PATH":"/mnt/mesos/geoip"}' > \ > --executor_registration_timeout='5mins' \ > --fetcher_cache_dir='/var/opt/mesos/cache' \ > --fetcher_cache_size='2GB' \ > --hooks='com_criteo_mesos_CommandHook' \ > --image_providers='docker' \ > --image_provisioner_backend='copy' \ > > --isolation='linux/capabilities,cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,filesystem/linux,docker/runtime,network/cni,disk/xfs,com_criteo_mesos_CommandIsolator' > \ > --logging_level='INFO' \ > > --master='zk://mesos:xx...@mesos-master01-par.central.criteo.prod:2181,mesos-master02-par.central.criteo.prod:2181,mesos-master03-par.central.criteo.prod:2181/mesos' > \ > --modules='file:///etc/mesos-chef/slave-modules.json' \ > --port=5051 \ > --recover='reconnect' \ > --resources='file:///etc/mesos-chef/custom_resources.json' \ > --strict \ > --work_dir='/var/opt/mesos' \ > --xfs_kill_containers \ > --xfs_project_range='[5000-500000]' > {code} > Reporter: Gregoire Seux > Priority: Minor > Labels: foundations > > When mesos agent is launched through systemd, a restart of systemd-journald > service makes mesos agent logging hang (no more output).. The process itself > seems to work fine (we can query state via http for instance). > A restart of mesos-agent corrects the issue. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)