[ 
https://issues.apache.org/jira/browse/MESOS-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839408#comment-16839408
 ] 

Joseph Wu commented on MESOS-9749:
----------------------------------

The default behavior of Mesos's logging is to write to stdout/stderr. When 
launching via systemd, this means you are writing to journald. And if journald 
is restarted, the pipe between the agent and journald would be broken. These 
sorts of broken pipes usually terminate the agent, but it seems to be different 
in systemd's case.
 See also: [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771122]

There are a variety of ways to get around this, basically involving writing 
logs to some other location:

---
 
h2. Built-in solutions

Mesos lets you write stdout/stderr to disk instead.  If you specify the 
{{--log_dir}} flag, Mesos will leverage glog's log writing behavior, which has 
some form of log rotation built in.  But unfortunately, this does not seem to 
bound the size of logs on disk, so you'd end up writing a script or such to 
clean up logs.

Besides that, you may modify your service file to write to something besides 
journald, such as syslog, or a file.
https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Logging%20and%20Standard%20Input/Output

h2. Other solutions

By the looks of your agent configuration, you are not averse to deploying 
modules ({{--modules='file:///etc/mesos-chef/slave-modules.json'}}).  In this 
case, you have some other options.

DC/OS uses a {{LogSink}} module (which is a Mesos Anonymous module implementing 
a glog module) to pipe logs to file, which are then rotated by another timer.
https://github.com/dcos/dcos-mesos-modules/tree/master/logsink

If the goal is to get logs into journald, across journald restarts, this is 
also possible with a {{LogSink}}.  This would entail using the journald C API, 
like {{sd_journal_send}}.  I believe this is capable of reconnecting after 
journald restarts.
https://www.freedesktop.org/software/systemd/man/sd_journal_print.html

> mesos agent logging hangs upon systemd-journald restart
> -------------------------------------------------------
>
>                 Key: MESOS-9749
>                 URL: https://issues.apache.org/jira/browse/MESOS-9749
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.7.2
>         Environment: Running on centos 7.4.1708, systemd  219 (probably 
> heavily patched by centos)
> mesos-agent command:
> {code}
> /usr/sbin/mesos-slave \
>  
> --attributes='canary:canary-false;maintenance_group:group-6;network:10g;platform:centos;platform_major_version:7;rack_name:22.05;type:base;version:v2018-q-1'
>  \
>  --cgroups_enable_cfs \
>  --cgroups_hierarchy='/sys/fs/cgroup' \
>  --cgroups_net_cls_primary_handle='0xC370' \
>  --container_logger='org_apache_mesos_LogrotateContainerLogger' \
>  --containerizers='mesos' \
>  --credential='file:///etc/mesos-chef/slave-credential' \
>  
> --default_container_info='\{"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"},\{"host_path":"var_tmp","container_path":"/var/tmp","mode":"RW"},\{"host_path":".","container_path":"/mnt/mesos/sandbox","mode":"RW"},\{"host_path":"/usr/share/mesos/geoip","container_path":"/mnt/mesos/geoip","mode":"RO"}]}'
>  \
>  --docker_registry='https://filer-docker-registry.prod.crto.in/' \
>  --docker_store_dir='/var/opt/mesos/store/docker' \
>  --enforce_container_disk_quota \
>  
> --executor_environment_variables='\{"PATH":"/bin:/usr/bin","CRITEO_DC":"par","CRITEO_ENV":"prod","CRITEO_GEOIP_PATH":"/mnt/mesos/geoip"}'
>  \
>  --executor_registration_timeout='5mins' \
>  --fetcher_cache_dir='/var/opt/mesos/cache' \
>  --fetcher_cache_size='2GB' \
>  --hooks='com_criteo_mesos_CommandHook' \
>  --image_providers='docker' \
>  --image_provisioner_backend='copy' \
>  
> --isolation='linux/capabilities,cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,filesystem/linux,docker/runtime,network/cni,disk/xfs,com_criteo_mesos_CommandIsolator'
>  \
>  --logging_level='INFO' \
>  
> --master='zk://mesos:xx...@mesos-master01-par.central.criteo.prod:2181,mesos-master02-par.central.criteo.prod:2181,mesos-master03-par.central.criteo.prod:2181/mesos'
>  \
>  --modules='file:///etc/mesos-chef/slave-modules.json' \
>  --port=5051 \
>  --recover='reconnect' \
>  --resources='file:///etc/mesos-chef/custom_resources.json' \
>  --strict \
>  --work_dir='/var/opt/mesos' \
>  --xfs_kill_containers \
>  --xfs_project_range='[5000-500000]'
> {code}
>            Reporter: Gregoire Seux
>            Priority: Minor
>              Labels: foundations
>
> When mesos agent is launched through systemd, a restart of systemd-journald 
> service makes mesos agent logging hang (no more output).. The process itself 
> seems to work fine (we can query state via http for instance).
> A restart of mesos-agent corrects the issue.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to