[
https://issues.apache.org/jira/browse/SAMZA-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081745#comment-14081745
]
Yan Fang commented on SAMZA-310:
--------------------------------
Thanks for pointing out the MDC.
{quote}
Ideally, I'd like to have this work without depending on log4j in samza-core.
{quote}
slf4j has the [MDC|http://www.slf4j.org/api/org/slf4j/MDC.html], but
grizzled.slf4j does not...
{quote}
The only injection point that I can think of right now to manage things like
setting the MDC to update the taskName when we process a message is via the
TaskLifecycleListener.
{quote}
Can we just set up MDC at the starting time of the containers, instead of vai
TaskLifecycleListener? Since the goal of assgining the AM/ContainerID
information is to have the key to the logs, these information can be retrieved
at the starting time of the container.
> Publish container logs to a SystemStream
> ----------------------------------------
>
> Key: SAMZA-310
> URL: https://issues.apache.org/jira/browse/SAMZA-310
> Project: Samza
> Issue Type: New Feature
> Components: container
> Affects Versions: 0.7.0
> Reporter: Martin Kleppmann
>
> At the moment, it's a bit awkward to get to a Samza job's logs: assuming
> you're running on YARN, you have to navigate around the YARN web interface,
> and you can only see one container's logs at a time.
> Given that Samza is all about streams, it would make sense for the logs
> generated by Samza jobs to also be sent to a stream. There, they could be
> indexed with [Kibana|http://www.elasticsearch.org/overview/kibana/], consumed
> by an exception-tracking system, etc.
> Notes:
> - The serde for encoding logs into a suitable wire format should be
> pluggable. There can be a default implementation that uses JSON, analogous to
> MetricsSnapshotSerdeFactory for metrics, but organisations that already have
> a standardised in-house encoding for logs should be able to use it.
> - Should this be at the level of Slf4j or Log4j? Currently the log
> configuration for YARN jobs uses Log4j, which has the advantage that any
> frameworks/libraries that use Log4j but not Slf4j appear in the logs.
> However, Samza itself currently only depends on Slf4j. If we tie this feature
> to Log4j, it would somewhat defeat the purpose of using Slf4j.
> - Do we need to consider partitioning? Perhaps we can use the container name
> as partitioning key, so that the ordering of logs from each container is
> preserved.
--
This message was sent by Atlassian JIRA
(v6.2#6252)