Martin Kleppmann created SAMZA-310:
--------------------------------------

             Summary: Publish container logs to a SystemStream
                 Key: SAMZA-310
                 URL: https://issues.apache.org/jira/browse/SAMZA-310
             Project: Samza
          Issue Type: New Feature
          Components: container
    Affects Versions: 0.7.0
            Reporter: Martin Kleppmann


At the moment, it's a bit awkward to get to a Samza job's logs: assuming you're 
running on YARN, you have to navigate around the YARN web interface, and you 
can only see one container's logs at a time.

Given that Samza is all about streams, it would make sense for the logs 
generated by Samza jobs to also be sent to a stream. There, they could be 
indexed with [Kibana|http://www.elasticsearch.org/overview/kibana/], consumed 
by an exception-tracking system, etc.

Notes:

- The serde for encoding logs into a suitable wire format should be pluggable. 
There can be a default implementation that uses JSON, analogous to 
MetricsSnapshotSerdeFactory for metrics, but organisations that already have a 
standardised in-house encoding for logs should be able to use it.
- Should this be at the level of Slf4j or Log4j? Currently the log 
configuration for YARN jobs uses Log4j, which has the advantage that any 
frameworks/libraries that use Log4j but not Slf4j appear in the logs. However, 
Samza itself currently only depends on Slf4j. If we tie this feature to Log4j, 
it would somewhat defeat the purpose of using Slf4j.
- Do we need to consider partitioning? Perhaps we can use the container name as 
partitioning key, so that the ordering of logs from each container is preserved.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to