[ 
https://issues.apache.org/jira/browse/SAMZA-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079416#comment-14079416
 ] 

Yan Fang commented on SAMZA-310:
--------------------------------

spent some time in looking at KafkaLog4jAppender.scala. I think 
KafkaLog4jAppender.scala works fine in our use case if we do not consider gc 
logs. a few change we need to make: 
1) add partition information when sending messages to broker in 
KafkaLog4jAppender.scala
2) create the topic before hand with appropriate partition number
3) tag container information to the log (for partition purpose)

We could make it optional. The way to trigger it is from changing log4j.xml and 
properties file. One thing bugs me is that, the log4j topic name is defined in 
the log4j.xml, Samza seems not have the information before hand and so it can 
not create the topic in, say, AM, if it does not read log4j.xml.

One benefit is that, it could easily fit into ELK with 
[logstash-kafka|https://github.com/joekiller/logstash-kafka]

> Publish container logs to a SystemStream
> ----------------------------------------
>
>                 Key: SAMZA-310
>                 URL: https://issues.apache.org/jira/browse/SAMZA-310
>             Project: Samza
>          Issue Type: New Feature
>          Components: container
>    Affects Versions: 0.7.0
>            Reporter: Martin Kleppmann
>
> At the moment, it's a bit awkward to get to a Samza job's logs: assuming 
> you're running on YARN, you have to navigate around the YARN web interface, 
> and you can only see one container's logs at a time.
> Given that Samza is all about streams, it would make sense for the logs 
> generated by Samza jobs to also be sent to a stream. There, they could be 
> indexed with [Kibana|http://www.elasticsearch.org/overview/kibana/], consumed 
> by an exception-tracking system, etc.
> Notes:
> - The serde for encoding logs into a suitable wire format should be 
> pluggable. There can be a default implementation that uses JSON, analogous to 
> MetricsSnapshotSerdeFactory for metrics, but organisations that already have 
> a standardised in-house encoding for logs should be able to use it.
> - Should this be at the level of Slf4j or Log4j? Currently the log 
> configuration for YARN jobs uses Log4j, which has the advantage that any 
> frameworks/libraries that use Log4j but not Slf4j appear in the logs. 
> However, Samza itself currently only depends on Slf4j. If we tie this feature 
> to Log4j, it would somewhat defeat the purpose of using Slf4j.
> - Do we need to consider partitioning? Perhaps we can use the container name 
> as partitioning key, so that the ordering of logs from each container is 
> preserved.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to