[
https://issues.apache.org/jira/browse/SAMZA-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180000#comment-14180000
]
Chris Riccomini commented on SAMZA-310:
---------------------------------------
bq. Fine with me, as long as we don't mind breaking compatibility in future
when we release a revamped log appender.
Yea, I was thinking about this more last night. I think the compatibility issue
is going to be a problem for us. I wouldn't feel comfortable rolling this out,
and knowing we'll have to modify everyone's log4j files in a few months to
update their appender. It seems like it might be better to "do the right thing"
here. Unfortunately, that means changing the appender pretty significantly.
bq. for the duration of the call to SystemProducer.send, set a thread-local
variable which allows you to detect if it's being called recursively.
This is a great solution. I'm slightly worried about introducing a thread local
variable in the logging path. We've been bitten by ThreadLocal variable
performance in Avro pretty horribly in the past. This concern isn't enough for
me to say that we shouldn't do it, though. This solution is quite elegant.
So it sounds like we'd need to:
# Move the appender to samza-log4j.
# Get config out of environment variable.
# Convert appender to use SystemProducer instead of Kafka's Producer.
# Add a thread local variable check in append to eliminate infinite loops.
# Write some tests.
# Update the docs.
[~closeuris], I'm sorry about all the churn here. :( If you're burnt out on
this ticket, just say the word.
> Publish container logs to a SystemStream
> ----------------------------------------
>
> Key: SAMZA-310
> URL: https://issues.apache.org/jira/browse/SAMZA-310
> Project: Samza
> Issue Type: New Feature
> Components: container
> Affects Versions: 0.7.0
> Reporter: Martin Kleppmann
> Assignee: Yan Fang
> Fix For: 0.8.0
>
> Attachments: SAMZA-310.patch
>
>
> At the moment, it's a bit awkward to get to a Samza job's logs: assuming
> you're running on YARN, you have to navigate around the YARN web interface,
> and you can only see one container's logs at a time.
> Given that Samza is all about streams, it would make sense for the logs
> generated by Samza jobs to also be sent to a stream. There, they could be
> indexed with [Kibana|http://www.elasticsearch.org/overview/kibana/], consumed
> by an exception-tracking system, etc.
> Notes:
> - The serde for encoding logs into a suitable wire format should be
> pluggable. There can be a default implementation that uses JSON, analogous to
> MetricsSnapshotSerdeFactory for metrics, but organisations that already have
> a standardised in-house encoding for logs should be able to use it.
> - Should this be at the level of Slf4j or Log4j? Currently the log
> configuration for YARN jobs uses Log4j, which has the advantage that any
> frameworks/libraries that use Log4j but not Slf4j appear in the logs.
> However, Samza itself currently only depends on Slf4j. If we tie this feature
> to Log4j, it would somewhat defeat the purpose of using Slf4j.
> - Do we need to consider partitioning? Perhaps we can use the container name
> as partitioning key, so that the ordering of logs from each container is
> preserved.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)