[ https://issues.apache.org/jira/browse/KAFKA-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657994#comment-16657994 ]
John Roesler commented on KAFKA-7510: ------------------------------------- [~MrKafka], That's a fair justification. I was thinking of more general "keys" like db primary keys. My personal opinion is that it would be a design error to use PII (such as email) as a db key. But it occurs to me now that because of the way that stream transformations work, arbitrary fields may need to become a kafka record key during the computation. I would be in favor of banning keys and values (and headers as well) from our logs by default. I agree with [~mjsax] and [~ewencp]: consistency is key here. It seems like this request should be re-scoped to cover the entire project if [~MrKafka] is to be able to trust we won't leak PII into the logs. I'd be in favor of the implementer writing a KIP to this effect so that the community can discuss the issue holistically. I'm not sure I agree with banning data fields in the logs, but allowing them at DEBUG or TRACE level. It's a personal opinion, but it seems too hard to verify proper handling with exceptions like this, especially over time. It also seems like it would be hard for operators to consider any logs "clean", knowing that some logs can contain pii. But these are exactly the kinds of concerns that can be hashed out in a KIP. Just a final opinionated note: I would personally never consider logs of any kind as "clean" of protected information. History is littered with examples of apps accidentally leaking protected information to logs. I'm not opposed to making a solid effort so Kafka isn't responsible for such a leak, but my advice to any operator would be to access-control their logs the way they access-control their data. > KStreams RecordCollectorImpl leaks data to logs on error > -------------------------------------------------------- > > Key: KAFKA-7510 > URL: https://issues.apache.org/jira/browse/KAFKA-7510 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Mr Kafka > Priority: Major > Labels: user-experience > > org.apache.kafka.streams.processor.internals.RecordCollectorImpl leaks data > on error as it dumps the *value* / message payload to the logs. > This is problematic as it may contain personally identifiable information > (pii) or other secret information to plain text log files which can then be > propagated to other log systems i.e Splunk. > I suggest the *key*, and *value* fields be moved to debug level as it is > useful for some people while error level contains the *errorMessage, > timestamp, topic* and *stackTrace*. > {code:java} > private <K, V> void recordSendError( > final K key, > final V value, > final Long timestamp, > final String topic, > final Exception exception > ) { > String errorLogMessage = LOG_MESSAGE; > String errorMessage = EXCEPTION_MESSAGE; > if (exception instanceof RetriableException) { > errorLogMessage += PARAMETER_HINT; > errorMessage += PARAMETER_HINT; > } > log.error(errorLogMessage, key, value, timestamp, topic, > exception.toString()); > sendException = new StreamsException( > String.format( > errorMessage, > logPrefix, > "an error caught", > key, > value, > timestamp, > topic, > exception.toString() > ), > exception); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)