[jira] [Commented] (KAFKA-7510) KStreams RecordCollectorImpl leaks data to logs on error

John Roesler (JIRA) Sat, 20 Oct 2018 14:19:48 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657994#comment-16657994
 ]


John Roesler commented on KAFKA-7510:
-------------------------------------

[~MrKafka], That's a fair justification.

I was thinking of more general "keys" like db primary keys. My personal opinion 
is that it would be a design error to use PII (such as email) as a db key.

But it occurs to me now that because of the way that stream transformations 
work, arbitrary fields may need to become a kafka record key during the 
computation.

I would be in favor of banning keys and values (and headers as well) from our 
logs by default.

 

I agree with [~mjsax] and [~ewencp]: consistency is key here. It seems like 
this request should be re-scoped to cover the entire project if [~MrKafka] is 
to be able to trust we won't leak PII into the logs. I'd be in favor of the 
implementer writing a KIP to this effect so that the community can discuss the 
issue holistically.

I'm not sure I agree with banning data fields in the logs, but allowing them at 
DEBUG or TRACE level. It's a personal opinion, but it seems too hard to verify 
proper handling with exceptions like this, especially over time. It also seems 
like it would be hard for operators to consider any logs "clean", knowing that 
some logs can contain pii.

But these are exactly the kinds of concerns that can be hashed out in a KIP.

 

Just a final opinionated note: I would personally never consider logs of any 
kind as "clean" of protected information. History is littered with examples of 
apps accidentally leaking protected information to logs. I'm not opposed to 
making a solid effort so Kafka isn't responsible for such a leak, but my advice 
to any operator would be to access-control their logs the way they 
access-control their data. 

> KStreams RecordCollectorImpl leaks data to logs on error
> --------------------------------------------------------
>
>                 Key: KAFKA-7510
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7510
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Mr Kafka
>            Priority: Major
>              Labels: user-experience
>
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl leaks data 
> on error as it dumps the *value* / message payload to the logs.
> This is problematic as it may contain personally identifiable information 
> (pii) or other secret information to plain text log files which can then be 
> propagated to other log systems i.e Splunk.
> I suggest the *key*, and *value* fields be moved to debug level as it is 
> useful for some people while error level contains the *errorMessage, 
> timestamp, topic* and *stackTrace*.
> {code:java}
> private <K, V> void recordSendError(
>     final K key,
>     final V value,
>     final Long timestamp,
>     final String topic,
>     final Exception exception
> ) {
>     String errorLogMessage = LOG_MESSAGE;
>     String errorMessage = EXCEPTION_MESSAGE;
>     if (exception instanceof RetriableException) {
>         errorLogMessage += PARAMETER_HINT;
>         errorMessage += PARAMETER_HINT;
>     }
>     log.error(errorLogMessage, key, value, timestamp, topic, 
> exception.toString());
>     sendException = new StreamsException(
>         String.format(
>             errorMessage,
>             logPrefix,
>             "an error caught",
>             key,
>             value,
>             timestamp,
>             topic,
>             exception.toString()
>         ),
>         exception);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-7510) KStreams RecordCollectorImpl leaks data to logs on error

Reply via email to