[jira] [Commented] (KUDU-1812) Redact user data that gets logged

2016-12-20 Thread Adar Dembo (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765841#comment-15765841
 ] 

Adar Dembo commented on KUDU-1812:
--

Given the nature of this problem, it's important for there to be a clear 
description of Kudu's redaction policy. For users, so they know what to expect, 
and for developers, who are meant to adhere to it. I'll try to do that below, 
based on a discussion I had with JD and Dan today.

As of today, Kudu takes a somewhat narrow definition of "sensitive user data": 
row data values. In the future, this definition could be broadened to include 
things like table and column names, but for the time being we're only 
considering row data values and data derived from row data values.

Row data values are obviously passed between server and client in client-facing 
operations (such as writes and scans), but are also passed between servers in 
server-to-server operations (such as log replication); it's important that we 
don't leak them in either case.

Besides direct row data (such as a batch of INSERT, UPDATE, UPSERT, or DELETE 
operations), we must also consider data that may imply the existence of row 
data, such as scan predicates.

Server-side, Kudu will adhere to the following policy:
# There will be a new gflag to control whether sensitive user data will be 
redacted. This flag will exist for both masters and tservers, and will default 
to 'true'.
# When a Kudu server logs a message containing sensitive user data, the gflag's 
value must be consulted. If true, the sensitive data must be replaced with a 
"" string. The rest of the message can remain the same.
# The same applies to errors returned by Kudu servers, should they embed 
sensitive user data.

Client-side, Kudu will adhere to the following policy:
# Sensitive user data may be returned in toString() (Java) or ToString() (C++) 
calls.
# All sensitive user data must be explicitly stripped from all LOG(), VLOG(), 
and slf4j log statements.
# All sensitive user data must be explicitly stripped from all thrown 
exceptions (Java) or Status messages (C++).

Taken together, these policies should ensure that the following never leak 
sensitive user data:
* A Kudu client implementation can log all errors returned by a server 
(assuming the gflag's value was 'true').
* An application can log all errors returned by a Kudu client.
* A log collection service can collect all Kudu server logs.

> Redact user data that gets logged
> -
>
> Key: KUDU-1812
> URL: https://issues.apache.org/jira/browse/KUDU-1812
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Priority: Critical
>
> There are many instances in the code base where we log user data and there is 
> a class of users that do not want this behavior. As an example, we might be 
> debugging an issue on the mailing list and the user has to scrub the logs 
> they share by hand because they don't want it to leak.
> On the server-side, we should replace all those instances with some string 
> like "redacted" and add a process flag to enable the logging of user data.
> On the client-side, it gets a bit more tricky. We can't use such flags so we 
> need to strike a balance between removing unnecessary logging of user 
> information and still keep the software usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1812) Redact user data that gets logged

2016-12-19 Thread Dan Burkert (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763214#comment-15763214
 ] 

Dan Burkert commented on KUDU-1812:
---

Also working on predicates.  Another avenue to search is calls to DebugString 
and ShortDebugString on protobuf messages.

> Redact user data that gets logged
> -
>
> Key: KUDU-1812
> URL: https://issues.apache.org/jira/browse/KUDU-1812
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Priority: Critical
>
> There are many instances in the code base where we log user data and there is 
> a class of users that do not want this behavior. As an example, we might be 
> debugging an issue on the mailing list and the user has to scrub the logs 
> they share by hand because they don't want it to leak.
> On the server-side, we should replace all those instances with some string 
> like "redacted" and add a process flag to enable the logging of user data.
> On the client-side, it gets a bit more tricky. We can't use such flags so we 
> need to strike a balance between removing unnecessary logging of user 
> information and still keep the software usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1812) Redact user data that gets logged

2016-12-19 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762920#comment-15762920
 ] 

Jean-Daniel Cryans commented on KUDU-1812:
--

Java client: https://gerrit.cloudera.org/#/c/5549/

> Redact user data that gets logged
> -
>
> Key: KUDU-1812
> URL: https://issues.apache.org/jira/browse/KUDU-1812
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Priority: Critical
>
> There are many instances in the code base where we log user data and there is 
> a class of users that do not want this behavior. As an example, we might be 
> debugging an issue on the mailing list and the user has to scrub the logs 
> they share by hand because they don't want it to leak.
> On the server-side, we should replace all those instances with some string 
> like "redacted" and add a process flag to enable the logging of user data.
> On the client-side, it gets a bit more tricky. We can't use such flags so we 
> need to strike a balance between removing unnecessary logging of user 
> information and still keep the software usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1812) Redact user data that gets logged

2016-12-19 Thread Dan Burkert (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762741#comment-15762741
 ] 

Dan Burkert commented on KUDU-1812:
---

I've identified a few major places where we log row data, or data derived from 
row data:

* partition keys of individual rows
* partial rows
* encoded buffers / protobuf messages
* predicates

I'm working on adding a server side {{log_row_contents}} flag right now and 
auditing/removing all instances where we might be logging row partition keys in 
the C++ codebase.

> Redact user data that gets logged
> -
>
> Key: KUDU-1812
> URL: https://issues.apache.org/jira/browse/KUDU-1812
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Priority: Critical
>
> There are many instances in the code base where we log user data and there is 
> a class of users that do not want this behavior. As an example, we might be 
> debugging an issue on the mailing list and the user has to scrub the logs 
> they share by hand because they don't want it to leak.
> On the server-side, we should replace all those instances with some string 
> like "redacted" and add a process flag to enable the logging of user data.
> On the client-side, it gets a bit more tricky. We can't use such flags so we 
> need to strike a balance between removing unnecessary logging of user 
> information and still keep the software usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)