[
https://issues.apache.org/jira/browse/FLUME-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424747#comment-15424747
]
Attila Simon edited comment on FLUME-2954 at 8/17/16 3:48 PM:
--------------------------------------------------------------
Changes made in the spirit of the discussed:
{noformat}
--------------------------------------------------------------------------------
flume-ng-channel ---
flume-jdbc-channel ---
JdbcChannelProviderImpl#98 <- fail properties <REMOVED>
JdbcChannelProviderImpl#261 #431 <- fail properties: jdbc url
might include password <KEPT><FOLLOWUP IN JIRA>
flume-kafka-channel ---
KafkaChannel#230 #253 <- fail properties <REMOVED>
--------------------------------------------------------------------------------
flume-ng-configuration ---
FlumeConfiguration#315 #372 <- fail properties <DRIVE BY
PROPERTY>
--------------------------------------------------------------------------------
flume-ng-core ---
SyslogAvroEventSerializer#150 <- fail data: SyslogEvent.message
gets logged <DRIVE BY PROPERTY>
GangliaServer#224 #245 <- safe data: only flume
component metrics data <KEPT>
LoggerSink#95 <- fail data: on purpose <KEPT>
AvroSource#347 <- fail data: log whole message
<DRIVE BY PROPERTY>
MultiportSyslogTCPSource#360 <- fail data: log whole message
<DRIVE BY PROPERTY>
BLOBHandler#70 <- fail data: logs http request
headers <DRIVE BY PROPERTY>
-------------------------------------------------------------------q-------------
flume-ng-embedded-agent ---
EmbeddedAgent#155 <- fail properties: printing all
config <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sinks ---
flume-hive-sink ---
HiveEndPoint has an URI field. <- fail properties
<KEPT><FOLLOWUP IN JIRA>
It may contain private data
(URI string may contain password) as it is
excessively logged within this module.
Appears in HiveSink#298 #342 #400 #403 #428,
HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...)
HiveEndPoint is also attached to exception logs as well
flume-ng-hbase-sink ---
AsyncHBaseSink#641 <- safe data: error details gets
logged in case of failure <KEPT>
flume-ng-kafka-sink ---
KafkaSink#179 <- fail data: log whole message
<REMOVED>
KafkaSink#304 <- fail properties <REMOVED>
flume-ng-morphline-solr-sink ---
BlobHandler#98 #113 <- fail data: log http request
headers <DRIVE BY PROPERTY>
MorphlineSink#139 <- fail data: logs event <DRIVE
BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sources ---
flume-kafka-source ---
KafkaSource#247 <- fail data: log whole <DRIVE BY
PROPERTY>
flume-twitter-source ---
TwitterSource#110-113 <- fail properties <REMOVED>
--------------------------------------------------------------------------------
{noformat}
was (Author: sati):
Changes made in the spirit of the discussed:
--------------------------------------------------------------------------------
flume-ng-channel ---
flume-jdbc-channel ---
JdbcChannelProviderImpl#98 <- fail properties <REMOVED>
JdbcChannelProviderImpl#261 #431 <- fail properties: jdbc url
might include password <KEPT><FOLLOWUP IN JIRA>
flume-kafka-channel ---
KafkaChannel#230 #253 <- fail properties <REMOVED>
--------------------------------------------------------------------------------
flume-ng-configuration ---
FlumeConfiguration#315 #372 <- fail properties <DRIVE BY
PROPERTY>
--------------------------------------------------------------------------------
flume-ng-core ---
SyslogAvroEventSerializer#150 <- fail data: SyslogEvent.message
gets logged <DRIVE BY PROPERTY>
GangliaServer#224 #245 <- safe data: only flume
component metrics data <KEPT>
LoggerSink#95 <- fail data: on purpose <KEPT>
AvroSource#347 <- fail data: log whole message
<DRIVE BY PROPERTY>
MultiportSyslogTCPSource#360 <- fail data: log whole message
<DRIVE BY PROPERTY>
BLOBHandler#70 <- fail data: logs http request
headers <DRIVE BY PROPERTY>
-------------------------------------------------------------------q-------------
flume-ng-embedded-agent ---
EmbeddedAgent#155 <- fail properties: printing all
config <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sinks ---
flume-hive-sink ---
HiveEndPoint has an URI field. <- fail properties
<KEPT><FOLLOWUP IN JIRA>
It may contain private data
(URI string may contain password) as it is
excessively logged within this module.
Appears in HiveSink#298 #342 #400 #403 #428,
HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...)
HiveEndPoint is also attached to exception logs as well
flume-ng-hbase-sink ---
AsyncHBaseSink#641 <- safe data: error details gets
logged in case of failure <KEPT>
flume-ng-kafka-sink ---
KafkaSink#179 <- fail data: log whole message
<REMOVED>
KafkaSink#304 <- fail properties <REMOVED>
flume-ng-morphline-solr-sink ---
BlobHandler#98 #113 <- fail data: log http request
headers <DRIVE BY PROPERTY>
MorphlineSink#139 <- fail data: logs event <DRIVE
BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sources ---
flume-kafka-source ---
KafkaSource#247 <- fail data: log whole <DRIVE BY
PROPERTY>
flume-twitter-source ---
TwitterSource#110-113 <- fail properties <REMOVED>
--------------------------------------------------------------------------------
> make raw data appearing in log messages explicit
> ------------------------------------------------
>
> Key: FLUME-2954
> URL: https://issues.apache.org/jira/browse/FLUME-2954
> Project: Flume
> Issue Type: Improvement
> Components: Channel, Configuration, Sinks+Sources
> Affects Versions: v1.6.0
> Reporter: Attila Simon
> Assignee: Attila Simon
> Priority: Critical
> Fix For: v1.7.0
>
> Attachments: FLUME-2954-1.patch, FLUME-2954-2.patch, FLUME-2954.patch
>
>
> Flume has built in functionality to log out data flowing through
> mainly for debugging purposes. This functionality appears in several
> places of the codebase. I think such functionality rise security
> concerns in production environments where sensitive information might
> be ingested so it is crucial that enabling such functionality has to
> be as explicit as possible (avoid implicit side effect setup).
> Eg: setting the level of root logger to debug/trace cause that every
> other logger will start logging at debug/trace including the ones
> logging raw data.
> In this jira I would like to provide a patch capturing how I imagined solving
> this issue. It can be refined iteratively or used as a basis for a broader
> discussion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)