----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51182/#review146895 -----------------------------------------------------------
Thanks for the patch. Good stuff. I have mostly minor feedback. conf/flume-env.ps1.template (line 21) <https://reviews.apache.org/r/51182/#comment213724> Consider using this slightly different wording, here and in the flume-env.sh.template file: Let Flume write raw event data and configuration information to its log files for debugging purposes. Enabling these flags is not recommended in production, as it may result in logging sensitive user information or encryption secrets. flume-ng-channels/flume-jdbc-channel/src/main/java/org/apache/flume/channel/jdbc/impl/JdbcChannelProviderImpl.java (line 591) <https://reviews.apache.org/r/51182/#comment213723> how about: @param defaultValue default value, null if no default flume-ng-configuration/src/main/java/org/apache/flume/conf/FlumeConfiguration.java (line 316) <https://reviews.apache.org/r/51182/#comment213722> nit: s/Initial-configuration/Initial configuration/ flume-ng-configuration/src/main/java/org/apache/flume/conf/FlumeConfiguration.java (line 373) <https://reviews.apache.org/r/51182/#comment213683> I don't think this log line is useful by itself. This part should also go inside the LogRawDataUtil statement below. flume-ng-doc/sphinx/FlumeUserGuide.rst (line 240) <https://reviews.apache.org/r/51182/#comment213686> Consider the following wording instead: Logging the raw stream of data flowing through the ingest pipeline is not desired behaviour in many production environments because this may result in leaking sensitive data or security related configurations, such as secret keys, to Flume log files. By default, Flume will not log such information. On the other hand, if the data pipeline is broken, Flume will attempt to provide clues for debugging the problem. One way to debug problems with event pipelines is to set up an additional `Memory Channel` connected to a `Logger Sink`, which will output all event data to the Flume logs. In some situations, however, this approach is insufficient. flume-ng-doc/sphinx/FlumeUserGuide.rst (line 249) <https://reviews.apache.org/r/51182/#comment213695> Consider the following wording: In order to enable logging of event- and configuration-related data, some Java system properties must be set in addition to log4j properties. To enable configuration-related logging, set the Java system property ``-Dorg.apache.flume.log.printconfig=true``. This can either be passed on the command line or by setting this in the JAVA_OPTS variable in *flume-env.sh*. To enable data logging, set the Java system property ``-Dorg.apache.flume.log.rawdata=true`` in the same way described above. For most components, the log4j logging level must also be set to DEBUG or TRACE to make event-specific logging appear in the Flume logs. flume-ng-doc/sphinx/FlumeUserGuide.rst (line 256) <https://reviews.apache.org/r/51182/#comment213696> Consider the following wording: Here is an example of enabling both configuration logging and raw data logging while also setting the Log4j loglevel to DEBUG for console output: flume-ng-sdk/src/main/java/org/apache/flume/util/LogRawDataUtil.java (line 25) <https://reviews.apache.org/r/51182/#comment213710> Consider this wording: Utility class to help any Flume component determine whether logging potentially sensitive information is allowed or not. flume-ng-sdk/src/main/java/org/apache/flume/util/LogRawDataUtil.java (line 28) <https://reviews.apache.org/r/51182/#comment213704> Please add InterfaceAudience and stability annotations for Public / Evolving flume-ng-sdk/src/main/java/org/apache/flume/util/LogRawDataUtil.java (line 32) <https://reviews.apache.org/r/51182/#comment213703> Because this is a public API available to all plugin writers, please expose these as static methods instead of static constants. flume-ng-sdk/src/main/java/org/apache/flume/util/LogRawDataUtil.java (line 45) <https://reviews.apache.org/r/51182/#comment213706> Why put this in a static block? It might be useful to allow people to set these system properties via remote debugging if needed. Otherwise if they are using MemoryChannel and something is stuck there may be no way to debug it. flume-ng-sdk/src/main/java/org/apache/flume/util/LogRawDataUtil.java (line 51) <https://reviews.apache.org/r/51182/#comment213701> s/in log/in the log/ flume-ng-sdk/src/main/java/org/apache/flume/util/LogRawDataUtil.java (line 53) <https://reviews.apache.org/r/51182/#comment213699> nit: to the log file. flume-ng-sinks/flume-ng-kafka-sink/src/main/java/org/apache/flume/sink/kafka/KafkaSink.java (line 177) <https://reviews.apache.org/r/51182/#comment213711> here we are checking if trace is enabled but then logging at debug level. we should also log at trace flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java (line 30) <https://reviews.apache.org/r/51182/#comment213714> Please avoid rearranging imports here and elsewhere unless necessary, or unless they are out of alphabetical order within a single grouping. Often, peoples' IDEs disagree on the order of the imports (IntelliJ, Netbeans, and Eclipse each have their own favorite grouping order) so I've found rearranging these things to be very noisy and pointless. flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java (line 39) <https://reviews.apache.org/r/51182/#comment213715> Please avoid reformatting these comments unless you need to change them. Same with the changes elsewhere. flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java (line 47) <https://reviews.apache.org/r/51182/#comment213717> Personally I think it hurts readability wrapping this. Let's just keep the changes to the required minimum for the scope of this patch. flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java (line 114) <https://reviews.apache.org/r/51182/#comment213720> +1 to removing this trailing whitespace flume-ng-sources/flume-kafka-source/src/main/java/org/apache/flume/source/kafka/KafkaSource.java (line 251) <https://reviews.apache.org/r/51182/#comment213718> why trace here and debug above? flume-ng-sources/flume-twitter-source/src/main/java/org/apache/flume/source/twitter/TwitterSource.java <https://reviews.apache.org/r/51182/#comment213721> wrap this in a configuration logging check? - Mike Percy On Aug. 24, 2016, 4:58 a.m., Attila Simon wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51182/ > ----------------------------------------------------------- > > (Updated Aug. 24, 2016, 4:58 a.m.) > > > Review request for Flume. > > > Bugs: FLUME-2954 > https://issues.apache.org/jira/browse/FLUME-2954 > > > Repository: flume-git > > > Description > ------- > > -------------------------------------------------------------------------------- > flume-ng-channel --- > flume-jdbc-channel --- > JdbcChannelProviderImpl#98 <- fail properties <REMOVED> > JdbcChannelProviderImpl#261 #431 <- fail properties: jdbc url > might include password <KEPT><FOLLOWUP IN JIRA> > flume-kafka-channel --- > KafkaChannel#230 #253 <- fail properties <REMOVED> > -------------------------------------------------------------------------------- > flume-ng-configuration --- > FlumeConfiguration#315 #372 <- fail properties <DRIVE BY > PROPERTY> > -------------------------------------------------------------------------------- > flume-ng-core --- > SyslogAvroEventSerializer#150 <- fail data: > SyslogEvent.message gets logged <DRIVE BY PROPERTY> > GangliaServer#224 #245 <- safe data: only flume > component metrics data <KEPT> > LoggerSink#95 <- fail data: on purpose <KEPT> > AvroSource#347 <- fail data: log whole message > <DRIVE BY PROPERTY> > MultiportSyslogTCPSource#360 <- fail data: log whole message > <DRIVE BY PROPERTY> > BLOBHandler#70 <- fail data: logs http request > headers <DRIVE BY PROPERTY> > -------------------------------------------------------------------q------------- > flume-ng-embedded-agent --- > EmbeddedAgent#155 <- fail properties: printing > all config <DRIVE BY PROPERTY> > -------------------------------------------------------------------------------- > flume-ng-sinks --- > flume-hive-sink --- > HiveEndPoint has an URI field. <- fail properties > <KEPT><FOLLOWUP IN JIRA> > It may contain private data > (URI string may contain password) as it is > excessively logged within this module. > Appears in HiveSink#298 #342 #400 #403 #428, > HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...) > HiveEndPoint is also attached to exception logs as well > flume-ng-hbase-sink --- > AsyncHBaseSink#641 <- safe data: error details > gets logged in case of failure <KEPT> > flume-ng-kafka-sink --- > KafkaSink#179 <- fail data: log whole message > <REMOVED> > KafkaSink#304 <- fail properties <REMOVED> > flume-ng-morphline-solr-sink --- > BlobHandler#98 #113 <- fail data: log http request > headers <DRIVE BY PROPERTY> > MorphlineSink#139 <- fail data: logs event <DRIVE > BY PROPERTY> > -------------------------------------------------------------------------------- > flume-ng-sources --- > flume-kafka-source --- > KafkaSource#247 <- fail data: log whole <DRIVE > BY PROPERTY> > flume-twitter-source --- > TwitterSource#110-113 <- fail properties <REMOVED> > -------------------------------------------------------------------------------- > > > Diffs > ----- > > conf/flume-env.ps1.template 8bf535a > conf/flume-env.sh.template c8b660f > > flume-ng-channels/flume-jdbc-channel/src/main/java/org/apache/flume/channel/jdbc/impl/JdbcChannelProviderImpl.java > 845b794 > > flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java > 90e3288 > > flume-ng-configuration/src/main/java/org/apache/flume/conf/FlumeConfiguration.java > 9b3a434 > flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java 8b9b956 > > flume-ng-core/src/main/java/org/apache/flume/source/MultiportSyslogTCPSource.java > b9f2438 > flume-ng-core/src/main/java/org/apache/flume/source/http/BLOBHandler.java > e24d4c6 > > flume-ng-core/src/test/java/org/apache/flume/serialization/SyslogAvroEventSerializer.java > 05af3b1 > flume-ng-doc/sphinx/FlumeUserGuide.rst 5e677c6 > > flume-ng-embedded-agent/src/main/java/org/apache/flume/agent/embedded/EmbeddedAgent.java > ad3e138 > flume-ng-sdk/src/main/java/org/apache/flume/util/LogRawDataUtil.java > PRE-CREATION > > flume-ng-sinks/flume-ng-kafka-sink/src/main/java/org/apache/flume/sink/kafka/KafkaSink.java > 9453546 > > flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java > ca7614a > > flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/MorphlineSink.java > f7a73f3 > > flume-ng-sources/flume-kafka-source/src/main/java/org/apache/flume/source/kafka/KafkaSource.java > 90e4715 > > flume-ng-sources/flume-twitter-source/src/main/java/org/apache/flume/source/twitter/TwitterSource.java > f5c8328 > > Diff: https://reviews.apache.org/r/51182/diff/ > > > Testing > ------- > > compiles, site builds, all unit test passes, distribution target handles the > system properties as expected: > bin/flume-ng agent --conf conf --conf-file > ../../../../../flume-conf/flume-log.conf --name a1 > -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true > -Dorg.apache.flume.log.rawdata=true (with and without the extra properties) > > > Thanks, > > Attila Simon > >
