Github user bbende commented on a diff in the pull request: https://github.com/apache/nifi/pull/843#discussion_r74689871 --- Diff: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java --- @@ -286,8 +310,10 @@ HdfsResources resetHDFSResources(String configResources, String dir, ProcessCont } } + final Path workingDir = fs.getWorkingDirectory(); getLogger().info("Initialized a new HDFS File System with working dir: {} default block size: {} default replication: {} config: {}", - new Object[] { fs.getWorkingDirectory(), fs.getDefaultBlockSize(new Path(dir)), fs.getDefaultReplication(new Path(dir)), config.toString() }); + new Object[]{workingDir, fs.getDefaultBlockSize(workingDir), fs.getDefaultReplication(workingDir), config.toString()}); --- End diff -- The main reason I wanted to go with the working directory is because its not always possible to know what the value of "Directory" is going to be during an OnScheduled method. The main example being PutHDFS will often have "Directory" set to an expression like ${hadoop.dir} that was set as a flow file attribute by an upstream processor, every flow file could actually be a different directory. ListHDFS and GetHDFS aren't as much of a problem because they are source processors, but since this code is in the abstract base class, it has to account for all of them. So overall I figured we can use the working directory, or if we believe that could lead to a problem then I would say we just don't need to log the block size and replication which are causing us to need a Path instance.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---