[ 
https://issues.apache.org/jira/browse/NIFI-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418816#comment-15418816
 ] 

Yolanda M. Davis commented on NIFI-2553:
----------------------------------------

[~bbende] agreed on the use of working directory....interestingly enough here's 
the behind the scenes on that getDefaultBlockSize method call (looking in 
hadoop common jar):

{noformat}
    @Deprecated
    public long getDefaultBlockSize() {
        return this.getConf().getLong("fs.local.block.size", 33554432L);
    }

    public long getDefaultBlockSize(Path f) {
        return this.getDefaultBlockSize();
    }
{noformat}

Not quite sure why path is even used? Perhaps they are moving to variable block 
size on directories..I know it's supported on individual files. If that is on 
the more immediate radar then I think working directory would need be perhaps a 
back up measure (if given directory fails).

> HDFS processors throwing exception from OnSchedule when directory is an 
> invalid URI
> -----------------------------------------------------------------------------------
>
>                 Key: NIFI-2553
>                 URL: https://issues.apache.org/jira/browse/NIFI-2553
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 0.7.0
>            Reporter: Bryan Bende
>            Assignee: Bryan Bende
>            Priority: Minor
>             Fix For: 1.0.0
>
>
> If you enter a directory string that results in an invalid URI, the HDFS 
> processors will throw an unexpected exception from OnScheduled because of a 
> logging statement on in AbstractHadoopProcessor:
> {code}
> getLogger().info("Initialized a new HDFS File System with working dir: {} 
> default block size: {} default replication: {} config: {}",
>                     new Object[] { fs.getWorkingDirectory(), 
> fs.getDefaultBlockSize(new Path(dir)), fs.getDefaultReplication(new 
> Path(dir)), config.toString() });
> {code}
> An example input for the directory that can produce this problem:
> data_${literal('testing'):substring(0,4)%7D
> In addition to this, FetchHDFS, ListHDFS, GetHDFS, and PutHDFS all create new 
> Path instances in their onTrigger methods from the same directory, outside of 
> a try/catch which would result in throwing a ProcessException (if it got past 
> the logging issue above).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to