[ 
https://issues.apache.org/jira/browse/NIFI-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sivaprasanna Sethuraman resolved NIFI-2705.
-------------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.1.0

Fixed in 1.1.0 release. See the related issue NIFI-2831

> ListHDFS Cannot Be Re-run
> -------------------------
>
>                 Key: NIFI-2705
>                 URL: https://issues.apache.org/jira/browse/NIFI-2705
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework, Documentation & Website
>    Affects Versions: 1.0.0
>            Reporter: Alan Jackoway
>            Priority: Major
>             Fix For: 1.1.0
>
>
> I have a use case where every day I want to go through a directory in HDFS 
> and do something to the files more than a month old.
> I was trying to do this with a flow like ListHDFS -> RouteOnAttribute 
> (hdfs.lastModified) -> FetchHDFS -> Processing.
> However, after I ran it once, old files were not pulled any more. I turned on 
> debug logging and got this:
> {noformat}
> 2016-08-30 06:15:17,473 DEBUG [Timer-Driven Process Thread-9] 
> o.apache.nifi.processors.hadoop.ListHDFS 
> ListHDFS[id=d80a1ceb-0156-1000-595d-978dcf53ecb6] Found a total of 3 files in 
> HDFS
> 2016-08-30 06:15:17,473 DEBUG [Timer-Driven Process Thread-9] 
> o.apache.nifi.processors.hadoop.ListHDFS 
> ListHDFS[id=d80a1ceb-0156-1000-595d-978dcf53ecb6] Of the 3 files found in 
> HDFS, 0 are listable
> 2016-08-30 06:15:17,473 DEBUG [Timer-Driven Process Thread-9] 
> o.apache.nifi.processors.hadoop.ListHDFS 
> ListHDFS[id=d80a1ceb-0156-1000-595d-978dcf53ecb6] There is no data to list. 
> Yielding.
> {noformat}
> It turns out that ListHDFS maintains state called {{latestTimestampListed}} 
> that prevents it from re-listing files unless you change the directory being 
> listed. At a minimum, that should be mentioned in the docs on ListHDFS. 
> Better would be to make it configurable more like GetHDFS.
> In my case I think I can change to using GetHDFS without causing trouble, but 
> the behavior of ListHDFS was surprising to me, and as far as I can tell is 
> not documented anywhere.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to