[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647455#comment-15647455
 ] 

Tushar Gosavi commented on APEXMALHAR-2274:
-------------------------------------------

we  could derive a common interface which could be used in both the operators 
for scanning files. Both are conceptually doing the same thing. we could have 
different implementaion of the interface, where one could just get the paths 
and other can get path as well as status.

> AbstractFileInputOperator gets killed when there are a large number of files.
> -----------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2274
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2274
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: Munagala V. Ramanath
>            Assignee: Matt Zhang
>
> When there are a large number of files in the monitored directory, the call 
> to DirectoryScanner.scan() can take a long time since it calls 
> FileSystem.listStatus() which returns the entire list. Meanwhile, the 
> AppMaster deems this operator hung and restarts it which again results in the 
> same problem.
> It should use FileSystem.listStatusIterator() [in Hadoop 2.7.X] or 
> FileSystem.listFiles() [in 2.6.X] or other similar calls that return
> a remote iterator to limit the number files processed in a single call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to