[ https://issues.apache.org/jira/browse/APEXMALHAR-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188391#comment-16188391 ]
ASF GitHub Bot commented on APEXMALHAR-2274: -------------------------------------------- vrozov closed pull request #597: APEXMALHAR-2274 merge #490 URL: https://github.com/apache/apex-malhar/pull/597 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AbstractFileInputOperator gets killed when there are a large number of files. > ----------------------------------------------------------------------------- > > Key: APEXMALHAR-2274 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2274 > Project: Apache Apex Malhar > Issue Type: Bug > Reporter: Munagala V. Ramanath > Assignee: Matt Zhang > > When there are a large number of files in the monitored directory, the call > to DirectoryScanner.scan() can take a long time since it calls > FileSystem.listStatus() which returns the entire list. Meanwhile, the > AppMaster deems this operator hung and restarts it which again results in the > same problem. > It should use FileSystem.listStatusIterator() [in Hadoop 2.7.X] or > FileSystem.listFiles() [in 2.6.X] or other similar calls that return > a remote iterator to limit the number files processed in a single call. -- This message was sent by Atlassian JIRA (v6.4.14#64029)