Purshotam Shah created OOZIE-3640: ------------------------------------- Summary: Add support for recursive directories fs:dirSize Key: OOZIE-3640 URL: https://issues.apache.org/jira/browse/OOZIE-3640 Project: Oozie Issue Type: Improvement Reporter: Purshotam Shah
There are three ways to do that. # Use getContentSummary: This can be dangerous for name nodes. getContentSummary on a large HDFS dirs can take minutes. During that time, an oozie thread will be blocked, waiting for the RPC response. A 5 min workflow doing a content summary on a directory that takes > 5 min may lead to oozie thread exhaustion. or if oozie times out and retries, the NN has no support for aborting a call being processed, so now there will be multiple concurrent content summaries which may also exhaust the NN's handler threads. # Use getQuotaUsage: If quote is not enabled, it will fall back on getContentSummary. So this is as bad as getContentSummary. # Use recursive listing to compute size. Enforce system-level dir size(or file count) and recursive level if it reached the max-level or max-size throw exception. Considering all three options. Option 3 is the best option. A system admin can configure max-level based on system load and user use-cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)