Purshotam Shah created OOZIE-3640:
-------------------------------------

             Summary: Add support for recursive directories fs:dirSize
                 Key: OOZIE-3640
                 URL: https://issues.apache.org/jira/browse/OOZIE-3640
             Project: Oozie
          Issue Type: Improvement
            Reporter: Purshotam Shah


There are three ways to do that.
 # Use getContentSummary: This can be dangerous for name nodes. 
getContentSummary on a large HDFS dirs can take minutes. During that time, an 
oozie thread will be blocked, waiting for the RPC response. A 5 min workflow 
doing a content summary on a directory that takes > 5 min may lead to oozie 
thread exhaustion. or if oozie times out and retries, the NN has no support for 
aborting a call being processed, so now there will be multiple concurrent 
content summaries which may also exhaust the NN's handler threads.
 # Use getQuotaUsage: If quote is not enabled, it will fall back on 
getContentSummary. So this is as bad as getContentSummary.
 # Use recursive listing to compute size. Enforce system-level dir size(or file 
count) and recursive level if it reached the max-level or max-size throw 
exception.

Considering all three options.
 Option 3 is the best option. A system admin can configure max-level based on 
system load and user use-cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to