Purshotam Shah created OOZIE-3640:
-------------------------------------
Summary: Add support for recursive directories fs:dirSize
Key: OOZIE-3640
URL: https://issues.apache.org/jira/browse/OOZIE-3640
Project: Oozie
Issue Type: Improvement
Reporter: Purshotam Shah
There are three ways to do that.
# Use getContentSummary: This can be dangerous for name nodes.
getContentSummary on a large HDFS dirs can take minutes. During that time, an
oozie thread will be blocked, waiting for the RPC response. A 5 min workflow
doing a content summary on a directory that takes > 5 min may lead to oozie
thread exhaustion. or if oozie times out and retries, the NN has no support for
aborting a call being processed, so now there will be multiple concurrent
content summaries which may also exhaust the NN's handler threads.
# Use getQuotaUsage: If quote is not enabled, it will fall back on
getContentSummary. So this is as bad as getContentSummary.
# Use recursive listing to compute size. Enforce system-level dir size(or file
count) and recursive level if it reached the max-level or max-size throw
exception.
Considering all three options.
Option 3 is the best option. A system admin can configure max-level based on
system load and user use-cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)