HDFS supports quotas, so you can control it that way, but obviously
this will affect all your HDFS users not just Pig scripts.
Alan.
On Aug 25, 2010, at 3:52 PM, jiang licht wrote:
Is there a way to tell Pig to restrict the size of map/reduce output
that can be saved to dfs? E.g. if a job creates over-limit data, it
won't be allowed to save the result to the dfs and the job will fail.
This will help to prevent unexpected huge data from being saved to
dfs by mapper/reducer created by a Pig script. This means we have an
estimate of how much data will be generated by a Pig script in
advance. Then, with this quota, if over-sized result is generated,
it won't be saved and the job fails.
Thanks,
Michael