Dave Beech created CRUNCH-132:
---------------------------------
Summary: Repeated runs result in duplicated output data
Key: CRUNCH-132
URL: https://issues.apache.org/jira/browse/CRUNCH-132
Project: Crunch
Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Dave Beech
Usually when you run a mapreduce job and the output directory already exists,
the job fails (won't start). A Crunch job does run, but results in the output
data being duplicated in the output directory with numbered files that follow
on from the previous run.
Example
Run 1, single reducer /output -> /output/part-r-00000
Run 2, single reducer /output -> /output/part-r-00000, /output/part-r-00001
I didn't realise I'd run my job twice, so when I looked in the directory it
seemed that there had been 2 reducers and somehow the output had been generated
twice, which was confusing.
I realise this may be by design, but it feels wrong to me. I'd prefer if the
behaviour of a standard mapreduce job was preserved.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira