Dave Beech created CRUNCH-132:
---------------------------------

             Summary: Repeated runs result in duplicated output data
                 Key: CRUNCH-132
                 URL: https://issues.apache.org/jira/browse/CRUNCH-132
             Project: Crunch
          Issue Type: Bug
    Affects Versions: 0.4.0
            Reporter: Dave Beech


Usually when you run a mapreduce job and the output directory already exists, 
the job fails (won't start). A Crunch job does run, but results in the output 
data being duplicated in the output directory with numbered files that follow 
on from the previous run. 

Example
Run 1, single reducer /output -> /output/part-r-00000
Run 2, single reducer /output -> /output/part-r-00000, /output/part-r-00001

I didn't realise I'd run my job twice, so when I looked in the directory it 
seemed that there had been 2 reducers and somehow the output had been generated 
twice, which was confusing. 

I realise this may be by design, but it feels wrong to me. I'd prefer if the 
behaviour of a standard mapreduce job was preserved.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to