[jira] [Created] (CRUNCH-132) Repeated runs result in duplicated output data

Dave Beech (JIRA) Thu, 13 Dec 2012 07:24:16 -0800

Dave Beech created CRUNCH-132:
---------------------------------

             Summary: Repeated runs result in duplicated output data
                 Key: CRUNCH-132
                 URL: https://issues.apache.org/jira/browse/CRUNCH-132
             Project: Crunch
          Issue Type: Bug
    Affects Versions: 0.4.0
            Reporter: Dave Beech



Usually when you run a mapreduce job and the output directory already exists, 
the job fails (won't start). A Crunch job does run, but results in the output 
data being duplicated in the output directory with numbered files that follow 
on from the previous run. 

Example
Run 1, single reducer /output -> /output/part-r-00000
Run 2, single reducer /output -> /output/part-r-00000, /output/part-r-00001

I didn't realise I'd run my job twice, so when I looked in the directory it 
seemed that there had been 2 reducers and somehow the output had been generated 
twice, which was confusing. 

I realise this may be by design, but it feels wrong to me. I'd prefer if the 
behaviour of a standard mapreduce job was preserved.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CRUNCH-132) Repeated runs result in duplicated output data

Reply via email to