Zhenxiao Luo created HIVE-6341:
----------------------------------

             Summary: ORCFileWriter does not overwrite preempted task's final 
file
                 Key: HIVE-6341
                 URL: https://issues.apache.org/jira/browse/HIVE-6341
             Project: Hive
          Issue Type: Bug
          Components: File Formats
            Reporter: Zhenxiao Luo



Insert overwrite partition get "File already exists" exception, the problem is,

insert overwrite attemp0 creates and writes the final file, then it get 
preempted. (attempt0 did not cleanup its output final file), then attempt1 
comes, attempt1 tries to write a final file, and found that file already 
existed, so throw the exception.

Here is the stacktrace:
Caused by: java.io.IOException: File already 
exists:s3n://netflix-dataoven-prod-users/hive/warehouse/miket.db/un_gps_group_info_v2/dateint=20140120/device_category=PS3/c49b3223-34f2-420d-a1d9-fa8164dcb7bb_000449
    at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.create_aroundBody0(NativeS3FileSystem.java:647)
    at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$AjcClosure1.run(NativeS3FileSystem.java:1)
    at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
    at 
com.netflix.bdp.s3mper.listing.ConsistentListingAspect.metastoreUpdate(ConsistentListingAspect.java:197)
    at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:646)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:557)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:538)
    at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.ensureWriter(WriterImpl.java:1320)
    at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1337)
    at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:173)
    at 
org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:162)
    at 
org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:151)
    at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1475)
    at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:88)
    at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:688)
    ... 12 more

It is ORCFileWriter does not overwrite that causes the problem:

1. org.apache.hadoop.hive.ql.io.orc.WriterImpl.ensureWriter
      rawWriter = fs.create(path, false, HDFS_BUFFER_SIZE,

2. 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter
    final OutputStream outStream = Utilities.createCompressedStream(jc, fs
        .create(outPath), isCompressed);

3. org.apache.hadoop.io.SequenceFile.BlockCompressWriter.BlockCompressWriter
                 fs.create(name, true, bufferSize, replication, blockSize, 
progress),

In the latter 2 cases, fs.create is called with the parameter overwrite == true 
(overwrite defaults to true).

We should set overwrite = true in ORCWriterImpl to fix this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to