Zhenxiao Luo created HIVE-6341: ---------------------------------- Summary: ORCFileWriter does not overwrite preempted task's final file Key: HIVE-6341 URL: https://issues.apache.org/jira/browse/HIVE-6341 Project: Hive Issue Type: Bug Components: File Formats Reporter: Zhenxiao Luo
Insert overwrite partition get "File already exists" exception, the problem is, insert overwrite attemp0 creates and writes the final file, then it get preempted. (attempt0 did not cleanup its output final file), then attempt1 comes, attempt1 tries to write a final file, and found that file already existed, so throw the exception. Here is the stacktrace: Caused by: java.io.IOException: File already exists:s3n://netflix-dataoven-prod-users/hive/warehouse/miket.db/un_gps_group_info_v2/dateint=20140120/device_category=PS3/c49b3223-34f2-420d-a1d9-fa8164dcb7bb_000449 at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create_aroundBody0(NativeS3FileSystem.java:647) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$AjcClosure1.run(NativeS3FileSystem.java:1) at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) at com.netflix.bdp.s3mper.listing.ConsistentListingAspect.metastoreUpdate(ConsistentListingAspect.java:197) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:646) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:557) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:538) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.ensureWriter(WriterImpl.java:1320) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1337) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:173) at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:162) at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:151) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1475) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:88) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:688) ... 12 more It is ORCFileWriter does not overwrite that causes the problem: 1. org.apache.hadoop.hive.ql.io.orc.WriterImpl.ensureWriter rawWriter = fs.create(path, false, HDFS_BUFFER_SIZE, 2. org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter final OutputStream outStream = Utilities.createCompressedStream(jc, fs .create(outPath), isCompressed); 3. org.apache.hadoop.io.SequenceFile.BlockCompressWriter.BlockCompressWriter fs.create(name, true, bufferSize, replication, blockSize, progress), In the latter 2 cases, fs.create is called with the parameter overwrite == true (overwrite defaults to true). We should set overwrite = true in ORCWriterImpl to fix this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)