Zhenxiao Luo created HIVE-6341:
----------------------------------
Summary: ORCFileWriter does not overwrite preempted task's final
file
Key: HIVE-6341
URL: https://issues.apache.org/jira/browse/HIVE-6341
Project: Hive
Issue Type: Bug
Components: File Formats
Reporter: Zhenxiao Luo
Insert overwrite partition get "File already exists" exception, the problem is,
insert overwrite attemp0 creates and writes the final file, then it get
preempted. (attempt0 did not cleanup its output final file), then attempt1
comes, attempt1 tries to write a final file, and found that file already
existed, so throw the exception.
Here is the stacktrace:
Caused by: java.io.IOException: File already
exists:s3n://netflix-dataoven-prod-users/hive/warehouse/miket.db/un_gps_group_info_v2/dateint=20140120/device_category=PS3/c49b3223-34f2-420d-a1d9-fa8164dcb7bb_000449
at
org.apache.hadoop.fs.s3native.NativeS3FileSystem.create_aroundBody0(NativeS3FileSystem.java:647)
at
org.apache.hadoop.fs.s3native.NativeS3FileSystem$AjcClosure1.run(NativeS3FileSystem.java:1)
at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
at
com.netflix.bdp.s3mper.listing.ConsistentListingAspect.metastoreUpdate(ConsistentListingAspect.java:197)
at
org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:646)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:557)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:538)
at
org.apache.hadoop.hive.ql.io.orc.WriterImpl.ensureWriter(WriterImpl.java:1320)
at
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1337)
at
org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:173)
at
org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:162)
at
org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:151)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1475)
at
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:88)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:688)
... 12 more
It is ORCFileWriter does not overwrite that causes the problem:
1. org.apache.hadoop.hive.ql.io.orc.WriterImpl.ensureWriter
rawWriter = fs.create(path, false, HDFS_BUFFER_SIZE,
2.
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter
final OutputStream outStream = Utilities.createCompressedStream(jc, fs
.create(outPath), isCompressed);
3. org.apache.hadoop.io.SequenceFile.BlockCompressWriter.BlockCompressWriter
fs.create(name, true, bufferSize, replication, blockSize,
progress),
In the latter 2 cases, fs.create is called with the parameter overwrite == true
(overwrite defaults to true).
We should set overwrite = true in ORCWriterImpl to fix this.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)