[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687154#comment-15687154 ]
Sergio Peña commented on HIVE-15199: ------------------------------------ I moved the listFiles() to the beginning of the method to avoid calling it for each new rename. However, we still have the 2 GET calls on each rename (exists && rename), I added a validation to do this on S3 only, and leave only the rename call on HDFS. As you mentioned, when S3Guard is released, this would help us a lot on consistency, and we could remove the exists() call. I think the listFiles() is still beneficial so Hive can figure out the next filename to use when renaming the file. The code change is pretty easy, so I can create a Jira to remove it in the future. {noformat} + boolean isBlobStoragePath = BlobStorageUtils.isBlobStoragePath(conf, destDirPath); + while ((isBlobStoragePath && destFs.exists(destFilePath)) || !destFs.rename(sourcePath, destFilePath)) { + destFilePath = createCopyFilePath(destDirPath, name, type, ++counter); + } {noformat} Is the S3Guard going to be released on Hadoop 2.8? > INSERT INTO data on S3 is replacing the old rows with the new ones > ------------------------------------------------------------------ > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Sergio Peña > Assignee: Sergio Peña > Priority: Critical > Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, > HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, > HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch > > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)