[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

JIRA Tue, 22 Nov 2016 08:19:28 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687154#comment-15687154
 ]


Sergio Peña commented on HIVE-15199:
------------------------------------

I moved the listFiles() to the beginning of the method to avoid calling it for 
each new rename. However, we still have the 2 GET calls on each rename (exists 
&& rename), I added a validation to do this on S3 only, and leave only the 
rename call on HDFS.

As you mentioned, when S3Guard is released, this would help us a lot on 
consistency, and we could remove the exists() call. I think the listFiles() is 
still beneficial so Hive can figure out the next filename to use when renaming 
the file.

The code change is pretty easy, so I can create a Jira to remove it in the 
future.
{noformat}
+      boolean isBlobStoragePath = BlobStorageUtils.isBlobStoragePath(conf, 
destDirPath);
+      while ((isBlobStoragePath && destFs.exists(destFilePath)) || 
!destFs.rename(sourcePath, destFilePath)) {
+        destFilePath = createCopyFilePath(destDirPath, name, type, ++counter);
+      }
{noformat}

Is the S3Guard going to be released on Hadoop 2.8?

> INSERT INTO data on S3 is replacing the old rows with the new ones
> ------------------------------------------------------------------
>
>                 Key: HIVE-15199
>                 URL: https://issues.apache.org/jira/browse/HIVE-15199
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>            Priority: Critical
>         Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1       name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2       name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

Reply via email to