[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15679190#comment-15679190
 ] 

Steve Loughran commented on HIVE-15199:
---------------------------------------

if you do listStatus(path, recursive=true) you don't get back a filestatus 
array, you get an interator back; on s3a branch 2.8+ this goes through the 
results of the list, triggering new listing requests on demand: 
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java#L171

To make effective use of this feature, you do have to list through the results, 
otherwise it won't do the listing operation...you may as well build the set up 
from that iteration

{code}

RemoteIterator<FileStatus> it = fs.listStatus(path, true)
while (it.hasNext()) {
  FileStatus s = it.next()
 if (!fileSet.contains(s)) {
   fileSet.add(s);
 }
}

{code}

On other filesystems listStatus does a recursive treewalk, no more/less 
expensive than doing it in your own code


> INSERT INTO data on S3 is replacing the old rows with the new ones
> ------------------------------------------------------------------
>
>                 Key: HIVE-15199
>                 URL: https://issues.apache.org/jira/browse/HIVE-15199
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>            Priority: Critical
>         Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1       name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2       name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to