[ 
https://issues.apache.org/jira/browse/HIVE-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294572#comment-15294572
 ] 

Sailesh Mukil commented on HIVE-13778:
--------------------------------------

[~rajesh.balamohan]
I discovered a few more details as I was trying to reproduce this issue. I 
didn't realize that this is how it happened before. To reproduce, do the 
following:

 - In Hive, "create table purge_test_s3 (x int) location 
's3a://[bucket]/purge_test_s3';"
 - Use the AWS CLI or the AWS Web interface to copy files to the above 
mentioned location.
 - In Hive, "drop table purge_test_s3 purge;"

The Metastore logs say:
2016-05-20 17:01:41,259 INFO  hive.metastore.hivemetastoressimpl: 
[pool-4-thread-103]: Not moving s3a://[bucket]/purge_test_s3 to trash
2016-05-20 17:01:41,364 INFO  hive.metastore.hivemetastoressimpl: 
[pool-4-thread-103]: Deleted the diretory s3a://[bucket]/purge_test_s3

However, the files are still there. The weird part is that the Hadoop S3A 
connector reads the files correctly but is not able to delete them.

If instead of the AWS CLI or the AWS Web interface, we use the hadoop CLI to 
copy the files, "drop table ... purge" works just fine. If we insert the files 
using Hive, it works fine as well.

This might be an issue of the HDFS Namenode not getting updated and might be 
more a problem for the HDFS folks.

> DROP TABLE PURGE on S3A table with too many files does not delete the files
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-13778
>                 URL: https://issues.apache.org/jira/browse/HIVE-13778
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sailesh Mukil
>            Priority: Critical
>              Labels: metastore, s3
>
> I've noticed that when we do a DROP TABLE tablename PURGE on a table on S3A 
> that has many files, the files never get deleted. However, the Hive metastore 
> logs do say that the path was deleted:
> "Not moving [path] to trash"
> "Deleted the diretory [path]"
> I initially thought that this was due to the eventually consistent nature of 
> S3 for deletes, however, a week later, the files still exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to