[
https://issues.apache.org/jira/browse/IMPALA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sahil Takiar resolved IMPALA-9293.
----------------------------------
Fix Version/s: Impala 3.4.0
Resolution: Fixed
> Impala Doc: Revise explanation of HDFS trashcan usage on S3
> -----------------------------------------------------------
>
> Key: IMPALA-9293
> URL: https://issues.apache.org/jira/browse/IMPALA-9293
> Project: IMPALA
> Issue Type: Task
> Components: Docs
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Major
> Fix For: Impala 3.4.0
>
>
> The Impala docs state:
> {quote}
> By default, when you drop an internal (managed) table, the data files are
> moved to the HDFS trashcan. This operation is expensive for tables that
> reside on the Amazon S3 filesystem. Therefore, for S3 tables, prefer to use
> DROP TABLE table_name PURGE rather than the default DROP TABLE statement. The
> PURGE clause makes Impala delete the data files immediately, skipping the
> HDFS trashcan.
> {quote}
> and
> {quote}
> The default DROP TABLE/PARTITION is slow because Impala copies the files to
> the HDFS trash folder, and Impala waits until all the data is moved. DROP
> TABLE/PARTITION .. PURGE is a fast delete operation, and the Impala statement
> finishes quickly even though the change might not have propagated fully
> throughout S3.
> {quote}
> The confusing part is "Impala copies the files to the HDFS trash folder".
> Users might think that when a managed Impala table on S3 is dropped, Impala
> actually copies the data from S3 to a trashcan folder *stored on HDFS*. This
> isn't true. The term "HDFS trashcan" is used to refer to a feature of HDFS
> where all deleted data is moved to a trash folder rather than being deleted
> immediately. See
> https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#File+Deletes+and+Undeletes
> for details.
> What actually happens is that there is a trashcan folder on S3 itself, and
> when a S3 managed table is dropped, the data is copied from from the managed
> table folder to the trashcan folder *stored on S3*.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]