[ https://issues.apache.org/jira/browse/IMPALA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar resolved IMPALA-9293. ---------------------------------- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala Doc: Revise explanation of HDFS trashcan usage on S3 > ----------------------------------------------------------- > > Key: IMPALA-9293 > URL: https://issues.apache.org/jira/browse/IMPALA-9293 > Project: IMPALA > Issue Type: Task > Components: Docs > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > Fix For: Impala 3.4.0 > > > The Impala docs state: > {quote} > By default, when you drop an internal (managed) table, the data files are > moved to the HDFS trashcan. This operation is expensive for tables that > reside on the Amazon S3 filesystem. Therefore, for S3 tables, prefer to use > DROP TABLE table_name PURGE rather than the default DROP TABLE statement. The > PURGE clause makes Impala delete the data files immediately, skipping the > HDFS trashcan. > {quote} > and > {quote} > The default DROP TABLE/PARTITION is slow because Impala copies the files to > the HDFS trash folder, and Impala waits until all the data is moved. DROP > TABLE/PARTITION .. PURGE is a fast delete operation, and the Impala statement > finishes quickly even though the change might not have propagated fully > throughout S3. > {quote} > The confusing part is "Impala copies the files to the HDFS trash folder". > Users might think that when a managed Impala table on S3 is dropped, Impala > actually copies the data from S3 to a trashcan folder *stored on HDFS*. This > isn't true. The term "HDFS trashcan" is used to refer to a feature of HDFS > where all deleted data is moved to a trash folder rather than being deleted > immediately. See > https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#File+Deletes+and+Undeletes > for details. > What actually happens is that there is a trashcan folder on S3 itself, and > when a S3 managed table is dropped, the data is copied from from the managed > table folder to the trashcan folder *stored on S3*. -- This message was sent by Atlassian Jira (v8.3.4#803005)