Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18801


Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on 
them
......................................................................

IMPALA-11464: Skip listing staging dirs to avoid failures on them

Hive or other systems will generate staging/tmp dirs under the
table/partition folders while loading/inserting data. They are removed
when the operation is done. File metadata loading in catalogd could fail
if it's listing files of such dirs. This is found on HDFS where file
listing is done in batches. Each batch contains a partial list of 1000
items (configured by "dfs.ls.limit"). If the dir is removed, the next
listing, e.g. the next hasNext() call on the RemoteIterator, will fail
with FileNotFoundException. Such error on staging/tmp dirs should not
fail the metadata loading. However, if it happens on a partition dir,
the metadata loading should fail to avoid stale metadata.

This patch adds a check before listing the dir. If it's a staging/tmp
dir, catalogd will just ignore it. Also adds a debug action,
catalogd_pause_after_hdfs_remote_iterator_creation, to inject
sleeps after the first partial listing (happens in creating the
RemoteIterator). So we can reproduce the FileNotFoundException stably.

Tests:
 - Add test on removing a large staging dir (contains 1024 files) during
   REFRESH. Metadata loading fails consistently before this fix.
 - Add test on removing a large partition dir (contains 1024 files)
   during REFRESH. Verify metadata loading fails as expected.

Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M tests/metadata/test_recursive_listing.py
M tests/util/filesystem_base.py
M tests/util/hdfs_util.py
5 files changed, 194 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/18801/1
--
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>

Reply via email to