[ 
https://issues.apache.org/jira/browse/IMPALA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876194#comment-17876194
 ] 

ASF subversion and git services commented on IMPALA-13303:
----------------------------------------------------------

Commit d91d99c08e469b4cd40d81ce1aeb8c2bec596880 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d91d99c08 ]

IMPALA-13303: FileSystemUtil.listFiles() should handle non-recursive case

FileSystemUtil.listFiles() is used in FileMetadataLoader#loadInternal()
to list the files with block locations. When table property
"impala.disable.recursive.listing" is set to true, it's supposed to skip
files in the sub dirs. However, for FileSystems that don't support
recursive listFiles(), we always create a RecursingIterator and don't
respect the 'recursive' argument.

This patch fixes the issue by adding the check for the 'recursive'
argument and use the non-recursive iterator when it's false.

Tests
 - Add test in test_recursive_listing.py to reveal the issue

Change-Id: Ia930e6071963d53561ce79896bff9d19720468a4
Reviewed-on: http://gerrit.cloudera.org:8080/21680
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> File listing could still be recursive even if 
> impala.disable.recursive.listing is true
> --------------------------------------------------------------------------------------
>
>                 Key: IMPALA-13303
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13303
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> During the development of IMPALA-13117, I found the table property 
> "impala.disable.recursive.listing" is not respected during the initial 
> metadata loading, i.e. not reloading from REFRESH or HMS events.
> To reproduce the issue, rewrite this test statement from REFRESH to 
> INVALIDATE METADATA:
> https://github.com/apache/impala/blob/0a45cb5ae6d1345a7d531c22d174c99ea7cedea0/tests/metadata/test_recursive_listing.py#L126
> The test should still pass but it actually fails.
> A simpler way to reproduce the issue is:
> {code:sql}
> create table my_tbl (i int) stored as textfile 
> tblproperties('impala.disable.recursive.listing'='true');
> describe formatted my_tbl; // Get the table location, e,g, 
> hdfs://localhost:20500/test-warehouse/my_tbl
> {code}
> Upload 3 files to that table location: dir1/data.txt, dir2/data.txt, data.txt.
> {code}
> echo 1 > data.txt
> hdfs dfs -mkdir hdfs://localhost:20500/test-warehouse/my_tbl/dir1
> hdfs dfs -mkdir hdfs://localhost:20500/test-warehouse/my_tbl/dir2
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/dir1
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/dir2
> {code}
> Then refresh the table and show the files:
> {code:sql}
> refresh my_tbl;
> show files in my_tbl;
> +-------------------------------------------------------------+------+-----------+-----------+
> | Path                                                        | Size | 
> Partition | EC Policy |
> +-------------------------------------------------------------+------+-----------+-----------+
> | hdfs://localhost:20500/test-warehouse/my_tbl/data.txt      | 2B   |         
>   | NONE      |
> | hdfs://localhost:20500/test-warehouse/my_tbl/dir1/data.txt | 2B   |         
>   | NONE      |
> | hdfs://localhost:20500/test-warehouse/my_tbl/dir2/data.txt | 2B   |         
>   | NONE      |
> +-------------------------------------------------------------+------+-----------+-----------+{code}
> Only the first file under the table folder directly should be shown in the 
> results. The other two files are in sub dirs so should be ignored since 
> recursively listing is disabled.
> This feature is added in IMPALA-8454. Though rarely used in production, it'd 
> be nice to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to