Hello Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14311

to look at the new patch set (#7).

Change subject: IMPALA-8950: Add -d, -f options to hdfs copyFromLocal, put, cp
......................................................................

IMPALA-8950: Add -d, -f options to hdfs copyFromLocal, put, cp

Add the -d option and -f option to the following commands:

`hdfs dfs -copyFromLocal <localsrc> URI`
`hdfs dfs -put [ - | <localsrc1> .. ]. <dst>`
`hdfs dfs -cp URI [URI ...] <dest>`

The -d option "Skip[s] creation of temporary file with the suffix
._COPYING_." which improves performance of these commands on S3 since S3
does not support metadata only renames.

The -f option "Overwrites the destination if it already exists" combined
with HADOOP-13884 this improves issues seen with S3 consistency issues by
avoiding a HEAD request to check if the destination file exists or not.

Added the method 'copy_from_local' to the BaseFilesystem class.
Re-factored most usages of the aforementioned HDFS commands to use
the filesystem_client. Some usages were not appropriate / worth
refactoring, so occasionally this patch just adds the '-d' and '-f'
options explicitly. All calls to '-put' were replaced with
'copyFromLocal' because they both copy files from the local fs to a HDFS
compatible target fs.

Since WebHDFS does not have good support for copying files, this patch
removes the copy functionality from the PyWebHdfsClientWithChmod.
Re-factored the hdfs_client so that it uses a DelegatingHdfsClient
that delegates to either the HadoopFsCommandLineClient or
PyWebHdfsClientWithChmod.

Testing:
* Ran core tests on HDFS and S3

Change-Id: I0d45db1c00554e6fb6bcc0b552596d86d4e30144
---
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_udf_concurrency.py
M tests/metadata/test_hidden_files.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_stale_metadata.py
M tests/query_test/test_compressed_formats.py
M tests/query_test/test_hdfs_file_mods.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_multiple_filesystems.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_scanners.py
M tests/query_test/test_scanners_fuzz.py
M tests/query_test/test_udfs.py
M tests/util/adls_util.py
M tests/util/filesystem_base.py
M tests/util/hdfs_util.py
21 files changed, 206 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14311/7
--
To view, visit http://gerrit.cloudera.org:8080/14311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d45db1c00554e6fb6bcc0b552596d86d4e30144
Gerrit-Change-Number: 14311
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>

Reply via email to