[ 
https://issues.apache.org/jira/browse/IMPALA-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944936#comment-16944936
 ] 

ASF subversion and git services commented on IMPALA-8950:
---------------------------------------------------------

Commit ac87278b169422091af1c03fcd2101516372defb in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ac87278 ]

IMPALA-8950: Add -d, -f options to hdfs copyFromLocal, put, cp

Add the -d option and -f option to the following commands:

`hdfs dfs -copyFromLocal <localsrc> URI`
`hdfs dfs -put [ - | <localsrc1> .. ]. <dst>`
`hdfs dfs -cp URI [URI ...] <dest>`

The -d option "Skip[s] creation of temporary file with the suffix
._COPYING_." which improves performance of these commands on S3 since S3
does not support metadata only renames.

The -f option "Overwrites the destination if it already exists" combined
with HADOOP-13884 this improves issues seen with S3 consistency issues by
avoiding a HEAD request to check if the destination file exists or not.

Added the method 'copy_from_local' to the BaseFilesystem class.
Re-factored most usages of the aforementioned HDFS commands to use
the filesystem_client. Some usages were not appropriate / worth
refactoring, so occasionally this patch just adds the '-d' and '-f'
options explicitly. All calls to '-put' were replaced with
'copyFromLocal' because they both copy files from the local fs to a HDFS
compatible target fs.

Since WebHDFS does not have good support for copying files, this patch
removes the copy functionality from the PyWebHdfsClientWithChmod.
Re-factored the hdfs_client so that it uses a DelegatingHdfsClient
that delegates to either the HadoopFsCommandLineClient or
PyWebHdfsClientWithChmod.

Testing:
* Ran core tests on HDFS and S3

Change-Id: I0d45db1c00554e6fb6bcc0b552596d86d4e30144
Reviewed-on: http://gerrit.cloudera.org:8080/14311
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Add -d and -f option to copyFromLocal and re-enable disabled S3 tests
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-8950
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8950
>             Project: IMPALA
>          Issue Type: Test
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> The {{-d}} option for {{hdfs dfs -copyFromLocal}} "Skip[s] creation of 
> temporary file with the suffix ._COPYING_". The {{-f}} option "Overwrites the 
> destination if it already exists".
> By using the {{-d}} option, copies to S3 avoid the additional overhead of 
> copying data to a tmp file and then renaming the file. The {{-f}} option 
> overwrites the file if it exists, which should be safe since tests should be 
> writing to unique directories anyway. With HADOOP-16490, 
> {{create(overwrite=true)}} avoids issuing a HEAD request on the path, which 
> prevents any cached 404s on the S3 key.
> After these changes, the tests disabled by IMPALA-8189 can be re-enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to