[ https://issues.apache.org/jira/browse/IMPALA-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944936#comment-16944936 ]
ASF subversion and git services commented on IMPALA-8950: --------------------------------------------------------- Commit ac87278b169422091af1c03fcd2101516372defb in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ac87278 ] IMPALA-8950: Add -d, -f options to hdfs copyFromLocal, put, cp Add the -d option and -f option to the following commands: `hdfs dfs -copyFromLocal <localsrc> URI` `hdfs dfs -put [ - | <localsrc1> .. ]. <dst>` `hdfs dfs -cp URI [URI ...] <dest>` The -d option "Skip[s] creation of temporary file with the suffix ._COPYING_." which improves performance of these commands on S3 since S3 does not support metadata only renames. The -f option "Overwrites the destination if it already exists" combined with HADOOP-13884 this improves issues seen with S3 consistency issues by avoiding a HEAD request to check if the destination file exists or not. Added the method 'copy_from_local' to the BaseFilesystem class. Re-factored most usages of the aforementioned HDFS commands to use the filesystem_client. Some usages were not appropriate / worth refactoring, so occasionally this patch just adds the '-d' and '-f' options explicitly. All calls to '-put' were replaced with 'copyFromLocal' because they both copy files from the local fs to a HDFS compatible target fs. Since WebHDFS does not have good support for copying files, this patch removes the copy functionality from the PyWebHdfsClientWithChmod. Re-factored the hdfs_client so that it uses a DelegatingHdfsClient that delegates to either the HadoopFsCommandLineClient or PyWebHdfsClientWithChmod. Testing: * Ran core tests on HDFS and S3 Change-Id: I0d45db1c00554e6fb6bcc0b552596d86d4e30144 Reviewed-on: http://gerrit.cloudera.org:8080/14311 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Add -d and -f option to copyFromLocal and re-enable disabled S3 tests > --------------------------------------------------------------------- > > Key: IMPALA-8950 > URL: https://issues.apache.org/jira/browse/IMPALA-8950 > Project: IMPALA > Issue Type: Test > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > > The {{-d}} option for {{hdfs dfs -copyFromLocal}} "Skip[s] creation of > temporary file with the suffix ._COPYING_". The {{-f}} option "Overwrites the > destination if it already exists". > By using the {{-d}} option, copies to S3 avoid the additional overhead of > copying data to a tmp file and then renaming the file. The {{-f}} option > overwrites the file if it exists, which should be safe since tests should be > writing to unique directories anyway. With HADOOP-16490, > {{create(overwrite=true)}} avoids issuing a HEAD request on the path, which > prevents any cached 404s on the S3 key. > After these changes, the tests disabled by IMPALA-8189 can be re-enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org