[jira] [Commented] (HDFS-2316) webhdfs: a complete FileSystem implementation for accessing HDFS over HTTP

Tsz Wo (Nicholas), SZE (Commented) (JIRA) Tue, 01 Nov 2011 16:05:56 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141727#comment-13141727
 ]


Tsz Wo (Nicholas), SZE commented on HDFS-2316:
----------------------------------------------

@Nathan

> "<namenode>:<port>" and "http://<host>:<port>" seem to be used 
> interchangeably. We should be consistent where possible.

You are right.  I should use <host>:<port> only.

> Why doesn't "curl -i -L "http://<host>:<port>/webhdfs/<path>" just work? Do 
> we really need to specify op=OPEN for this very simple, common case?

The op parameter does not have a default value.  I think it may be confusing if 
we have a default - If we forgot to add op parameter, then it becomes a totally 
different operation.

> I believe "http://<datanode>:<path>" should be "http://<datanode>:<port>" in 
> append.

Good catch!

> Need format of responses spelled out.
> It would be nice if we could document the possible error responses as well.

Will post a updated doc with JSON responses and error responses soon.

> Since a single datanode will be performing the write of a potentially large 
> file, does that mean that file will have an entire copy on that node (due to 
> block placement strategies)? That doesn't seem desirable..

It is probably the case.  We may change the block placement strategies as an 
improvement later on.

> Is a SHORT sufficient for buffersize?

It should be INT.

> Do we need a renewlease? How will very slow writers be handled?

A slow writer sends data to one of the datanodes using HTTP.  That datanode 
uses a DFSClient to write data.  The DFSClient is going to renews lease for the 
writer.

> Once I have file block locations, can I go directly to those datanodes to 
> retrieve rather than using content_range and always following a redirect?

Yes.  Clients could get block locations, construct the URLs itself and then 
talk to the datanodes directly.  We should have an API to support this.  E.g. 
GETFILEBLOCKLOCATIONS is better to return a list of URLs directly.

GETFILEBLOCKLOCATIONS returns a LocatedBlocks structure which is not easy to 
use.  I am changing GETFILEBLOCKLOCATIONS to GET_BLOCK_LOCATIONS, a private API.

> Do we need flush/sync?

Since the client is using HTTP, there is no way for them to call hflush.  Let's 
leave this as a future improvement.

                
> webhdfs: a complete FileSystem implementation for accessing HDFS over HTTP
> --------------------------------------------------------------------------
>
>                 Key: HDFS-2316
>                 URL: https://issues.apache.org/jira/browse/HDFS-2316
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: WebHdfsAPI20111020.pdf
>
>
> We current have hftp for accessing HDFS over HTTP.  However, hftp is a 
> read-only FileSystem and does not provide "write" accesses.
> In HDFS-2284, we propose to have webhdfs for providing a complete FileSystem 
> implementation for accessing HDFS over HTTP.  The is the umbrella JIRA for 
> the tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2316) webhdfs: a complete FileSystem implementation for accessing HDFS over HTTP

Reply via email to