[ https://issues.apache.org/jira/browse/HDFS-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141727#comment-13141727 ]
Tsz Wo (Nicholas), SZE commented on HDFS-2316: ---------------------------------------------- @Nathan > "<namenode>:<port>" and "http://<host>:<port>" seem to be used > interchangeably. We should be consistent where possible. You are right. I should use <host>:<port> only. > Why doesn't "curl -i -L "http://<host>:<port>/webhdfs/<path>" just work? Do > we really need to specify op=OPEN for this very simple, common case? The op parameter does not have a default value. I think it may be confusing if we have a default - If we forgot to add op parameter, then it becomes a totally different operation. > I believe "http://<datanode>:<path>" should be "http://<datanode>:<port>" in > append. Good catch! > Need format of responses spelled out. > It would be nice if we could document the possible error responses as well. Will post a updated doc with JSON responses and error responses soon. > Since a single datanode will be performing the write of a potentially large > file, does that mean that file will have an entire copy on that node (due to > block placement strategies)? That doesn't seem desirable.. It is probably the case. We may change the block placement strategies as an improvement later on. > Is a SHORT sufficient for buffersize? It should be INT. > Do we need a renewlease? How will very slow writers be handled? A slow writer sends data to one of the datanodes using HTTP. That datanode uses a DFSClient to write data. The DFSClient is going to renews lease for the writer. > Once I have file block locations, can I go directly to those datanodes to > retrieve rather than using content_range and always following a redirect? Yes. Clients could get block locations, construct the URLs itself and then talk to the datanodes directly. We should have an API to support this. E.g. GETFILEBLOCKLOCATIONS is better to return a list of URLs directly. GETFILEBLOCKLOCATIONS returns a LocatedBlocks structure which is not easy to use. I am changing GETFILEBLOCKLOCATIONS to GET_BLOCK_LOCATIONS, a private API. > Do we need flush/sync? Since the client is using HTTP, there is no way for them to call hflush. Let's leave this as a future improvement. > webhdfs: a complete FileSystem implementation for accessing HDFS over HTTP > -------------------------------------------------------------------------- > > Key: HDFS-2316 > URL: https://issues.apache.org/jira/browse/HDFS-2316 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Tsz Wo (Nicholas), SZE > Assignee: Tsz Wo (Nicholas), SZE > Attachments: WebHdfsAPI20111020.pdf > > > We current have hftp for accessing HDFS over HTTP. However, hftp is a > read-only FileSystem and does not provide "write" accesses. > In HDFS-2284, we propose to have webhdfs for providing a complete FileSystem > implementation for accessing HDFS over HTTP. The is the umbrella JIRA for > the tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira