[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2016-06-11 Thread amoners (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325782#comment-15325782
 ] 

amoners commented on HDFS-4750:
---

"
By default, the DFSClient inside NFS gateway doesn’t flush data after each 
write, and thus the user may not
be able to read back the written data immediately. The solution is to read the 
data from the buffer in NFS
gateway instead of DataNode.
"

Is there any improvement in this filed? If so, could you give me a link?

Thanks,
Amoners

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: nfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, HDFS-4750.patch, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2015-08-25 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712494#comment-14712494
 ] 

Brandon Li commented on HDFS-4750:
--

Overlapped writes happen usually for the first block.
For example, a file has 100 bytes. User write this file to NFS mount point. 
After a while (the 100 bytes still in OS buffer cache), the user appends 
another 100 bytes. In this case, the NFS client will rewrite the first block 
(0-199th byte), and thus we see an overlapped write. These kind of case should 
be well handled by current code (but, still might have bugs there :-(   )

Based on your description, the problem seems not the above case. There is one 
case, we currently have not way to control it on the server side:

1. client sends write  0-99, we cache this write and do the write asynchronously
2. client sends another write  180-299, we cache it too, can't write since 
there is a hole
3. client sends another write 100-199, we will do the write since it's a 
sequential write
4. after we finish writing (0-99) and (100-199), we see an overlapped write 
(180-299) in the cache. This is where you see the error message since we are 
expecting another sequential write (200-xxx)

This kind of overlapped write happens very rarely. In case it happens, we have 
multiple copies of the same range (180-198 in the above example). The data 
could be different. When they are different, it could be hard to know which one 
is really expected by the client. 

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: nfs
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, HDFS-4750.patch, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2015-08-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710621#comment-14710621
 ] 

Yongjun Zhang commented on HDFS-4750:
-

Hi [~brandonli]

If your answer to "Is it by design that receivedNewWrite should not have 
overlapped requests?" is no, then we should probably issue warning when 
overlapping write request is added there; If your answer is yes, then the 
client might have a bug due to which overlapping requests are sent. So it'd be 
nice for clarify.

Thanks.





> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: nfs
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, HDFS-4750.patch, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2015-08-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710603#comment-14710603
 ] 

Yongjun Zhang commented on HDFS-4750:
-

Hi [~brandonli],

Thanks for your earlier work here. 

Need some help. I'm seeing a problem which might be a bug, would like to check 
with you. The symptom is that the following Warning is issue in the case I'm 
looking at. The following code commented that it should not happen for 
overlapped concurent writers.
{code}
} else if (range.getMin() < offset && range.getMax() > offset) {
// shouldn't happen since we do sync for overlapped concurrent writers
LOG.warn("Got an overlapping write (" + range.getMin() + ", "
+ range.getMax() + "), nextOffset=" + offset
+ ". Silently drop it now");
pendingWrites.remove(range);
processCommits(nextOffset.get()); // handle race
  } else {
{code}
However, I saw {{receivedNewWrite}} calls the following method to check if a 
write is repeated write, but not checking overlapping write:
{code}
  private WriteCtx checkRepeatedWriteRequest(WRITE3Request request,
  Channel channel, int xid) {
OffsetRange range = new OffsetRange(request.getOffset(),
request.getOffset() + request.getCount());
WriteCtx writeCtx = pendingWrites.get(range);
if (writeCtx== null) {
  return null;
} else {
  if (xid != writeCtx.getXid()) {
LOG.warn("Got a repeated request, same range, with a different xid: "
+ xid + " xid in old request: " + writeCtx.getXid());
//TODO: better handling.
  }
  return writeCtx;  
}
  }
{code}
thus overlapping write can get into {{pendingWrites}}, which seems to be why 
I'm seeing the Warn described in the beginning of this comment.

Would you please comment on why we only check on repeatedwrites, but not 
overlappingWrites in {{receivedNewWrite}}? Is it by design that 
{{receivedNewWrite}} should not have overlapped requests?

Thanks a lot.



 

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: nfs
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, HDFS-4750.patch, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-11-15 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13824293#comment-13824293
 ] 

Brandon Li commented on HDFS-4750:
--

NFS gateway has been released with 2.2.0.
This JIRA was continually used for additional bugfixes and code refactoring. 
There are still a few minor features to be added in the future.

I am wondering if we should close this JIRA and open new ones for these new 
features, such as sub-directory mount support, kerbores authentication. 
  

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: nfs
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, HDFS-4750.patch, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-07-02 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698561#comment-13698561
 ] 

Brandon Li commented on HDFS-4750:
--

I've merge the HADOOP-9009,HADOOP-9515,HDFS-4762,HDFS-4948 into branch-2 and 
branch-2.1

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: nfs
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, HDFS-4750.patch, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-07-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698301#comment-13698301
 ] 

Suresh Srinivas commented on HDFS-4750:
---

[~brandonli] If this work is completed, given all the jiras going in, can you 
please merge this to branch 2.1?

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, HDFS-4750.patch, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-05-05 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649369#comment-13649369
 ] 

Brandon Li commented on HDFS-4750:
--

Hi Brock,
Thanks for the review!
{quote} I don't see any concept of controlling export by hostname or ip range. 
FWWIW, that code can probably be taken directly from the NFS4 proxy.{quote}
Cool, thanks! :-)
{quote}Can you speak to the reason you chose to use DFSClient directly as 
opposed to using FileSystem?{quote}
DFSClient provides finer control of HDFS RPC parameters. Also it could be 
easier to add new interfaces to DFSClient than FileSystem in case we need some 
special support from HDFS for NFS. The drawback is that, we have to provide the 
utilities in FileSystem but not DFSClient, e.g., statistics and cache. 


> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, HDFS-4750.patch, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-05-03 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648478#comment-13648478
 ] 

Brock Noland commented on HDFS-4750:


Brandon,

I took a quick look.

1) I see some e.printStackTrace(); directly around LOG.level().
2) I see a good number of LOG.level(msg + e); which eats the exception.
3) I don't see any concept of controlling export by hostname or ip range. 
FWWIW, that code can probably be taken directly from the NFS4 proxy.

bq. the NFS gateway is tightly coupled with HDFS protocols

Can you speak to the reason you chose to use DFSClient directly as opposed to 
using FileSystem?

Brock

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, HDFS-4750.patch, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-05-01 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646994#comment-13646994
 ] 

Brandon Li commented on HDFS-4750:
--

No problem, Andrew. 
We will use the time to fix more bugs and get some review done first :-)

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-05-01 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646821#comment-13646821
 ] 

Andrew Purtell commented on HDFS-4750:
--

[~brandonli]
{quote}
bq. Would be happy to help with that when you think the code is ready.

Thanks! I only did cthon04 basic test(no symlink) with the uploaded patch on 
centos. Please feel free to give it a try.
{quote}

We had reserved some time this week over here to give this a spin, but 
unfortunately something has come up. Maybe there will still be an opportunity 
to be helpful down the road. Apologies.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645326#comment-13645326
 ] 

Hadoop QA commented on HDFS-4750:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581119/nfs-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1371 javac 
compiler warnings (more than the trunk's current 1366 warnings).

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 19 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4343//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4343//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4343//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4343//console

This message is automatically generated.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-29 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645322#comment-13645322
 ] 

Brandon Li commented on HDFS-4750:
--

@Han{quote}..not having understood the description in the pdf that "Hadoop+FUSE 
could be used to provide an NFS interface for HDFS. However it has many known 
problems and limitations."{quote}
For example, FUSE is not inode based like NFS. FUSE usually uses path to 
generate the NFS file handle. Its path-handle mapping can make the host run out 
of memory. Even it can work around the memory problem, it could have 
correctness issue. FUSE may not be aware that a file's path has been changed by 
other means(e.g., hadoop CLI). If FUSE is used on the client side, each NFS 
client has to install a client component which runs only on Linux so far. ...

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-29 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645289#comment-13645289
 ] 

Brandon Li commented on HDFS-4750:
--

@Hari{quote}An alternative to consider to support NFS writes is to require 
clients do NFS mounts with directio enabled. Directio will bypass client cache 
and might alleviate some of the funky behavior.{quote}
Yes, directIO could help reduce the kernel reordered writes. Solaris supports 
it using the forcedirectio for mount. Linux seems not have a corresponding 
mount option. 

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-29 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645287#comment-13645287
 ] 

Brandon Li commented on HDFS-4750:
--

@Todd{quote}What is the purpose of putting it in Hadoop proper rather than 
proposing it as a separate project (eg in the incubator)? {quote}
What we were thinking was that, as mentioned above, the NFS gateway is tightly 
coupled with HDFS protocols. Current code is still controlled at a small size. 
Also, some code is so general (e.g., oncrpc implementation) that can be used by 
other possible projects. 

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-29 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645272#comment-13645272
 ] 

Brandon Li commented on HDFS-4750:
--

@Allen{quote}Any clients besides Linux and Mac OS X? (FWIW: OS X's NFS client 
has always been a bit flaky...) Have we thought about YANFS support?{quote}
Weeks ago, did some manual tests with Window NFSv3 client before we changed the 
RPC authentication support form AUTH_NULL to AUTH_SYS. We didn't try it again 
after the change. Mapping Windows users to Unix users may be needed to test it 
again.

We looked a few other NFS implementations. Eventually we decide to implement 
one. The major reason is that, the NFS should work around a few HDFS 
limitations, and also tightly coupled with HDFS protocols. 

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-29 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645247#comment-13645247
 ] 

Brandon Li commented on HDFS-4750:
--

@Andrew {quote}Would be happy to help with that when you think the code is 
ready. {quote}
Thanks! I only did cthon04 basic test(no symlink) with the uploaded patch on 
centos. Please feel free to give it a try.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf, nfs-trunk.patch
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-28 Thread Han Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643948#comment-13643948
 ] 

Han Xiao commented on HDFS-4750:


This feature is intersting. 
However, I'am sorry for not having understood the desription in the pdf that 
"Hadoop+FUSE could be used to provide an NFS interface for HDFS. However it has 
many known problems and limitations."
Could it be explained more detailed or a link would also be appreciated. We 
want to use this fuction alike to do backuping tasks with normal backup 
software. So the information is very important to us. 
Thank you.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-27 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643816#comment-13643816
 ] 

Andrew Purtell commented on HDFS-4750:
--

bq. We will do some performance tests once the code is relatively stable.

Would be happy to help with that when you think the code is ready. 

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643772#comment-13643772
 ] 

Brandon Li commented on HDFS-4750:
--

@Allen,Andrew
{quote}This is a great question. Or fio, or another of the usual.
{quote}
The code is in its very early stage. We have done little performance test. We 
did some tests with Cthon04 and NFStest(from NetApp). We will do some 
performance tests once the code is relatively stable.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-27 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643758#comment-13643758
 ] 

Andrew Purtell commented on HDFS-4750:
--

bq. What does iozone do with this?

This is a great question. Or fio, or another of the usual. 

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-27 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643757#comment-13643757
 ] 

Allen Wittenauer commented on HDFS-4750:


Have we run any of these against SPEC SFS?  What does iozone do with this?  Any 
clients besides Linux and Mac OS X? (FWIW: OS X's NFS client has always been a 
bit flaky...)  Have we thought about YANFS support?

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-27 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643739#comment-13643739
 ] 

Brock Noland commented on HDFS-4750:


Hari,

Yes, this a major concern. Any clients will have to be well behaved and 
explicitly not perform random writes on the client side. With NFS4, as long as 
the client application is not performing random writes, the linux NFS4 client 
does not appear attempt any random writes. As I mentioned earlier I thought I 
saw the linux NFS3 client perform random writes for a sequentially writing 
client. Hopefully I was mistaken.

In regard to handling the normal write-reordering which occurs under both NFS3 
and 4, the approach I took in the NFS4 proxy was to buffer any non-sequential 
writes until I could write them sequentially. As you said previously this can 
lead unbounded memory consumption. Therefore in the NFS4 proxy, if a write 
request is greater than 1MB from the current file offset, it's written out to a 
log file and the in-memory object simply stores the file name, offset, and 
length. I've found this method to work quite well.

Brock

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643737#comment-13643737
 ] 

Todd Lipcon commented on HDFS-4750:
---

Looking at some of the patches that have been posted, it appears that this 
project is entirely new/separate code from the rest of Hadoop. What is the 
purpose of putting it in Hadoop proper rather than proposing it as a separate 
project (eg in the incubator)? Bundling it with Hadoop has the downside that it 
makes our releases even bigger, whereas the general feeling of late has been 
that we should try to keep things out of 'core' (eg we removed a bunch of 
former contrib projects)

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-27 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643734#comment-13643734
 ] 

Hari Mankude commented on HDFS-4750:


I would recommend thinking through NFS write operations. The client does 
caching and page cache can result in lots of weirdness. For example, as long as 
the data is cached in client's page cache, client can do random writes and 
overwrites. When page cache is flushed to hdfs data store, some writes would 
fail (translate to overwrites in hdfs) while others might succeed (offsets 
happen to be append). 

An alternative to consider to support NFS writes is to require clients do NFS 
mounts with directio enabled. Directio will bypass client cache and might 
alleviate some of the funky behavior.



> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-27 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643690#comment-13643690
 ] 

Suresh Srinivas commented on HDFS-4750:
---

bq. But, if it's preferred to do the change in a different branch, please let 
me know.
Since these do changes do not lend trunk unstable, I am okay not having a 
branch for this development. If I do not hear a differing opinion, I will start 
reviewing and merging this patch from next week.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-26 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643043#comment-13643043
 ] 

Brandon Li commented on HDFS-4750:
--

Hi folks,

I plan to split and upload the initial implementation to 4 JIRAs 
(HADOOP-9509,HADOOP-9515,HDFS-4762,HDFS-4763). These changes are independent 
with current Hadoop code base. But, if it's preferred to do the change in a 
different branch, please let me know.

Thanks,
Brandon


> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642145#comment-13642145
 ] 

Brandon Li commented on HDFS-4750:
--

{quote}...but NFS does require that close() makes it visible to other clients 
(so-called "close-to-open consistency"){quote}
The protocol provides no facility to guarantee the cached data is consistent 
with that on server. But "close-to-open consistency" is recommended for 
implementation.


> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Hari Mankude (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642102#comment-13642102
 ] 

Hari Mankude commented on HDFS-4750:


Implementing writes might not be easy. The client implementations in various 
kernels does not guarantee that the writes are issued in sequential order. Page 
flushing algorithms try to find contiguous pages (offsets). However, there are 
other factors in play with page flushing algorithms.  So it does not imply that 
writes from the client has to be sequential as HDFS requires it to be. This is 
true whether the writes are coming in lazily from the client or due to a sync() 
before close(). A possible solution is for nfs gateway on dfs client to cache 
and reorder the writes to be sequential. But, this might still result in 
"holes" which hdfs cannot handle. Also, the cache requirements might not be 
trivial and might require a flush to local disk.

NFS interfaces are very useful for reads. 

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642068#comment-13642068
 ] 

Todd Lipcon commented on HDFS-4750:
---

{quote}
If the application expects that, after closing file through one NFS gateway, 
the new data is immediately available to all other NFS gateways, the 
application should do a sync call after close.
This is not a limitation only to this NFS implemtation. POSIX close doesn't 
sync data implicitly.
{quote}

I don't think this is right. POSIX doesn't ensure that close() syncs data 
(makes it durable), but NFS _does_ require that close() makes it _visible_ to 
other clients (so-called "close-to-open consistency"):

{quote}
The NFS standard requires clients to maintain close-to-open cache coherency 
when multiple clients access the same files [5, 6, 10]. This means flushing all 
file data and metadata changes when a client closes a file, and immediately and 
unconditionally retrieving a file's attributes when it is opened via the open() 
system call API. In this way, changes made by one client appear as soon as a 
file is opened on any other client.
{quote}
(from http://www.citi.umich.edu/projects/nfs-perf/results/cel/dnlc.html)

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642065#comment-13642065
 ] 

Brandon Li commented on HDFS-4750:
--

@Brock
{quote}
What happens if the 10 seconds expires and the prerequisite write has not been 
received? The biggest issue I had when moving the proxy from basically working 
to handling multiple heavy use clients was memory consumption while waiting for 
pre-requisite writes. I eventually had to write pending writes to a file. This 
is documented in this issue https://github.com/brockn/hdfs-nfs-proxy/issues/7
{quote}
The pending write requests will fail after timeout. Saving pending writes in 
files can help in some cases but also introduces some problems. First, it 
doesn't eliminate the problem. The prerequisite write may never arrive if 10 
seconds is not long enough. Even the prerequisite write arrives finally, the 
accumulated writes in the file may have timed out. Secondly, it makes the 
server stateful (or have more state information). To support HA later, we have 
to move the state information from one NFS gateway to another in order to 
recover. If the state recovery takes too long to finish, it can cause the 
clients new requests to fail. 
More testing and research work is needed here.

{quote}
Interesting, what version of linux have tried? I believe I was using RHEL 5.X. 
{quote}
CentOS6.3 and Mac 10.7.5

{quote}
Additionally, is there anything the user can do to force the writes? i.e. If 
the user has control over the program, could they do a fsync(fd) to force the 
flush?
{quote} fsync could trigger NFS commit, which will sync and persist the data.

{quote} I could imagine many users believing once they have closed a file via 
NFS that exact file will be available via one the other APIs. We'll need to 
make this limitation blatantly obvious to users otherwise it will likely become 
a support headache.{quote}
If the application expects that, after closing file through one NFS gateway, 
the new data is immediately available to all other NFS gateways, the 
application should do a sync call after close.

This is not a limitation only to this NFS implemtation. POSIX close doesn't 
sync data implicitly.


> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642052#comment-13642052
 ] 

Daryn Sharp commented on HDFS-4750:
---

bq. If the file is appended by another client, the first client's new write 
before the file's  becomes random write and would fail with exception. 
What breaks POSIX semantic here is that random write is not support.

Ok, we're in 100% agreement.  The doc is just ambiguous. 

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642000#comment-13642000
 ] 

Brandon Li commented on HDFS-4750:
--

{quote}This is very semantically wrong. If another client appended to the file 
in the interim, the file position should not implicitly move to the end of the 
file. 
{quote}
When the stream is closed, the file size is updated in HDFS. Before it's 
closed, the same client still holds the lease.

{quote}Assuming the proposed approach is otherwise valid: when the client 
attempts to write again via append, it should throw an exception if the file 
size is greater than the client's current position in the stream. Even that 
breaks POSIX semantics, but it's "less wrong" by not causing the potential for 
garbled data.{quote}
If the file is appended by another client, the first client's new write before 
the file's  becomes random write and would fail with exception. What 
breaks POSIX semantic here is that random write is not support. 



> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641973#comment-13641973
 ] 

Brandon Li commented on HDFS-4750:
--

{quote}What are the plans around RPCSEC and GSSAPI mapping capabilities?{quote}
The initial implementation doesn't have it but I agree it would be nice to 
support it sooner than later.
{quote}...This code might be of use.{quote}
Sounds like a plan :-)

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641900#comment-13641900
 ] 

Brock Noland commented on HDFS-4750:


I didn't spend too much time looking at NFSv3 security but FWIW the NFS4 proxy 
supports Kerberos in privacy mode. This code might be of use.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641896#comment-13641896
 ] 

Allen Wittenauer commented on HDFS-4750:


What are the plans around RPCSEC and GSSAPI mapping capabilities? While I 
recognize that these are optional to the NFSv3 specs, a lot of folks need them 
in order to use this feature.  It is probably also worth pointing out that 
NFSv4 and higher fix this mistake and require the security pieces to be there 
in order to be RFC compliant.  So if we want to implement pNFS, we have to do 
this work anyway.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641864#comment-13641864
 ] 

Daryn Sharp commented on HDFS-4750:
---

This part seems a bit worrisome:
bq. The solution is to close the stream after it’s idle(no write) for a certain 
period(e.g., 10 seconds). The subsequent write will become append and open the 
stream again.

This is very semantically wrong.  If another client appended to the file in the 
interim, the file position _should not_ implicitly move to the end of the file. 
 Assuming the proposed approach is otherwise valid: when the client attempts to 
write again via append, it should throw an exception if the file size is 
greater than the client's current position in the stream.  Even that breaks 
POSIX semantics, but it's "less wrong" by not causing the potential for garbled 
data.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641855#comment-13641855
 ] 

Brock Noland commented on HDFS-4750:


Hi Brandon,

Thank you for the quick response!

bq. "10 milliseconds" is the time from the reference paper. In the initial 
implementation, we used 10 seconds just to be on the safe side.

What happens if the 10 seconds expires and the prerequisite write has not been 
received? The biggest issue I had when moving the proxy from basically working 
to handling multiple heavy use clients was memory consumption while waiting for 
pre-requisite writes. I eventually had to write pending writes to a file. This 
is documented in this issue https://github.com/brockn/hdfs-nfs-proxy/issues/7

bq. Regarding small file append, it starts from the correct offset in the tests 
I observed. For example, I tried "echo abcd >> /mnt_test/file_with_5bytes", the 
write request starts with offset 5. With the initial file loading tests with 
Linux/Mac clients, so far we haven't encountered the problem you mentioned.

Interesting, what version of linux have tried? I believe I was using RHEL 5.X.

bq. For the second question, as long as the second user uses NFS gateway to 
read the closed file, the second user should be able to get the data buffered 
in NFS gateway. For the opened files, NFS gateway also saves their latest file 
size. When it serves getattr request, it gets file attributes from HDFS and 
then update the file length based on the cached length.

Cool, my question was more around how we are going to make our users aware of 
this limitation. I could imagine many users believing once they have closed a 
file via NFS that exact file will be available via one the other APIs. We'll 
need to make this limitation blatantly obvious to users otherwise it will 
likely become a support headache. 

Additionally, is there anything the user can do to force the writes? i.e. If 
the user has control over the program, could they do a fsync(fd) to force the 
flush?

Cheers,
Brock




> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-24 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641470#comment-13641470
 ] 

Brandon Li commented on HDFS-4750:
--


{quote}This precludes having multiple NFS gateways in operation simultaneously 
for increased throughput, right?{quote}
Not necessarily, it depends on the workloads and the application requirement.

Even for a regular NFS server mounted to multiple clients, it could have the 
same issue. One way to synchronize the clienB-read-after-clienA-write is to use 
NFS lock manager(NLM) protocol(along with Network Status Monitor (NSM) 
protocol). In the first phase, it seems a bit overkill for the user cases we 
want to support.

{quote}
Even in a data loading situation, I'd expect a set of several "gateway nodes" 
to be used in round-robin in order to increase ingest throughput beyond what a 
single host can handle. 
{quote}

Here what I want to mention is, as also in the proposal, one benefit of NFS 
support is to make it easier to integrate HDFS into client's file system 
namespace. The performance of NFS gateway is usually slower than using 
DFSClient directly. 

Loading file through NFS gateway can be faster than DFSClient only in a few 
cases, such as unstable writes with no commit after them immediately. 

With that said, its performance can be improved in the future by a few ways, 
such as better caching, pNFS support and etc.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-24 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641394#comment-13641394
 ] 

Todd Lipcon commented on HDFS-4750:
---

bq. For the second question, as long as the second user uses NFS gateway to 
read the closed file, the second user should be able to get the data buffered 
in NFS gateway

This precludes having multiple NFS gateways in operation simultaneously for 
increased throughput, right? I'd imagine the typical deployment scenario would 
be to run an NFS gateway on every machine and then have them mount localhost in 
order to avoid a bottleneck scenario. Even in a data loading situation, I'd 
expect a set of several "gateway nodes" to be used in round-robin in order to 
increase ingest throughput beyond what a single host can handle.

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to have such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-24 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641379#comment-13641379
 ] 

Brandon Li commented on HDFS-4750:
--

Hi Brock, I was about to send you the link to this JIRA. Glad you already 
noticed it.
I've resolved HDFS-252 as a dup.  
I studied your NFSv4 implementation and learned a lot from it. Thank you! :-)

"10 milliseconds" is the time from the reference paper. In the initial 
implementation, we used 10 seconds just to be on the safe side.

Regarding small file append, it starts from the correct offset in the tests I 
observed. For example, I tried "echo abcd >> /mnt_test/file_with_5bytes", the 
write request starts with offset 5. With the initial file loading tests with 
Linux/Mac clients, so far we haven't encountered the problem you mentioned. 

For the second question, as long as the second user uses NFS gateway to read 
the closed file, the second user should be able to get the data buffered in NFS 
gateway. For the opened files, NFS gateway also saves their latest file size. 
When it serves getattr request, it gets file attributes from HDFS and then 
update the file length based on the cached length.

Thanks!
Brandon



> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to support such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4750) Support NFSv3 interface to HDFS

2013-04-24 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641309#comment-13641309
 ] 

Brock Noland commented on HDFS-4750:


Hi Brandon,

Great to see this proposal! I am fine with using a new JIRA for this, but if we 
do so, should HDFS-252 be closed as a duplicate? As you know I created an 
Apache Licensed NFS4 proxy for HDFS 
(https://github.com/cloudera/hdfs-nfs-proxy). I have a couple 
questions/comments:

The proposal says that waiting "10 milliseconds" should be able to convert most 
writes to sequential writes. I am curious has this been tested under load on 
modern kernels? The reason I ask is that I found that often the NFS4 proxy has 
to wait much longer than 10 milliseconds to receive the pre-requisite writes. 
It's possible that behavior is NFS4 only.

Before implementing the NFS4 proxy I implemented a NFS3 proxy as you propose. 
Unfortunately I deleted the git repo when I became frustrated with the mismatch 
between NFS3 and HDFS semantics. If I remember correctly, one example was that 
when I had a small file, a small append resulted in a write of the entire file. 
I cannot remember exactly how it behaved with larger files. Have you 
encountered this? If so, how will it be handled?

Another problem I ran into was that since NFS3 doesn't have a close, I was 
never sure when to close the HDFS file handle. I see that you plan to handle 
this by idle closing file handles. I thought about this approach as well but my 
concern was it will often be the case that there is data which has not been 
"synced" to HDFS when the native program has closed the file. Therefore there 
are races with other clients being able to see that data. I am not 100% up the 
latest of when a file length is updated in HDFS, but I believe there is a 
similar issue with the length metadata as well. How will this be handled?

Once again, great work on the proposal!

Cheers,
Brock

> Support NFSv3 interface to HDFS
> ---
>
> Key: HDFS-4750
> URL: https://issues.apache.org/jira/browse/HDFS-4750
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HADOOP-NFS-Proposal.pdf
>
>
> Access HDFS is usually done through HDFS Client or webHDFS. Lack of seamless 
> integration with client’s file system makes it difficult for users and 
> impossible for some applications to access HDFS. NFS interface support is one 
> way for HDFS to support such easy integration.
> This JIRA is to track the NFS protocol support for accessing HDFS. With HDFS 
> client, webHDFS and the NFS interface, HDFS will be easier to access and be 
> able support more applications and use cases. 
> We will upload the design document and the initial implementation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira