[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086824#comment-13086824 ] Arun C Murthy commented on HDFS-2004: - Jason, I'm not aware of anyone working on HDFS-2121 - it's just been a long-term wish of mine for better MR applications. Would you like to take it over and work on? Thanks. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059109#comment-13059109 ] Jason Rutherglen commented on HDFS-2004: Arun, It looks somewhat similar. Is HDFS-2121 being worked on? > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058234#comment-13058234 ] Arun C Murthy commented on HDFS-2004: - Todd pointed me to this. Maybe folks here would be interested in HDFS-2121? > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045513#comment-13045513 ] Jason Rutherglen commented on HDFS-2004: I think I mentioned HDFS is probably the wrong place for this, as HDFS should and basically does allow one to implement this functionality today. The end consumer project can easily make the request of the name node etc. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044943#comment-13044943 ] Allen Wittenauer commented on HDFS-2004: bq. What if this theoretical facility only allowed clients to "request" that a block move to some DN, with the NN being able to then make the final call? I'd say I'm not interested and one should probably use a different file system that meets the needs. I'd also point out that HDFS provides an interface where one can discover where the block is located and that the scheduling algorithm from the client should take that information into consideration. One of the big selling points for Hadoop is that the code gets moved to the data. This proposed API is the equivalent of saying "No, the data should actually get moved." bq. This isn't about "bypassing HDFS" - it's about making the interface to HDFS more capable/performant for a specific type of client. The cited example definitely is. Go read the HBASE case. mmap() is mentioned several times. If that isn't bypassing HDFS, I don't know what is. The HBASE case will basically lead to broken clients if/when the on-disk block format changes. For example, what happens if someone adds on-disk encryption? I posit that the *only* reason a client would request to move a block is if it is doing something it shouldn't be doing. Yes, I understand the "for long running clients this should be a perf gain". I'd argue that long running clients should be doing a memory cache or use a different file system rather than hammer HDFS continually for the same blocks. bq. . I just think that we should allow contributors to post a patch which scratches their itch, and then evaluate the implementation, not the idea, of it. You are certainly entitled to your opinion. I'm also entitled to mine. -1 it is. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044700#comment-13044700 ] Aaron T. Myers commented on HDFS-2004: -- bq. I'm vetoing the very concept of a client being able to dictate to the NN how it should replicate the data. "Dictate" is a strong word, and isn't necessarily what's being proposed here. What if this theoretical facility only allowed clients to "request" that a block move to some DN, with the NN being able to then make the final call? I don't think it's reasonable to veto an idea before there's been any proposed design or implementation. bq. It won't scale up very well without having severe performance consequences to the NN. That's not necessarily true. It depends upon the implementation, which we haven't seen yet. As Todd said earlier, "The difficulties in implementation are obvious - eg you don't want it to fight against a balancer or other placement policies in action on the cluster. But that's a matter to evaluate after the work is done, if someone is willing to put forth the work." bq. It should also be pointed out that the HBASE example is essentially bypassing HDFS to talk directly to the underlying file system via mmap(). We should not encourage such bad behavior. This isn't about "bypassing HDFS" - it's about making the interface to HDFS more capable/performant for a specific type of client. HDFS already makes an effort to ensure that clients that are local to a DN which write a block will have one replica of that block placed on that DN, at least initially. I don't see how adding an interface to *request* (not *require*) the NN move a block replica to a specific DN is meaningfully different than that already-existing HDFS facility. The only distinction is whether the request is done implicitly at file-write time because the client is collocated with the DN, or explicitly at a later time. To be clear, I'm not volunteering to do this work, and I'm not blocked because of it. I just think that we should allow contributors to post a patch which scratches their itch, and then evaluate the implementation, not the idea, of it. If, after an implementation is proposed/provided, you still have technical objections, then by all means veto away. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044688#comment-13044688 ] Allen Wittenauer commented on HDFS-2004: I'm vetoing the very concept of a client being able to dictate to the NN how it should replicate the data. It won't scale up very well without having severe performance consequences to the NN. It should also be pointed out that the HBASE example is essentially bypassing HDFS to talk directly to the underlying file system via mmap(). We should not encourage such bad behavior. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044613#comment-13044613 ] Aaron T. Myers commented on HDFS-2004: -- @Allen, what exactly are you vetoing, and for what reasons? It seems to me that a veto here may be a little premature, given that so far there is no code posted, or even an exact design settled upon. In particular, several commenters have already said that "pinning" a block to a particular DN is probably not feasible, but that "requesting" the NN move a block to a particular DN may be doable, and would be very handy for HBase. Are you vetoing the former, the latter, or both? > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040731#comment-13040731 ] Allen Wittenauer commented on HDFS-2004: At this point, I'm just going to -1 this now rather than keeping going down the rabbit hole. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040695#comment-13040695 ] Jason Rutherglen commented on HDFS-2004: {quote}If it indeed cannot work without a local replica, then I agree with Allen/Eli - you should just plan on copying the blocks you need over to a local tmp directory and people will take the capacity hit.{quote} I think we may be a little out of scope for this issue. The Lucene index isn't static, it'll be updated somewhat frequently. If we have a separate temp directory, there is a lot of room for error, in addition to the natural redundancy. As this is experimental/researchy I think we can try the current system out, and if it doesn't work, examine other possibilities? > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040678#comment-13040678 ] Todd Lipcon commented on HDFS-2004: --- Note that this isn't likely to get committed to the 'append' branch unless it's committed to trunk first. The idea of append was a stopgap so HBase would have something to use until a later release is out. But it shouldn't be seen as a testing ground for new ideas. bq. Unfortunately Lucene cannot work with non-local replicas Let's separate "cannot work" from "cannot work as fast as it otherwise could". If it indeed _cannot work_ without a local replica, then I agree with Allen/Eli - you should just plan on copying the blocks you need over to a local tmp directory and people will take the capacity hit. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040677#comment-13040677 ] Jason Rutherglen commented on HDFS-2004: {quote}The idea is for this to be a "LimitedPrivate" interface for use by specific applications, perhaps governed by an ACL. HBase for example can guarantee that only one node will be requesting a local replica at a time{quote} It's important to note this will likely first be used with the 'append' branch which is basically HBase specific. I'll change the intended target in Jira. It may be possible to simply factor this functionality into HBase so that HDFS does not need to change, or simply needs an abstract extensible API to enable this type of functionality. This probably means moving more of the Balancer functionality into the abstract BalancingPolicy class? > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040676#comment-13040676 ] Jason Rutherglen commented on HDFS-2004: bq. To be clear, "pinning" isn't really what we want here. We want to request that the NN make a local replica – similar to what the balancer does. Right, however pinning would be great and I think it's achievable using a modified placement policy and Balancer that skips moving files from a location based on a pattern. This would effectively 'pin' blocks to a DataNode. bq. All client software should work with non-local replicas, but if it knows it's going to need a copy for a while, pulling one over makes some sense. Unfortunately Lucene cannot work with non-local replicas. For example a single query with 4 clauses would open 4 input streams because each could perform a seek. bq. Yes, we could have HBase use some local storage to cache blocks, but then we're faced with a potentially large increase in storage requirements. I think that's OK, eg, it's more of a configuration/operations problem/option. For some HBase clusters, the current sometimes non-local replicas is optimal, for others it may not be. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040484#comment-13040484 ] Allen Wittenauer commented on HDFS-2004: bq. The idea is for this to be a "LimitedPrivate" interface for use by specific applications IIRC, the Hadoop interface definitions are based on what we had at Sun. This type of usage would likely have been called Contracted Private because it would basically be an agreement between two non-related parties to support a non-public interface. The terms of the Contract would have dictated renewal agreements, change agreements, and other things. IMO, that type of agreement really can't exist in open source. It is either open to be used or it isn't. I understand that the ACL thing is so operations people have some control over who abuses the system, but I think it's effectiveness will be zero, given how many shops have the development team acting as the operations team. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040446#comment-13040446 ] Todd Lipcon commented on HDFS-2004: --- The idea is for this to be a "LimitedPrivate" interface for use by specific applications, perhaps governed by an ACL. HBase for example can guarantee that only one node will be requesting a local replica at a time. The difficulties in implementation are obvious - eg you don't want it to fight against a balancer or other placement policies in action on the cluster. But that's a matter to evaluate after the work is done, if someone is willing to put forth the work. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040445#comment-13040445 ] Allen Wittenauer commented on HDFS-2004: (and yes, that's a real problem. One of the reasons why rep limits were put in because users at Y! would set replication factors of 100 on files so that they were always local.) > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040443#comment-13040443 ] Allen Wittenauer commented on HDFS-2004: The space increase is still likely to happen: if a file has a rep of 3, but is requested by 5 hosts, the block still ends up over-replicated. To work around that, the block would need to be auto-rep'd to 5. At some point, HDFS growth is out of control because every idiot is going to want their blocks local. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040438#comment-13040438 ] Todd Lipcon commented on HDFS-2004: --- If the DN makes a copy, the NN will see it as overreplicated, and (if the feature is implemented well), delete an excess replica elsewhere. Making a local copy would end up with an effective replication of 4 instead of 3. (Note I"m not saying this would be an easy feature. just that in the abstract, it is useful) > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040434#comment-13040434 ] Allen Wittenauer commented on HDFS-2004: What is the difference between the DN making a local copy and the application just copying the file itself to temp space? > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040419#comment-13040419 ] Todd Lipcon commented on HDFS-2004: --- To be clear, "pinning" isn't really what we want here. We want to *request* that the NN make a local replica -- similar to what the balancer does. All client software should *work* with non-local replicas, but if it knows it's going to need a copy for a while, pulling one over makes some sense. Yes, we could have HBase use some local storage to cache blocks, but then we're faced with a potentially large increase in storage requirements. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040411#comment-13040411 ] Eli Collins commented on HDFS-2004: --- Seems like the need is more about caching data on a client than pinning a file to a particular DN (which is just one way to implement caching). Eg allowing clients to configure a readahead buffer might be another option. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040407#comment-13040407 ] stack commented on HDFS-2004: - @Jason Sounds like a good idea to me. When the blocks are non-local in HBase, latency goes up. Would be nice to have an API to pull on to hurry moving blocks over. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039889#comment-13039889 ] Jakob Homan commented on HDFS-2004: --- +1 to what Allen has said. Even marking this limited or expert, or whatever (as was mentioned in the discussion list where this was proposed), I imagine there will be quite a bit of pushback to adding this change to HDFS without an astonishingly compelling use case. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
[ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039884#comment-13039884 ] Allen Wittenauer commented on HDFS-2004: If an application requires a file have all its blocks local, it sounds incredibly flawed or it shouldn't be using HDFS. Having this type of semantic is basically completely against the whole design of HDFS. > Enable replicating and pinning files to a data node > --- > > Key: HDFS-2004 > URL: https://issues.apache.org/jira/browse/HDFS-2004 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 0.23.0 >Reporter: Jason Rutherglen > > Some HDFS applications require that a given file is on the local DataNode. > The functionality created here will allow pinning the file to any DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira