[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-08-17 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086824#comment-13086824
 ] 

Arun C Murthy commented on HDFS-2004:
-

Jason, I'm not aware of anyone working on HDFS-2121 - it's just been a 
long-term wish of mine for better MR applications. Would you like to take it 
over and work on? Thanks.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-07-02 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059109#comment-13059109
 ] 

Jason Rutherglen commented on HDFS-2004:


Arun, It looks somewhat similar.  Is HDFS-2121 being worked on?

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-07-01 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058234#comment-13058234
 ] 

Arun C Murthy commented on HDFS-2004:
-

Todd pointed me to this.

Maybe folks here would be interested in HDFS-2121?

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-06-07 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045513#comment-13045513
 ] 

Jason Rutherglen commented on HDFS-2004:


I think I mentioned HDFS is probably the wrong place for this, as HDFS should 
and basically does allow one to implement this functionality today.  The end 
consumer project can easily make the request of the name node etc.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-06-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044943#comment-13044943
 ] 

Allen Wittenauer commented on HDFS-2004:


bq. What if this theoretical facility only allowed clients to "request" that a 
block move to some DN, with the NN being able to then make the final call?

I'd say I'm not interested and one should probably use a different file system 
that meets the needs.  I'd also point out that HDFS provides an interface where 
one can discover where the block is located and that the scheduling algorithm 
from the client should take that information into consideration.

One of the big selling points for Hadoop is that the code gets moved to the 
data.  This proposed API is the equivalent of saying "No, the data should 
actually get moved."

bq. This isn't about "bypassing HDFS" - it's about making the interface to HDFS 
more capable/performant for a specific type of client.

The cited example definitely is.  Go read the HBASE case.  mmap() is mentioned 
several times.  If that isn't bypassing HDFS, I don't know what is.  The HBASE 
case will basically lead to broken clients if/when the on-disk block format 
changes.  For example, what happens if someone adds on-disk encryption?

I posit that the *only* reason a client would request to move a block is if it 
is doing something it shouldn't be doing.  Yes, I understand the "for long 
running clients this should be a perf gain".  I'd argue that long running 
clients should be doing a memory cache or use a different file system rather 
than hammer HDFS continually for the same blocks.

bq. . I just think that we should allow contributors to post a patch which 
scratches their itch, and then evaluate the implementation, not the idea, of it.

You are certainly entitled to your opinion.  I'm also entitled to mine.  -1 it 
is.


> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-06-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044700#comment-13044700
 ] 

Aaron T. Myers commented on HDFS-2004:
--

bq. I'm vetoing the very concept of a client being able to dictate to the NN 
how it should replicate the data. 

"Dictate" is a strong word, and isn't necessarily what's being proposed here. 
What if this theoretical facility only allowed clients to "request" that a 
block move to some DN, with the NN being able to then make the final call?

I don't think it's reasonable to veto an idea before there's been any proposed 
design or implementation.

bq. It won't scale up very well without having severe performance consequences 
to the NN.

That's not necessarily true. It depends upon the implementation, which we 
haven't seen yet. As Todd said earlier, "The difficulties in implementation are 
obvious - eg you don't want it to fight against a balancer or other placement 
policies in action on the cluster. But that's a matter to evaluate after the 
work is done, if someone is willing to put forth the work."

bq. It should also be pointed out that the HBASE example is essentially 
bypassing HDFS to talk directly to the underlying file system via mmap(). We 
should not encourage such bad behavior.

This isn't about "bypassing HDFS" - it's about making the interface to HDFS 
more capable/performant for a specific type of client. HDFS already makes an 
effort to ensure that clients that are local to a DN which write a block will 
have one replica of that block placed on that DN, at least initially. I don't 
see how adding an interface to *request* (not *require*) the NN move a block 
replica to a specific DN is meaningfully different than that already-existing 
HDFS facility. The only distinction is whether the request is done implicitly 
at file-write time because the client is collocated with the DN, or explicitly 
at a later time.

To be clear, I'm not volunteering to do this work, and I'm not blocked because 
of it. I just think that we should allow contributors to post a patch which 
scratches their itch, and then evaluate the implementation, not the idea, of 
it. If, after an implementation is proposed/provided, you still have technical 
objections, then by all means veto away.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-06-05 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044688#comment-13044688
 ] 

Allen Wittenauer commented on HDFS-2004:


I'm vetoing the very concept of a client being able to dictate to the NN how it 
should replicate the data.  It won't scale up very well without having severe 
performance consequences to the NN.

It should also be pointed out that the HBASE example is essentially bypassing 
HDFS to talk directly to the underlying file system via mmap(). We should not 
encourage such bad behavior.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-06-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044613#comment-13044613
 ] 

Aaron T. Myers commented on HDFS-2004:
--

@Allen, what exactly are you vetoing, and for what reasons? It seems to me that 
a veto here may be a little premature, given that so far there is no code 
posted, or even an exact design settled upon.

In particular, several commenters have already said that "pinning" a block to a 
particular DN is probably not feasible, but that "requesting" the NN move a 
block to a particular DN may be doable, and would be very handy for HBase. Are 
you vetoing the former, the latter, or both?

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-28 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040731#comment-13040731
 ] 

Allen Wittenauer commented on HDFS-2004:


At this point, I'm just going to -1 this now rather than keeping going down the 
rabbit hole.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-28 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040695#comment-13040695
 ] 

Jason Rutherglen commented on HDFS-2004:


{quote}If it indeed cannot work without a local replica, then I agree with 
Allen/Eli - you should just plan on copying the blocks you need over to a local 
tmp directory and people will take the capacity hit.{quote}

I think we may be a little out of scope for this issue.  The Lucene index isn't 
static, it'll be updated somewhat frequently.  If we have a separate temp 
directory, there is a lot of room for error, in addition to the natural 
redundancy.  As this is experimental/researchy I think we can try the current 
system out, and if it doesn't work, examine other possibilities?

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040678#comment-13040678
 ] 

Todd Lipcon commented on HDFS-2004:
---

Note that this isn't likely to get committed to the 'append' branch unless it's 
committed to trunk first. The idea of append was a stopgap so HBase would have 
something to use until a later release is out. But it shouldn't be seen as a 
testing ground for new ideas.

bq. Unfortunately Lucene cannot work with non-local replicas

Let's separate "cannot work" from "cannot work as fast as it otherwise could".

If it indeed _cannot work_ without a local replica, then I agree with Allen/Eli 
- you should just plan on copying the blocks you need over to a local tmp 
directory and people will take the capacity hit.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-28 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040677#comment-13040677
 ] 

Jason Rutherglen commented on HDFS-2004:


{quote}The idea is for this to be a "LimitedPrivate" interface for use by
specific applications, perhaps governed by an ACL. HBase for example can
guarantee that only one node will be requesting a local replica at a time{quote}

It's important to note this will likely first be used with the 'append'
branch which is basically HBase specific. I'll change the intended target
in Jira. It may be possible to simply factor this functionality into HBase
so that HDFS does not need to change, or simply needs an abstract
extensible API to enable this type of functionality. This probably means
moving more of the Balancer functionality into the abstract
BalancingPolicy class?

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-28 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040676#comment-13040676
 ] 

Jason Rutherglen commented on HDFS-2004:


bq. To be clear, "pinning" isn't really what we want here. We want to request 
that the NN make a local replica – similar to what the balancer does.

Right, however pinning would be great and I think it's achievable using a 
modified placement policy and Balancer that skips moving files from a location 
based on a pattern.  This would effectively 'pin' blocks to a DataNode.

bq. All client software should work with non-local replicas, but if it knows 
it's going to need a copy for a while, pulling one over makes some sense.

Unfortunately Lucene cannot work with non-local replicas.  For example a single 
query with 4 clauses would open 4 input streams because each could perform a 
seek.

bq. Yes, we could have HBase use some local storage to cache blocks, but then 
we're faced with a potentially large increase in storage requirements.

I think that's OK, eg, it's more of a configuration/operations problem/option.  
For some HBase clusters, the current sometimes non-local replicas is optimal, 
for others it may not be.


> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-27 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040484#comment-13040484
 ] 

Allen Wittenauer commented on HDFS-2004:


bq. The idea is for this to be a "LimitedPrivate" interface for use by specific 
applications

IIRC, the Hadoop interface definitions are based on what we had at Sun.  This 
type of usage would likely have been called Contracted Private because it would 
basically be an agreement between two non-related parties to support a 
non-public interface.  The terms of the Contract would have dictated renewal 
agreements, change agreements, and other things.  IMO, that type of agreement 
really can't exist in open source.  It is either open to be used or it isn't.  

I understand that the ACL thing is so operations people have some control over 
who abuses the system, but I think it's effectiveness will be zero, given how 
many shops have the development team acting as the operations team.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040446#comment-13040446
 ] 

Todd Lipcon commented on HDFS-2004:
---

The idea is for this to be a "LimitedPrivate" interface for use by specific 
applications, perhaps governed by an ACL. HBase for example can guarantee that 
only one node will be requesting a local replica at a time.

The difficulties in implementation are obvious - eg you don't want it to fight 
against a balancer or other placement policies in action on the cluster. But 
that's a matter to evaluate after the work is done, if someone is willing to 
put forth the work.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-27 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040445#comment-13040445
 ] 

Allen Wittenauer commented on HDFS-2004:


(and yes, that's a real problem.  One of the reasons why rep limits were put in 
because users at Y! would set replication factors of 100 on files so that they 
were always local.)

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-27 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040443#comment-13040443
 ] 

Allen Wittenauer commented on HDFS-2004:


The space increase is still likely to happen:  if a file has a rep of 3, but is 
requested by 5 hosts, the block still ends up over-replicated. To work around 
that, the block would need to be auto-rep'd to 5.  At some point, HDFS growth 
is out of control because every idiot is going to want their blocks local.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040438#comment-13040438
 ] 

Todd Lipcon commented on HDFS-2004:
---

If the DN makes a copy, the NN will see it as overreplicated, and (if the 
feature is implemented well), delete an excess replica elsewhere. Making a 
local copy would end up with an effective replication of 4 instead of 3.

(Note I"m not saying this would be an easy feature. just that in the abstract, 
it is useful)

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-27 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040434#comment-13040434
 ] 

Allen Wittenauer commented on HDFS-2004:


What is the difference between the DN making a local copy and the application 
just copying the file itself to temp space?

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040419#comment-13040419
 ] 

Todd Lipcon commented on HDFS-2004:
---

To be clear, "pinning" isn't really what we want here. We want to *request* 
that the NN make a local replica -- similar to what the balancer does.

All client software should *work* with non-local replicas, but if it knows it's 
going to need a copy for a while, pulling one over makes some sense.

Yes, we could have HBase use some local storage to cache blocks, but then we're 
faced with a potentially large increase in storage requirements.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-27 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040411#comment-13040411
 ] 

Eli Collins commented on HDFS-2004:
---

Seems like the need is more about caching data on a client than pinning a file 
to a particular DN (which is just one way to implement caching). Eg allowing 
clients to configure a readahead buffer might be another option.  

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040407#comment-13040407
 ] 

stack commented on HDFS-2004:
-

@Jason Sounds like a good idea to me.  When the blocks are non-local in HBase, 
latency goes up.  Would be nice to have an API to pull on to hurry moving 
blocks over.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-26 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039889#comment-13039889
 ] 

Jakob Homan commented on HDFS-2004:
---

+1 to what Allen has said.  Even marking this limited or expert, or whatever 
(as was mentioned in the discussion list where this was proposed), I imagine 
there will be quite a bit of pushback to adding this change to HDFS without an 
astonishingly compelling use case.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node

2011-05-26 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039884#comment-13039884
 ] 

Allen Wittenauer commented on HDFS-2004:


If an application requires a file have all its blocks local, it sounds 
incredibly flawed or it shouldn't be using HDFS. Having this type of semantic 
is basically completely against the whole design of HDFS.

> Enable replicating and pinning files to a data node
> ---
>
> Key: HDFS-2004
> URL: https://issues.apache.org/jira/browse/HDFS-2004
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.23.0
>Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  
> The functionality created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira