Re: DFS. How to read from a specific datanode

2008-08-07 Thread Kevin
Yes, I agree with you that it should be negotiated. That is "namenode
provides an ordered list and the client can choose some based on its
own measurements." But I am afraid 0.17.1 does not provide easy
interface for this.


-Kevin



On Thu, Aug 7, 2008 at 3:40 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> Kevin wrote:
>>
>> Thank you for the suggestion. I looked at DFSClient. It appears that
>> chooseDataNode method decides which data node to connect to. Currently
>> it chooses the first non-dead data node returned by namenode, which
>> have sorted the nodes by proximity to the client. However,
>> chooseDataNode is private, so overriding it seems infeasible. Neither
>> are the callers of chooseDataNode public or protected.
>>
>> I need this because I do not want to trust namenode's ordering. For
>> applications where network congestion is rare, we should let the
>> client to decide which data node to load from.
>>
>
> dangerous. what happens when network congestion arrives and the apps are out
> there. Maybe it should be negotiated -namenode provides an ordered list and
> the client can choose some based on its own measurements. If the name node
> provides one only, that's the one you get to use
>


Re: DFS. How to read from a specific datanode

2008-08-07 Thread Steve Loughran

Kevin wrote:

Thank you for the suggestion. I looked at DFSClient. It appears that
chooseDataNode method decides which data node to connect to. Currently
it chooses the first non-dead data node returned by namenode, which
have sorted the nodes by proximity to the client. However,
chooseDataNode is private, so overriding it seems infeasible. Neither
are the callers of chooseDataNode public or protected.

I need this because I do not want to trust namenode's ordering. For
applications where network congestion is rare, we should let the
client to decide which data node to load from.



dangerous. what happens when network congestion arrives and the apps are 
out there. Maybe it should be negotiated -namenode provides an ordered 
list and the client can choose some based on its own measurements. If 
the name node provides one only, that's the one you get to use


Re: DFS. How to read from a specific datanode

2008-08-06 Thread Kevin
Thank you for the idea of submitting request. However, I guess I could
not wait until it is served. The worst case is that I would probably
hack my copy of hadoop and rebuild it.

-Kevin



On Wed, Aug 6, 2008 at 11:31 AM, lohit <[EMAIL PROTECTED]> wrote:
>>I need this because I do not want to trust namenode's ordering. For
>>applications where network congestion is rare, we should let the
>>client to decide which data node to load from.
>
> If this is the case, then providing a method to re-order the datanode list 
> shouldnt be hard. May be open a JIRA 
> (https://issues.apache.org/jira/secure/CreateIssue!default.jspa) as 
> improvement request and continue the discussion there?
>
> -Lohit
>
>
> - Original Message 
> From: Kevin <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Wednesday, August 6, 2008 10:37:44 AM
> Subject: Re: DFS. How to read from a specific datanode
>
> Thank you for the suggestion. I looked at DFSClient. It appears that
> chooseDataNode method decides which data node to connect to. Currently
> it chooses the first non-dead data node returned by namenode, which
> have sorted the nodes by proximity to the client. However,
> chooseDataNode is private, so overriding it seems infeasible. Neither
> are the callers of chooseDataNode public or protected.
>
> I need this because I do not want to trust namenode's ordering. For
> applications where network congestion is rare, we should let the
> client to decide which data node to load from.
>
> -Kevin
>
>
>
> On Tue, Aug 5, 2008 at 7:57 PM, lohit <[EMAIL PROTECTED]> wrote:
>>  I havent tried it, but see if you can create DFSClient object and use its 
>> open() and read() calls to get the job done. Basically you would have to 
>> force currentNode to be your node of interest in there.
>> Just curious, what is the use case for your request?
>>
>> Thanks,
>> Lohit
>>
>>
>>
>> - Original Message 
>> From: Kevin <[EMAIL PROTECTED]>
>> To: "core-user@hadoop.apache.org" 
>> Sent: Tuesday, August 5, 2008 6:59:55 PM
>> Subject: DFS. How to read from a specific datanode
>>
>> Hi,
>>
>> This is about dfs only, not to consider mapreduce. It may sound like a
>> strange need, but sometimes I want to read a block from a specific
>> data node which holds a replica. Figuring out which datanodes have the
>> block is easy. But is there an easy way to specify which datanode I
>> want to load from?
>>
>> Best,
>> -Kevin
>>
>>
>
>


Re: DFS. How to read from a specific datanode

2008-08-06 Thread lohit
>I need this because I do not want to trust namenode's ordering. For
>applications where network congestion is rare, we should let the
>client to decide which data node to load from.

If this is the case, then providing a method to re-order the datanode list 
shouldnt be hard. May be open a JIRA 
(https://issues.apache.org/jira/secure/CreateIssue!default.jspa) as improvement 
request and continue the discussion there?

-Lohit


- Original Message 
From: Kevin <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, August 6, 2008 10:37:44 AM
Subject: Re: DFS. How to read from a specific datanode

Thank you for the suggestion. I looked at DFSClient. It appears that
chooseDataNode method decides which data node to connect to. Currently
it chooses the first non-dead data node returned by namenode, which
have sorted the nodes by proximity to the client. However,
chooseDataNode is private, so overriding it seems infeasible. Neither
are the callers of chooseDataNode public or protected.

I need this because I do not want to trust namenode's ordering. For
applications where network congestion is rare, we should let the
client to decide which data node to load from.

-Kevin



On Tue, Aug 5, 2008 at 7:57 PM, lohit <[EMAIL PROTECTED]> wrote:
>  I havent tried it, but see if you can create DFSClient object and use its 
> open() and read() calls to get the job done. Basically you would have to 
> force currentNode to be your node of interest in there.
> Just curious, what is the use case for your request?
>
> Thanks,
> Lohit
>
>
>
> - Original Message 
> From: Kevin <[EMAIL PROTECTED]>
> To: "core-user@hadoop.apache.org" 
> Sent: Tuesday, August 5, 2008 6:59:55 PM
> Subject: DFS. How to read from a specific datanode
>
> Hi,
>
> This is about dfs only, not to consider mapreduce. It may sound like a
> strange need, but sometimes I want to read a block from a specific
> data node which holds a replica. Figuring out which datanodes have the
> block is easy. But is there an easy way to specify which datanode I
> want to load from?
>
> Best,
> -Kevin
>
>



Re: DFS. How to read from a specific datanode

2008-08-06 Thread Kevin
Yes, the namenode is in charge of deciding the proximity by using
DNSToSwitchMapping. On the other hand, I am exploring the possibility
to let the client decide which data node to connect to, since
sometimes network hierarchy is so complex or dynamic that we better
leave it to the client to find out which datanode is nearest.

-Kevin



On Wed, Aug 6, 2008 at 2:31 AM, Samuel Guo <[EMAIL PROTECTED]> wrote:
> Kevin 写道:
>>
>> Hi,
>>
>> This is about dfs only, not to consider mapreduce. It may sound like a
>> strange need, but sometimes I want to read a block from a specific
>> data node which holds a replica. Figuring out which datanodes have the
>> block is easy. But is there an easy way to specify which datanode I
>> want to load from?
>>
>> Best,
>> -Kevin
>>
>
> DFSClient will choose a node that contains a replicas of the block for you.
> The chosen node will be the nearest node to your client node. This method is
> awesome.
> plz let me know why you want to specify the datanode yourself :)
>


Re: DFS. How to read from a specific datanode

2008-08-06 Thread Kevin
Thank you for the suggestion. I looked at DFSClient. It appears that
chooseDataNode method decides which data node to connect to. Currently
it chooses the first non-dead data node returned by namenode, which
have sorted the nodes by proximity to the client. However,
chooseDataNode is private, so overriding it seems infeasible. Neither
are the callers of chooseDataNode public or protected.

I need this because I do not want to trust namenode's ordering. For
applications where network congestion is rare, we should let the
client to decide which data node to load from.

-Kevin



On Tue, Aug 5, 2008 at 7:57 PM, lohit <[EMAIL PROTECTED]> wrote:
>  I havent tried it, but see if you can create DFSClient object and use its 
> open() and read() calls to get the job done. Basically you would have to 
> force currentNode to be your node of interest in there.
> Just curious, what is the use case for your request?
>
> Thanks,
> Lohit
>
>
>
> - Original Message 
> From: Kevin <[EMAIL PROTECTED]>
> To: "core-user@hadoop.apache.org" 
> Sent: Tuesday, August 5, 2008 6:59:55 PM
> Subject: DFS. How to read from a specific datanode
>
> Hi,
>
> This is about dfs only, not to consider mapreduce. It may sound like a
> strange need, but sometimes I want to read a block from a specific
> data node which holds a replica. Figuring out which datanodes have the
> block is easy. But is there an easy way to specify which datanode I
> want to load from?
>
> Best,
> -Kevin
>
>


Re: DFS. How to read from a specific datanode

2008-08-06 Thread Samuel Guo

Kevin 写道:

Hi,

This is about dfs only, not to consider mapreduce. It may sound like a
strange need, but sometimes I want to read a block from a specific
data node which holds a replica. Figuring out which datanodes have the
block is easy. But is there an easy way to specify which datanode I
want to load from?

Best,
-Kevin
  
DFSClient will choose a node that contains a replicas of the block for 
you. The chosen node will be the nearest node to your client node. This 
method is awesome.

plz let me know why you want to specify the datanode yourself :)


Re: DFS. How to read from a specific datanode

2008-08-05 Thread lohit
 I havent tried it, but see if you can create DFSClient object and use its 
open() and read() calls to get the job done. Basically you would have to force 
currentNode to be your node of interest in there.
Just curious, what is the use case for your request? 

Thanks,
Lohit



- Original Message 
From: Kevin <[EMAIL PROTECTED]>
To: "core-user@hadoop.apache.org" 
Sent: Tuesday, August 5, 2008 6:59:55 PM
Subject: DFS. How to read from a specific datanode

Hi,

This is about dfs only, not to consider mapreduce. It may sound like a
strange need, but sometimes I want to read a block from a specific
data node which holds a replica. Figuring out which datanodes have the
block is easy. But is there an easy way to specify which datanode I
want to load from?

Best,
-Kevin



DFS. How to read from a specific datanode

2008-08-05 Thread Kevin
Hi,

This is about dfs only, not to consider mapreduce. It may sound like a
strange need, but sometimes I want to read a block from a specific
data node which holds a replica. Figuring out which datanodes have the
block is easy. But is there an easy way to specify which datanode I
want to load from?

Best,
-Kevin