Re: Copying a file to specified nodes
Yes, I've tried the long solution; when I execute ./hadoop dfs -put ... from a datanode, in any case 1 copy gets written to that datanode. But I think I should use SSH for this, Anybody knows a better way? Thanks, Rasit 2009/2/16 Rasit OZDAS : > Thanks, Jeff. > After considering JIRA link you've given and making some investigation: > > It seems that this JIRA ticket didn't draw much attention, so will > take much time to be considered. > After some more investigation I found out that when I copy the file to > HDFS from a specific DataNode, first copy will be written to that > DataNode itself. This solution will take long to implement, I think. > But we definitely need this feature, so if we have no other choice, > we'll go though it. > > Any further info (or comments on my solution) is appreciated. > > Cheers, > Rasit > > 2009/2/10 Jeff Hammerbacher : >> Hey Rasit, >> >> I'm not sure I fully understand your description of the problem, but >> you might want to check out the JIRA ticket for making the replica >> placement algorithms in HDFS pluggable >> (https://issues.apache.org/jira/browse/HADOOP-3799) and add your use >> case there. >> >> Regards, >> Jeff >> >> On Tue, Feb 10, 2009 at 5:05 AM, Rasit OZDAS wrote: >>> >>> Hi, >>> >>> We have thousands of files, each dedicated to a user. (Each user has >>> access to other users' files, but they do this not very often.) >>> Each user runs map-reduce jobs on the cluster. >>> So we should seperate his/her files equally across the cluster, >>> so that every machine can take part in the process (assuming he/she is >>> the only user running jobs). >>> For this we should initially copy files to specified nodes: >>> User A : first file : Node 1, second file: Node 2, .. etc. >>> User B : first file : Node 1, second file: Node 2, .. etc. >>> >>> I know, hadoop create also replicas, but in our solution at least one >>> file will be in the right place >>> (or we're willing to control other replicas too). >>> >>> Rebalancing is also not a problem, assuming it uses the information >>> about how much a computer is in use. >>> It even helps for a better organization of files. >>> >>> How can we copy files to specified nodes? >>> Or do you have a better solution for us? >>> >>> I couldn't find a solution to this, probably such an option doesn't exist. >>> But I wanted to take an expert's opinion about this. >>> >>> Thanks in advance.. >>> Rasit >> > > > > -- > M. Raşit ÖZDAŞ > -- M. Raşit ÖZDAŞ
Re: Copying a file to specified nodes
Thanks, Jeff. After considering JIRA link you've given and making some investigation: It seems that this JIRA ticket didn't draw much attention, so will take much time to be considered. After some more investigation I found out that when I copy the file to HDFS from a specific DataNode, first copy will be written to that DataNode itself. This solution will take long to implement, I think. But we definitely need this feature, so if we have no other choice, we'll go though it. Any further info (or comments on my solution) is appreciated. Cheers, Rasit 2009/2/10 Jeff Hammerbacher : > Hey Rasit, > > I'm not sure I fully understand your description of the problem, but > you might want to check out the JIRA ticket for making the replica > placement algorithms in HDFS pluggable > (https://issues.apache.org/jira/browse/HADOOP-3799) and add your use > case there. > > Regards, > Jeff > > On Tue, Feb 10, 2009 at 5:05 AM, Rasit OZDAS wrote: >> >> Hi, >> >> We have thousands of files, each dedicated to a user. (Each user has >> access to other users' files, but they do this not very often.) >> Each user runs map-reduce jobs on the cluster. >> So we should seperate his/her files equally across the cluster, >> so that every machine can take part in the process (assuming he/she is >> the only user running jobs). >> For this we should initially copy files to specified nodes: >> User A : first file : Node 1, second file: Node 2, .. etc. >> User B : first file : Node 1, second file: Node 2, .. etc. >> >> I know, hadoop create also replicas, but in our solution at least one >> file will be in the right place >> (or we're willing to control other replicas too). >> >> Rebalancing is also not a problem, assuming it uses the information >> about how much a computer is in use. >> It even helps for a better organization of files. >> >> How can we copy files to specified nodes? >> Or do you have a better solution for us? >> >> I couldn't find a solution to this, probably such an option doesn't exist. >> But I wanted to take an expert's opinion about this. >> >> Thanks in advance.. >> Rasit > -- M. Raşit ÖZDAŞ
Re: Copying a file to specified nodes
Hey Rasit, I'm not sure I fully understand your description of the problem, but you might want to check out the JIRA ticket for making the replica placement algorithms in HDFS pluggable (https://issues.apache.org/jira/browse/HADOOP-3799) and add your use case there. Regards, Jeff On Tue, Feb 10, 2009 at 5:05 AM, Rasit OZDAS wrote: > > Hi, > > We have thousands of files, each dedicated to a user. (Each user has > access to other users' files, but they do this not very often.) > Each user runs map-reduce jobs on the cluster. > So we should seperate his/her files equally across the cluster, > so that every machine can take part in the process (assuming he/she is > the only user running jobs). > For this we should initially copy files to specified nodes: > User A : first file : Node 1, second file: Node 2, .. etc. > User B : first file : Node 1, second file: Node 2, .. etc. > > I know, hadoop create also replicas, but in our solution at least one > file will be in the right place > (or we're willing to control other replicas too). > > Rebalancing is also not a problem, assuming it uses the information > about how much a computer is in use. > It even helps for a better organization of files. > > How can we copy files to specified nodes? > Or do you have a better solution for us? > > I couldn't find a solution to this, probably such an option doesn't exist. > But I wanted to take an expert's opinion about this. > > Thanks in advance.. > Rasit
Copying a file to specified nodes
Hi, We have thousands of files, each dedicated to a user. (Each user has access to other users' files, but they do this not very often.) Each user runs map-reduce jobs on the cluster. So we should seperate his/her files equally across the cluster, so that every machine can take part in the process (assuming he/she is the only user running jobs). For this we should initially copy files to specified nodes: User A : first file : Node 1, second file: Node 2, .. etc. User B : first file : Node 1, second file: Node 2, .. etc. I know, hadoop create also replicas, but in our solution at least one file will be in the right place (or we're willing to control other replicas too). Rebalancing is also not a problem, assuming it uses the information about how much a computer is in use. It even helps for a better organization of files. How can we copy files to specified nodes? Or do you have a better solution for us? I couldn't find a solution to this, probably such an option doesn't exist. But I wanted to take an expert's opinion about this. Thanks in advance.. Rasit