ok that was a lame mistake. $ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy I had spelled hdfs instead of "hftp"
$ hadoop distcp hftp://localhost:50070/docs/index.html hftp://localhost:60070/user/hadoop 12/05/07 16:38:09 INFO tools.DistCp: srcPaths=[hftp://localhost:50070/docs/index.html] 12/05/07 16:38:09 INFO tools.DistCp: destPath=hftp://localhost:60070/user/hadoop With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Not supported at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457) at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) Any idea why this error is coming? I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3 (/user/hadoop) Thanks & Regards, Austin On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <austi...@gmail.com> wrote: > Thanks, > > So I decided to try and move using distcp. > > $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy > 12/05/07 14:57:38 INFO tools.DistCp: srcPaths=[hdfs://localhost:54310/tmp] > 12/05/07 14:57:38 INFO tools.DistCp: > destPath=hdfs://localhost:8021/tmp_copy > With failures, global counters are inaccurate; consider running with -i > Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol > org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = > 63, server = 61) > > I found that we can do distcp like above only if both are of the same > hadoop version. > so I tried: > > $ hadoop distcp hftp://localhost:50070/tmp hdfs://localhost:60070/tmp_copy > 12/05/07 15:02:44 INFO tools.DistCp: srcPaths=[hftp://localhost:50070/tmp] > 12/05/07 15:02:44 INFO tools.DistCp: > destPath=hdfs://localhost:60070/tmp_copy > > But this process seemed to be hangs at this stage. What might I be doing > wrong? > > hftp://<dfs.http.address>/<path> > hftp://localhost:50070 is dfs.http.address of 0.20.205 > hdfs://localhost:60070 is dfs.http.address of cdh3u3 > > Thanks and regards, > Austin > > > On Fri, May 4, 2012 at 4:30 AM, Michel Segel <michael_se...@hotmail.com>wrote: > >> Ok... So riddle me this... >> I currently have a replication factor of 3. >> I reset it to two. >> >> What do you have to do to get the replication factor of 3 down to 2? >> Do I just try to rebalance the nodes? >> >> The point is that you are looking at a very small cluster. >> You may want to start the be cluster with a replication factor of 2 and >> then when the data is moved over, increase it to a factor of 3. Or maybe >> not. >> >> I do a distcp to. Copy the data and after each distcp, I do an fsck for a >> sanity check and then remove the files I copied. As I gain more room, I can >> then slowly drop nodes, do an fsck, rebalance and then repeat. >> >> Even though this us a dev cluster, the OP wants to retain the data. >> >> There are other options depending on the amount and size of new hardware. >> I mean make one machine a RAID 5 machine, copy data to it clearing off >> the cluster. >> >> If 8TB was the amount of disk used, that would be 2.6666 TB used. >> Let's say 3TB. Going raid 5, how much disk is that? So you could fit it >> on one machine, depending on hardware, or maybe 2 machines... Now you can >> rebuild initial cluster and then move data back. Then rebuild those >> machines. Lots of options... ;-) >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On May 3, 2012, at 11:26 AM, Suresh Srinivas <sur...@hortonworks.com> >> wrote: >> >> > This probably is a more relevant question in CDH mailing lists. That >> said, >> > what Edward is suggesting seems reasonable. Reduce replication factor, >> > decommission some of the nodes and create a new cluster with those nodes >> > and do distcp. >> > >> > Could you share with us the reasons you want to migrate from Apache 205? >> > >> > Regards, >> > Suresh >> > >> > On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo <edlinuxg...@gmail.com >> >wrote: >> > >> >> Honestly that is a hassle, going from 205 to cdh3u3 is probably more >> >> or a cross-grade then an upgrade or downgrade. I would just stick it >> >> out. But yes like Michael said two clusters on the same gear and >> >> distcp. If you are using RF=3 you could also lower your replication to >> >> rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving >> >> stuff. >> >> >> >> >> >> On Thu, May 3, 2012 at 7:25 AM, Michel Segel < >> michael_se...@hotmail.com> >> >> wrote: >> >>> Ok... When you get your new hardware... >> >>> >> >>> Set up one server as your new NN, JT, SN. >> >>> Set up the others as a DN. >> >>> (Cloudera CDH3u3) >> >>> >> >>> On your existing cluster... >> >>> Remove your old log files, temp files on HDFS anything you don't need. >> >>> This should give you some more space. >> >>> Start copying some of the directories/files to the new cluster. >> >>> As you gain space, decommission a node, rebalance, add node to new >> >> cluster... >> >>> >> >>> It's a slow process. >> >>> >> >>> Should I remind you to make sure you up you bandwidth setting, and to >> >> clean up the hdfs directories when you repurpose the nodes? >> >>> >> >>> Does this make sense? >> >>> >> >>> Sent from a remote device. Please excuse any typos... >> >>> >> >>> Mike Segel >> >>> >> >>> On May 3, 2012, at 5:46 AM, Austin Chungath <austi...@gmail.com> >> wrote: >> >>> >> >>>> Yeah I know :-) >> >>>> and this is not a production cluster ;-) and yes there is more >> hardware >> >>>> coming :-) >> >>>> >> >>>> On Thu, May 3, 2012 at 4:10 PM, Michel Segel < >> michael_se...@hotmail.com >> >>> wrote: >> >>>> >> >>>>> Well, you've kind of painted yourself in to a corner... >> >>>>> Not sure why you didn't get a response from the Cloudera lists, but >> >> it's a >> >>>>> generic question... >> >>>>> >> >>>>> 8 out of 10 TB. Are you talking effective storage or actual disks? >> >>>>> And please tell me you've already ordered more hardware.. Right? >> >>>>> >> >>>>> And please tell me this isn't your production cluster... >> >>>>> >> >>>>> (Strong hint to Strata and Cloudea... You really want to accept my >> >>>>> upcoming proposal talk... ;-) >> >>>>> >> >>>>> >> >>>>> Sent from a remote device. Please excuse any typos... >> >>>>> >> >>>>> Mike Segel >> >>>>> >> >>>>> On May 3, 2012, at 5:25 AM, Austin Chungath <austi...@gmail.com> >> >> wrote: >> >>>>> >> >>>>>> Yes. This was first posted on the cloudera mailing list. There >> were no >> >>>>>> responses. >> >>>>>> >> >>>>>> But this is not related to cloudera as such. >> >>>>>> >> >>>>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in >> apache >> >>>>>> hadoop 0.20.205 >> >>>>>> >> >>>>>> There is an upgrade namenode option when we are migrating to a >> higher >> >>>>>> version say from 0.20 to 0.20.205 >> >>>>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3) >> >>>>>> Is this possible? >> >>>>>> >> >>>>>> >> >>>>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi < >> >> prash1...@gmail.com >> >>>>>> wrote: >> >>>>>> >> >>>>>>> Seems like a matter of upgrade. I am not a Cloudera user so would >> not >> >>>>> know >> >>>>>>> much, but you might find some help moving this to Cloudera mailing >> >> list. >> >>>>>>> >> >>>>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath < >> austi...@gmail.com> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>>> There is only one cluster. I am not copying between clusters. >> >>>>>>>> >> >>>>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage >> >>>>> capacity >> >>>>>>>> and has about 8 TB of data. >> >>>>>>>> Now how can I migrate the same cluster to use cdh3 and use that >> >> same 8 >> >>>>> TB >> >>>>>>>> of data. >> >>>>>>>> >> >>>>>>>> I can't copy 8 TB of data using distcp because I have only 2 TB >> of >> >> free >> >>>>>>>> space >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar < >> >> nitinpawar...@gmail.com> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> you can actually look at the distcp >> >>>>>>>>> >> >>>>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html >> >>>>>>>>> >> >>>>>>>>> but this means that you have two different set of clusters >> >> available >> >>>>> to >> >>>>>>>> do >> >>>>>>>>> the migration >> >>>>>>>>> >> >>>>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath < >> >> austi...@gmail.com> >> >>>>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> Thanks for the suggestions, >> >>>>>>>>>> My concerns are that I can't actually copyToLocal from the dfs >> >>>>>>> because >> >>>>>>>>> the >> >>>>>>>>>> data is huge. >> >>>>>>>>>> >> >>>>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can >> do >> >> a >> >>>>>>>>>> namenode upgrade. I don't have to copy data out of dfs. >> >>>>>>>>>> >> >>>>>>>>>> But here I am having Apache hadoop 0.20.205 and I want to use >> CDH3 >> >>>>>>> now, >> >>>>>>>>>> which is based on 0.20 >> >>>>>>>>>> Now it is actually a downgrade as 0.20.205's namenode info has >> to >> >> be >> >>>>>>>> used >> >>>>>>>>>> by 0.20's namenode. >> >>>>>>>>>> >> >>>>>>>>>> Any idea how I can achieve what I am trying to do? >> >>>>>>>>>> >> >>>>>>>>>> Thanks. >> >>>>>>>>>> >> >>>>>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar < >> >>>>>>> nitinpawar...@gmail.com >> >>>>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> i can think of following options >> >>>>>>>>>>> >> >>>>>>>>>>> 1) write a simple get and put code which gets the data from >> DFS >> >> and >> >>>>>>>>> loads >> >>>>>>>>>>> it in dfs >> >>>>>>>>>>> 2) see if the distcp between both versions are compatible >> >>>>>>>>>>> 3) this is what I had done (and my data was hardly few hundred >> >> GB) >> >>>>>>> .. >> >>>>>>>>>> did a >> >>>>>>>>>>> dfs -copyToLocal and then in the new grid did a copyFromLocal >> >>>>>>>>>>> >> >>>>>>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath < >> >>>>>>> austi...@gmail.com >> >>>>>>>>> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> Hi, >> >>>>>>>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3. >> >>>>>>>>>>>> I don't want to lose the data that is in the HDFS of Apache >> >>>>>>> hadoop >> >>>>>>>>>>>> 0.20.205. >> >>>>>>>>>>>> How do I migrate to CDH3u3 but keep the data that I have on >> >>>>>>>> 0.20.205. >> >>>>>>>>>>>> What is the best practice/ techniques to do this? >> >>>>>>>>>>>> >> >>>>>>>>>>>> Thanks & Regards, >> >>>>>>>>>>>> Austin >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> -- >> >>>>>>>>>>> Nitin Pawar >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> Nitin Pawar >> >>>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>> >> >> >> > >