Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

Austin Chungath Mon, 07 May 2012 04:14:47 -0700

ok that was a lame mistake.
$ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy
I had spelled hdfs instead of "hftp"


$ hadoop distcp hftp://localhost:50070/docs/index.html
hftp://localhost:60070/user/hadoop
12/05/07 16:38:09 INFO tools.DistCp:
srcPaths=[hftp://localhost:50070/docs/index.html]
12/05/07 16:38:09 INFO tools.DistCp:
destPath=hftp://localhost:60070/user/hadoop
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Not supported
at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457)
at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

Any idea why this error is coming?
I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3
(/user/hadoop)

Thanks & Regards,
Austin

On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <austi...@gmail.com> wrote:

> Thanks,
>
> So I decided to try and move using distcp.
>
> $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy
> 12/05/07 14:57:38 INFO tools.DistCp: srcPaths=[hdfs://localhost:54310/tmp]
> 12/05/07 14:57:38 INFO tools.DistCp:
> destPath=hdfs://localhost:8021/tmp_copy
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol
> org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client =
> 63, server = 61)
>
> I found that we can do distcp like above only if both are of the same
> hadoop version.
> so I tried:
>
> $ hadoop distcp hftp://localhost:50070/tmp hdfs://localhost:60070/tmp_copy
> 12/05/07 15:02:44 INFO tools.DistCp: srcPaths=[hftp://localhost:50070/tmp]
> 12/05/07 15:02:44 INFO tools.DistCp:
> destPath=hdfs://localhost:60070/tmp_copy
>
> But this process seemed to be hangs at this stage. What might I be doing
> wrong?
>
> hftp://<dfs.http.address>/<path>
> hftp://localhost:50070 is dfs.http.address of 0.20.205
> hdfs://localhost:60070 is dfs.http.address of cdh3u3
>
> Thanks and regards,
> Austin
>
>
> On Fri, May 4, 2012 at 4:30 AM, Michel Segel <michael_se...@hotmail.com>wrote:
>
>> Ok... So riddle me this...
>> I currently have a replication factor of 3.
>> I reset it to two.
>>
>> What do you have to do to get the replication factor of 3 down to 2?
>> Do I just try to rebalance the nodes?
>>
>> The point is that you are looking at a very small cluster.
>> You may want to start the be cluster with a replication factor of 2 and
>> then when the data is moved over, increase it to a factor of 3. Or maybe
>> not.
>>
>> I do a distcp to. Copy the data and after each distcp, I do an fsck for a
>> sanity check and then remove the files I copied. As I gain more room, I can
>> then slowly drop nodes, do an fsck, rebalance and then repeat.
>>
>> Even though this us a dev cluster, the OP wants to retain the data.
>>
>> There are other options depending on the amount and size of new hardware.
>> I mean make one machine a RAID 5 machine, copy data to it clearing off
>> the cluster.
>>
>> If 8TB was the amount of disk used, that would be 2.6666 TB used.
>> Let's say 3TB. Going raid 5, how much disk is that?  So you could fit it
>> on one machine, depending on hardware, or maybe 2 machines...  Now you can
>> rebuild initial cluster and then move data back. Then rebuild those
>> machines. Lots of options... ;-)
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On May 3, 2012, at 11:26 AM, Suresh Srinivas <sur...@hortonworks.com>
>> wrote:
>>
>> > This probably is a more relevant question in CDH mailing lists. That
>> said,
>> > what Edward is suggesting seems reasonable. Reduce replication factor,
>> > decommission some of the nodes and create a new cluster with those nodes
>> > and do distcp.
>> >
>> > Could you share with us the reasons you want to migrate from Apache 205?
>> >
>> > Regards,
>> > Suresh
>> >
>> > On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo <edlinuxg...@gmail.com
>> >wrote:
>> >
>> >> Honestly that is a hassle, going from 205 to cdh3u3 is probably more
>> >> or a cross-grade then an upgrade or downgrade. I would just stick it
>> >> out. But yes like Michael said two clusters on the same gear and
>> >> distcp. If you are using RF=3 you could also lower your replication to
>> >> rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving
>> >> stuff.
>> >>
>> >>
>> >> On Thu, May 3, 2012 at 7:25 AM, Michel Segel <
>> michael_se...@hotmail.com>
>> >> wrote:
>> >>> Ok... When you get your new hardware...
>> >>>
>> >>> Set up one server as your new NN, JT, SN.
>> >>> Set up the others as a DN.
>> >>> (Cloudera CDH3u3)
>> >>>
>> >>> On your existing cluster...
>> >>> Remove your old log files, temp files on HDFS anything you don't need.
>> >>> This should give you some more space.
>> >>> Start copying some of the directories/files to the new cluster.
>> >>> As you gain space, decommission a node, rebalance, add node to new
>> >> cluster...
>> >>>
>> >>> It's a slow process.
>> >>>
>> >>> Should I remind you to make sure you up you bandwidth setting, and to
>> >> clean up the hdfs directories when you repurpose the nodes?
>> >>>
>> >>> Does this make sense?
>> >>>
>> >>> Sent from a remote device. Please excuse any typos...
>> >>>
>> >>> Mike Segel
>> >>>
>> >>> On May 3, 2012, at 5:46 AM, Austin Chungath <austi...@gmail.com>
>> wrote:
>> >>>
>> >>>> Yeah I know :-)
>> >>>> and this is not a production cluster ;-) and yes there is more
>> hardware
>> >>>> coming :-)
>> >>>>
>> >>>> On Thu, May 3, 2012 at 4:10 PM, Michel Segel <
>> michael_se...@hotmail.com
>> >>> wrote:
>> >>>>
>> >>>>> Well, you've kind of painted yourself in to a corner...
>> >>>>> Not sure why you didn't get a response from the Cloudera lists, but
>> >> it's a
>> >>>>> generic question...
>> >>>>>
>> >>>>> 8 out of 10 TB. Are you talking effective storage or actual disks?
>> >>>>> And please tell me you've already ordered more hardware.. Right?
>> >>>>>
>> >>>>> And please tell me this isn't your production cluster...
>> >>>>>
>> >>>>> (Strong hint to Strata and Cloudea... You really want to accept my
>> >>>>> upcoming proposal talk... ;-)
>> >>>>>
>> >>>>>
>> >>>>> Sent from a remote device. Please excuse any typos...
>> >>>>>
>> >>>>> Mike Segel
>> >>>>>
>> >>>>> On May 3, 2012, at 5:25 AM, Austin Chungath <austi...@gmail.com>
>> >> wrote:
>> >>>>>
>> >>>>>> Yes. This was first posted on the cloudera mailing list. There
>> were no
>> >>>>>> responses.
>> >>>>>>
>> >>>>>> But this is not related to cloudera as such.
>> >>>>>>
>> >>>>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in
>> apache
>> >>>>>> hadoop 0.20.205
>> >>>>>>
>> >>>>>> There is an upgrade namenode option when we are migrating to a
>> higher
>> >>>>>> version say from 0.20 to 0.20.205
>> >>>>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3)
>> >>>>>> Is this possible?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi <
>> >> prash1...@gmail.com
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Seems like a matter of upgrade. I am not a Cloudera user so would
>> not
>> >>>>> know
>> >>>>>>> much, but you might find some help moving this to Cloudera mailing
>> >> list.
>> >>>>>>>
>> >>>>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <
>> austi...@gmail.com>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> There is only one cluster. I am not copying between clusters.
>> >>>>>>>>
>> >>>>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage
>> >>>>> capacity
>> >>>>>>>> and has about 8 TB of data.
>> >>>>>>>> Now how can I migrate the same cluster to use cdh3 and use that
>> >> same 8
>> >>>>> TB
>> >>>>>>>> of data.
>> >>>>>>>>
>> >>>>>>>> I can't copy 8 TB of data using distcp because I have only 2 TB
>> of
>> >> free
>> >>>>>>>> space
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <
>> >> nitinpawar...@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> you can actually look at the distcp
>> >>>>>>>>>
>> >>>>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html
>> >>>>>>>>>
>> >>>>>>>>> but this means that you have two different set of clusters
>> >> available
>> >>>>> to
>> >>>>>>>> do
>> >>>>>>>>> the migration
>> >>>>>>>>>
>> >>>>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <
>> >> austi...@gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Thanks for the suggestions,
>> >>>>>>>>>> My concerns are that I can't actually copyToLocal from the dfs
>> >>>>>>> because
>> >>>>>>>>> the
>> >>>>>>>>>> data is huge.
>> >>>>>>>>>>
>> >>>>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can
>> do
>> >> a
>> >>>>>>>>>> namenode upgrade. I don't have to copy data out of dfs.
>> >>>>>>>>>>
>> >>>>>>>>>> But here I am having Apache hadoop 0.20.205 and I want to use
>> CDH3
>> >>>>>>> now,
>> >>>>>>>>>> which is based on 0.20
>> >>>>>>>>>> Now it is actually a downgrade as 0.20.205's namenode info has
>> to
>> >> be
>> >>>>>>>> used
>> >>>>>>>>>> by 0.20's namenode.
>> >>>>>>>>>>
>> >>>>>>>>>> Any idea how I can achieve what I am trying to do?
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks.
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar <
>> >>>>>>> nitinpawar...@gmail.com
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> i can think of following options
>> >>>>>>>>>>>
>> >>>>>>>>>>> 1) write a simple get and put code which gets the data from
>> DFS
>> >> and
>> >>>>>>>>> loads
>> >>>>>>>>>>> it in dfs
>> >>>>>>>>>>> 2) see if the distcp  between both versions are compatible
>> >>>>>>>>>>> 3) this is what I had done (and my data was hardly few hundred
>> >> GB)
>> >>>>>>> ..
>> >>>>>>>>>> did a
>> >>>>>>>>>>> dfs -copyToLocal and then in the new grid did a copyFromLocal
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath <
>> >>>>>>> austi...@gmail.com
>> >>>>>>>>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi,
>> >>>>>>>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3.
>> >>>>>>>>>>>> I don't want to lose the data that is in the HDFS of Apache
>> >>>>>>> hadoop
>> >>>>>>>>>>>> 0.20.205.
>> >>>>>>>>>>>> How do I migrate to CDH3u3 but keep the data that I have on
>> >>>>>>>> 0.20.205.
>> >>>>>>>>>>>> What is the best practice/ techniques to do this?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Thanks & Regards,
>> >>>>>>>>>>>> Austin
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>> Nitin Pawar
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Nitin Pawar
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>
>>
>
>

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

Reply via email to