Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

Michel Segel Thu, 03 May 2012 03:41:20 -0700

Well, you've kind of painted yourself in to a corner...
Not sure why you didn't get a response from the Cloudera lists, but it's a 
generic question...


8 out of 10 TB. Are you talking effective storage or actual disks? 
And please tell me you've already ordered more hardware.. Right?

And please tell me this isn't your production cluster...

(Strong hint to Strata and Cloudea... You really want to accept my upcoming 
proposal talk... ;-)


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 3, 2012, at 5:25 AM, Austin Chungath <austi...@gmail.com> wrote:

> Yes. This was first posted on the cloudera mailing list. There were no
> responses.
> 
> But this is not related to cloudera as such.
> 
> cdh3 is based on apache hadoop 0.20 as the base. My data is in apache
> hadoop 0.20.205
> 
> There is an upgrade namenode option when we are migrating to a higher
> version say from 0.20 to 0.20.205
> but here I am downgrading from 0.20.205 to 0.20 (cdh3)
> Is this possible?
> 
> 
> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi 
> <prash1...@gmail.com>wrote:
> 
>> Seems like a matter of upgrade. I am not a Cloudera user so would not know
>> much, but you might find some help moving this to Cloudera mailing list.
>> 
>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <austi...@gmail.com>
>> wrote:
>> 
>>> There is only one cluster. I am not copying between clusters.
>>> 
>>> Say I have a cluster running apache 0.20.205 with 10 TB storage capacity
>>> and has about 8 TB of data.
>>> Now how can I migrate the same cluster to use cdh3 and use that same 8 TB
>>> of data.
>>> 
>>> I can't copy 8 TB of data using distcp because I have only 2 TB of free
>>> space
>>> 
>>> 
>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <nitinpawar...@gmail.com>
>>> wrote:
>>> 
>>>> you can actually look at the distcp
>>>> 
>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html
>>>> 
>>>> but this means that you have two different set of clusters available to
>>> do
>>>> the migration
>>>> 
>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <austi...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Thanks for the suggestions,
>>>>> My concerns are that I can't actually copyToLocal from the dfs
>> because
>>>> the
>>>>> data is huge.
>>>>> 
>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a
>>>>> namenode upgrade. I don't have to copy data out of dfs.
>>>>> 
>>>>> But here I am having Apache hadoop 0.20.205 and I want to use CDH3
>> now,
>>>>> which is based on 0.20
>>>>> Now it is actually a downgrade as 0.20.205's namenode info has to be
>>> used
>>>>> by 0.20's namenode.
>>>>> 
>>>>> Any idea how I can achieve what I am trying to do?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar <
>> nitinpawar...@gmail.com
>>>>>> wrote:
>>>>> 
>>>>>> i can think of following options
>>>>>> 
>>>>>> 1) write a simple get and put code which gets the data from DFS and
>>>> loads
>>>>>> it in dfs
>>>>>> 2) see if the distcp  between both versions are compatible
>>>>>> 3) this is what I had done (and my data was hardly few hundred GB)
>> ..
>>>>> did a
>>>>>> dfs -copyToLocal and then in the new grid did a copyFromLocal
>>>>>> 
>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath <
>> austi...@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3.
>>>>>>> I don't want to lose the data that is in the HDFS of Apache
>> hadoop
>>>>>>> 0.20.205.
>>>>>>> How do I migrate to CDH3u3 but keep the data that I have on
>>> 0.20.205.
>>>>>>> What is the best practice/ techniques to do this?
>>>>>>> 
>>>>>>> Thanks & Regards,
>>>>>>> Austin
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Nitin Pawar
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Nitin Pawar
>>>> 
>>> 
>>

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

Reply via email to