Honestly that is a hassle, going from 205 to cdh3u3 is probably more
or a cross-grade then an upgrade or downgrade. I would just stick it
out. But yes like Michael said two clusters on the same gear and
distcp. If you are using RF=3 you could also lower your replication to
rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving
stuff.


On Thu, May 3, 2012 at 7:25 AM, Michel Segel <michael_se...@hotmail.com> wrote:
> Ok... When you get your new hardware...
>
> Set up one server as your new NN, JT, SN.
> Set up the others as a DN.
> (Cloudera CDH3u3)
>
> On your existing cluster...
> Remove your old log files, temp files on HDFS anything you don't need.
> This should give you some more space.
> Start copying some of the directories/files to the new cluster.
> As you gain space, decommission a node, rebalance, add node to new cluster...
>
> It's a slow process.
>
> Should I remind you to make sure you up you bandwidth setting, and to clean 
> up the hdfs directories when you repurpose the nodes?
>
> Does this make sense?
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On May 3, 2012, at 5:46 AM, Austin Chungath <austi...@gmail.com> wrote:
>
>> Yeah I know :-)
>> and this is not a production cluster ;-) and yes there is more hardware
>> coming :-)
>>
>> On Thu, May 3, 2012 at 4:10 PM, Michel Segel 
>> <michael_se...@hotmail.com>wrote:
>>
>>> Well, you've kind of painted yourself in to a corner...
>>> Not sure why you didn't get a response from the Cloudera lists, but it's a
>>> generic question...
>>>
>>> 8 out of 10 TB. Are you talking effective storage or actual disks?
>>> And please tell me you've already ordered more hardware.. Right?
>>>
>>> And please tell me this isn't your production cluster...
>>>
>>> (Strong hint to Strata and Cloudea... You really want to accept my
>>> upcoming proposal talk... ;-)
>>>
>>>
>>> Sent from a remote device. Please excuse any typos...
>>>
>>> Mike Segel
>>>
>>> On May 3, 2012, at 5:25 AM, Austin Chungath <austi...@gmail.com> wrote:
>>>
>>>> Yes. This was first posted on the cloudera mailing list. There were no
>>>> responses.
>>>>
>>>> But this is not related to cloudera as such.
>>>>
>>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in apache
>>>> hadoop 0.20.205
>>>>
>>>> There is an upgrade namenode option when we are migrating to a higher
>>>> version say from 0.20 to 0.20.205
>>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3)
>>>> Is this possible?
>>>>
>>>>
>>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi <prash1...@gmail.com
>>>> wrote:
>>>>
>>>>> Seems like a matter of upgrade. I am not a Cloudera user so would not
>>> know
>>>>> much, but you might find some help moving this to Cloudera mailing list.
>>>>>
>>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <austi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> There is only one cluster. I am not copying between clusters.
>>>>>>
>>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage
>>> capacity
>>>>>> and has about 8 TB of data.
>>>>>> Now how can I migrate the same cluster to use cdh3 and use that same 8
>>> TB
>>>>>> of data.
>>>>>>
>>>>>> I can't copy 8 TB of data using distcp because I have only 2 TB of free
>>>>>> space
>>>>>>
>>>>>>
>>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <nitinpawar...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> you can actually look at the distcp
>>>>>>>
>>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html
>>>>>>>
>>>>>>> but this means that you have two different set of clusters available
>>> to
>>>>>> do
>>>>>>> the migration
>>>>>>>
>>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <austi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the suggestions,
>>>>>>>> My concerns are that I can't actually copyToLocal from the dfs
>>>>> because
>>>>>>> the
>>>>>>>> data is huge.
>>>>>>>>
>>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a
>>>>>>>> namenode upgrade. I don't have to copy data out of dfs.
>>>>>>>>
>>>>>>>> But here I am having Apache hadoop 0.20.205 and I want to use CDH3
>>>>> now,
>>>>>>>> which is based on 0.20
>>>>>>>> Now it is actually a downgrade as 0.20.205's namenode info has to be
>>>>>> used
>>>>>>>> by 0.20's namenode.
>>>>>>>>
>>>>>>>> Any idea how I can achieve what I am trying to do?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar <
>>>>> nitinpawar...@gmail.com
>>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> i can think of following options
>>>>>>>>>
>>>>>>>>> 1) write a simple get and put code which gets the data from DFS and
>>>>>>> loads
>>>>>>>>> it in dfs
>>>>>>>>> 2) see if the distcp  between both versions are compatible
>>>>>>>>> 3) this is what I had done (and my data was hardly few hundred GB)
>>>>> ..
>>>>>>>> did a
>>>>>>>>> dfs -copyToLocal and then in the new grid did a copyFromLocal
>>>>>>>>>
>>>>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath <
>>>>> austi...@gmail.com
>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3.
>>>>>>>>>> I don't want to lose the data that is in the HDFS of Apache
>>>>> hadoop
>>>>>>>>>> 0.20.205.
>>>>>>>>>> How do I migrate to CDH3u3 but keep the data that I have on
>>>>>> 0.20.205.
>>>>>>>>>> What is the best practice/ techniques to do this?
>>>>>>>>>>
>>>>>>>>>> Thanks & Regards,
>>>>>>>>>> Austin
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Nitin Pawar
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Nitin Pawar
>>>>>>>
>>>>>>
>>>>>
>>>

Reply via email to