Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Suresh Srinivas
You did free up lot of old generation with reducing young generation,
right? The extra 5G of RAM for the old generation should have helped.

Based on my calculation, for the current number of objects you have, you
need roughly:
12G of total heap with young generation size of 1G. This assumes the
average file name size is 32 bytes.

In later releases (= 0.20.204), several memory optimization and startup
optimizations have been done. It should help you as well.



On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 So it turns out the issue was just the size of the filesystem.
 2012-12-27 16:37:22,390 WARN
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done.
 New Image Size: 4,354,340,042

 Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So you
 need about 3x ram as your FSImage size. If you do not have enough you die a
 slow death.

 On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas sur...@hortonworks.com
 wrote:

  Do not have access to my computer. Based on reading the previous email, I
  do not see any thing suspicious on the list of objects in the histo live
  dump.
 
  I would like to hear from you about if it continued to grow. One instance
  of this I had seen in the past was related to weak reference related to
  socket objects.  I do not see that happening here though.
 
  Sent from phone
 
  On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
   Tried this..
  
   NameNode is still Ruining my Xmas on its slow death march to OOM.
  
   http://imagebin.org/240453
  
  
   On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas 
  sur...@hortonworks.comwrote:
  
   -XX:NewSize=1G -XX:MaxNewSize=1G
 




-- 
http://hortonworks.com/download/


Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Suresh Srinivas
I do not follow what you mean here.

 Even when I forced a GC it cleared 0% memory.
Is this with new younggen setting? Because earlier, based on the
calculation I posted, you need ~11G in old generation. With 6G as the
default younggen size, you actually had just enough memory to fit the
namespace in oldgen. Hence you might not have seen Full GC freeing up
enough memory.

Have you tried Full GC with 1G youngen size have you tried this? I supsect
you would see lot more memory freeing up.

 One would think that since the entire NameNode image is stored in memory
that the heap would not need to grow beyond that
Namenode image that you see during checkpointing is the size of file
written after serializing file system namespace in memory. This is not what
is directly stored in namenode memory. Namenode stores data structures that
corresponds to file system directory tree and block locations. Out of this
only file system directory is serialized and written to fsimage. Blocks
locations are not.




On Thu, Dec 27, 2012 at 2:22 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I am not sure GC had a factor. Even when I forced a GC it cleared 0%
 memory. One would think that since the entire NameNode image is stored in
 memory that the heap would not need to grow beyond that, but that sure does
 not seem to be the case. a 5GB image starts off using 10GB of memory and
 after burn in it seems to use about 15GB memory.

 So really we say the name node data has to fit in memory but what we
 really mean is the name node data must fit in memory 3x

 On Thu, Dec 27, 2012 at 5:08 PM, Suresh Srinivas sur...@hortonworks.com
 wrote:

  You did free up lot of old generation with reducing young generation,
  right? The extra 5G of RAM for the old generation should have helped.
 
  Based on my calculation, for the current number of objects you have, you
  need roughly:
  12G of total heap with young generation size of 1G. This assumes the
  average file name size is 32 bytes.
 
  In later releases (= 0.20.204), several memory optimization and startup
  optimizations have been done. It should help you as well.
 
 
 
  On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
   So it turns out the issue was just the size of the filesystem.
   2012-12-27 16:37:22,390 WARN
   org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint
  done.
   New Image Size: 4,354,340,042
  
   Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So
 you
   need about 3x ram as your FSImage size. If you do not have enough you
  die a
   slow death.
  
   On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas 
 sur...@hortonworks.com
   wrote:
  
Do not have access to my computer. Based on reading the previous
  email, I
do not see any thing suspicious on the list of objects in the histo
  live
dump.
   
I would like to hear from you about if it continued to grow. One
  instance
of this I had seen in the past was related to weak reference related
 to
socket objects.  I do not see that happening here though.
   
Sent from phone
   
On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com
 
wrote:
   
 Tried this..

 NameNode is still Ruining my Xmas on its slow death march to OOM.

 http://imagebin.org/240453


 On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas 
sur...@hortonworks.comwrote:

 -XX:NewSize=1G -XX:MaxNewSize=1G
   
  
 
 
 
  --
  http://hortonworks.com/download/
 




-- 
http://hortonworks.com/download/


Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Suresh Srinivas

 I tried your suggested setting and forced GC from Jconsole and once it
 crept up nothing was freeing up.


That is very surprising. If possible, take a live dump when namenode starts
up (when memory used is low) and when namenode memory consumption has gone
up considerably, closer to the heap limit.

BTW, are you running with that configuration - with younggen size set to
smaller size?



 So just food for thought:

 You said average file name size is 32 bytes. Well most of my data sits in

 /user/hive/warehouse/
 Then I have a tables with partitions.

 Does it make sense to just move this to /u/h/w?


In the directory structure in the namenode memory, there is one inode for
user, hive and warehouse. So it would save only couple of bytes. However on
fsimage in older releases, /user/hive/warehouse is repeated for every file.
This in the later release has been optimized. But these optimizations
affect only the fsimage and not the memory consumed on the namenode.


 Will I be saving 400,000,000 bytes of memory if I do?
 On Thu, Dec 27, 2012 at 5:41 PM, Suresh Srinivas sur...@hortonworks.com
 wrote:

  I do not follow what you mean here.
 
   Even when I forced a GC it cleared 0% memory.
  Is this with new younggen setting? Because earlier, based on the
  calculation I posted, you need ~11G in old generation. With 6G as the
  default younggen size, you actually had just enough memory to fit the
  namespace in oldgen. Hence you might not have seen Full GC freeing up
  enough memory.
 
  Have you tried Full GC with 1G youngen size have you tried this? I
 supsect
  you would see lot more memory freeing up.
 
   One would think that since the entire NameNode image is stored in
 memory
  that the heap would not need to grow beyond that
  Namenode image that you see during checkpointing is the size of file
  written after serializing file system namespace in memory. This is not
 what
  is directly stored in namenode memory. Namenode stores data structures
 that
  corresponds to file system directory tree and block locations. Out of
 this
  only file system directory is serialized and written to fsimage. Blocks
  locations are not.
 
 
 
 
  On Thu, Dec 27, 2012 at 2:22 PM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
   I am not sure GC had a factor. Even when I forced a GC it cleared 0%
   memory. One would think that since the entire NameNode image is stored
 in
   memory that the heap would not need to grow beyond that, but that sure
  does
   not seem to be the case. a 5GB image starts off using 10GB of memory
 and
   after burn in it seems to use about 15GB memory.
  
   So really we say the name node data has to fit in memory but what we
   really mean is the name node data must fit in memory 3x
  
   On Thu, Dec 27, 2012 at 5:08 PM, Suresh Srinivas 
 sur...@hortonworks.com
   wrote:
  
You did free up lot of old generation with reducing young generation,
right? The extra 5G of RAM for the old generation should have helped.
   
Based on my calculation, for the current number of objects you have,
  you
need roughly:
12G of total heap with young generation size of 1G. This assumes the
average file name size is 32 bytes.
   
In later releases (= 0.20.204), several memory optimization and
  startup
optimizations have been done. It should help you as well.
   
   
   
On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo 
  edlinuxg...@gmail.com
wrote:
   
 So it turns out the issue was just the size of the filesystem.
 2012-12-27 16:37:22,390 WARN
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
 Checkpoint
done.
 New Image Size: 4,354,340,042

 Basically if the NN image size hits ~ 5,000,000,000 you get f'ed.
 So
   you
 need about 3x ram as your FSImage size. If you do not have enough
 you
die a
 slow death.

 On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas 
   sur...@hortonworks.com
 wrote:

  Do not have access to my computer. Based on reading the previous
email, I
  do not see any thing suspicious on the list of objects in the
 histo
live
  dump.
 
  I would like to hear from you about if it continued to grow. One
instance
  of this I had seen in the past was related to weak reference
  related
   to
  socket objects.  I do not see that happening here though.
 
  Sent from phone
 
  On Dec 23, 2012, at 10:34 AM, Edward Capriolo 
  edlinuxg...@gmail.com
   
  wrote:
 
   Tried this..
  
   NameNode is still Ruining my Xmas on its slow death march to
 OOM.
  
   http://imagebin.org/240453
  
  
   On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas 
  sur...@hortonworks.comwrote:
  
   -XX:NewSize=1G -XX:MaxNewSize=1G
 

   
   
   
--
http://hortonworks.com/download/
   
  
 
 
 
  --
  http://hortonworks.com/download/
 




-- 
http://hortonworks.com/download/


Re: NN Memory Jumps every 1 1/2 hours

2012-12-23 Thread Suresh Srinivas
Do not have access to my computer. Based on reading the previous email, I do 
not see any thing suspicious on the list of objects in the histo live dump.

I would like to hear from you about if it continued to grow. One instance of 
this I had seen in the past was related to weak reference related to socket 
objects.  I do not see that happening here though. 

Sent from phone

On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 Tried this..
 
 NameNode is still Ruining my Xmas on its slow death march to OOM.
 
 http://imagebin.org/240453
 
 
 On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas 
 sur...@hortonworks.comwrote:
 
 -XX:NewSize=1G -XX:MaxNewSize=1G


Re: NN Memory Jumps every 1 1/2 hours

2012-12-22 Thread Suresh Srinivas
This looks to me is because of larger default young generation size in
newer java releases - see
http://docs.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html#heap_size.
I can see looking at your GC logs, around 6G space being used for young
generation (though I do not see logs related to minor collection). That
means for the same given number of objects, you have smaller old generation
space and hence old generation collection can no longer perform well.

It is unfortunate that such changes are made in java and that causes
previously working applications to fail. My suggestion is to not depend on
default young generation sizes any more. At large JVM sizes, the defaults
chosen by the JDK no longer works well. So I suggest protecting yourself
from such changes by explicitly specifying young generation size. Given my
experience of tuning GC at Yahoo clusters, at the number of objects you
have and total heap size you are allocating, I suggest setting the young
generation to 1G.

You can do that by adding

-XX:NewSize=1G -XX:MaxNewSize=1G

Let me know how it goes.

On Sat, Dec 22, 2012 at 5:59 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 6333.934: [Full GC 10391746K-9722532K(17194656K), 63.9812940 secs]




-- 
http://hortonworks.com/download/


Re: can HADOOP-6546: BloomMapFile can return false negatives get backported to branch-1?

2012-05-08 Thread Suresh Srinivas
This change in merged into branch-1 and will be available in release 1.1.

On Mon, May 7, 2012 at 6:40 PM, Jim Donofrio donofrio...@gmail.com wrote:

 Can someone backport HADOOP-6546: BloomMapFile can return false negatives
 to branch-1 for the next 1+ release?

 Without this fix BloomMapFile is somewhat useless because having no false
 negatives is a core feature of BloomFilters. I am surprised that both
 hadoop 1.0.2 and cdh3u3 do not have this fix from over 2 years ago.



Re: can HADOOP-6546: BloomMapFile can return false negatives get backported to branch-1?

2012-05-07 Thread Suresh Srinivas
I have marked it for 1.1. I will follow up on promoting the path.

Regards,
Suresh

On May 7, 2012, at 6:40 PM, Jim Donofrio donofrio...@gmail.com wrote:

 Can someone backport HADOOP-6546: BloomMapFile can return false negatives to 
 branch-1 for the next 1+ release?
 
 Without this fix BloomMapFile is somewhat useless because having no false 
 negatives is a core feature of BloomFilters. I am surprised that both hadoop 
 1.0.2 and cdh3u3 do not have this fix from over 2 years ago.


Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Suresh Srinivas
This probably is a more relevant question in CDH mailing lists. That said,
what Edward is suggesting seems reasonable. Reduce replication factor,
decommission some of the nodes and create a new cluster with those nodes
and do distcp.

Could you share with us the reasons you want to migrate from Apache 205?

Regards,
Suresh

On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Honestly that is a hassle, going from 205 to cdh3u3 is probably more
 or a cross-grade then an upgrade or downgrade. I would just stick it
 out. But yes like Michael said two clusters on the same gear and
 distcp. If you are using RF=3 you could also lower your replication to
 rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving
 stuff.


 On Thu, May 3, 2012 at 7:25 AM, Michel Segel michael_se...@hotmail.com
 wrote:
  Ok... When you get your new hardware...
 
  Set up one server as your new NN, JT, SN.
  Set up the others as a DN.
  (Cloudera CDH3u3)
 
  On your existing cluster...
  Remove your old log files, temp files on HDFS anything you don't need.
  This should give you some more space.
  Start copying some of the directories/files to the new cluster.
  As you gain space, decommission a node, rebalance, add node to new
 cluster...
 
  It's a slow process.
 
  Should I remind you to make sure you up you bandwidth setting, and to
 clean up the hdfs directories when you repurpose the nodes?
 
  Does this make sense?
 
  Sent from a remote device. Please excuse any typos...
 
  Mike Segel
 
  On May 3, 2012, at 5:46 AM, Austin Chungath austi...@gmail.com wrote:
 
  Yeah I know :-)
  and this is not a production cluster ;-) and yes there is more hardware
  coming :-)
 
  On Thu, May 3, 2012 at 4:10 PM, Michel Segel michael_se...@hotmail.com
 wrote:
 
  Well, you've kind of painted yourself in to a corner...
  Not sure why you didn't get a response from the Cloudera lists, but
 it's a
  generic question...
 
  8 out of 10 TB. Are you talking effective storage or actual disks?
  And please tell me you've already ordered more hardware.. Right?
 
  And please tell me this isn't your production cluster...
 
  (Strong hint to Strata and Cloudea... You really want to accept my
  upcoming proposal talk... ;-)
 
 
  Sent from a remote device. Please excuse any typos...
 
  Mike Segel
 
  On May 3, 2012, at 5:25 AM, Austin Chungath austi...@gmail.com
 wrote:
 
  Yes. This was first posted on the cloudera mailing list. There were no
  responses.
 
  But this is not related to cloudera as such.
 
  cdh3 is based on apache hadoop 0.20 as the base. My data is in apache
  hadoop 0.20.205
 
  There is an upgrade namenode option when we are migrating to a higher
  version say from 0.20 to 0.20.205
  but here I am downgrading from 0.20.205 to 0.20 (cdh3)
  Is this possible?
 
 
  On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi 
 prash1...@gmail.com
  wrote:
 
  Seems like a matter of upgrade. I am not a Cloudera user so would not
  know
  much, but you might find some help moving this to Cloudera mailing
 list.
 
  On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com
  wrote:
 
  There is only one cluster. I am not copying between clusters.
 
  Say I have a cluster running apache 0.20.205 with 10 TB storage
  capacity
  and has about 8 TB of data.
  Now how can I migrate the same cluster to use cdh3 and use that
 same 8
  TB
  of data.
 
  I can't copy 8 TB of data using distcp because I have only 2 TB of
 free
  space
 
 
  On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar 
 nitinpawar...@gmail.com
  wrote:
 
  you can actually look at the distcp
 
  http://hadoop.apache.org/common/docs/r0.20.0/distcp.html
 
  but this means that you have two different set of clusters
 available
  to
  do
  the migration
 
  On Thu, May 3, 2012 at 12:51 PM, Austin Chungath 
 austi...@gmail.com
  wrote:
 
  Thanks for the suggestions,
  My concerns are that I can't actually copyToLocal from the dfs
  because
  the
  data is huge.
 
  Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do
 a
  namenode upgrade. I don't have to copy data out of dfs.
 
  But here I am having Apache hadoop 0.20.205 and I want to use CDH3
  now,
  which is based on 0.20
  Now it is actually a downgrade as 0.20.205's namenode info has to
 be
  used
  by 0.20's namenode.
 
  Any idea how I can achieve what I am trying to do?
 
  Thanks.
 
  On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar 
  nitinpawar...@gmail.com
  wrote:
 
  i can think of following options
 
  1) write a simple get and put code which gets the data from DFS
 and
  loads
  it in dfs
  2) see if the distcp  between both versions are compatible
  3) this is what I had done (and my data was hardly few hundred
 GB)
  ..
  did a
  dfs -copyToLocal and then in the new grid did a copyFromLocal
 
  On Thu, May 3, 2012 at 11:41 AM, Austin Chungath 
  austi...@gmail.com
 
  wrote:
 
  Hi,
  I am migrating from Apache hadoop 0.20.205 to CDH3u3.
  I don't want to lose the data 

Re: hadoop permission guideline

2012-03-22 Thread Suresh Srinivas
Can you please take this discussion CDH mailing list?

On Mar 22, 2012, at 7:51 AM, Michael Wang michael.w...@meredith.com wrote:

 I have installed Cloudera hadoop (CDH). I used its Cloudera Manager to 
 install all needed packages. When it was installed, the root is used.  I 
 found the installation created some users, such as hdfs, hive, 
 mapred,hue,hbase...
 After the installation, should we change some permission or ownership of some 
 directories/files? For example, to use HIVE. It works fine with root user, 
 since the metatore directory belongs to root. But in order to let other user 
 use HIVE, I have to change metastore ownership to a specific non-root user, 
 then it works. Is it the best practice?
 Another example is the start-all.sh, stop-all.sh they all belong to root. 
 Should I change them to other user? I guess there are more cases...
 
 Thanks,
 
 
 
 This electronic message, including any attachments, may contain proprietary, 
 confidential or privileged information for the sole use of the intended 
 recipient(s). You are hereby notified that any unauthorized disclosure, 
 copying, distribution, or use of this message is prohibited. If you have 
 received this message in error, please immediately notify the sender by reply 
 e-mail and delete it.


Re: Issue when starting services on CDH3

2012-03-15 Thread Suresh Srinivas
Guys, can you please take this up in CDH related mailing lists.

On Thu, Mar 15, 2012 at 10:01 AM, Manu S manupk...@gmail.com wrote:

 Because for large clusters we have to run namenode in a single node,
 datanode in another nodes
 So we can start namenode and jobtracker in master node and datanode n
 tasktracker in slave nodes

 For getting more clarity You can check the service status after starting

 Verify these:
 dfs.name.dir hdfs:hadoop drwx--
 dfs.data.dir hdfs:hadoop drwx--

 mapred.local.dir mapred:hadoop drwxr-xr-x

 Please follow each steps in this link
 https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster
  On Mar 15, 2012 9:52 PM, Manish Bhoge manishbh...@rocketmail.com
 wrote:

  Ys, I understand the order and I formatted namenode before starting
  services. As I suspect there may be ownership and an access issue. Not
 able
  to nail down issue exactly. I also have question why there are 2 routes
 to
  start services. When we have start-all.sh script then why need to go to
  init.d to start services??
 
 
  Thank you,
  Manish
  Sent from my BlackBerry, pls excuse typo
 
  -Original Message-
  From: Manu S manupk...@gmail.com
  Date: Thu, 15 Mar 2012 21:43:26
  To: common-user@hadoop.apache.org; manishbh...@rocketmail.com
  Reply-To: common-user@hadoop.apache.org
  Subject: Re: Issue when starting services on CDH3
 
  Did you check the service status?
  Is it like dead, but pid exist?
 
  Did you check the ownership and permissions for the
  dfs.name.dir,dfs.data.dir,mapped.local.dir etc ?
 
  The order for starting daemons are like this:
  1 namenode
  2 datanode
  3 jobtracker
  4 tasktracker
 
  Did you format the namenode before starting?
  On Mar 15, 2012 9:31 PM, Manu S manupk...@gmail.com wrote:
 
   Dear manish
   Which daemons are not starting?
  
   On Mar 15, 2012 9:21 PM, Manish Bhoge manishbh...@rocketmail.com
   wrote:
   
I have CDH3 installed in standalone mode. I have install all hadoop
   components. Now when I start services (namenode,secondary namenode,job
   tracker,task tracker) I can start gracefully from /usr/lib/hadoop/
   ./bin/start-all.sh. But when start the same servises from
   /etc/init.d/hadoop-0.20-* then I unable to start. Why? Now I want to
  start
   Hue also which is in init.d that also I couldn't start. Here I suspect
   authentication issue. Because all the services in init.d are under root
   user and root group. Please suggest I am stuck here. I tried hive and
 it
   seems it running fine.
Thanks
Manish.
Sent from my BlackBerry, pls excuse typo
   
  
 
 



Re: What is the NEW api?

2012-03-11 Thread Suresh Srinivas
 there are many people talking about the NEW API
This might be related to releases 0.21 or later, where append and related 
functionality is re-implemented. 

1.0 comes from 0.20.205 and has same API as 0.20-append.

Sent from phone

On Mar 11, 2012, at 6:27 PM, WangRamon ramon_w...@hotmail.com wrote:

 
 
 
 
 Hi all I've been with Hadoop-0.20-append for a few time and I plan to upgrade 
 to 1.0.0 release, but i find there are many people taking about the NEW API, 
 so I'm lost, can anyone please tell me what is the new API? Is the OLD one 
 available in the 1.0.0 release? Thanks CheersRamon 


Re: Backupnode in 1.0.0?

2012-02-23 Thread Suresh Srinivas
On Thu, Feb 23, 2012 at 12:41 AM, Jeremy Hansen jer...@skidrow.la wrote:

 Thanks.  Could you clarify what BackupNode does?

 -jeremy


Namenode currently keeps the entire file system namespace in memory. It
logs the write operations (create, delete file etc.) into a journal file
called editlog. This journal needs to be merged with the file system image
periodically to avoid journal file growing to a large size. This is called
checkpointing. Checkpoint also reduces the startup time, since the namenode
need not load large editlog file.

Prior to release 0.21, another node called SecondaryNamenode was used  for
checkpointing. It periodically gets the file system image and edit, load it
into memory and write checkpoint image. This image is then then shipped to
the Namenode.

In 0.21, BackupNode was introduced. Unlike SecondaryNamenode, it gets edits
streamed from the Namenode. It periodically writes the checkpoint image and
ships it back to Namenode. The goal was for this to become Standby node,
towards Namenode HA. Konstantin and few others are pursuing this.

I have not seen any deployments of BackupNode in production. I would love
to hear if any one has deployed it in production and how stable it is.

Regards,
Suresh


Re: Backupnode in 1.0.0?

2012-02-22 Thread Suresh Srinivas
Joey,

Can you please answer the question from in the context of Apache releases.
Not sure if CDH4b1 needs to be mentioned in the context of this mailing
list.

Regards,
Suresh

On Wed, Feb 22, 2012 at 5:24 PM, Joey Echeverria j...@cloudera.com wrote:

 Check out this branch for the 0.22 version of Bigtop:

 https://svn.apache.org/repos/asf/incubator/bigtop/branches/hadoop-0.22/

 However, I don't think BackupNode is what you want. It sounds like you
 want HA which is coming in (hopefully) 0.23.2 and is also available
 today in CDH4b1.

 -Joey

 On Wed, Feb 22, 2012 at 7:09 PM, Jeremy Hansen jer...@skidrow.la wrote:
  By the way, I don't see anything 0.22 based in the bigtop repos.
 
  Thanks
  -jeremy
 
  On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote:
 
  I guess I thought that backupnode would provide some level of namenode
 redundancy.  Perhaps I don't fully understand.
 
  I'll check out Bigtop.  I looked at it a while ago and forgot about it.
 
  Thanks
  -jeremy
 
  On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote:
 
  Check out the Apache Bigtop project. I believe they have 0.22 RPMs.
 
  Out of curiosity, why are you interested in BackupNode?
 
  -Joey
 
  Sent from my iPhone
 
  On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote:
 
  Any possibility of getting spec files to create packages for 0.22?
 
  Thanks
  -jeremy
 
  On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote:
 
  BackupNode is major functionality with change in required in RPC
 protocols,
  configuration etc. Hence it will not be available in bug fix release
 1.0.1.
 
  It is also unlikely to be not available on minor releases in 1.x
 release
  streams.
 
  Regards,
  Suresh
 
  On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la
 wrote:
 
 
  It looks as if backupnode isn't supported in 1.0.0?  Any chances
 it's in
  1.0.1?
 
  Thanks
  -jeremy
 
 
 



 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434



Re: datanode failing to start

2012-01-09 Thread Suresh Srinivas
Can you please send your notes on what info is out of date or better still
create a jira so that it can be addressed.

On Fri, Jan 6, 2012 at 3:11 PM, Dave Kelsey da...@gamehouse.com wrote:

 gave up and installed version 1.
 it installed correctly and worked, thought the instructions for setup and
 the location of scripts and configs are now out of date.

 D

 On 1/5/2012 10:25 AM, Dave Kelsey wrote:


 java version 1.6.0_29
 hadoop: 0.20.203.0

 I'm attempting to setup the pseudo-distributed config on a mac 10.6.8.
 I followed the steps from the QuickStart (http://wiki.apache.org./**
 hadoop/QuickStart http://wiki.apache.org./hadoop/QuickStart) and
 succeeded with Stage 1: Standalone Operation.
 I followed the steps for Stage 2: Pseudo-distributed Configuration.
 I set the JAVA_HOME variable in conf/hadoop-env.sh and I changed
 tools.jar to the location of classes.jar (a mac version of tools.jar)
 I've modified the three .xml files as described in the QuickStart.
 ssh'ing to localhost has been configured and works with passwordless
 authentication.
 I formatted the namenode with bin/hadoop namenode -format as the
 instructions say

 This is what I see when I run bin/start-all.sh

 root# bin/start-all.sh
 starting namenode, logging to /Users/admin/hadoop/hadoop-0.**
 20.203.0/bin/../logs/hadoop-**root-namenode-Hoot-2.local.out
 localhost: starting datanode, logging to /Users/admin/hadoop/hadoop-0.**
 20.203.0/bin/../logs/hadoop-**root-datanode-Hoot-2.local.out
 localhost: Exception in thread main java.lang.**NoClassDefFoundError:
 server
 localhost: Caused by: java.lang.**ClassNotFoundException: server
 localhost: at java.net.URLClassLoader$1.run(**
 URLClassLoader.java:202)
 localhost: at java.security.**AccessController.doPrivileged(**Native
 Method)
 localhost: at java.net.URLClassLoader.**
 findClass(URLClassLoader.java:**190)
 localhost: at java.lang.ClassLoader.**loadClass(ClassLoader.java:**
 306)
 localhost: at sun.misc.Launcher$**AppClassLoader.loadClass(**
 Launcher.java:301)
 localhost: at java.lang.ClassLoader.**loadClass(ClassLoader.java:**
 247)
 localhost: starting secondarynamenode, logging to
 /Users/admin/hadoop/hadoop-0.**20.203.0/bin/../logs/hadoop-**
 root-secondarynamenode-Hoot-2.**local.out
 starting jobtracker, logging to /Users/admin/hadoop/hadoop-0.**
 20.203.0/bin/../logs/hadoop-**root-jobtracker-Hoot-2.local.**out
 localhost: starting tasktracker, logging to /Users/admin/hadoop/hadoop-0.
 **20.203.0/bin/../logs/hadoop-**root-tasktracker-Hoot-2.local.**out

 There are 4 processes running:
 ps -fax | grep hadoop | grep -v grep | wc -l
  4

 They are:
 SecondaryNameNode
 TaskTracker
 NameNode
 JobTracker


 I've searched to see if anyone else has encountered this and not found
 anything

 d

 p.s. I've also posted this to core-u...@hadoop.apache.org which I've yet
 to find how to subscribe to.




Re: HDFS Backup nodes

2011-12-13 Thread Suresh Srinivas
Srivas,

As you may know already, NFS is just being used in the first prototype for
HA.

Two options for editlog store are:
1. Using BookKeeper. Work has already completed on trunk towards this. This
will replace need for NFS to  store the editlogs and is highly available.
This solution will also be used for HA.
2. We have a short term goal also to enable editlogs going to HDFS itself.
The work is in progress.

Regards,
Suresh



 -- Forwarded message --
 From: M. C. Srivas mcsri...@gmail.com
 Date: Sun, Dec 11, 2011 at 10:47 PM
 Subject: Re: HDFS Backup nodes
 To: common-user@hadoop.apache.org


 You are out of luck if you don't want to use NFS, and yet want redundancy
 for the NN.  Even the new NN HA work being done by the community will
 require NFS ... and the NFS itself needs to be HA.

 But if you use a Netapp, then the likelihood of the Netapp crashing is
 lower than the likelihood of a garbage-collection-of-death happening in the
 NN.

 [ disclaimer:  I don't work for Netapp, I work for MapR ]


 On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote:

  Thanks Joey. We've had enough problems with nfs (mainly under very high
  load) that we thought it might be riskier to use it for the NN.
 
  randy
 
 
  On 12/07/2011 06:46 PM, Joey Echeverria wrote:
 
  Hey Rand,
 
  It will mark that storage directory as failed and ignore it from then
  on. In order to do this correctly, you need a couple of options
  enabled on the NFS mount to make sure that it doesn't retry
  infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10
  options set.
 
  -Joey
 
  On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:
 
  What happens then if the nfs server fails or isn't reachable? Does hdfs
  lock up? Does it gracefully ignore the nfs copy?
 
  Thanks,
  randy
 
  - Original Message -
  From: Joey Echeverriaj...@cloudera.com
  To: common-user@hadoop.apache.org
  Sent: Wednesday, December 7, 2011 6:07:58 AM
  Subject: Re: HDFS Backup nodes
 
  You should also configure the Namenode to use an NFS mount for one of
  it's storage directories. That will give the most up-to-date back of
  the metadata in case of total node failure.
 
  -Joey
 
  On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumarpraveen...@gmail.com
   wrote:
 
  This means still we are relying on Secondary NameNode idealogy for
  Namenode's backup.
  Can OS-mirroring of Namenode is a good alternative keep it alive all
 the
  time ?
 
  Thanks,
  Praveenesh
 
  On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
  mahesw...@huawei.comwrote:
 
   AFAIK backup node introduced in 0.21 version onwards.
  __**__
  From: praveenesh kumar [praveen...@gmail.com]
  Sent: Wednesday, December 07, 2011 12:40 PM
  To: common-user@hadoop.apache.org
  Subject: HDFS Backup nodes
 
  Does hadoop 0.20.205 supports configuring HDFS backup nodes ?
 
  Thanks,
  Praveenesh
 
 
 
 
  --
  Joseph Echeverria
  Cloudera, Inc.
  443.305.9434
 
 
 
 
 
 




Re: Difference between DFS Used and Non-DFS Used

2011-07-08 Thread Suresh Srinivas
non DFS storage is not required, it is provided as information only to shown
how the storage is being used.

The available storage on the disks is used for both DFS and non DFS
(mapreduce shuffle output and any other files that could be on the disks).

See if you have unnecessary files or shuffle output that is lingering on
these disks, that is contributing to 250GB. Delete the unneeded files and
you should be able to reclaim some of the 250GB.

On Fri, Jul 8, 2011 at 4:24 AM, Sagar Shukla
sagar_shu...@persistent.co.inwrote:

 Thanks Harsh. My first question still remains unanswered - Why does it
 require non-DFS storage?. If it is cache data then it should get flushed
 from the system after certain interval of time. And if it is useful data
 then it should have been part of used DFS data.

 I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used
 is around 250 GB which is quite ridiculous.

 Thanks,
 Sagar

 -Original Message-
 From: Harsh J [mailto:ha...@cloudera.com]
 Sent: Friday, July 08, 2011 4:42 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Difference between DFS Used and Non-DFS Used

 It is just for information's sake (cause it can be computed with the
 data collected). The space is accounted just to let you know that
 there's something being stored on the DataNodes apart from just the
 HDFS data, in case you are running out of space.

 On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
 sagar_shu...@persistent.co.in wrote:
  Hi Harsh,
  Thanks for your reply.
 
  But why does it require non-DFS storage ? And why that space is accounted
 differently from regular DFS storage ?
 
  Ideally, it should have been part of same storage.
 
  Thanks,
  Sagar
 
  -Original Message-
  From: Harsh J [mailto:ha...@cloudera.com]
  Sent: Thursday, July 07, 2011 6:04 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Difference between DFS Used and Non-DFS Used
 
  DFS used is a count of all the space used by the dfs.data.dirs. The
  non-dfs used space is whatever space is occupied beyond that (which
  the DN does not account for).
 
  On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
  sagar_shu...@persistent.co.in wrote:
  Hi,
What is the difference between DFS Used and Non-DFS used ?
 
  Thanks,
  Sagar
 
  DISCLAIMER
  ==
  This e-mail may contain privileged and confidential information which is
 the property of Persistent Systems Ltd. It is intended only for the use of
 the individual or entity to which it is addressed. If you are not the
 intended recipient, you are not authorized to read, retain, copy, print,
 distribute or use this message. If you have received this communication in
 error, please notify the sender and delete all copies of this message.
 Persistent Systems Ltd. does not accept any liability for virus infected
 mails.
 
 
 
 
 
  --
  Harsh J
 
  DISCLAIMER
  ==
  This e-mail may contain privileged and confidential information which is
 the property of Persistent Systems Ltd. It is intended only for the use of
 the individual or entity to which it is addressed. If you are not the
 intended recipient, you are not authorized to read, retain, copy, print,
 distribute or use this message. If you have received this communication in
 error, please notify the sender and delete all copies of this message.
 Persistent Systems Ltd. does not accept any liability for virus infected
 mails.
 
 



 --
 Harsh J

 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is
 the property of Persistent Systems Ltd. It is intended only for the use of
 the individual or entity to which it is addressed. If you are not the
 intended recipient, you are not authorized to read, retain, copy, print,
 distribute or use this message. If you have received this communication in
 error, please notify the sender and delete all copies of this message.
 Persistent Systems Ltd. does not accept any liability for virus infected
 mails.




-- 
Regards,
Suresh


Re: CDH and Hadoop

2011-03-24 Thread suresh srinivas
On Thu, Mar 24, 2011 at 7:04 PM, Rita rmorgan...@gmail.com wrote:

 Oh! Thats for the heads up on that...

 I guess I will go with the cloudera source then


 On Thu, Mar 24, 2011 at 8:41 PM, David Rosenstrauch dar...@darose.net
 wrote:

  They do, but IIRC, they recently announced that they're going to be
  discontinuing it.
 
  DR


Yahoo! discontinued the distribution in favor of making Apache Hadoop the
most stable and the go to place for Hadoop releases. So all the advantages
of using Yahoo distribution, you get in Apache Hadoop release.

Please see the details of announcement here:

http://developer.yahoo.com/blogs/hadoop/posts/2011/01/announcement-yahoo-focusing-on-apache-hadoop-discontinuing-the-yahoo-distribution-of-hadoop/


Re: Data Nodes do not start

2011-02-09 Thread suresh srinivas
On Tue, Feb 8, 2011 at 11:05 PM, rahul patodi patodira...@gmail.com wrote:

 I think you should copy the namespaceID of your master which is in
 name/current/VERSION file to all the slaves


This is a sure recipe for disaster. The VERSION file is a file system meta
data file not to be messed around with. At worst, this can cause loss of
entire file system data! Rahul please update your blog to reflect this.

Some background on namespace ID:
A namespace ID is created on the namenode when it is formatted. This is
propagated to datanodes when they register the first time with namenode.
From then on, this ID is burnt into the datanodes.

A mismatch in namespace ID of datanode and namenode means:
# Datanode is pointing to a wrong namenode, perhaps in a different cluster
(config of datanode points to wrong namenode address).
# Namenode was running with a storage directory previously. It was changed
to some other storage directory with a different file system image.


Why does editing namespace ID is a bad idea?
Given that either namenode has loaded wrong namespace or datanode is
pointing to wrong namenode, messing around with namespaceID either on
namenode/datanode, results in datanode being able to register with the
namenode. When datanode sends block report, the blocks on the datanode do
not belong to the namespace loaded by the namenode, resulting in deletion of
all the blocks on the datanode.

Please find out if any of these problem exist in your setup and fix it.