Re: multiple namenode directories

2011-02-10 Thread mike anderson
Whew, glad I asked.

It might be useful for someone to update the wiki:
http://wiki.apache.org/hadoop/FAQ#How_do_I_set_up_a_hadoop_node_to_use_multiple_volumes.3F

-Mike

On Thu, Feb 10, 2011 at 12:43 PM, Harsh J  wrote:

> DO NOT format your NameNode. Formatting a NameNode is equivalent to
> formatting a FS -- you're bound lose it all.
>
> And while messing with NameNode, after bringing it down safely, ALWAYS
> take a backup of the existing dfs.name.dir contents and preferably the
> SNN checkpoint directory contents too (if you're running it).
>
> The RIGHT way to add new directories to the NameNode's dfs.name.dir is
> by comma-separating them in the same value and NOT by adding two
> properties - that is not how Hadoop's configuration operates. In your
> case, bring NN down and edit conf as:
>
> >  
> >dfs.name.dir
> >/mnt/hadoop/name,/public/hadoop/name
> >  
>
> Create the new directory by copying the existing one. Both must have
> the SAME file and structure in them, like mirror copies of one
> another. Ensure that this new location, apart from being symmetric in
> content, is also symmetric in permissions. NameNode will require WRITE
> permissions via its user on all locations configured.
>
> Having configured properly and ensured that both storage directories
> mirror one another, launch your NameNode back up again (feel a little
> paranoid and do check namenode logs for any issues -- in which case
> your backup would be very essential as a requirement for recovery!).
>
> P.s. Hold on for a bit for a possible comment from another user before
> getting into action. I've added extra directories this way, but I do
> not know if this is "the" genuine way to do so - although it feels
> right to me.
>
> On Thu, Feb 10, 2011 at 10:27 PM, mike anderson 
> wrote:
> > This should be a straightforward question, but better safe than sorry.
> >
> > I wanted to add a second name node directory (on an NFS as a backup), so
> now
> > my hdfs-site.xml contains:
> >
> >  
> >dfs.name.dir
> >/mnt/hadoop/name
> >  
> >  
> >dfs.name.dir
> >/public/hadoop/name
> >  
> >
> >
> > When I go to start DFS i'm getting the exception:
> >
> > org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
> Directory
> > /public/hadoop/name is in an inconsistent state: storage directory does
> not
> > exist or is not accessible.
> >
> >
> > After googling a bit, it seems like I want to do "bin/hadoop namenode
> > -format"
> >
> > Is this right? As long as I shut down DFS before issuing the command I
> > shouldn't lose any data?
> >
> > Thanks in advance,
> > Mike
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>


multiple namenode directories

2011-02-10 Thread mike anderson
This should be a straightforward question, but better safe than sorry.

I wanted to add a second name node directory (on an NFS as a backup), so now
my hdfs-site.xml contains:

  
dfs.name.dir
/mnt/hadoop/name
  
  
dfs.name.dir
/public/hadoop/name
  


When I go to start DFS i'm getting the exception:

org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
/public/hadoop/name is in an inconsistent state: storage directory does not
exist or is not accessible.


After googling a bit, it seems like I want to do "bin/hadoop namenode
-format"

Is this right? As long as I shut down DFS before issuing the command I
shouldn't lose any data?

Thanks in advance,
Mike


Fwd: multiple namenode directories

2011-02-10 Thread mike anderson
-- Forwarded message --
From: mike anderson 
Date: Thu, Feb 10, 2011 at 11:57 AM
Subject: multiple namenode directories
To: core-u...@hadoop.apache.org


This should be a straightforward question, but better safe than sorry.

I wanted to add a second name node directory (on an NFS as a backup), so now
my hdfs-site.xml contains:

  
dfs.name.dir
/mnt/hadoop/name
  
  
dfs.name.dir
/public/hadoop/name
  


When I go to start DFS i'm getting the exception:

org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
/public/hadoop/name is in an inconsistent state: storage directory does not
exist or is not accessible.


After googling a bit, it seems like I want to do "bin/hadoop namenode
-format"

Is this right? As long as I shut down DFS before issuing the command I
shouldn't lose any data?

Thanks in advance,
Mike


Re: start anyways with missing blocks

2011-01-21 Thread mike anderson
yess! thanks, that was exactly what i wanted.

-mike

On Fri, Jan 21, 2011 at 3:16 PM, Brian Bockelman wrote:

> Hi Mike,
>
> You want to take things out of safemode before you can make these changes.
>
> hadoop dfsadmin -safemode leave
>
> Then you can do the "hadoop fsck / -delete"
>
> Brian
>
> On Jan 21, 2011, at 2:12 PM, mike anderson wrote:
>
> > Also, here's the output of dfsadmin -report.  What seems weird is that
> it's
> > not reporting any missing blocks. BTW, I tried doing fsck / -delete, but
> it
> > failed, complaining about the missing nodes.
> >
> > $ ../bin/hadoop dfsadmin -report
> > Safe mode is ON
> > Configured Capacity: 3915872829440 (3.56 TB)
> > Present Capacity: 2913577631744 (2.65 TB)
> > DFS Remaining: 1886228164608 (1.72 TB)
> > DFS Used: 1027349467136 (956.79 GB)
> > DFS Used%: 35.26%
> > Under replicated blocks: 0
> > Blocks with corrupt replicas: 0
> > Missing blocks: 0
> >
> > -
> > Datanodes available: 9 (9 total, 0 dead)
> >
> > Name: 10.0.16.91:50010
> > Decommission Status : Normal
> > Configured Capacity: 139438620672 (129.86 GB)
> > DFS Used: 44507017216 (41.45 GB)
> > Non DFS Used: 85782597632 (79.89 GB)
> > DFS Remaining: 9149005824(8.52 GB)
> > DFS Used%: 31.92%
> > DFS Remaining%: 6.56%
> > Last contact: Fri Jan 21 15:10:47 EST 2011
> >
> >
> > Name: 10.0.16.165:50010
> > Decommission Status : Normal
> > Configured Capacity: 472054276096 (439.63 GB)
> > DFS Used: 139728683008 (130.13 GB)
> > Non DFS Used: 90374217728 (84.17 GB)
> > DFS Remaining: 241951375360(225.33 GB)
> > DFS Used%: 29.6%
> > DFS Remaining%: 51.25%
> > Last contact: Fri Jan 21 15:10:47 EST 2011
> >
> >
> > Name: 10.0.16.163:50010
> > Decommission Status : Normal
> > Configured Capacity: 472054276096 (439.63 GB)
> > DFS Used: 174687391744 (162.69 GB)
> > Non DFS Used: 55780028416 (51.95 GB)
> > DFS Remaining: 241586855936(225 GB)
> > DFS Used%: 37.01%
> > DFS Remaining%: 51.18%
> > Last contact: Fri Jan 21 15:10:47 EST 2011
> >
> >
> > Name: 10.0.16.164:50010
> > Decommission Status : Normal
> > Configured Capacity: 472054276096 (439.63 GB)
> > DFS Used: 95075942400 (88.55 GB)
> > Non DFS Used: 182544318464 (170.01 GB)
> > DFS Remaining: 194434015232(181.08 GB)
> > DFS Used%: 20.14%
> > DFS Remaining%: 41.19%
> > Last contact: Fri Jan 21 15:10:47 EST 2011
> >
> >
> > Name: 10.0.16.169:50010
> > Decommission Status : Normal
> > Configured Capacity: 472054276096 (439.63 GB)
> > DFS Used: 24576 (24 KB)
> > Non DFS Used: 51301322752 (47.78 GB)
> > DFS Remaining: 420752928768(391.86 GB)
> > DFS Used%: 0%
> > DFS Remaining%: 89.13%
> > Last contact: Fri Jan 21 15:10:48 EST 2011
> >
> >
> > Name: 10.0.16.160:50010
> > Decommission Status : Normal
> > Configured Capacity: 472054276096 (439.63 GB)
> > DFS Used: 171275218944 (159.51 GB)
> > Non DFS Used: 119652265984 (111.43 GB)
> > DFS Remaining: 181126791168(168.69 GB)
> > DFS Used%: 36.28%
> > DFS Remaining%: 38.37%
> > Last contact: Fri Jan 21 15:10:47 EST 2011
> >
> >
> > Name: 10.0.16.161:50010
> > Decommission Status : Normal
> > Configured Capacity: 472054276096 (439.63 GB)
> > DFS Used: 131355377664 (122.33 GB)
> > Non DFS Used: 174232702976 (162.27 GB)
> > DFS Remaining: 166466195456(155.03 GB)
> > DFS Used%: 27.83%
> > DFS Remaining%: 35.26%
> > Last contact: Fri Jan 21 15:10:47 EST 2011
> >
> >
> > Name: 10.0.16.162:50010
> > Decommission Status : Normal
> > Configured Capacity: 472054276096 (439.63 GB)
> > DFS Used: 139831177216 (130.23 GB)
> > Non DFS Used: 91403055104 (85.13 GB)
> > DFS Remaining: 240820043776(224.28 GB)
> > DFS Used%: 29.62%
> > DFS Remaining%: 51.02%
> > Last contact: Fri Jan 21 15:10:47 EST 2011
> >
> >
> > Name: 10.0.16.167:50010
> > Decommission Status : Normal
> > Configured Capacity: 472054276096 (439.63 GB)
> > DFS Used: 130888634368 (121.9 GB)
> > Non DFS Used: 151224688640 (140.84 GB)
> > DFS Remaining: 189940953088(176.9 GB)
> > DFS Used%: 27.73%
> > DFS Remaining%: 40.24%
> > Last contact: Fri Jan 21 15:10:46 EST 2011
> >
> >
> > On Fri, Jan 21, 2011 at 3:03 PM, mike anderson  >wrote:
> >
> >> After a tragic cluster crash it looks like some blocks are missing.
> >>
> >&

Re: start anyways with missing blocks

2011-01-21 Thread mike anderson
Also, here's the output of dfsadmin -report.  What seems weird is that it's
not reporting any missing blocks. BTW, I tried doing fsck / -delete, but it
failed, complaining about the missing nodes.

$ ../bin/hadoop dfsadmin -report
Safe mode is ON
Configured Capacity: 3915872829440 (3.56 TB)
Present Capacity: 2913577631744 (2.65 TB)
DFS Remaining: 1886228164608 (1.72 TB)
DFS Used: 1027349467136 (956.79 GB)
DFS Used%: 35.26%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-
Datanodes available: 9 (9 total, 0 dead)

Name: 10.0.16.91:50010
Decommission Status : Normal
Configured Capacity: 139438620672 (129.86 GB)
DFS Used: 44507017216 (41.45 GB)
Non DFS Used: 85782597632 (79.89 GB)
DFS Remaining: 9149005824(8.52 GB)
DFS Used%: 31.92%
DFS Remaining%: 6.56%
Last contact: Fri Jan 21 15:10:47 EST 2011


Name: 10.0.16.165:50010
Decommission Status : Normal
Configured Capacity: 472054276096 (439.63 GB)
DFS Used: 139728683008 (130.13 GB)
Non DFS Used: 90374217728 (84.17 GB)
DFS Remaining: 241951375360(225.33 GB)
DFS Used%: 29.6%
DFS Remaining%: 51.25%
Last contact: Fri Jan 21 15:10:47 EST 2011


Name: 10.0.16.163:50010
Decommission Status : Normal
Configured Capacity: 472054276096 (439.63 GB)
DFS Used: 174687391744 (162.69 GB)
Non DFS Used: 55780028416 (51.95 GB)
DFS Remaining: 241586855936(225 GB)
DFS Used%: 37.01%
DFS Remaining%: 51.18%
Last contact: Fri Jan 21 15:10:47 EST 2011


Name: 10.0.16.164:50010
Decommission Status : Normal
Configured Capacity: 472054276096 (439.63 GB)
DFS Used: 95075942400 (88.55 GB)
Non DFS Used: 182544318464 (170.01 GB)
DFS Remaining: 194434015232(181.08 GB)
DFS Used%: 20.14%
DFS Remaining%: 41.19%
Last contact: Fri Jan 21 15:10:47 EST 2011


Name: 10.0.16.169:50010
Decommission Status : Normal
Configured Capacity: 472054276096 (439.63 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 51301322752 (47.78 GB)
DFS Remaining: 420752928768(391.86 GB)
DFS Used%: 0%
DFS Remaining%: 89.13%
Last contact: Fri Jan 21 15:10:48 EST 2011


Name: 10.0.16.160:50010
Decommission Status : Normal
Configured Capacity: 472054276096 (439.63 GB)
DFS Used: 171275218944 (159.51 GB)
Non DFS Used: 119652265984 (111.43 GB)
DFS Remaining: 181126791168(168.69 GB)
DFS Used%: 36.28%
DFS Remaining%: 38.37%
Last contact: Fri Jan 21 15:10:47 EST 2011


Name: 10.0.16.161:50010
Decommission Status : Normal
Configured Capacity: 472054276096 (439.63 GB)
DFS Used: 131355377664 (122.33 GB)
Non DFS Used: 174232702976 (162.27 GB)
DFS Remaining: 166466195456(155.03 GB)
DFS Used%: 27.83%
DFS Remaining%: 35.26%
Last contact: Fri Jan 21 15:10:47 EST 2011


Name: 10.0.16.162:50010
Decommission Status : Normal
Configured Capacity: 472054276096 (439.63 GB)
DFS Used: 139831177216 (130.23 GB)
Non DFS Used: 91403055104 (85.13 GB)
DFS Remaining: 240820043776(224.28 GB)
DFS Used%: 29.62%
DFS Remaining%: 51.02%
Last contact: Fri Jan 21 15:10:47 EST 2011


Name: 10.0.16.167:50010
Decommission Status : Normal
Configured Capacity: 472054276096 (439.63 GB)
DFS Used: 130888634368 (121.9 GB)
Non DFS Used: 151224688640 (140.84 GB)
DFS Remaining: 189940953088(176.9 GB)
DFS Used%: 27.73%
DFS Remaining%: 40.24%
Last contact: Fri Jan 21 15:10:46 EST 2011


On Fri, Jan 21, 2011 at 3:03 PM, mike anderson wrote:

> After a tragic cluster crash it looks like some blocks are missing.
>
>  Total size: 343918527293 B (Total open files size: 67108864 B)
>  Total dirs: 5897
>  Total files: 5574 (Files currently being written: 19)
>  Total blocks (validated): 9441 (avg. block size 36428188 B) (Total open
> file blocks (not validated): 1)
>   
>   CORRUPT FILES: 319
>   MISSING BLOCKS: 691
>   MISSING SIZE: 32767071153 B
>   CORRUPT BLOCKS: 691
>   
>  Minimally replicated blocks: 8750 (92.68086 %)
>  Over-replicated blocks: 0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks: 0 (0.0 %)
>  Default replication factor: 2
>  Average block replication: 2.731914
>  Corrupt blocks: 691
>  Missing replicas: 0 (0.0 %)
>  Number of data-nodes: 9
>  Number of racks: 1
>
>
> The filesystem under path '/' is CORRUPT
>
>
>
> I don't particularly care if I lose some of the data (it's just a cache
> store), instead of figuring out where the blocks went missing can I just
> forget about them and boot up with the blocks I have?
>
> -Mike
>


Re: can't start namenode

2010-03-04 Thread mike anderson
Todd, That did the trick. Thanks to everyone for the quick responses
and effective suggestions.

-Mike


On Thu, Mar 4, 2010 at 2:50 PM, Todd Lipcon  wrote:
> Hi Mike,
>
> Since you removed the edits, you restored to an earlier version of the
> namesystem. Thus, any files that were deleted since the last checkpoint will
> have come back. But, the blocks will have been removed from the datanodes.
> So, the NN is complaining since there are some files that have missing
> blocks. That is to say, some of your files are corrupt (ie unreadable
> because the data is gone but the metadata is still there)
>
> In order to force it out of safemode, you can run hadoop dfsadmin -safemode
> leave
> You should also run "hadoop fsck" in order to determine which files are
> broken, and then probably use the -delete option to remove their metadata.
>
> Thanks
> -Todd
>
> On Thu, Mar 4, 2010 at 11:37 AM, mike anderson wrote:
>
>> Removing edits.new and starting worked, though it didn't seem that
>> happy about it. It started up nonetheless, in safe mode. Saying that
>> "The ratio of reported blocks 0.9948 has not reached the threshold
>> 0.9990. Safe mode will be turned off automatically." Unfortunately
>> this is holding up the restart of hbase.
>>
>> About how long does it take to exit safe mode? is there anything I can
>> do to expedite the process?
>>
>>
>>
>> On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon  wrote:
>> >
>> > Sorry, I actually meant ls -l from name.dir/current/
>> >
>> > Having only one dfs.name.dir isn't recommended - after you get your
>> system
>> > back up and running I would strongly suggest running with at least two,
>> > preferably with one on a separate server via NFS.
>> >
>> > Thanks
>> > -Todd
>> >
>> > On Thu, Mar 4, 2010 at 9:05 AM, mike anderson > >wrote:
>> >
>> > > We have a single dfs.name.dir directory, in case it's useful the
>> contents
>> > > are:
>> > >
>> > > [m...@carr name]$ ls -l
>> > > total 8
>> > > drwxrwxr-x 2 mike mike 4096 Mar  4 11:18 current
>> > > drwxrwxr-x 2 mike mike 4096 Oct  8 16:38 image
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon 
>> wrote:
>> > >
>> > > > Hi Mike,
>> > > >
>> > > > Was your namenode configured with multiple dfs.name.dir settings?
>> > > >
>> > > > If so, can you please reply with "ls -l" from each dfs.name.dir?
>> > > >
>> > > > Thanks
>> > > > -Todd
>> > > >
>> > > > On Thu, Mar 4, 2010 at 8:57 AM, mike anderson <
>> saidthero...@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > Our hadoop cluster went down last night when the namenode ran out
>> of
>> > > hard
>> > > > > drive space. Trying to restart fails with this exception (see
>> below).
>> > > > >
>> > > > > Since I don't really care that much about losing a days worth of
>> data
>> > > or
>> > > > so
>> > > > > I'm fine with blowing away the edits file if that's what it takes
>> (we
>> > > > don't
>> > > > > have a secondary namenode to restore from). I tried removing the
>> edits
>> > > > file
>> > > > > from the namenode directory, but then it complained about not
>> finding
>> > > an
>> > > > > edits file. I touched a blank edits file and I got the exact same
>> > > > > exception.
>> > > > >
>> > > > > Any thoughts? I googled around a bit, but to no avail.
>> > > > >
>> > > > > -mike
>> > > > >
>> > > > >
>> > > > > 2010-03-04 10:50:44,768 INFO
>> org.apache.hadoop.ipc.metrics.RpcMetrics:
>> > > > > Initializing RPC Metrics with hostName=NameNode, port=54310
>> > > > > 2010-03-04 10:50:44,772 INFO
>> > > > > org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>> > > > > carr.projectlounge.com/10.0.16.91:54310
>> > > > > 2010-03-04 <
>> > > http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-04
>> >10:50:44,773
>> >

Re: can't start namenode

2010-03-04 Thread mike anderson
Removing edits.new and starting worked, though it didn't seem that
happy about it. It started up nonetheless, in safe mode. Saying that
"The ratio of reported blocks 0.9948 has not reached the threshold
0.9990. Safe mode will be turned off automatically." Unfortunately
this is holding up the restart of hbase.

About how long does it take to exit safe mode? is there anything I can
do to expedite the process?



On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon  wrote:
>
> Sorry, I actually meant ls -l from name.dir/current/
>
> Having only one dfs.name.dir isn't recommended - after you get your system
> back up and running I would strongly suggest running with at least two,
> preferably with one on a separate server via NFS.
>
> Thanks
> -Todd
>
> On Thu, Mar 4, 2010 at 9:05 AM, mike anderson wrote:
>
> > We have a single dfs.name.dir directory, in case it's useful the contents
> > are:
> >
> > [m...@carr name]$ ls -l
> > total 8
> > drwxrwxr-x 2 mike mike 4096 Mar  4 11:18 current
> > drwxrwxr-x 2 mike mike 4096 Oct  8 16:38 image
> >
> >
> >
> >
> > On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon  wrote:
> >
> > > Hi Mike,
> > >
> > > Was your namenode configured with multiple dfs.name.dir settings?
> > >
> > > If so, can you please reply with "ls -l" from each dfs.name.dir?
> > >
> > > Thanks
> > > -Todd
> > >
> > > On Thu, Mar 4, 2010 at 8:57 AM, mike anderson  > > >wrote:
> > >
> > > > Our hadoop cluster went down last night when the namenode ran out of
> > hard
> > > > drive space. Trying to restart fails with this exception (see below).
> > > >
> > > > Since I don't really care that much about losing a days worth of data
> > or
> > > so
> > > > I'm fine with blowing away the edits file if that's what it takes (we
> > > don't
> > > > have a secondary namenode to restore from). I tried removing the edits
> > > file
> > > > from the namenode directory, but then it complained about not finding
> > an
> > > > edits file. I touched a blank edits file and I got the exact same
> > > > exception.
> > > >
> > > > Any thoughts? I googled around a bit, but to no avail.
> > > >
> > > > -mike
> > > >
> > > >
> > > > 2010-03-04 10:50:44,768 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> > > > Initializing RPC Metrics with hostName=NameNode, port=54310
> > > > 2010-03-04 10:50:44,772 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
> > > > carr.projectlounge.com/10.0.16.91:54310
> > > > 2010-03-04 <
> > http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-04>10:50:44,773
> > > INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> > > > Initializing JVM Metrics with processName=NameNode, sessionId=null
> > > > 2010-03-04 10:50:44,774 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
> > > > Initializing
> > > > NameNodeMeterics using context
> > > > object:org.apache.hadoop.metrics.spi.NullContext
> > > > 2010-03-04 10:50:44,816 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> > > fsOwner=pubget,pubget
> > > > 2010-03-04 10:50:44,817 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> > > supergroup=supergroup
> > > > 2010-03-04 10:50:44,817 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> > > > isPermissionEnabled=true
> > > > 2010-03-04 10:50:44,823 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> > > > Initializing FSNamesystemMetrics using context
> > > > object:org.apache.hadoop.metrics.spi.NullContext
> > > > 2010-03-04 10:50:44,825 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> > > > FSNamesystemStatusMBean
> > > > 2010-03-04 10:50:44,849 INFO
> > > org.apache.hadoop.hdfs.server.common.Storage:
> > > > Number of files = 2687
> > > > 2010-03-04 10:50:45,092 INFO
> > > org.apache.hadoop.hdfs.server.common.Storage:
> > > > Number of files under construction = 7
> > > > 2010-03-04 10:50:45,095 INFO
> > > org.apache.hadoop.hdfs.server.common.Storage:
> > > > Image file of size 347821 loa

Re: can't start namenode

2010-03-04 Thread mike anderson
We have a single dfs.name.dir directory, in case it's useful the contents
are:

[m...@carr name]$ ls -l
total 8
drwxrwxr-x 2 mike mike 4096 Mar  4 11:18 current
drwxrwxr-x 2 mike mike 4096 Oct  8 16:38 image




On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon  wrote:

> Hi Mike,
>
> Was your namenode configured with multiple dfs.name.dir settings?
>
> If so, can you please reply with "ls -l" from each dfs.name.dir?
>
> Thanks
> -Todd
>
> On Thu, Mar 4, 2010 at 8:57 AM, mike anderson  >wrote:
>
> > Our hadoop cluster went down last night when the namenode ran out of hard
> > drive space. Trying to restart fails with this exception (see below).
> >
> > Since I don't really care that much about losing a days worth of data or
> so
> > I'm fine with blowing away the edits file if that's what it takes (we
> don't
> > have a secondary namenode to restore from). I tried removing the edits
> file
> > from the namenode directory, but then it complained about not finding an
> > edits file. I touched a blank edits file and I got the exact same
> > exception.
> >
> > Any thoughts? I googled around a bit, but to no avail.
> >
> > -mike
> >
> >
> > 2010-03-04 10:50:44,768 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> > Initializing RPC Metrics with hostName=NameNode, port=54310
> > 2010-03-04 10:50:44,772 INFO
> > org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
> > carr.projectlounge.com/10.0.16.91:54310
> > 2010-03-04 
> > <http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-04>10:50:44,773
> INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> > Initializing JVM Metrics with processName=NameNode, sessionId=null
> > 2010-03-04 10:50:44,774 INFO
> > org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
> > Initializing
> > NameNodeMeterics using context
> > object:org.apache.hadoop.metrics.spi.NullContext
> > 2010-03-04 10:50:44,816 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> fsOwner=pubget,pubget
> > 2010-03-04 10:50:44,817 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> supergroup=supergroup
> > 2010-03-04 10:50:44,817 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> > isPermissionEnabled=true
> > 2010-03-04 10:50:44,823 INFO
> > org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> > Initializing FSNamesystemMetrics using context
> > object:org.apache.hadoop.metrics.spi.NullContext
> > 2010-03-04 10:50:44,825 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> > FSNamesystemStatusMBean
> > 2010-03-04 10:50:44,849 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Number of files = 2687
> > 2010-03-04 10:50:45,092 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Number of files under construction = 7
> > 2010-03-04 10:50:45,095 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Image file of size 347821 loaded in 0 seconds.
> > 2010-03-04 10:50:45,104 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Edits file /mnt/hadoop/name/current/edits of size 4653 edits # 39 loaded
> in
> > 0 seconds.
> > 2010-03-04 10:50:45,114 ERROR
> > org.apache.hadoop.hdfs.server.namenode.NameNode:
> > java.lang.NumberFormatException: For input string: ""
> > at
> >
> >
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> > at java.lang.Long.parseLong(Long.java:424)
> > at java.lang.Long.parseLong(Long.java:461)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:670)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:997)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:292)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
> > at
> > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:279)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
> >
> > 2010-03-04 10:50:45,115 INFO
> > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> > /
> > SHUTDOWN_MSG: Shutting down NameNode at
> carr.projectlounge.com/10.0.16.91
> > /
> >
>


can't start namenode

2010-03-04 Thread mike anderson
Our hadoop cluster went down last night when the namenode ran out of hard
drive space. Trying to restart fails with this exception (see below).

Since I don't really care that much about losing a days worth of data or so
I'm fine with blowing away the edits file if that's what it takes (we don't
have a secondary namenode to restore from). I tried removing the edits file
from the namenode directory, but then it complained about not finding an
edits file. I touched a blank edits file and I got the exact same exception.

Any thoughts? I googled around a bit, but to no avail.

-mike


2010-03-04 10:50:44,768 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=54310
2010-03-04 10:50:44,772 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
carr.projectlounge.com/10.0.16.91:54310
2010-03-04 10:50:44,773 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-03-04 10:50:44,774 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-03-04 10:50:44,816 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=pubget,pubget
2010-03-04 10:50:44,817 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-03-04 10:50:44,817 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2010-03-04 10:50:44,823 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2010-03-04 10:50:44,825 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2010-03-04 10:50:44,849 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 2687
2010-03-04 10:50:45,092 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 7
2010-03-04 10:50:45,095 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 347821 loaded in 0 seconds.
2010-03-04 10:50:45,104 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /mnt/hadoop/name/current/edits of size 4653 edits # 39 loaded in
0 seconds.
2010-03-04 10:50:45,114 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NumberFormatException: For input string: ""
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:424)
at java.lang.Long.parseLong(Long.java:461)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:670)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:997)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:292)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:279)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

2010-03-04 10:50:45,115 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down NameNode at carr.projectlounge.com/10.0.16.91
/


Re: syslog-ng and hadoop

2009-08-20 Thread mike anderson
I got it working! fantastic. One thing that hung me up for a while was how
picky the log4j.properties files are about syntax. For future reference to
others, I used this in log4j.properties:
# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=${hadoop.root.logger}, EventCounter, Socket


On Thu, Aug 20, 2009 at 11:16 AM, Edward Capriolo wrote:

> On Thu, Aug 20, 2009 at 10:49 AM, mike anderson
> wrote:
> > Yeah, that is interesting Edward. I don't need syslog-ng for any
> particular
> > reason, other than that I'm familiar with it. If there were another way
> to
> > get all my logs collated into one log file that would be great.
> > mike
> >
> > On Thu, Aug 20, 2009 at 10:44 AM, Edward Capriolo  >wrote:
> >
> >> On Wed, Aug 19, 2009 at 11:50 PM, Brian Bockelman
> >> wrote:
> >> > Hey Mike,
> >> >
> >> > Yup.  We find the stock log4j needs two things:
> >> >
> >> > 1) Set the rootLogger manually.  The way 0.19.x has the root logger
> set
> >> up
> >> > breaks when adding new appenders.  I.e., do:
> >> >
> >> > log4j.rootLogger=INFO,SYSLOG,console,DRFA,EventCounter
> >> >
> >> > 2) Add the headers; otherwise log4j is not compatible with syslog:
> >> >
> >> > log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
> >> > log4j.appender.SYSLOG.facility=local0
> >> > log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
> >> > log4j.appender.SYSLOG.layout.ConversionPattern=%p %c{2}: %m%n
> >> > log4j.appender.SYSLOG.SyslogHost=red
> >> > log4j.appender.SYSLOG.threshold=ERROR
> >> > log4j.appender.SYSLOG.Header=true
> >> > log4j.appender.SYSLOG.FacilityPrinting=true
> >> >
> >> > Brian
> >> >
> >> > On Aug 19, 2009, at 6:32 PM, Mike Anderson wrote:
> >> >
> >> >> Has anybody had any luck setting up the log4j.properties file to send
> >> logs
> >> >> to a syslog-ng server?
> >> >> My log4j.properties excerpt:
> >> >> log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
> >> >> log4j.appender.SYSLOG.syslogHost=10.0.20.164
> >> >> log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
> >> >> log4j.appender.SYSLOG.layout.ConversionPattern=%d{ISO8601} %p %c:
> %m%n
> >> >> log4j.appender.SYSLOG.Facility=HADOOP
> >> >>
> >> >> and my syslog-ng.conf file running on 10.0.20.164
> >> >>
> >> >> source s_hadoop {
> >> >>   # message generated by Syslog-NG
> >> >>   internal();
> >> >>   # standard Linux log source (this is the default place for the
> >> >> syslog()
> >> >>   # function to send logs to)
> >> >>   unix-stream("/dev/log");
> >> >>   udp();
> >> >> };
> >> >> destination df_hadoop { file("/var/log/hadoop/hadoop.log");};
> >> >> filter f_hadoop {facility(hadoop);};
> >> >> log {
> >> >> source(s_hadoop);
> >> >> filter(f_hadoop);
> >> >> destination(df_hadoop);
> >> >> };
> >> >>
> >> >>
> >> >> Thanks in advance,
> >> >> Mike
> >> >
> >> >
> >>
> >> Mike slightly off topic but you can also run a Log 4J server which
> >> perfectly transports the messages fired off by LOG4j. The
> >> log4J->syslog loses/ changes some information. If anyone is interested
> >> in this let me know and I will write up something about it.
> >>
> >
>
> Mike,
> I just put this up for you.
> http://www.edwardcapriolo.com/wiki/en/Log4j_Server
>
> All of the functionality is in the class
> org.apache.log4j.net.SocketServer which ships as part of Log4j.
>
> I pretty much followed this http://timarcher.com/node/10
>
> I started with the syslog appender but it had some quirks. Mostly the
> syslog appender can only write a syslog so it loses some information.
> The Log4jserver transfers the log.error("whatever" ) as is and can
> handle it on the server end though the servers logging properties.
> Cool stuff.
>


Re: syslog-ng and hadoop

2009-08-20 Thread mike anderson
Yeah, that is interesting Edward. I don't need syslog-ng for any particular
reason, other than that I'm familiar with it. If there were another way to
get all my logs collated into one log file that would be great.
mike

On Thu, Aug 20, 2009 at 10:44 AM, Edward Capriolo wrote:

> On Wed, Aug 19, 2009 at 11:50 PM, Brian Bockelman
> wrote:
> > Hey Mike,
> >
> > Yup.  We find the stock log4j needs two things:
> >
> > 1) Set the rootLogger manually.  The way 0.19.x has the root logger set
> up
> > breaks when adding new appenders.  I.e., do:
> >
> > log4j.rootLogger=INFO,SYSLOG,console,DRFA,EventCounter
> >
> > 2) Add the headers; otherwise log4j is not compatible with syslog:
> >
> > log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
> > log4j.appender.SYSLOG.facility=local0
> > log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
> > log4j.appender.SYSLOG.layout.ConversionPattern=%p %c{2}: %m%n
> > log4j.appender.SYSLOG.SyslogHost=red
> > log4j.appender.SYSLOG.threshold=ERROR
> > log4j.appender.SYSLOG.Header=true
> > log4j.appender.SYSLOG.FacilityPrinting=true
> >
> > Brian
> >
> > On Aug 19, 2009, at 6:32 PM, Mike Anderson wrote:
> >
> >> Has anybody had any luck setting up the log4j.properties file to send
> logs
> >> to a syslog-ng server?
> >> My log4j.properties excerpt:
> >> log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
> >> log4j.appender.SYSLOG.syslogHost=10.0.20.164
> >> log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
> >> log4j.appender.SYSLOG.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
> >> log4j.appender.SYSLOG.Facility=HADOOP
> >>
> >> and my syslog-ng.conf file running on 10.0.20.164
> >>
> >> source s_hadoop {
> >>   # message generated by Syslog-NG
> >>   internal();
> >>   # standard Linux log source (this is the default place for the
> >> syslog()
> >>   # function to send logs to)
> >>   unix-stream("/dev/log");
> >>   udp();
> >> };
> >> destination df_hadoop { file("/var/log/hadoop/hadoop.log");};
> >> filter f_hadoop {facility(hadoop);};
> >> log {
> >> source(s_hadoop);
> >> filter(f_hadoop);
> >> destination(df_hadoop);
> >> };
> >>
> >>
> >> Thanks in advance,
> >> Mike
> >
> >
>
> Mike slightly off topic but you can also run a Log 4J server which
> perfectly transports the messages fired off by LOG4j. The
> log4J->syslog loses/ changes some information. If anyone is interested
> in this let me know and I will write up something about it.
>


syslog-ng and hadoop

2009-08-19 Thread Mike Anderson
Has anybody had any luck setting up the log4j.properties file to send logs
to a syslog-ng server?
My log4j.properties excerpt:
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.syslogHost=10.0.20.164
log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.SYSLOG.Facility=HADOOP

and my syslog-ng.conf file running on 10.0.20.164

source s_hadoop {
# message generated by Syslog-NG
internal();
# standard Linux log source (this is the default place for the
syslog()
# function to send logs to)
unix-stream("/dev/log");
udp();
};
destination df_hadoop { file("/var/log/hadoop/hadoop.log");};
filter f_hadoop {facility(hadoop);};
log {
source(s_hadoop);
filter(f_hadoop);
destination(df_hadoop);
};


Thanks in advance,
Mike