Re: missing job history and strange MR job output

2012-01-16 Thread Ioan Eugen Stan

Pe 13.01.2012 06:00, Harsh J a scris:

Perhaps you aren't writing it properly? Its hard to tell what your
problem may be without looking at some code snippets
(sensitive/irrelevant parts may be cut out, or even pseudocode typed
up is fine), etc..



Hello Harsh and others,

It's fixed. After resolving a childish bug on my part (with building the 
Scan object) I still had problems with the setup. It ran everything up 
until waitForCompletion() where it hanged. I checked the logs and it 
barely showed any output from the MapReduceMini cluster. Just a few 
lines announcing the start of TaskTrackers and JobTrackers, etc.


Removing the local maven repository finally solved the issue and now I 
can happily continue with coding.


It seems that periodically cleaning maven repo is a must these days.

Thanks for the support,

--
Ioan Eugen Stan
http://ieugen.blogspot.com


Re: hadoop filesystem cache

2012-01-16 Thread Rita
Thanks. I believe this is a good feature to have for clients especially if
you are reading the same large file over and over.


On Sun, Jan 15, 2012 at 7:33 PM, Todd Lipcon  wrote:

> There is some work being done in this area by some folks over at UC
> Berkeley's AMP Lab in coordination with Facebook. I don't believe it
> has been published quite yet, but the title of the project is "PACMan"
> -- I expect it will be published soon.
>
> -Todd
>
> On Sat, Jan 14, 2012 at 5:30 PM, Rita  wrote:
> > After reading this article,
> > http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ , I was
> > wondering if there was a filesystem cache for hdfs. For example, if a
> large
> > file (10gigabytes) was keep getting accessed on the cluster instead of
> keep
> > getting it from the network why not storage the content of the file
> locally
> > on the client itself.  A use case on the client would be like this:
> >
> >
> >
> > 
> >  dfs.client.cachedirectory
> >  /var/cache/hdfs
> > 
> >
> >
> > 
> > dfs.client.cachesize
> > in megabytes
> > 10
> > 
> >
> >
> > Any thoughts of a feature like this?
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
--- Get your facts first, then you can distort them as you please.--


Re: Username on Hadoop 20.2

2012-01-16 Thread Eli Finkelshteyn

Hi Folks,
I'm still lost on this. Has no one wanted or needed to connect to a 
Hadoop cluster from a client machine under a name other than the 
client's whoami before?


Eli

On 1/13/12 11:00 AM, Eli Finkelshteyn wrote:
I tried this, and it doesn't seem to work. Specifically, the way I 
tested it was adding:



user.name
[my_username]


to core-site.xml. I then tried a test mkdir, and did an ls which 
showed the new folder had been created by my default whoami client 
username instead of the new one I had set. Do I need to add it 
somewhere else, or add something else to the property name? I'm using 
CDH3 with my Hadoop cluster currently setup with one node in 
pseudo-distributed mode, in case that helps.


Cheers,
Eli

On 1/12/12 5:39 PM, Joey Echeverria wrote:

Set the user.name property in your core-site.xml on your client nodes.

-Joey

On Thu, Jan 12, 2012 at 3:55 PM, Eli 
Finkelshteyn  wrote:

Hi,
If I have one username on a hadoop cluster and would like to set 
myself up
to use that same username from every client from which I access the 
cluster,

how can I go about doing that? I found information about setting
hadoop.job.ugi, and have tried setting this property variously in
hdfs-site.xml, core-site.xml, and mapred-site.xml, but nothing seems to
work. All I want is to be able to look like the same user no matter 
which of

my machines I connect to the cluster from. How can I do this?

Thanks!
Eli









Re: Username on Hadoop 20.2

2012-01-16 Thread Joey Echeverria
(-common-user, +cdh-user)

I'm moving the discussion since this is CDH specific issue. Setting
user.name works for plain 0.20.2, but not the CDH version as it's been
modified to support enabling Kerberos security. You'll need to modify your
code to use something like this:

UserGroupInformation.createRemoteUser("cuser").doAs(new
PrivilegedExceptionAction()... {
 void run() {
   // submit my evil job
 }
};

Where cuser is the name of the user you want everything to run as on the
cluster.

-Joey

On Mon, Jan 16, 2012 at 12:02 PM, Eli Finkelshteyn wrote:

> Hi Folks,
> I'm still lost on this. Has no one wanted or needed to connect to a Hadoop
> cluster from a client machine under a name other than the client's whoami
> before?
>
> Eli
>
>
> On 1/13/12 11:00 AM, Eli Finkelshteyn wrote:
>
>> I tried this, and it doesn't seem to work. Specifically, the way I tested
>> it was adding:
>>
>> 
>> user.name
>> [my_username]
>> 
>>
>> to core-site.xml. I then tried a test mkdir, and did an ls which showed
>> the new folder had been created by my default whoami client username
>> instead of the new one I had set. Do I need to add it somewhere else, or
>> add something else to the property name? I'm using CDH3 with my Hadoop
>> cluster currently setup with one node in pseudo-distributed mode, in case
>> that helps.
>>
>> Cheers,
>> Eli
>>
>> On 1/12/12 5:39 PM, Joey Echeverria wrote:
>>
>>> Set the user.name property in your core-site.xml on your client nodes.
>>>
>>> -Joey
>>>
>>> On Thu, Jan 12, 2012 at 3:55 PM, Eli 
>>> Finkelshteyn>
>>>  wrote:
>>>
 Hi,
 If I have one username on a hadoop cluster and would like to set myself
 up
 to use that same username from every client from which I access the
 cluster,
 how can I go about doing that? I found information about setting
 hadoop.job.ugi, and have tried setting this property variously in
 hdfs-site.xml, core-site.xml, and mapred-site.xml, but nothing seems to
 work. All I want is to be able to look like the same user no matter
 which of
 my machines I connect to the cluster from. How can I do this?

 Thanks!
 Eli

>>>
>>>
>>>
>>
>


-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: hadoop filesystem cache

2012-01-16 Thread Edward Capriolo
The challenges of this design is people accessing the same data over and
over again is the uncommon usecase for hadoop. Hadoop's bread and butter is
all about streaming through large datasets that do not fit in memory. Also
your shuffle-sort-spill is going to play havoc on and file system based
cache. The distributed cache roughly fits this role except that it does not
persist after a job.

Replicating content to N nodes also is not a hard problem to tackle (you
can hack up a content delivery system with ssh+rsync) and get similar
results.The approach often taken has been to keep data that is accessed
repeatedly and fits in memory in some other system
(hbase/cassandra/mysql/whatever).

Edward


On Mon, Jan 16, 2012 at 11:33 AM, Rita  wrote:

> Thanks. I believe this is a good feature to have for clients especially if
> you are reading the same large file over and over.
>
>
> On Sun, Jan 15, 2012 at 7:33 PM, Todd Lipcon  wrote:
>
> > There is some work being done in this area by some folks over at UC
> > Berkeley's AMP Lab in coordination with Facebook. I don't believe it
> > has been published quite yet, but the title of the project is "PACMan"
> > -- I expect it will be published soon.
> >
> > -Todd
> >
> > On Sat, Jan 14, 2012 at 5:30 PM, Rita  wrote:
> > > After reading this article,
> > > http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ , I
> was
> > > wondering if there was a filesystem cache for hdfs. For example, if a
> > large
> > > file (10gigabytes) was keep getting accessed on the cluster instead of
> > keep
> > > getting it from the network why not storage the content of the file
> > locally
> > > on the client itself.  A use case on the client would be like this:
> > >
> > >
> > >
> > > 
> > >  dfs.client.cachedirectory
> > >  /var/cache/hdfs
> > > 
> > >
> > >
> > > 
> > > dfs.client.cachesize
> > > in megabytes
> > > 10
> > > 
> > >
> > >
> > > Any thoughts of a feature like this?
> > >
> > >
> > > --
> > > --- Get your facts first, then you can distort them as you please.--
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>


failed to build trunk, what's wrong?

2012-01-16 Thread smith jack
mvn compile and failed:(
jdk version is "1.6.0_23"
maven version is Apache Maven 3.0.3
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-antrun-plugin:1.6:run (compile-proto) on
project hadoop-common: An Ant BuildException has occured: exec returned:
127 -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (compile-proto)
on project hadoop-common: An Ant BuildException has occured: exec returned:
127
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:217)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
at
org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
at
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:319)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
at
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
at
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
at
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
Caused by: org.apache.maven.plugin.MojoExecutionException: An Ant
BuildException has occured: exec returned: 127
at
org.apache.maven.plugin.antrun.AntRunMojo.execute(AntRunMojo.java:283)
at
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
... 19 more
Caused by:
/home/jack/home/download/build/hadoop-common/hadoop-common-project/hadoop-common/target/antrun/build-main.xml:23:
exec returned: 127
at
org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:650)
at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:676)
at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:502)
at
org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:390)
at org.apache.tools.ant.Target.performTasks(Target.java:411)
at
org.apache.tools.ant.Project.executeSortedTargets(Project.java:1397)
at org.apache.tools.ant.Project.executeTarget(Project.java:1366)
at
org.apache.maven.plugin.antrun.AntRunMojo.execute(AntRunMojo.java:270)
... 21 more
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command
[ERROR]   mvn  -rf :hadoop-common


Re: failed to build trunk, what's wrong?

2012-01-16 Thread Ronald Petty
Hello,

If you type protoc on the command line is it found?

Kindest regards.

Ron

On Sat, Jan 14, 2012 at 5:52 PM, smith jack  wrote:

> mvn compile and failed:(
> jdk version is "1.6.0_23"
> maven version is Apache Maven 3.0.3
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-antrun-plugin:1.6:run (compile-proto) on
> project hadoop-common: An Ant BuildException has occured: exec returned:
> 127 -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
> goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (compile-proto)
> on project hadoop-common: An Ant BuildException has occured: exec returned:
> 127
>at
>
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:217)
>at
>
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>at
>
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>at
>
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
>at
>
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
>at
>
> org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
>at
>
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
>at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:319)
>at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
>at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
>at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
>at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at
>
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
>at
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
>at
>
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
>at
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
> Caused by: org.apache.maven.plugin.MojoExecutionException: An Ant
> BuildException has occured: exec returned: 127
>at
> org.apache.maven.plugin.antrun.AntRunMojo.execute(AntRunMojo.java:283)
>at
>
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
>at
>
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
>... 19 more
> Caused by:
>
> /home/jack/home/download/build/hadoop-common/hadoop-common-project/hadoop-common/target/antrun/build-main.xml:23:
> exec returned: 127
>at
> org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:650)
>at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:676)
>at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:502)
>at
> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at
> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>at org.apache.tools.ant.Task.perform(Task.java:348)
>at org.apache.tools.ant.Target.execute(Target.java:390)
>at org.apache.tools.ant.Target.performTasks(Target.java:411)
>at
> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1397)
>at org.apache.tools.ant.Project.executeTarget(Project.java:1366)
>at
> org.apache.maven.plugin.antrun.AntRunMojo.execute(AntRunMojo.java:270)
>... 21 more
> [ERROR]
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn  -rf :hadoop-common
>


effect on data after topology change

2012-01-16 Thread rk vishu
Hello All,

If i change the rackid for some nodes and restart namenode, will data be
rearranged accordingly? Do i need to run rebalancer?

Any information on this would be appreciated.

Thanks and Regards
Ravi


small files problem in hdfs

2012-01-16 Thread rk vishu
Hello All,

Could any one give me some information how flume handles small files? If
flume agents are setup for text log files, how will flume ensure that there
are not many small files?. I believe waiting for fixed time before pumping
to HDFS may not guarantee the block sized files.

I am trying to write a client app to collect data to hdfs directly using
Java APIs. I am sure i will come across this issue. Are there any utilities
or tricks to combine files from hdfs to larger files (without an MR job).

Any help will be greatly appreciated

-R


Re: small files problem in hdfs

2012-01-16 Thread W.P. McNeill
Write a Hadoop job that uses the default mapper and reducer. Specify the
number of reducers when you run it, and it will produce that many output
files, grouping input files together as necessary.


Can you unset a mapred.input.dir configuration value?

2012-01-16 Thread W.P. McNeill
It is possible to unset a configuration value? I think the answer is no,
but I want to be sure.

I know that you can set a configuration value to the empty string, but I
have a scenario in which that is not an option. I have a top level Hadoop
Tool that launches a series of other Hadoop jobs in its run() method. The
output of the first sub-job becomes the input of the second one and so on.
The top-level Tool takes a configuration file which specifies parameters
used by all the sub-jobs. It also specifies a mapred.input.dir value which
serves as the input directory to the first sub-job.

TopLevelJob() {
 job1 = createJob1(configuration);
 // Run job 1
 job2 = createJob2();
 FileInputFormat.addInputPath(configuration, job1-output)
 // Run job 2
}

The problem is that addInputPath() appends a value to the end of
mapred.input.dir, erroneously leaving the input directory for Job 1 on the
list for Job 2. If I try to delete Job 1's input dir by setting
mapred.input.dir to the empty string like so:

configuration.set("mapred.input.dir", "")

the addInputPath() method appends the input path, giving the value
",job1-output". The first element of this list is the empty string, which
causes an Exception.

I can work around this by calling configuration.set("mapred.input.dir")
directly when creating Job 2, but this feels like a hack. It seems like the
proper way to set input paths is via a FileInputFormat method instead of by
setting the property directly.


Re: Can you unset a mapred.input.dir configuration value?

2012-01-16 Thread Joey Echeverria
You can use  FileInptuFormat.setInputPaths(configuration,
job1-output). This will overwrite the old input path(s).

-Joey

On Mon, Jan 16, 2012 at 7:16 PM, W.P. McNeill  wrote:
>
> It is possible to unset a configuration value? I think the answer is no,
> but I want to be sure.
>
> I know that you can set a configuration value to the empty string, but I
> have a scenario in which that is not an option. I have a top level Hadoop
> Tool that launches a series of other Hadoop jobs in its run() method. The
> output of the first sub-job becomes the input of the second one and so on.
> The top-level Tool takes a configuration file which specifies parameters
> used by all the sub-jobs. It also specifies a mapred.input.dir value which
> serves as the input directory to the first sub-job.
>
> TopLevelJob() {
>  job1 = createJob1(configuration);
>  // Run job 1
>  job2 = createJob2();
>  FileInputFormat.addInputPath(configuration, job1-output)
>  // Run job 2
> }
>
> The problem is that addInputPath() appends a value to the end of
> mapred.input.dir, erroneously leaving the input directory for Job 1 on the
> list for Job 2. If I try to delete Job 1's input dir by setting
> mapred.input.dir to the empty string like so:
>
> configuration.set("mapred.input.dir", "")
>
> the addInputPath() method appends the input path, giving the value
> ",job1-output". The first element of this list is the empty string, which
> causes an Exception.
>
> I can work around this by calling configuration.set("mapred.input.dir")
> directly when creating Job 2, but this feels like a hack. It seems like the
> proper way to set input paths is via a FileInputFormat method instead of by
> setting the property directly.




--
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: How to find out whether a node is Overloaded from Cpu utilization ?

2012-01-16 Thread Amandeep Khurana
Arun,

I don't think you'll hear a fixed number. Having said that, I have seen CPU
being pegged at 95% during jobs and the cluster working perfectly fine. On
the slaves, if you have nothing else going on, Hadoop only has TaskTrackers
and DataNodes. Those two daemons are relatively light weight in terms of
CPU for the most part. So, you can afford to let your tasks take up a high
%.

Hope that helps.

-Amandeep

On Tue, Jan 17, 2012 at 2:16 PM, ArunKumar  wrote:

> Hi  Guys !
>
> When we get CPU utilization value of a node  in hadoop cluster, what
> percent
> value can be considered as overloaded ?
> Say for eg.
>
>CPU utilizationNode Status
> 85%  Overloaded
>  20%Normal
>
>
> Arun
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-find-out-whether-a-node-is-Overloaded-from-Cpu-utilization-tp3665289p3665289.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>


Best practices to recover from Corrupt Namenode

2012-01-16 Thread praveenesh kumar
Hi guys,

I just faced a weird situation, in which one of my hard disks on DN went
down.
Due to which when I restarted namenode, some of the blocks went missing and
it was saying my namenode is CORRUPT and in safe mode, which doesn't allow
you to add or delete any files on HDFS.

I know , we can close the safe mode part.
Problem is how to deal with Corrupt Namenode problem in this case -- Best
practices.

In my case, I was lucky that all missing blocks were that of the Outputs of
my M/R codes I ran previously.
So I just deleted all those files with the missing blocks from HDFS to come
from CORRUPT --> HEALTHY state.

But had it be for the large input data files , it won't be a good solution
in that case to delete those files.

So I wanted to know what should be the best practices to deal with above
kind of problems to go from CORRUPT NAMENODE --> HEALTHY NAMENODE?

Thanks,
Praveenesh


Re: Best practices to recover from Corrupt Namenode

2012-01-16 Thread Harsh J
You ran into a corrupt files issue, not a namenode corruption (which generally 
refers to the fsimage or edits getting corrupted).

Did your files not have adequate replication that they could not withstand the 
loss of one DN's disk? What exactly did fsck output? Did all block replicas go 
missing for your files?

On 17-Jan-2012, at 12:08 PM, praveenesh kumar wrote:

> Hi guys,
> 
> I just faced a weird situation, in which one of my hard disks on DN went
> down.
> Due to which when I restarted namenode, some of the blocks went missing and
> it was saying my namenode is CORRUPT and in safe mode, which doesn't allow
> you to add or delete any files on HDFS.
> 
> I know , we can close the safe mode part.
> Problem is how to deal with Corrupt Namenode problem in this case -- Best
> practices.
> 
> In my case, I was lucky that all missing blocks were that of the Outputs of
> my M/R codes I ran previously.
> So I just deleted all those files with the missing blocks from HDFS to come
> from CORRUPT --> HEALTHY state.
> 
> But had it be for the large input data files , it won't be a good solution
> in that case to delete those files.
> 
> So I wanted to know what should be the best practices to deal with above
> kind of problems to go from CORRUPT NAMENODE --> HEALTHY NAMENODE?
> 
> Thanks,
> Praveenesh

--
Harsh J
Customer Ops. Engineer, Cloudera



Re: Best practices to recover from Corrupt Namenode

2012-01-16 Thread praveenesh kumar
I have a replication factor of 2, because of the reason that I can not
afford 3 replicas on my cluster.
fsck output was saying block replicas missing for some files that was
making Namenode is corrupt
I don't have the output with me. but issue was block replicas were missing.
How can we tackle that ?

Is their an internal mechanism of creating new blocks, if they were found
missing / some kind of refresh command  or something ?


Thanks,
Praveenesh

On Tue, Jan 17, 2012 at 12:48 PM, Harsh J  wrote:

> You ran into a corrupt files issue, not a namenode corruption (which
> generally refers to the fsimage or edits getting corrupted).
>
> Did your files not have adequate replication that they could not withstand
> the loss of one DN's disk? What exactly did fsck output? Did all block
> replicas go missing for your files?
>
> On 17-Jan-2012, at 12:08 PM, praveenesh kumar wrote:
>
> > Hi guys,
> >
> > I just faced a weird situation, in which one of my hard disks on DN went
> > down.
> > Due to which when I restarted namenode, some of the blocks went missing
> and
> > it was saying my namenode is CORRUPT and in safe mode, which doesn't
> allow
> > you to add or delete any files on HDFS.
> >
> > I know , we can close the safe mode part.
> > Problem is how to deal with Corrupt Namenode problem in this case -- Best
> > practices.
> >
> > In my case, I was lucky that all missing blocks were that of the Outputs
> of
> > my M/R codes I ran previously.
> > So I just deleted all those files with the missing blocks from HDFS to
> come
> > from CORRUPT --> HEALTHY state.
> >
> > But had it be for the large input data files , it won't be a good
> solution
> > in that case to delete those files.
> >
> > So I wanted to know what should be the best practices to deal with above
> > kind of problems to go from CORRUPT NAMENODE --> HEALTHY NAMENODE?
> >
> > Thanks,
> > Praveenesh
>
> --
> Harsh J
> Customer Ops. Engineer, Cloudera
>
>