Re: Secondary namenode fsimage concept

2011-10-05 Thread Kai Voigt
Hi,

the secondary namenode only fetches the two files when a checkpointing is 
needed.

Kai

Am 06.10.2011 um 08:45 schrieb shanmuganathan.r:

> Hi Kai,
> 
>  In the Second part I meant 
> 
> 
> Is the secondary namenode also contain the FSImage file or the two 
> files(FSImage and EdiltLog) are transferred from the namenode at the 
> checkpoint time.
> 
> 
> Thanks 
> Shanmuganathan
> 
> 
> 
> 
> 
>  On Thu, 06 Oct 2011 11:37:50 +0530 Kai Voigt wrote 
>  
> 
> 
> Hi, 
> 
> you're correct when saying the namenode hosts the fsimage file and the edits 
> log file. 
> 
> The fsimage file contains a snapshot of the HDFS metadata (a filename to 
> blocks list mapping). Whenever there is a change to HDFS, it will be appended 
> to the edits file. Think of it as a database transaction log, where changes 
> will not be applied to the datafile, but appended to a log. 
> 
> To prevent the edits file growing infinitely, the secondary namenode 
> periodically pulls these two files, and the namenode starts writing changes 
> to a new edits file. Then, the secondary namenode merges the changes from the 
> edits file with the old snapshot from the fsimage file and creates an updated 
> fsimage file. This updated fsimage file is then copied to the namenode. 
> 
> Then, the entire cycle starts again. To answer your question: The namenode 
> has both files, even if the secondary namenode is running on a different 
> machine. 
> 
> Kai 
> 
> Am 06.10.2011 um 07:57 schrieb shanmuganathan.r: 
> 
> > 
> > Hi All, 
> > 
> > I have a doubt in hadoop secondary namenode concept . Please correct if 
> the following statements are wrong . 
> > 
> > 
> > The namenode hosts the fsimage and edit log files. The secondary 
> namenode hosts the fsimage file only. At the time of checkpoint the edit log 
> file is transferred to the secondary namenode and the both files are merged, 
> Then the updated fsimage file is transferred to the namenode . Is it correct? 
> > 
> > 
> > If we run the secondary namenode in separate machine , then both 
> machines contain the fsimage file . Namenode only contains the editlog file. 
> Is it true? 
> > 
> > 
> > 
> > Thanks R.Shanmuganathan 
> > 
> > 
> > 
> > 
> > 
> > 
> 
> -- 
> Kai Voigt 
> k...@123.org 
> 
> 
> 
> 
> 
> 
> 

-- 
Kai Voigt
k...@123.org






Re: Secondary namenode fsimage concept

2011-10-05 Thread shanmuganathan.r
Hi Kai,

  In the Second part I meant 


Is the secondary namenode also contain the FSImage file or the two 
files(FSImage and EdiltLog) are transferred from the namenode at the checkpoint 
time.


Thanks 
Shanmuganathan





 On Thu, 06 Oct 2011 11:37:50 +0530 Kai Voigt wrote 
 


Hi, 
 
you're correct when saying the namenode hosts the fsimage file and the edits 
log file. 
 
The fsimage file contains a snapshot of the HDFS metadata (a filename to blocks 
list mapping). Whenever there is a change to HDFS, it will be appended to the 
edits file. Think of it as a database transaction log, where changes will not 
be applied to the datafile, but appended to a log. 
 
To prevent the edits file growing infinitely, the secondary namenode 
periodically pulls these two files, and the namenode starts writing changes to 
a new edits file. Then, the secondary namenode merges the changes from the 
edits file with the old snapshot from the fsimage file and creates an updated 
fsimage file. This updated fsimage file is then copied to the namenode. 
 
Then, the entire cycle starts again. To answer your question: The namenode has 
both files, even if the secondary namenode is running on a different machine. 
 
Kai 
 
Am 06.10.2011 um 07:57 schrieb shanmuganathan.r: 
 
> 
> Hi All, 
> 
> I have a doubt in hadoop secondary namenode concept . Please correct if 
the following statements are wrong . 
> 
> 
> The namenode hosts the fsimage and edit log files. The secondary namenode 
hosts the fsimage file only. At the time of checkpoint the edit log file is 
transferred to the secondary namenode and the both files are merged, Then the 
updated fsimage file is transferred to the namenode . Is it correct? 
> 
> 
> If we run the secondary namenode in separate machine , then both machines 
contain the fsimage file . Namenode only contains the editlog file. Is it true? 
> 
> 
> 
> Thanks R.Shanmuganathan 
> 
> 
> 
> 
> 
> 
 
-- 
Kai Voigt 
k...@123.org 
 
 
 
 





Re: Secondary namenode fsimage concept

2011-10-05 Thread Kai Voigt
Hi,

you're correct when saying the namenode hosts the fsimage file and the edits 
log file.

The fsimage file contains a snapshot of the HDFS metadata (a filename to blocks 
list mapping). Whenever there is a change to HDFS, it will be appended to the 
edits file. Think of it as a database transaction log, where changes will not 
be applied to the datafile, but appended to a log.

To prevent the edits file growing infinitely, the secondary namenode 
periodically pulls these two files, and the namenode starts writing changes to 
a new edits file. Then, the secondary namenode merges the changes from the 
edits file with the old snapshot from the fsimage file and creates an updated 
fsimage file. This updated fsimage file is then copied to the namenode.

Then, the entire cycle starts again. To answer your question: The namenode has 
both files, even if the secondary namenode is running on a different machine.

Kai

Am 06.10.2011 um 07:57 schrieb shanmuganathan.r:

> 
> Hi All,
> 
>I have a doubt in hadoop secondary namenode concept . Please 
> correct if the following statements are wrong .
> 
> 
> The namenode hosts the fsimage and edit log files. The secondary namenode 
> hosts the fsimage file only. At the time of checkpoint the edit log file is 
> transferred to the secondary namenode and the both files are merged, Then the 
> updated fsimage file is transferred to the namenode . Is it correct?
> 
> 
> If we run the secondary namenode in separate machine , then both machines 
> contain the fsimage file . Namenode only contains the editlog file. Is it 
> true?
> 
> 
> 
> Thanks R.Shanmuganathan  
> 
> 
> 
> 
> 
> 

-- 
Kai Voigt
k...@123.org






Secondary namenode fsimage concept

2011-10-05 Thread shanmuganathan.r

Hi All,

I have a doubt in hadoop secondary namenode concept . Please 
correct if the following statements are wrong .


The namenode hosts the fsimage and edit log files. The secondary namenode hosts 
the fsimage file only. At the time of checkpoint the edit log file is 
transferred to the secondary namenode and the both files are merged, Then the 
updated fsimage file is transferred to the namenode . Is it correct?


If we run the secondary namenode in separate machine , then both machines 
contain the fsimage file . Namenode only contains the editlog file. Is it true?



Thanks R.Shanmuganathan  








How to evenly split data file

2011-10-05 Thread Thomas Anderson
I don't use mapreduce, and just practice using Hadoop common api to
manually split a data file, in which data is stored in a form of
SequceFileInputFormat.

The way to split file is by dividing file length by total tasks
number. InputSplit created will be passed RecordReader and read from
designated path. The code is as below:

   private void readPartOfDataFile() {
  taskId = getTaskId();
  InputSplit split = getSplit(taskid);
  SequenceFileRecordReader input = new
SequenceFileRecordReader(conf, (FileSplit) split);
  Text url = input.createKey();
  CustomData d = input.createValue();
  int count = 0;
  while(input.next(url, d)) {
count++;
  }
}

private InputSplit getSplit(final int taskid) throws IOException {
  FileSystem fs = FileSystem.get(conf);
  Path filePath = new Path("path/to/", "file");
  FileStatus[] status = fs.listStatus(filePath);
  int maxTasks = conf.getInt("test.maxtasks", 12);
  for(FileStatus file: status) {
if(file.isDir()) { // get data file
  Path dataFile = new Path(file.getPath(), "data");
  FileStatus data = fs.getFileStatus(dataFile);
  long dataLength = data.getLen();
  BlockLocation[] locations =
fs.getFileBlockLocations(data, 0, dataLength);
  if(0 < dataLength) {
long chunk = dataLength/(long)maxTasks;
long beg = (taskid*chunk)+(long)1;
long end = (taskid+1)*chunk;
if(maxTasks == (taskid+1)) {
  end = dataLength;
}
return new FileSplit(dataFile, beg, end,
locations[locations.length-1].getHosts());
  } else {
LOG.info("No Data for file:"+file.getPath());
  }
}// is dir
  }// for
  return null;
}

However, it seems that the records read from data file is not equally
distributed. For instance, data file may contain 1200 records and data
length is around 74250. With 12 max tasks, each task may roughly hold
size around 6187 (per split). But the records displayed shows that
each task may hold various  records (e.g. task 4 read records 526.
task 5 read 632. task 6 read 600) and the total count records is
larger than the total records stored. I check
JobClient.writeOldSplits(). It seems similar to the way to JobClient
divides data. What is missing when considering split data with hadoop
common api?


Re: problem while running wordcount on lion x

2011-10-05 Thread Jignesh Patel
Thanks for this information. The jar problem is resolved now.

Exception in thread "main" 
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
not exist: hdfs://localhost:9000/user/hadoop-user/input

I know I don't have input directory but before that I need to find out where 
this user/hadoop-user got created and I am not able to find it.
My root is /Users/Hadoop-User.

-Jignesh

On Oct 5, 2011, at 8:35 PM, Brock Noland wrote:

> Hi,
> 
> Answers, inline.
> 
> On Wed, Oct 5, 2011 at 7:31 PM, Jignesh Patel  wrote:
> 
>> have used eclipse to export the file and then got following error
>> 
>> hadoop-user$ bin/hadoop jar
>> wordcountsmp/wordcount.jarorg.apache.hadoop.examples.WordCount input output
>> 
>> 
>> Exception in thread "main" java.io.IOException: Error opening job jar:
>> wordcountsmp/wordcount.jarorg.apache.hadoop.examples.WordCount
>>   at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
>> Caused by: java.util.zip.ZipException: error in opening zip file
>>   at java.util.zip.ZipFile.open(Native Method)
>>   at java.util.zip.ZipFile.(ZipFile.java:127)
>>   at java.util.jar.JarFile.(JarFile.java:135)
>>   at java.util.jar.JarFile.(JarFile.java:72)
>>   at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
>> 
>> 
> OK, the problem above is that you are missing a space, it should be:
> 
> hadoop-user$ bin/hadoop jar wordcountsmp/wordcount.jar
> org.apache.hadoop.examples.WordCount input output
> 
> with a space between the jar and the class name.
> 
> 
>> I tried following
>> java -jar xf wordcountsmp/wordcount.jar
>> 
> 
> That's not how you extract a jar. It should be:
> 
> jar tf wordcountsmp/wordcount.jar
> 
> to get a listing of the jar and:
> 
> jar xf wordcountsmp/wordcount.jar
> 
> To extract it.
> 
> 
>> and got the error
>> 
>> Unable to access jar file xf
>> 
>> my jar file size is 5kb. I am feeling somehow eclipse export in macOS is
>> not creating appropriate jar.
>> 
>> 
>> 
>> 
>> On Oct 5, 2011, at 8:16 PM, Brock Noland wrote:
>> 
>>> Hi,
>>> 
>>> On Wed, Oct 5, 2011 at 7:13 PM, Jignesh Patel 
>> wrote:
 
 
 I also found another problem if I directly export from eclipse as a jar
 file then while trying javac -jar or hadoop -jar doesn't recognize that
>> jar.
 However same jar works well with windows.
>>> 
>>> 
>>> 
>>> Can you please share the error message? Note, the structure of the hadoop
>>> command is:
>>> 
>>> hadoop jar file.jar class.name
>>> 
>>> Note, no - in front of jar like `java -jar'
>>> 
>>> Brock
>> 
>> 



Re: problem while running wordcount on lion x

2011-10-05 Thread Brock Noland
Hi,

Answers, inline.

On Wed, Oct 5, 2011 at 7:31 PM, Jignesh Patel  wrote:

>  have used eclipse to export the file and then got following error
>
> hadoop-user$ bin/hadoop jar
> wordcountsmp/wordcount.jarorg.apache.hadoop.examples.WordCount input output
>
>
> Exception in thread "main" java.io.IOException: Error opening job jar:
> wordcountsmp/wordcount.jarorg.apache.hadoop.examples.WordCount
>at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> Caused by: java.util.zip.ZipException: error in opening zip file
>at java.util.zip.ZipFile.open(Native Method)
>at java.util.zip.ZipFile.(ZipFile.java:127)
>at java.util.jar.JarFile.(JarFile.java:135)
>at java.util.jar.JarFile.(JarFile.java:72)
>at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
>
>
OK, the problem above is that you are missing a space, it should be:

hadoop-user$ bin/hadoop jar wordcountsmp/wordcount.jar
org.apache.hadoop.examples.WordCount input output

with a space between the jar and the class name.


> I tried following
> java -jar xf wordcountsmp/wordcount.jar
>

That's not how you extract a jar. It should be:

jar tf wordcountsmp/wordcount.jar

to get a listing of the jar and:

jar xf wordcountsmp/wordcount.jar

To extract it.


> and got the error
>
> Unable to access jar file xf
>
> my jar file size is 5kb. I am feeling somehow eclipse export in macOS is
> not creating appropriate jar.
>
>
>
>
> On Oct 5, 2011, at 8:16 PM, Brock Noland wrote:
>
> > Hi,
> >
> > On Wed, Oct 5, 2011 at 7:13 PM, Jignesh Patel 
> wrote:
> >>
> >>
> >> I also found another problem if I directly export from eclipse as a jar
> >> file then while trying javac -jar or hadoop -jar doesn't recognize that
> jar.
> >> However same jar works well with windows.
> >
> >
> >
> > Can you please share the error message? Note, the structure of the hadoop
> > command is:
> >
> > hadoop jar file.jar class.name
> >
> > Note, no - in front of jar like `java -jar'
> >
> > Brock
>
>


Re: problem while running wordcount on lion x

2011-10-05 Thread Jignesh Patel
 have used eclipse to export the file and then got following error

hadoop-user$ bin/hadoop jar 
wordcountsmp/wordcount.jarorg.apache.hadoop.examples.WordCount input output


Exception in thread "main" java.io.IOException: Error opening job jar: 
wordcountsmp/wordcount.jarorg.apache.hadoop.examples.WordCount
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:127)
at java.util.jar.JarFile.(JarFile.java:135)
at java.util.jar.JarFile.(JarFile.java:72)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)


I tried following 
java -jar xf wordcountsmp/wordcount.jar

and got the error

Unable to access jar file xf

my jar file size is 5kb. I am feeling somehow eclipse export in macOS is not 
creating appropriate jar.




On Oct 5, 2011, at 8:16 PM, Brock Noland wrote:

> Hi,
> 
> On Wed, Oct 5, 2011 at 7:13 PM, Jignesh Patel  wrote:
>> 
>> 
>> I also found another problem if I directly export from eclipse as a jar
>> file then while trying javac -jar or hadoop -jar doesn't recognize that jar.
>> However same jar works well with windows.
> 
> 
> 
> Can you please share the error message? Note, the structure of the hadoop
> command is:
> 
> hadoop jar file.jar class.name
> 
> Note, no - in front of jar like `java -jar'
> 
> Brock



Re: problem while running wordcount on lion x

2011-10-05 Thread Brock Noland
Hi,

On Wed, Oct 5, 2011 at 7:13 PM, Jignesh Patel  wrote:
>
>
> I also found another problem if I directly export from eclipse as a jar
> file then while trying javac -jar or hadoop -jar doesn't recognize that jar.
> However same jar works well with windows.



Can you please share the error message? Note, the structure of the hadoop
command is:

hadoop jar file.jar class.name

Note, no - in front of jar like `java -jar'

Brock


problem while running wordcount on lion x

2011-10-05 Thread Jignesh Patel

On Oct 5, 2011, at 8:10 PM, Jignesh Patel wrote:

> While running Wordcount from eclipse on Lion X after setting up everything I 
> got following error.
> 
> Exception in thread "main" 
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
> not exist: hdfs://localhost:9000/user/hadoop-user/input
> 
> I have used following command
> 
>  bin/hadoop fs -mkdir /user/hadoop-user
> 
> and it has created directory for me.
> 
> when I checked I found following
> 
> Jignesh-MacBookPro:hadoop hadoop-user$ bin/hadoop fs -ls /
> 2011-10-05 20:05:03.073 java[45432:1903] Unable to load realm info from 
> SCDynamicStore
> Found 2 items
> drwxr-xr-x   - hadoop-user supergroup  0 2011-10-05 19:52 /user
> drwxr-xr-x   - hadoop-user supergroup  0 2011-10-05 00:59 /users
> 
> 
> I am wondering where did following directory created as I am unable to locate 
> it because my base directory is users not user
> /user/hadoop-user 


I also found another problem if I directly export from eclipse as a jar file 
then while trying javac -jar or hadoop -jar doesn't recognize that jar. However 
same jar works well with windows.

Re: pointing mapred.local.dir to a ramdisk

2011-10-05 Thread Raj V
Thanks Joey, Todd,  Vinod , Edward

I have mixed news. The problem of the task tracker not starting was was indeed 
permssion related. under /ramdisk there was a lost+found that was owned by 
root, eventhough /ramdisk was owned by mapred:hadoop. This was the cause of the 
problem. Now I will see if I can fix the error message to something better than 
"Null pointer exception". 

Once I saw all my task trackers were UP I started with my TTT ( trusted teragen 
and terasort :-)).

I ran teragen with a data size of 10GB ( 100MB records). I have 500 nodes and I 
wanted 2000 files.  

It takes 19 minutes to complete - awfully slow  There is no swapping ,, CPU is 
not pegged so things look Ok. I ran it a couple of times and it takes about 
15-19 minutes. It is the not the same same nodes either.  But that could be a 
problem with something local.  We will ignore it for now.

TeraSort doees not ever complete. 

I get the following errors on majority of the nodes.

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid 
local directory for output/spill0.out at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:376)
 at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
 at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
 at 
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
 at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1247)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1155) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:392) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:270) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at
 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
 at org.apache.hadoop.mapred.Child.main

I know this indicates lack of space but I have a df monitoring the disk space 
on all the nodes and all nodes have loads of dissk space available. The 
ramdiisk is never more than 25% full.

So once again - any clues?

Raj


>
>From: Raj V 
>To: "common-user@hadoop.apache.org" 
>Sent: Monday, October 3, 2011 12:31 PM
>Subject: Re: pointing mapred.local.dir to a ramdisk
>
>Joey
>
>Thanks. Will try and uppgrade to a newer version and check. I will also change 
>the logs to debug and see if more information is available.
>
>Raj
>
>
>
>>
>>From: Joey Echeverria 
>>To: common-user@hadoop.apache.org; Raj V 
>>Sent: Monday, October 3, 2011 11:49 AM
>>Subject: Re: pointing mapred.local.dir to a ramdisk
>>
>>Raj,
>>
>>I just tried this on my CHD3u1 VM, and the ramdisk worked the first
>>time. So, it's possible you've hit a bug in CDH3b3 that was later
>>fixed. Can you enable debug logging in log4j.properties and then
>>repost your task tracker log? I think there might be more details that
>>it will print that will be helpful.
>>
>>-Joey
>>
>>On Mon, Oct 3, 2011 at 2:18 PM, Raj V  wrote:
>>> Edward
>>>
>>> I understand the size limitations - but for my experiment the ramdisk size 
>>> I have created is large enough.
>>> I think there will be substantial benefits by putting the intermediate map 
>>> outputs on a ramdisk - size permitting, ofcourse, but I can't provide any 
>>> numbers to substantiate my claim  given that I can't get it to run.
>>>
>>> -best regards
>>>
>>> Raj
>>>
>>>
>>>

From: Edward Capriolo 
To: common-user@hadoop.apache.org
Cc: Raj V 
Sent: Monday, October 3, 2011 10:36 AM
Subject: Re: pointing mapred.local.dir to a ramdisk

This directory can get very large, in many cases I doubt it would fit on a
ram disk.

Also RAM Disks tend to help most with random read/write, since hadoop is
doing mostly linear IO you may not see a great benefit from the RAM disk.



On Mon, Oct 3, 2011 at 12:07 PM, Vinod Kumar Vavilapalli <
vino...@hortonworks.com> wrote:

> Must be related to some kind of permissions problems.
>
> It will help if you can paste the corresponding source code for
> FileUtil.copy(). Hard to track it with different versions, so.
>
> Thanks,
> +Vinod
>
>
> On Mon, Oct 3, 2011 at 9:28 PM, Raj V  wrote:
>
> > Eric
> >
> > Yes. The owner is hdfs and group is hadoop and the directory is group
> > writable(775).  This is tehe exact same configuration I have when I use
> real
> > disks.But let me give it a try again to see if I overlooked something.
> > Thanks
> >
> > Raj
> >
> > >
> > >From: Eric Caspole 
> > >To: common-user@hadoop.apache.org
> > >Sent: Monday, October 

Re: cannot find DeprecatedLzoTextInputFormat

2011-10-05 Thread Jessica Owensby
Great.  Thanks!  Will give that a try.
Jessica

On Wed, Oct 5, 2011 at 4:22 PM, Joey Echeverria  wrote:

> It sounds like you're hitting this:
>
> https://issues.apache.org/jira/browse/HIVE-2395
>
> You might need to patch your version of DeprecatedLzoLineRecordReader
> to ignore the .lzo.index files.
>
> -Joey
>
> On Wed, Oct 5, 2011 at 4:13 PM, Jessica Owensby
>  wrote:
> > Alex,
> > The task trackers have been restarted many times across the cluster since
> > this issue was first seen.
> >
> > Hmmm, I hadn't tried to explicitly add the lzo jar to my classpath in the
> > hive shell, but I just tried it and got the same errors.
> >
> > Do you see
> >
> > /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar in the child classpath
> when
> >
> > the task is executed (use 'ps aux' on the node)?
> >
> >
> > While the job wasn't running, I did this and I got back the tasktracker
> > process:  ps aux | grep java | grep lzo.
> > Do I have to run this while the task is running on that node?
> >
> > Joey,
> > Yes, the lzo files are indexed.  They are indexed using the following
> > command:
> >
> > hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-20110217.jar
> > com.hadoop.compression.lzo.LzoIndexer /user/hive/warehouse/foo/bar.lzo
> >
> > Jessica
> >
> > On Wed, Oct 5, 2011 at 3:52 PM, Joey Echeverria 
> wrote:
> >> Are your LZO files indexed?
> >>
> >> -Joey
> >>
> >> On Wed, Oct 5, 2011 at 3:35 PM, Jessica Owensby
> >>  wrote:
> >>> Hi Joey,
> >>> Thanks. I forgot to say that; yes, the lzocodec class is listed in
> >>> core-site.xml under the io.compression.codecs property:
> >>>
> >>> 
> >>>  io.compression.codecs
> >>>
> >
>  
> org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec
> >>> 
> >>>
> >>> I also added the mapred.child.env property to mapred site:
> >>>
> >>>  
> >>>mapred.child.env
> >>>JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib
> >>>  
> >>>
> >>> per these instructions:
> >>>
> >
> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
> >>>
> >>> After making each of these changes I have restarted the cluster --
> >>> just to be sure that the new changes were being picked up.
> >>>
> >>> Jessica
> >>>
> >>
> >>
> >>
> >> --
> >> Joseph Echeverria
> >> Cloudera, Inc.
> >> 443.305.9434
> >>
> >
> >
> > Adding back the email history:
> >
> > Hello Everyone,
> > I've been having an issue in a hadoop environment (running cdh3u1)
> > where any table declared in hive
> > with the "STORED AS INPUTFORMAT
> > "com.hadoop.mapred.DeprecatedLzoTextInputFormat"" directive has the
> > following errors when running any query against it.
> >
> > For instance, running "select count(*) from foo;" gives the following
> error:
> >
> > java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
> >  at
> >
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:306)
> >  at
> >
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.next(Hadoop20SShims.java:209)
> >  at
> >
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208)
> >  at
> >
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193)
> >  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> >  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> >  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> >  at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> >  at java.security.AccessController.doPrivileged(Native Method)
> >  at javax.security.auth.Subject.doAs(Subject.java:396)
> >  at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> >  at org.apache.hadoop.mapred.Child.main(Child.java:264)
> > Caused by: java.lang.reflect.InvocationTargetException
> >  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> >  at
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> >  at
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> >  at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> >  at
> >
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:292)
> >  ... 11 more
> > Caused by: java.io.IOException: No LZO codec found, cannot run.
> >  at
> >
> com.hadoop.mapred.DeprecatedLzoLineRecordReader.(DeprecatedLzoLineRecordReader.java:53)
> >  at
> >
> com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:128)
> >  at
> >
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:68)
> >  ... 16 more
> >
> > java.io.IOException: can

Re: cannot find DeprecatedLzoTextInputFormat

2011-10-05 Thread Joey Echeverria
It sounds like you're hitting this:

https://issues.apache.org/jira/browse/HIVE-2395

You might need to patch your version of DeprecatedLzoLineRecordReader
to ignore the .lzo.index files.

-Joey

On Wed, Oct 5, 2011 at 4:13 PM, Jessica Owensby
 wrote:
> Alex,
> The task trackers have been restarted many times across the cluster since
> this issue was first seen.
>
> Hmmm, I hadn't tried to explicitly add the lzo jar to my classpath in the
> hive shell, but I just tried it and got the same errors.
>
> Do you see
>
> /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar in the child classpath when
>
> the task is executed (use 'ps aux' on the node)?
>
>
> While the job wasn't running, I did this and I got back the tasktracker
> process:  ps aux | grep java | grep lzo.
> Do I have to run this while the task is running on that node?
>
> Joey,
> Yes, the lzo files are indexed.  They are indexed using the following
> command:
>
> hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-20110217.jar
> com.hadoop.compression.lzo.LzoIndexer /user/hive/warehouse/foo/bar.lzo
>
> Jessica
>
> On Wed, Oct 5, 2011 at 3:52 PM, Joey Echeverria  wrote:
>> Are your LZO files indexed?
>>
>> -Joey
>>
>> On Wed, Oct 5, 2011 at 3:35 PM, Jessica Owensby
>>  wrote:
>>> Hi Joey,
>>> Thanks. I forgot to say that; yes, the lzocodec class is listed in
>>> core-site.xml under the io.compression.codecs property:
>>>
>>> 
>>>  io.compression.codecs
>>>
>  org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec
>>> 
>>>
>>> I also added the mapred.child.env property to mapred site:
>>>
>>>  
>>>    mapred.child.env
>>>    JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib
>>>  
>>>
>>> per these instructions:
>>>
> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
>>>
>>> After making each of these changes I have restarted the cluster --
>>> just to be sure that the new changes were being picked up.
>>>
>>> Jessica
>>>
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>
>
> Adding back the email history:
>
> Hello Everyone,
> I've been having an issue in a hadoop environment (running cdh3u1)
> where any table declared in hive
> with the "STORED AS INPUTFORMAT
> "com.hadoop.mapred.DeprecatedLzoTextInputFormat"" directive has the
> following errors when running any query against it.
>
> For instance, running "select count(*) from foo;" gives the following error:
>
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>      at
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:306)
>      at
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.next(Hadoop20SShims.java:209)
>      at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208)
>      at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193)
>      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>      at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>      at java.security.AccessController.doPrivileged(Native Method)
>      at javax.security.auth.Subject.doAs(Subject.java:396)
>      at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>      at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: java.lang.reflect.InvocationTargetException
>      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>      at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>      at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>      at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>      at
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:292)
>      ... 11 more
> Caused by: java.io.IOException: No LZO codec found, cannot run.
>      at
> com.hadoop.mapred.DeprecatedLzoLineRecordReader.(DeprecatedLzoLineRecordReader.java:53)
>      at
> com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:128)
>      at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:68)
>      ... 16 more
>
> java.io.IOException: cannot find class
> com.hadoop.mapred.DeprecatedLzoTextInputFormat
>      at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:406)
>      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371)
>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>      at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>      at java.secu

Re: cannot find DeprecatedLzoTextInputFormat

2011-10-05 Thread Jessica Owensby
Alex,
The task trackers have been restarted many times across the cluster since
this issue was first seen.

Hmmm, I hadn't tried to explicitly add the lzo jar to my classpath in the
hive shell, but I just tried it and got the same errors.

Do you see

/usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar in the child classpath when

the task is executed (use 'ps aux' on the node)?


While the job wasn't running, I did this and I got back the tasktracker
process:  ps aux | grep java | grep lzo.
Do I have to run this while the task is running on that node?

Joey,
Yes, the lzo files are indexed.  They are indexed using the following
command:

hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-20110217.jar
com.hadoop.compression.lzo.LzoIndexer /user/hive/warehouse/foo/bar.lzo

Jessica

On Wed, Oct 5, 2011 at 3:52 PM, Joey Echeverria  wrote:
> Are your LZO files indexed?
>
> -Joey
>
> On Wed, Oct 5, 2011 at 3:35 PM, Jessica Owensby
>  wrote:
>> Hi Joey,
>> Thanks. I forgot to say that; yes, the lzocodec class is listed in
>> core-site.xml under the io.compression.codecs property:
>>
>> 
>>  io.compression.codecs
>>
 
org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec
>> 
>>
>> I also added the mapred.child.env property to mapred site:
>>
>>  
>>mapred.child.env
>>JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib
>>  
>>
>> per these instructions:
>>
http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
>>
>> After making each of these changes I have restarted the cluster --
>> just to be sure that the new changes were being picked up.
>>
>> Jessica
>>
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>


Adding back the email history:

Hello Everyone,
I've been having an issue in a hadoop environment (running cdh3u1)
where any table declared in hive
with the "STORED AS INPUTFORMAT
"com.hadoop.mapred.DeprecatedLzoTextInputFormat"" directive has the
following errors when running any query against it.

For instance, running "select count(*) from foo;" gives the following error:

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
  at
org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:306)
  at
org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.next(Hadoop20SShims.java:209)
  at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208)
  at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
  at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
  at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
  at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
  at
org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:292)
  ... 11 more
Caused by: java.io.IOException: No LZO codec found, cannot run.
  at
com.hadoop.mapred.DeprecatedLzoLineRecordReader.(DeprecatedLzoLineRecordReader.java:53)
  at
com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:128)
  at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:68)
  ... 16 more

java.io.IOException: cannot find class
com.hadoop.mapred.DeprecatedLzoTextInputFormat
  at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:406)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
  at org.apache.hadoop.mapred.Child.main(Child.java:264)

My thought is that the hadoop-lzo-20110217.jar is not available on the
hadoop classpath.  However, the hadoop classpath commnd shows that
/usr/lib/hadoop-0.20

Re: cannot find DeprecatedLzoTextInputFormat

2011-10-05 Thread Joey Echeverria
Are your LZO files indexed?

-Joey

On Wed, Oct 5, 2011 at 3:35 PM, Jessica Owensby
 wrote:
> Hi Joey,
> Thanks. I forgot to say that; yes, the lzocodec class is listed in
> core-site.xml under the io.compression.codecs property:
>
> 
>  io.compression.codecs
>  org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec
> 
>
> I also added the mapred.child.env property to mapred site:
>
>  
>    mapred.child.env
>    JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib
>  
>
> per these instructions:
> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
>
> After making each of these changes I have restarted the cluster --
> just to be sure that the new changes were being picked up.
>
> Jessica
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: cannot find DeprecatedLzoTextInputFormat

2011-10-05 Thread Alex Kozlov
Hi Jessica, I assume the exception is on the remote node?  Was the TT
restarted?  Did you try 'add jar
/usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar' command from the hive
command line to make sure it's classpath issue?  Do you see
/usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar in the child classpath when
the task is executed (use 'ps aux' on the node)? -- Alex K

On Wed, Oct 5, 2011 at 12:35 PM, Jessica Owensby
wrote:

> Hi Joey,
> Thanks. I forgot to say that; yes, the lzocodec class is listed in
> core-site.xml under the io.compression.codecs property:
>
> 
>  io.compression.codecs
>
>  
> org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec
> 
>
> I also added the mapred.child.env property to mapred site:
>
>  
>mapred.child.env
>JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib
>  
>
> per these instructions:
>
> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
>
> After making each of these changes I have restarted the cluster --
> just to be sure that the new changes were being picked up.
>
> Jessica
>


Re: cannot find DeprecatedLzoTextInputFormat

2011-10-05 Thread Jessica Owensby
Hi Joey,
Thanks. I forgot to say that; yes, the lzocodec class is listed in
core-site.xml under the io.compression.codecs property:


  io.compression.codecs
  
org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec


I also added the mapred.child.env property to mapred site:

 
mapred.child.env
JAVA_LIBRARY_PATH=/usr/lib/hadoop-0.20/lib
  

per these instructions:
http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/

After making each of these changes I have restarted the cluster --
just to be sure that the new changes were being picked up.

Jessica


Re: cannot find DeprecatedLzoTextInputFormat

2011-10-05 Thread Joey Echeverria
Did you add the LZO codec configuration to core-site.xml?

-Joey

On Wed, Oct 5, 2011 at 2:31 PM, Jessica Owensby
 wrote:
> Hello Everyone,
> I've been having an issue in a hadoop environment (running cdh3u1)
> where any table declared in hive
> with the "STORED AS INPUTFORMAT
> "com.hadoop.mapred.DeprecatedLzoTextInputFormat"" directive has the
> following errors when running any query against it.
>
> For instance, running "select count(*) from foo;" gives the following error:
>
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>        at 
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:306)
>        at 
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.next(Hadoop20SShims.java:209)
>        at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208)
>        at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>        at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>        at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>        at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>        at 
> org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:292)
>        ... 11 more
> Caused by: java.io.IOException: No LZO codec found, cannot run.
>        at 
> com.hadoop.mapred.DeprecatedLzoLineRecordReader.(DeprecatedLzoLineRecordReader.java:53)
>        at 
> com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:128)
>        at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:68)
>        ... 16 more
>
> java.io.IOException: cannot find class
> com.hadoop.mapred.DeprecatedLzoTextInputFormat
>        at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:406)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>        at org.apache.hadoop.mapred.Child.main(Child.java:264)
>
> My thought is that the hadoop-lzo-20110217.jar is not available on the
> hadoop classpath.  However, the hadoop classpath commnd shows that
> /usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar is in the classpath.
> Additionally, across the cluster on each machine, the
> hadoop-lzo-20110217.jar is present under /usr/lib/hadoop-0.20/lib/.
>
> The hadoop-core-0.20.2-cdh3u1.jar is also available on my hadoop classpath.
>
> What else can I investigate to confirm that the lzo jar is on my
> classpath?  Or is this error indicative of another issue?
>
> Jessica
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


cannot find DeprecatedLzoTextInputFormat

2011-10-05 Thread Jessica Owensby
Hello Everyone,
I've been having an issue in a hadoop environment (running cdh3u1)
where any table declared in hive
with the "STORED AS INPUTFORMAT
"com.hadoop.mapred.DeprecatedLzoTextInputFormat"" directive has the
following errors when running any query against it.

For instance, running "select count(*) from foo;" gives the following error:

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:306)
at 
org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.next(Hadoop20SShims.java:209)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileRecordReader.initNextRecordReader(Hadoop20SShims.java:292)
... 11 more
Caused by: java.io.IOException: No LZO codec found, cannot run.
at 
com.hadoop.mapred.DeprecatedLzoLineRecordReader.(DeprecatedLzoLineRecordReader.java:53)
at 
com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:128)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:68)
... 16 more

java.io.IOException: cannot find class
com.hadoop.mapred.DeprecatedLzoTextInputFormat
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:406)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)  

My thought is that the hadoop-lzo-20110217.jar is not available on the
hadoop classpath.  However, the hadoop classpath commnd shows that
/usr/lib/hadoop-0.20/lib/hadoop-lzo-20110217.jar is in the classpath.
Additionally, across the cluster on each machine, the
hadoop-lzo-20110217.jar is present under /usr/lib/hadoop-0.20/lib/.

The hadoop-core-0.20.2-cdh3u1.jar is also available on my hadoop classpath.

What else can I investigate to confirm that the lzo jar is on my
classpath?  Or is this error indicative of another issue?

Jessica


Re: Hadoop : Linux-Window interface

2011-10-05 Thread Periya.Data
Hi Aditya,
You may want to investigate about using Flume...that is designed to
collect unstructured data from disparate sources and store them in HDFS (or
directly into HIVE tables). I do not know if Flume provides interoperability
with Window's systems (maybe you hack it and make it work with Cygwin...).

http://archive.cloudera.com/cdh/3/flume/Cookbook/


-PD.

On Wed, Oct 5, 2011 at 8:14 AM, Bejoy KS  wrote:

> Hi Aditya
> Definitely you can do it. As a very basic solution you can ftp the
> contents to LFS(LOCAL/Linux File System ) and they do a copyFromLocal into
> HDFS. Create a hive table with appropriate regex support and load the data
> in. Hive has classes that effectively support parsing and loading of Apache
> log files into hive tables.
> For the entite data transfer,you just need to write a shell script for the
> same. Log analysis won't be real time right? So you can schedule the job
> with some scheduler libe a cron or to be used  in conjuction with hadoop
> jobs you can use some workflow management within hadoop eco ecosystem.
>
>
> On Wed, Oct 5, 2011 at 3:43 PM, Aditya Singh30
> wrote:
>
> > Hi,
> >
> > We want to use Hadoop and Hive to store and analyze some Web Servers' Log
> > files. The servers are running on windows platform. As mentioned about
> > Hadoop, it is only supported for development on windows. I wanted to know
> is
> > there a way that we can run the Hadoop server(namenode server) and
> cluster
> > nodes on  Linux, and have an interface using which we can send files and
> run
> > analysis queries from the WebServer's windows environment.
> > I would really appreciate if you could point me to a right direction.
> >
> >
> > Regards,
> > Aditya Singh
> > Infosys. India
> >
> >
> >  CAUTION - Disclaimer *
> > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> > solely
> > for the use of the addressee(s). If you are not the intended recipient,
> > please
> > notify the sender by e-mail and delete the original message. Further, you
> > are not
> > to copy, disclose, or distribute this e-mail or its contents to any other
> > person and
> > any such actions are unlawful. This e-mail may contain viruses. Infosys
> has
> > taken
> > every reasonable precaution to minimize this risk, but is not liable for
> > any damage
> > you may sustain as a result of any virus in this e-mail. You should carry
> > out your
> > own virus checks before opening the e-mail or attachment. Infosys
> reserves
> > the
> > right to monitor and review the content of all messages sent to or from
> > this e-mail
> > address. Messages sent to or from this e-mail address may be stored on
> the
> > Infosys e-mail system.
> > ***INFOSYS End of Disclaimer INFOSYS***
> >
>


Re: Hadoop : Linux-Window interface

2011-10-05 Thread Bejoy KS
Hi Aditya
 Definitely you can do it. As a very basic solution you can ftp the
contents to LFS(LOCAL/Linux File System ) and they do a copyFromLocal into
HDFS. Create a hive table with appropriate regex support and load the data
in. Hive has classes that effectively support parsing and loading of Apache
log files into hive tables.
For the entite data transfer,you just need to write a shell script for the
same. Log analysis won't be real time right? So you can schedule the job
with some scheduler libe a cron or to be used  in conjuction with hadoop
jobs you can use some workflow management within hadoop eco ecosystem.


On Wed, Oct 5, 2011 at 3:43 PM, Aditya Singh30
wrote:

> Hi,
>
> We want to use Hadoop and Hive to store and analyze some Web Servers' Log
> files. The servers are running on windows platform. As mentioned about
> Hadoop, it is only supported for development on windows. I wanted to know is
> there a way that we can run the Hadoop server(namenode server) and cluster
> nodes on  Linux, and have an interface using which we can send files and run
> analysis queries from the WebServer's windows environment.
> I would really appreciate if you could point me to a right direction.
>
>
> Regards,
> Aditya Singh
> Infosys. India
>
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or distribute this e-mail or its contents to any other
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has
> taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should carry
> out your
> own virus checks before opening the e-mail or attachment. Infosys reserves
> the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>


Hadoop : Linux-Window interface

2011-10-05 Thread Aditya Singh30
Hi,

We want to use Hadoop and Hive to store and analyze some Web Servers' Log 
files. The servers are running on windows platform. As mentioned about Hadoop, 
it is only supported for development on windows. I wanted to know is there a 
way that we can run the Hadoop server(namenode server) and cluster nodes on  
Linux, and have an interface using which we can send files and run analysis 
queries from the WebServer's windows environment.
I would really appreciate if you could point me to a right direction.


Regards,
Aditya Singh
Infosys. India


 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: Question regarding hdfs synchronously / asynchronously block replication

2011-10-05 Thread Eric Fiala
Ronen,
On file write HDFS's block replication pipeline is asynchronous - datanode 1
gets a block before passing it onto datanode 2, and so on (limiting network
traffic between client node and the data nodes - it only writes to one).

The ACK for a packet is returned only once all datanodes in the pipeline
have copied the block.

However, if a failure occurs in the interim on a datanode in the write
pipeline, AND the minimum replication threshold has been met (normally 1) -
namenode will, in seperate operation, quell the replica deficit.

Don't think that's configurable, however, it would be interesting use case
for speeding up writes, while trading off some reliability.

EF

On Wed, Oct 5, 2011 at 1:53 AM, Ronen Itkin  wrote:

> Hi all!
>
> My question is regarding hdfs block replication.
> From the perspective of client, does the application receives an ACK for a
> certain packet after it was written on the first
> hadoop data node in the pipeline? or after the packet is *replicated* to
> all
> assigned *replication* nodes?
>
> More generaly, does Hadoop's HDFS block replication works synchronously or
> asynchronously?
>
> synchronously --> more replications =  decrease in write performances
> (client has to wait until every packet will be written to all replication
> nodes before he receives an ACK).
> asynchronously --> more replication has no influence on write performance
> (client recieves an ACK packet after the first write to the first datadone
> finishes, hdfs will complete its replication on his free time).
>
> synchronously / asynchronously block replication - is it something
> configurable ? If it is, than how can I do it?
>
> Thanks!
>
> --
> *
> Ronen Itkin*
> Taykey | www.taykey.com
>


Re: How do I diagnose IO bounded errors using the framework counters?

2011-10-05 Thread John Meagher
The counter names are created dynamically in mapred.Task

  /**
   * Counters to measure the usage of the different file systems.
   * Always return the String array with two elements. First one is
the name of
   * BYTES_READ counter and second one is of the BYTES_WRITTEN counter.
   */
  protected static String[] getFileSystemCounterNames(String uriScheme) {
String scheme = uriScheme.toUpperCase();
return new String[]{scheme+"_BYTES_READ", scheme+"_BYTES_WRITTEN"};
  }


On Tue, Oct 4, 2011 at 17:22, W.P. McNeill  wrote:
> Here's an even more basic question. I tried to figure out what
> the FILE_BYTES_READ means by searching every file in the hadoop 0.20.203.0
> installation for the string FILE_BYTES_READ installation by running
>
>      find . -type f | xargs grep FILE_BYTES_READ
>
> I only found this string in source files in vaidya contributor directory and
> the tools/rumen directories. Nothing in the main source base.
>
> Where in the source code are these counters created and updated?
>


RE: Run hadoop Map/Reduce app from another machine

2011-10-05 Thread Devaraj K
Hi Oleksiy,

No need to copy the jar file and execute the hadoop command remotely
to submit job.

Hadoop provides JobClient API to submit and monitor the jobs from remote
systems/remote application. 

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/Jo
bClient.html

Here is an example on how to use JobClient to run the job  :


 // Create a new JobConf
 JobConf job = new JobConf(new Configuration(), MyJob.class);
 
 // Specify various job-specific parameters 
 job.setJobName("myjob");
 
 job.setInputPath(new Path("in"));
 job.setOutputPath(new Path("out"));
 
 job.setMapperClass(MyJob.MyMapper.class);
 job.setReducerClass(MyJob.MyReducer.class);

 // Submit the job, then poll for progress until the job is complete
 JobClient.runJob(job);

I hope this helps to solve problem.

Devaraj K 

-Original Message-
From: oleksiy [mailto:gayduk.a.s...@mail.ru] 
Sent: Wednesday, October 05, 2011 3:32 PM
To: core-u...@hadoop.apache.org
Subject: Run hadoop Map/Reduce app from another machine


Hello,

I'm trying to find a way how to run hadoop MapReduce app from another
machine.
For instance I have *.jar file with MapReduce app it works ok when I run it
from command line for instance using this command: "hadoop jar
/usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input
/usr/joe/wordcount/output"
But in situation when I have another server (simple web app) where user can
upload jar file, specify configuration for the MapReduce app and so on. And
this server should interact with hadoop server. I mean somehow to upload
this jar file to the hadoop server and run it with attributes.

So, right now I see only one way of how to do this, it's upload jar file to
the hadoop server and run remotely command: "hadoop jar
/usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input
/usr/joe/wordcount/output". 

So may be hadoop has spatial API for doing this kind of tasks remotely? 
-- 
View this message in context:
http://old.nabble.com/Run-hadoop-Map-Reduce-app-from-another-machine-tp32595
264p32595264.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: hadoop input buffer size

2011-10-05 Thread Yang Xiaoliang
Hi,

Hadoop neither read one line each time, nor fetching dfs.block.size of lines
into a buffer,
Actually, for the TextInputFormat, it read io.file.buffer.size bytes of text
into a buffer each time,
this can be seen from the hadoop source file LineReader.java



2011/10/5 Mark question 

> Hello,
>
>  Correct me if I'm wrong, but when a program opens n-files at the same time
> to read from, and start reading from each file at a time 1 line at a time.
> Isn't hadoop actually fetching dfs.block.size of lines into a buffer? and
> not actually one line.
>
>  If this is correct, I set up my dfs.block.size = 3MB and each line takes
> about 650 bytes only, then I would assume the performance for reading
> 1-4000
> lines would be the same, but it isn't !  Do you know a way to find #n of
> lines to be read at once?
>
> Thank you,
> Mark
>


Re: SafeModeException: Cannot delete . Name node is in safe mode.

2011-10-05 Thread Abdelrahman Kamel
Thank you very much, Bejoy. It helped.

On Wed, Oct 5, 2011 at 10:36 AM,  wrote:

> Hi Abdelrahman Kamel
>Your Name Node is in safe mode now. Either wait till it
> automatically comes out of safe mode or you can manually make it exit the
> safe mode by the following command
>
> hadoop dfsadmin -safemode leave
>
> If you cluster was not put on safe mode manually and it happened during
> start up then it would get out of safe mode automatically when a certain
> percentage of blocks satisfies minimum replication condition.
>
> Hope it helps!
>
>
> --Original Message--
> From: Abdelrahman Kamel
> To: common-user@hadoop.apache.org
> ReplyTo: common-user@hadoop.apache.org
> Subject: SafeModeException: Cannot delete . Name node is in
> safe mode.
> Sent: Oct 5, 2011 13:47
>
> Hi all,
>
> I got this exception trying to delete a directory from HDFS.
>
> hduser@hdmaster:/usr/local/hadoop$ *bin/hadoop dfs -rmr
> /user/hduser/gutenberg-output*
> rmr: org.apache.hadoop.hdfs.server.namenode.*SafeModeException: Cannot
> delete /user/hduser/gutenberg-output. Name node is in safe mode.*
>
> Could anyone help me get rid of it.
> Thanks.
>
>
> --
> Abdelrahman Kamel
>
>
>
> Regards
> Bejoy K S




-- 
Abdelrahman Kamel


Re: Run hadoop Map/Reduce app from another machine

2011-10-05 Thread Yang Xiaoliang
Install hadoop on your local machine, copy the configuration files from the
remote
hadoop culuster server to your local machine(including the hosts file), then
you can
just submit a *.jar locally as before.


2011/10/5 oleksiy 

>
> Hello,
>
> I'm trying to find a way how to run hadoop MapReduce app from another
> machine.
> For instance I have *.jar file with MapReduce app it works ok when I run it
> from command line for instance using this command: "hadoop jar
> /usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input
> /usr/joe/wordcount/output"
> But in situation when I have another server (simple web app) where user can
> upload jar file, specify configuration for the MapReduce app and so on. And
> this server should interact with hadoop server. I mean somehow to upload
> this jar file to the hadoop server and run it with attributes.
>
> So, right now I see only one way of how to do this, it's upload jar file to
> the hadoop server and run remotely command: "hadoop jar
> /usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input
> /usr/joe/wordcount/output".
>
> So may be hadoop has spatial API for doing this kind of tasks remotely?
> --
> View this message in context:
> http://old.nabble.com/Run-hadoop-Map-Reduce-app-from-another-machine-tp32595264p32595264.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


Run hadoop Map/Reduce app from another machine

2011-10-05 Thread oleksiy

Hello,

I'm trying to find a way how to run hadoop MapReduce app from another
machine.
For instance I have *.jar file with MapReduce app it works ok when I run it
from command line for instance using this command: "hadoop jar
/usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input
/usr/joe/wordcount/output"
But in situation when I have another server (simple web app) where user can
upload jar file, specify configuration for the MapReduce app and so on. And
this server should interact with hadoop server. I mean somehow to upload
this jar file to the hadoop server and run it with attributes.

So, right now I see only one way of how to do this, it's upload jar file to
the hadoop server and run remotely command: "hadoop jar
/usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input
/usr/joe/wordcount/output". 

So may be hadoop has spatial API for doing this kind of tasks remotely? 
-- 
View this message in context: 
http://old.nabble.com/Run-hadoop-Map-Reduce-app-from-another-machine-tp32595264p32595264.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Error using hadoop distcp

2011-10-05 Thread praveenesh kumar
I tried that thing also.. when I am using IP address, its saying I should
use hostname.

*hadoop@ub13:~$ hadoop distcp
hdfs://162.192.100.53:54310/user/hadoop/webloghdfs://
162.192.100.16:54310/user/hadoop/weblog*
11/10/05 14:53:50 INFO tools.DistCp: srcPaths=[hdfs://
162.192.100.53:54310/user/hadoop/weblog]
11/10/05 14:53:50 INFO tools.DistCp: destPath=hdfs://
162.192.100.16:54310/user/hadoop/weblog
java.lang.IllegalArgumentException: Wrong FS: hdfs://
162.192.100.53:54310/user/hadoop/weblog, expected: hdfs://ub13:54310
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at
org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:464)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:621)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:638)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:884)

I have the entries of both machines in /etc/hosts...


On Wed, Oct 5, 2011 at 1:55 PM,  wrote:

> Hi praveenesh
> Can you try repeating the distcp using IP instead of host name.
> From the error looks like an RPC exception not able to identify the host, so
> I believe it can't be due to not setting a password less ssh. Just try it
> out.
> Regards
> Bejoy K S
>
> -Original Message-
> From: trang van anh 
> Date: Wed, 05 Oct 2011 14:06:11
> To: 
> Reply-To: common-user@hadoop.apache.org
> Subject: Re: Error using hadoop distcp
>
> which  host run the task that throws the exception ? ensure that each
> data node know another data nodes in hadoop cluster-> add "ub16" entry
> in /etc/hosts on where the task running.
> On 10/5/2011 12:15 PM, praveenesh kumar wrote:
> > I am trying to use distcp to copy a file from one HDFS to another.
> >
> > But while copying I am getting the following exception :
> >
> > hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
> > hdfs://ub16:54310/user/hadoop/weblog
> >
> > 11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
> > attempt_201110031447_0005_m_07_0, Status : FAILED
> > java.net.UnknownHostException: unknown host: ub16
> >  at
> org.apache.hadoop.ipc.Client$Connection.(Client.java:195)
> >  at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
> >  at org.apache.hadoop.ipc.Client.call(Client.java:720)
> >  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >  at $Proxy1.getProtocolVersion(Unknown Source)
> >  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> >  at
> > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
> >  at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:215)
> >  at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:177)
> >  at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
> >  at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
> >  at
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> >  at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> >  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
> >  at
> >
> org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48)
> >  at
> >
> org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124)
> >  at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835)
> >  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296)
> >  at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > Its saying its not finding ub16. But the entry is there in /etc/hosts
> files.
> > I am able to ssh both the machines. Do I need password less ssh between
> > these two NNs ?
> > What can be the issue ? Any thing I am missing before using distcp ?
> >
> > Thanks,
> > Praveenesh
> >
>
>


Re: SafeModeException: Cannot delete . Name node is in safe mode.

2011-10-05 Thread bejoy . hadoop
Hi Abdelrahman Kamel
Your Name Node is in safe mode now. Either wait till it automatically 
comes out of safe mode or you can manually make it exit the safe mode by the 
following command

hadoop dfsadmin -safemode leave

If you cluster was not put on safe mode manually and it happened during start 
up then it would get out of safe mode automatically when a certain percentage 
of blocks satisfies minimum replication condition.
 
Hope it helps!


--Original Message--
From: Abdelrahman Kamel
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: SafeModeException: Cannot delete . Name node is in 
safe mode.
Sent: Oct 5, 2011 13:47

Hi all,

I got this exception trying to delete a directory from HDFS.

hduser@hdmaster:/usr/local/hadoop$ *bin/hadoop dfs -rmr
/user/hduser/gutenberg-output*
rmr: org.apache.hadoop.hdfs.server.namenode.*SafeModeException: Cannot
delete /user/hduser/gutenberg-output. Name node is in safe mode.*

Could anyone help me get rid of it.
Thanks.


-- 
Abdelrahman Kamel



Regards
Bejoy K S

Re: Error using hadoop distcp

2011-10-05 Thread bejoy . hadoop
Hi praveenesh
 Can you try repeating the distcp using IP instead of host name. 
From the error looks like an RPC exception not able to identify the host, so I 
believe it can't be due to not setting a password less ssh. Just try it out.
Regards
Bejoy K S

-Original Message-
From: trang van anh 
Date: Wed, 05 Oct 2011 14:06:11 
To: 
Reply-To: common-user@hadoop.apache.org
Subject: Re: Error using hadoop distcp

which  host run the task that throws the exception ? ensure that each 
data node know another data nodes in hadoop cluster-> add "ub16" entry 
in /etc/hosts on where the task running.
On 10/5/2011 12:15 PM, praveenesh kumar wrote:
> I am trying to use distcp to copy a file from one HDFS to another.
>
> But while copying I am getting the following exception :
>
> hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
> hdfs://ub16:54310/user/hadoop/weblog
>
> 11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
> attempt_201110031447_0005_m_07_0, Status : FAILED
> java.net.UnknownHostException: unknown host: ub16
>  at org.apache.hadoop.ipc.Client$Connection.(Client.java:195)
>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
>  at org.apache.hadoop.ipc.Client.call(Client.java:720)
>  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>  at $Proxy1.getProtocolVersion(Unknown Source)
>  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>  at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
>  at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:215)
>  at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:177)
>  at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>  at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>  at
> org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48)
>  at
> org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124)
>  at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296)
>  at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> Its saying its not finding ub16. But the entry is there in /etc/hosts files.
> I am able to ssh both the machines. Do I need password less ssh between
> these two NNs ?
> What can be the issue ? Any thing I am missing before using distcp ?
>
> Thanks,
> Praveenesh
>



SafeModeException: Cannot delete . Name node is in safe mode.

2011-10-05 Thread Abdelrahman Kamel
Hi all,

I got this exception trying to delete a directory from HDFS.

hduser@hdmaster:/usr/local/hadoop$ *bin/hadoop dfs -rmr
/user/hduser/gutenberg-output*
rmr: org.apache.hadoop.hdfs.server.namenode.*SafeModeException: Cannot
delete /user/hduser/gutenberg-output. Name node is in safe mode.*

Could anyone help me get rid of it.
Thanks.


-- 
Abdelrahman Kamel


Re: Error using hadoop distcp

2011-10-05 Thread trang van anh
which  host run the task that throws the exception ? ensure that each 
data node know another data nodes in hadoop cluster-> add "ub16" entry 
in /etc/hosts on where the task running.

On 10/5/2011 12:15 PM, praveenesh kumar wrote:

I am trying to use distcp to copy a file from one HDFS to another.

But while copying I am getting the following exception :

hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
hdfs://ub16:54310/user/hadoop/weblog

11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
attempt_201110031447_0005_m_07_0, Status : FAILED
java.net.UnknownHostException: unknown host: ub16
 at org.apache.hadoop.ipc.Client$Connection.(Client.java:195)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
 at org.apache.hadoop.ipc.Client.call(Client.java:720)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
 at $Proxy1.getProtocolVersion(Unknown Source)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
 at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
 at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:215)
 at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:177)
 at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
 at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at
org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:48)
 at
org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:124)
 at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:835)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:296)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)

Its saying its not finding ub16. But the entry is there in /etc/hosts files.
I am able to ssh both the machines. Do I need password less ssh between
these two NNs ?
What can be the issue ? Any thing I am missing before using distcp ?

Thanks,
Praveenesh





Question regarding hdfs synchronously / asynchronously block replication

2011-10-05 Thread Ronen Itkin
Hi all!

My question is regarding hdfs block replication.
>From the perspective of client, does the application receives an ACK for a
certain packet after it was written on the first
hadoop data node in the pipeline? or after the packet is *replicated* to all
assigned *replication* nodes?

More generaly, does Hadoop's HDFS block replication works synchronously or
asynchronously?

synchronously --> more replications =  decrease in write performances
(client has to wait until every packet will be written to all replication
nodes before he receives an ACK).
asynchronously --> more replication has no influence on write performance
(client recieves an ACK packet after the first write to the first datadone
finishes, hdfs will complete its replication on his free time).

synchronously / asynchronously block replication - is it something
configurable ? If it is, than how can I do it?

Thanks!

-- 
*
Ronen Itkin*
Taykey | www.taykey.com