Q, Find a counter

2008-11-04 Thread Edward J. Yoon
Hi,

I'd like to find a counter of REDUCE_OUTPUT_RECORDS after job done.
BTW, org.apache.hadoop.mapred.Task.Counter is not visible and, a
findCounter(String, int, String) is deprecated.

What is the best code?
-- 
Best regards, Edward J. Yoon @ NHN, corp.
[EMAIL PROTECTED]
http://blog.udanax.org


Re: _temporary directories not deleted

2008-11-04 Thread Amareshwari Sriramadasu


Nathan Marz wrote:

Hello all,

Occasionally when running jobs, Hadoop fails to clean up the 
"_temporary" directories it has left behind. This only appears to 
happen when a task is killed (aka a speculative execution), and the 
data that task has outputted so far is not cleaned up. Is this a known 
issue in hadoop? 
Yes. It is possible that _temporary gets created by a speculative, after 
the cleanup in some corner cases.
Is the data from that task guaranteed to be duplicate data of what was 
outputted by another task? Is it safe to just delete this directory 
without worrying about losing data?


Yes. You are right. It is duplicate data created by the speculative 
task. You can go ahead and delete it.

-Amareshwari

Thanks,
Nathan Marz
Rapleaf


Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Amit k. Saha
On Wed, Nov 5, 2008 at 3:17 AM, Tom Wheeler <[EMAIL PROTECTED]> wrote:
> Done.  I also added a link to the article that Amit Kumar Saha wrote
> just a few weeks ago for linux.com.

Thanks you Tom :-)

-Amit

-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Re: Missing blocks from bin/hadoop text but fsck is all right

2008-11-04 Thread Sagar Naik

Hi,

We were hitting file descriptor limits :). Increased it and got solved.

Thanks Jason

-Sagar


Sagar Naik wrote:

Hi,
We have a strange problem on getting out some of our files

bin/hadoop dfs -text dir/*  gives me missing block exceptions.
0/8/11/04 10:45:09 [main] INFO dfs.DFSClient: Could not obtain block 
blk_6488385702283300787_1247408 from any node:  java.io.IOException: 
No live nodes contain current block
08/11/04 10:45:12 [main] INFO dfs.DFSClient: Could not obtain block 
blk_6488385702283300787_1247408 from any node:  java.io.IOException: 
No live nodes contain current block
08/11/04 10:45:15 [main] INFO dfs.DFSClient: Could not obtain block 
blk_6488385702283300787_1247408 from any node:  java.io.IOException: 
No live nodes contain current block
08/11/04 10:45:18 [main] WARN dfs.DFSClient: DFS Read: 
java.io.IOException: Could not obtain block: 
blk_6488385702283300787_1247408 file=some_filepath-1
at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462) 

at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312) 

at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1369)

at java.io.DataInputStream.readShort(DataInputStream.java:295)
at org.apache.hadoop.fs.FsShell.forMagic(FsShell.java:396)
at org.apache.hadoop.fs.FsShell.access$1(FsShell.java:394)
at org.apache.hadoop.fs.FsShell$2.process(FsShell.java:419)
at 
org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1865) 


at org.apache.hadoop.fs.FsShell.text(FsShell.java:421)
at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1532)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1730)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1847)/

but when I do a
bin/hadoop dfs -text some_filepath-1. I do get all the data


the fsck on this parent of this file revealed no problems.



jstack on FSshell revealed nothin much

/Debugger attached successfully.
Server compiler detected.
JVM version is 10.0-b19
Deadlock Detection:

No deadlocks found.

Thread 3358: (state = BLOCKED)
- java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
- org.apache.hadoop.dfs.DFSClient$LeaseChecker.run() @bci=124, 
line=792 (Interpreted frame)

- java.lang.Thread.run() @bci=11, line=619 (Interpreted frame)


Thread 3357: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=62, 
line=397 (Interpreted frame)
- org.apache.hadoop.ipc.Client$Connection.run() @bci=63, line=440 
(Interpreted frame)



Thread 3342: (state = BLOCKED)


Thread 3341: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=116 
(Interpreted frame)
- java.lang.ref.ReferenceQueue.remove() @bci=2, line=132 (Interpreted 
frame)
- java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 
(Interpreted frame)



Thread 3340: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
- java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=116 
(Interpreted frame)



Thread 3330: (state = BLOCKED)
- java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
- 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(org.apache.hadoop.dfs.LocatedBlock) 
@bci=181, line=1470 (Interpreted frame)
- org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(long) 
@bci=133, line=1312 (Interpreted frame)
- org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(byte[], int, 
int) @bci=61, line=1417 (Interpreted frame)
- org.apache.hadoop.dfs.DFSClient$DFSInputStream.read() @bci=7, 
line=1369 (Compiled frame)

- java.io.DataInputStream.readShort() @bci=4, line=295 (Compiled frame)
- org.apache.hadoop.fs.FsShell.forMagic(org.apache.hadoop.fs.Path, 
org.apache.hadoop.fs.FileSystem) @bci=7, line=396 (Interpreted frame)
- org.apache.hadoop.fs.FsShell.access$1(org.apache.hadoop.fs.FsShell, 
org.apache.hadoop.fs.Path, org.apache.hadoop.fs.FileSystem) @bci=3, 
line=394 (Interpreted frame)
- org.apache.hadoop.fs.FsShell$2.process(org.apache.hadoop.fs.Path, 
org.apache.hadoop.fs.FileSystem) @bci=28, line=419 (Interpreted frame)
- 
org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(org.apache.hadoop.fs.Path, 
org.apache.hadoop.fs.FileSystem) @bci=40, line=1865 (Interpreted frame)
- org.apache.hadoop.fs.FsShell.text(java.lang.String) @bci=26, 
line=421 (Interpreted frame)
- org.apache.hadoop.fs.FsShell.doall(java.lang.String, 
java.lang.String[], int) @bci=246, line=1532 (Interpreted frame)
- org.apache.hadoop.fs.FsShell.run(java.lang.String[]) @bci=586, 
line=1730 (Interpreted frame)
- 
org.apache.hadoop.util.ToolRunner.run(org.apache.hadoop.conf.Configura

too many open files? Isn't 4K enough???

2008-11-04 Thread Yuri Pradkin
Hi,

I'm running current snapshot (-r709609), doing a simple word count using python 
over 
streaming.  I'm have a relatively moderate setup of 17 nodes.

I'm getting this exception:

java.io.FileNotFoundException: 
/usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_00_0/output/spill4055.out.index
 
(Too many open files)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:137)
at 
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:62)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:98)
at 
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:168)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
at 
org.apache.hadoop.mapred.IndexRecord.readIndexFile(IndexRecord.java:47)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.getIndexInformation(MapTask.java:1339)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1237)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.Child.main(Child.java:155)

I see that AFTER I've reconfigured the max allowable open files to 4096!

When I monitor the number of open files on a box running hadoop I see the 
number fluctuating around 900 during the map phase.  Then I see it going up 
through 
the roof during sorting/shuffling phase.  I see a lot of open files named like 
"/users/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_00_1/outp
ut/spill2188.out"

What a poor user to do about this?  Reconfigure hadoop to allow 32K open files 
as somebody suggested
on an hbase forum that I googled up?  Or some other ridiculous number?  If yes, 
what should it be?
Or is it my config problem and there is a way to control this?

Do I need to file a jira about this or is this a problem that people are aware 
of?  Because right
now it looks to me that Hadoop scalability is broken.  No way 4K descriptors 
should be insufficient.

Any feedback will be appreciated.

Thanks,

  -Yuri

P.S.  BTW, someone on this list has suggested before that after restarting 
hadoop a similar sounding 
problem goes away for a while.  It did not work for me.


Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Ravion

Great and thank you!!

Best
Ravion
- Original Message - 
From: "Tom Wheeler" <[EMAIL PROTECTED]>

To: 
Sent: Wednesday, November 05, 2008 5:47 AM
Subject: Re: Seeking Someone to Review Hadoop Article



Done.  I also added a link to the article that Amit Kumar Saha wrote
just a few weeks ago for linux.com.

On Tue, Nov 4, 2008 at 4:37 PM, Ravion <[EMAIL PROTECTED]> wrote:

Dear Tom,

Here is one more writtien by our core data warehouse team. Appreciate if 
you
can add it in Hadoop article links, so that community is more 
benefitted:-




--
Tom Wheeler
http://www.tomwheeler.com/ 




Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Tom Wheeler
Done.  I also added a link to the article that Amit Kumar Saha wrote
just a few weeks ago for linux.com.

On Tue, Nov 4, 2008 at 4:37 PM, Ravion <[EMAIL PROTECTED]> wrote:
> Dear Tom,
>
> Here is one more writtien by our core data warehouse team. Appreciate if you
> can add it in Hadoop article links, so that community is more benefitted:-



-- 
Tom Wheeler
http://www.tomwheeler.com/


Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Ravion

Dear Tom,

Here is one more writtien by our core data warehouse team. Appreciate if you 
can add it in Hadoop article links, so that community is more benefitted:-


http://www.javaworld.com/javaworld/jw-09-2008/jw-09-hadoop.html

Best,
Ravion

- Original Message - 
From: "Tom Wheeler" <[EMAIL PROTECTED]>

To: 
Sent: Wednesday, November 05, 2008 4:57 AM
Subject: Re: Seeking Someone to Review Hadoop Article


On Tue, Nov 4, 2008 at 3:46 PM, Milind Bhandarkar <[EMAIL PROTECTED]> 
wrote:
Please consider adding it to: 
http://wiki.apache.org/hadoop/HadoopArticles


Great suggestion -- I've just linked it there as you request.

--
Tom Wheeler
http://www.tomwheeler.com/ 




Re: HDFS Login Security

2008-11-04 Thread Alex Loddengaard
Look at the "hadoop.job.ugi" configuration option.  You can manually set a
user and the groups that user is a part of.
Alex

On Tue, Nov 4, 2008 at 1:42 PM, Wasim Bari <[EMAIL PROTECTED]> wrote:

> Hi,
> Do we have any Java class for Login purpose to HDFS programmatically
> like traditional UserName/Password mechanism ? or we can have only system
> user or user who started NameNode ?
>
> Thanks,
>
> Wasim


Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Tom Wheeler
On Tue, Nov 4, 2008 at 3:46 PM, Milind Bhandarkar <[EMAIL PROTECTED]> wrote:
> Please consider adding it to: http://wiki.apache.org/hadoop/HadoopArticles

Great suggestion -- I've just linked it there as you request.

-- 
Tom Wheeler
http://www.tomwheeler.com/


Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Milind Bhandarkar
Tom,

Please consider adding it to: http://wiki.apache.org/hadoop/HadoopArticles

Thanks,

- milind


On 11/2/08 5:57 PM, "Tom Wheeler" <[EMAIL PROTECTED]> wrote:

> The article I've written about Hadoop has just been published:
> 
>http://www.ociweb.com/jnb/jnbNov2008.html
> 
> I'd like to again thank Mafish Liu and Amit Kumar Saha for reviewing
> my draft and offering suggestions for helping me improve it.  I hope
> the article is compelling, clear and technically accurate.  However,
> if you notice anything in need of correction, please contact me
> offlist and I will address it ASAP.
> 
> Tom Wheeler
> 
> On Thu, Oct 23, 2008 at 5:31 PM, Tom Wheeler <[EMAIL PROTECTED]> wrote:
>> Each month the developers at my company write a short article about a
>> Java technology we find exciting. I've just finished one about Hadoop
>> for November and am seeking a volunteer knowledgeable about Hadoop to
>> look it over to help ensure it's both clear and technically accurate.
>> 
>> If you're interested in helping me, please contact me offlist and I
>> will send you the draft.  Meanwhile, you can get a feel for the length
>> and general style of the articles from our archives:
>> 
>>   http://www.ociweb.com/articles/publications/jnb.html
>> 
>> Thanks in advance,
>> 
>> Tom Wheeler
>> 


-- 
Milind Bhandarkar
Y!IM: GridSolutions
408-349-2136 
([EMAIL PROTECTED])



Re: Hadoop hardware specs

2008-11-04 Thread Brian Bockelman

Hey Roger,

SSH is only needed to start and stop daemons - it's not really needed  
for running Hadoop itself.  Currently, we do this through custom site  
mechanisms, and not through SSH.


Brian

On Nov 4, 2008, at 10:36 AM, Zhang, Roger wrote:


Brian,

You seem to have a pretty large cluster. How do you think about the  
overall performance?

Is your implementation on Open-SSH or SSH2?

I'm new to this and trying to setup a 20 node cluster. But our Linux  
boxes enforced F-secure SSH2 already, which I found HDFS 0.18 does  
not support right now.


Anyone has any idea of a workaround?


Thanks and best Rgds.
   Roger Zhang

-Original Message-
From: Brian Bockelman [mailto:[EMAIL PROTECTED]
Sent: 2008年11月4日 21:36
To: core-user@hadoop.apache.org
Subject: Re: Hadoop hardware specs

Hey Arjit,

We use all internal SATA drives in our cluster, which is about 110TB
today; if we grow it to our planned 350TB, it will be a healthy mix of
worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached
vaults, and fibre channel vaults.

Brian

On Nov 4, 2008, at 4:16 AM, Arijit Mukherjee wrote:


Hi All

We're thinking of setting up a Hadoop cluster which will be used to
create a prototype system for analyzing telecom data. The wiki page  
on

machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives
an
overview of the node specs and from the Hadoop primer I found the
following specs -

* 5 x dual core CPUs
* RAM - 4-8GB; ECC preferred, though more expensive
* 2 x 250GB SATA drives (on each of the 5 nodes)
* 1-5 TB external storage

I'm curious to find out what sort of specs do people use normally. Is
the external storage essential or will the individual disks on each
node
be sufficient? Why would you need an external storage in a hadoop
cluster? How can I find out what other projects on hadoop are using?
Cheers
Arijit


Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107
http://www.connectivasystems.com







_temporary directories not deleted

2008-11-04 Thread Nathan Marz

Hello all,

Occasionally when running jobs, Hadoop fails to clean up the  
"_temporary" directories it has left behind. This only appears to  
happen when a task is killed (aka a speculative execution), and the  
data that task has outputted so far is not cleaned up. Is this a known  
issue in hadoop? Is the data from that task guaranteed to be duplicate  
data of what was outputted by another task? Is it safe to just delete  
this directory without worrying about losing data?


Thanks,
Nathan Marz
Rapleaf


HDFS Login Security

2008-11-04 Thread Wasim Bari
Hi,
 Do we have any Java class for Login purpose to HDFS programmatically like 
traditional UserName/Password mechanism ? or we can have only system user or 
user who started NameNode ?

Thanks,

Wasim

Missing blocks from bin/hadoop text but fsck is all right

2008-11-04 Thread Sagar Naik

Hi,
We have a strange problem on getting out some of our files

bin/hadoop dfs -text dir/*  gives me missing block exceptions.
0/8/11/04 10:45:09 [main] INFO dfs.DFSClient: Could not obtain block 
blk_6488385702283300787_1247408 from any node:  java.io.IOException: No 
live nodes contain current block
08/11/04 10:45:12 [main] INFO dfs.DFSClient: Could not obtain block 
blk_6488385702283300787_1247408 from any node:  java.io.IOException: No 
live nodes contain current block
08/11/04 10:45:15 [main] INFO dfs.DFSClient: Could not obtain block 
blk_6488385702283300787_1247408 from any node:  java.io.IOException: No 
live nodes contain current block
08/11/04 10:45:18 [main] WARN dfs.DFSClient: DFS Read: 
java.io.IOException: Could not obtain block: 
blk_6488385702283300787_1247408 file=some_filepath-1
at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)

at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1369)
at java.io.DataInputStream.readShort(DataInputStream.java:295)
at org.apache.hadoop.fs.FsShell.forMagic(FsShell.java:396)
at org.apache.hadoop.fs.FsShell.access$1(FsShell.java:394)
at org.apache.hadoop.fs.FsShell$2.process(FsShell.java:419)
at 
org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1865)

at org.apache.hadoop.fs.FsShell.text(FsShell.java:421)
at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1532)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1730)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1847)/

but when I do a
bin/hadoop dfs -text some_filepath-1. I do get all the data


the fsck on this parent of this file revealed no problems.



jstack on FSshell revealed nothin much

/Debugger attached successfully.
Server compiler detected.
JVM version is 10.0-b19
Deadlock Detection:

No deadlocks found.

Thread 3358: (state = BLOCKED)
- java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
- org.apache.hadoop.dfs.DFSClient$LeaseChecker.run() @bci=124, line=792 
(Interpreted frame)

- java.lang.Thread.run() @bci=11, line=619 (Interpreted frame)


Thread 3357: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=62, 
line=397 (Interpreted frame)
- org.apache.hadoop.ipc.Client$Connection.run() @bci=63, line=440 
(Interpreted frame)



Thread 3342: (state = BLOCKED)


Thread 3341: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=116 
(Interpreted frame)
- java.lang.ref.ReferenceQueue.remove() @bci=2, line=132 (Interpreted 
frame)
- java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 
(Interpreted frame)



Thread 3340: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
- java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=116 
(Interpreted frame)



Thread 3330: (state = BLOCKED)
- java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
- 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(org.apache.hadoop.dfs.LocatedBlock) 
@bci=181, line=1470 (Interpreted frame)
- org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(long) 
@bci=133, line=1312 (Interpreted frame)
- org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(byte[], int, int) 
@bci=61, line=1417 (Interpreted frame)
- org.apache.hadoop.dfs.DFSClient$DFSInputStream.read() @bci=7, 
line=1369 (Compiled frame)

- java.io.DataInputStream.readShort() @bci=4, line=295 (Compiled frame)
- org.apache.hadoop.fs.FsShell.forMagic(org.apache.hadoop.fs.Path, 
org.apache.hadoop.fs.FileSystem) @bci=7, line=396 (Interpreted frame)
- org.apache.hadoop.fs.FsShell.access$1(org.apache.hadoop.fs.FsShell, 
org.apache.hadoop.fs.Path, org.apache.hadoop.fs.FileSystem) @bci=3, 
line=394 (Interpreted frame)
- org.apache.hadoop.fs.FsShell$2.process(org.apache.hadoop.fs.Path, 
org.apache.hadoop.fs.FileSystem) @bci=28, line=419 (Interpreted frame)
- 
org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(org.apache.hadoop.fs.Path, 
org.apache.hadoop.fs.FileSystem) @bci=40, line=1865 (Interpreted frame)
- org.apache.hadoop.fs.FsShell.text(java.lang.String) @bci=26, line=421 
(Interpreted frame)
- org.apache.hadoop.fs.FsShell.doall(java.lang.String, 
java.lang.String[], int) @bci=246, line=1532 (Interpreted frame)
- org.apache.hadoop.fs.FsShell.run(java.lang.String[]) @bci=586, 
line=1730 (Interpreted frame)
- 
org.apache.hadoop.util.ToolRunner.run(org.apache.hadoop.conf.Configuration, 
org.apache.hadoop.util.Tool, java.lang.String[]) @bci=38, line=65 
(Interpreted frame)
- org.apache.hadoop.util.ToolRunne

Re: Recovery from Failed Jobs

2008-11-04 Thread Alex Loddengaard
With regard to checkpointing, not yet.  This JIRA is a prerequisite:


I'm a little confused about what you're trying to do with log parsing.  You
should consider Scribe or Chukwa, though Chukwa isn't ready to be used yet.
 Learn more here:

Chukwa:



Scribe:
<
http://www.cloudera.com/blog/2008/10/28/installing-scribe-for-log-collection/
>
<
http://www.cloudera.com/blog/2008/11/02/configuring-and-using-scribe-for-hadoop-log-collection/
>

Alex

On Tue, Nov 4, 2008 at 11:51 AM, shahab mehmandoust <[EMAIL PROTECTED]>wrote:

> Hello,
>
> I want to parse lines of an access logs, line by line with map/reduce.  I
> want to know, once my access log is in the HDFS, am I guaranteed that every
> line will be processed and results will be in the output dir?  In other
> words, if a job fails, does hadoop know where it failed? and can hadoop
> recover from that point so no data is lost?
>
> Thanks,
> Shahab
>


Re: Problem while starting Hadoop

2008-11-04 Thread Alex Loddengaard
Does 'ping lca2-s3-pc01' resolve from lca2-s3-pc04 and vise-versa?  Are your
'slaves' and 'master' configuration files configured correctly?
You can also try stopping everything, deleting all of your Hadoop data on
each machine (by default in /tmp), reformating the namenode, and starting
all again.

Alex

On Tue, Nov 4, 2008 at 11:11 AM, <[EMAIL PROTECTED]> wrote:

> Hi,
>
>   I am trying to use hadoop 0.18.1. After I start the hadoop, I am able to
> see namenode running on the master. But, datanode on the client machine is
> unable to connect to the namenode. I use 2 machines with hostnames
> lca2-s3-pc01 and lca2-s3-pc04 respectively. It shows the following message
> in the client log file.
>
> 2008-11-04 17:19:25,253 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
> /
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = lca2-s3-pc04/127.0.1.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.18.1
> STARTUP_MSG:   build =
> http://svn.apache.org/repos/asf/hadoop/core/branches/bran
> ch-0.18 -r 694836; compiled by 'hadoopqa' on Fri Sep 12 23:29:35 UTC 2008
> /
> 2008-11-04 17:19:26,464 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to s
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 0 time(s).
> 2008-11-04 17:19:27,468 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to s
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 1 time(s).
> 2008-11-04 17:19:28,472 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to s
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 2 time(s).
> 2008-11-04 17:19:29,476 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to s
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 3 time(s).
> 2008-11-04 17:19:30,479 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to s
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 4 time(s).
> 2008-11-04 17:19:31,483 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to s
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 5 time(s).
> 2008-11-04 17:19:32,487 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to s
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 6 time(s).
> 2008-11-04 17:19:33,491 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to s
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 7 time(s).
> 2008-11-04 17:19:34,495 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to s
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 8 time(s).
> 2008-11-04 17:19:35,499 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to s
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 9 time(s).
> 2008-11-04 17:19:35,502 ERROR org.apache.hadoop.dfs.DataNode:
> java.io.IOExceptio
> n: Call failed on local exception
>at org.apache.hadoop.ipc.Client.call(Client.java:718)
>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2008-11-at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> to s
>at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
>at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
>at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
>at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)
>at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:244)
>at org.apache.hadoop.dfs.DataNode.(DataNode.java:190)
>at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2987)
>at
> org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:2942
> )
>at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2950)
>at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3072)
> Caused by: java.net.ConnectException: Connection refused
>at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574
> )
>at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
>at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:30
> 0)
>at
> org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>at org.apache.hadoop.ipc.Client.call(Client.java:704)
>... 12 more
> erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 5 time(s).
> 2008-11-04 17:19:35,502 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG:t
> to s
> /e(s).
> SHUTDOWN_MSG: Shutting down DataNode at lca2-s3-pc04/127.0.1.1
> /haracters
>
> Here is the hadoop-site configuration file data that I use on both the
> master and the client.
>
> 
>
> 
> hadoop.tmp.dir
> /opt/okkam/datastore/hadoop
> 
>
> 
> fs.default.name
> hdfs://lca2-s3-pc01:9000
> 
>
> 
> dfs.replication
> 2
> 

[ANNOUNCE] Hadoop release 0.18.2 available

2008-11-04 Thread Nigel Daley

Release 0.18.2 fixes many critical bugs in 0.18.1.

For Hadoop release details and downloads, visit:
http://hadoop.apache.org/core/releases.html

Hadoop 0.18.2 Release Notes are at
http://hadoop.apache.org/core/docs/r0.18.2/releasenotes.html

Thanks to all who contributed to this release!

Nigel


Recovery from Failed Jobs

2008-11-04 Thread shahab mehmandoust
Hello,

I want to parse lines of an access logs, line by line with map/reduce.  I
want to know, once my access log is in the HDFS, am I guaranteed that every
line will be processed and results will be in the output dir?  In other
words, if a job fails, does hadoop know where it failed? and can hadoop
recover from that point so no data is lost?

Thanks,
Shahab


Problem while starting Hadoop

2008-11-04 Thread srikanth . bondalapati

Hi,

   I am trying to use hadoop 0.18.1. After I start the hadoop, I am  
able to see namenode running on the master. But, datanode on the  
client machine is unable to connect to the namenode. I use 2 machines  
with hostnames lca2-s3-pc01 and lca2-s3-pc04 respectively. It shows  
the following message in the client log file.


2008-11-04 17:19:25,253 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = lca2-s3-pc04/127.0.1.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.18.1
STARTUP_MSG:   build =  
http://svn.apache.org/repos/asf/hadoop/core/branches/bran

ch-0.18 -r 694836; compiled by 'hadoopqa' on Fri Sep 12 23:29:35 UTC 2008
/
2008-11-04 17:19:26,464 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to s

erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 0 time(s).
2008-11-04 17:19:27,468 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to s

erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 1 time(s).
2008-11-04 17:19:28,472 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to s

erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 2 time(s).
2008-11-04 17:19:29,476 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to s

erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 3 time(s).
2008-11-04 17:19:30,479 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to s

erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 4 time(s).
2008-11-04 17:19:31,483 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to s

erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 5 time(s).
2008-11-04 17:19:32,487 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to s

erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 6 time(s).
2008-11-04 17:19:33,491 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to s

erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 7 time(s).
2008-11-04 17:19:34,495 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to s

erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 8 time(s).
2008-11-04 17:19:35,499 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to s

erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 9 time(s).
2008-11-04 17:19:35,502 ERROR org.apache.hadoop.dfs.DataNode:  
java.io.IOExceptio

n: Call failed on local exception
at org.apache.hadoop.ipc.Client.call(Client.java:718)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
2008-11-at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown  
Source) to s

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)
at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:244)
at org.apache.hadoop.dfs.DataNode.(DataNode.java:190)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2987)
at  
org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:2942

)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2950)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3072)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at  
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574

)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
at  
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:30

0)
at  
org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)

at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
at org.apache.hadoop.ipc.Client.call(Client.java:704)
... 12 more
erver: lca2-s3-pc01/128.178.156.221:9000. Already tried 5 time(s).
2008-11-04 17:19:35,502 INFO org.apache.hadoop.dfs.DataNode:  
SHUTDOWN_MSG:t to s

/e(s).
SHUTDOWN_MSG: Shutting down DataNode at lca2-s3-pc04/127.0.1.1
/haracters

Here is the hadoop-site configuration file data that I use on both the  
master and the client.





hadoop.tmp.dir
/opt/okkam/datastore/hadoop



fs.default.name
hdfs://lca2-s3-pc01:9000



dfs.replication
2




Could you please tell the mistake I am committing.

Thanks a lot in advance,
Srikanth.


RE: Hadoop hardware specs

2008-11-04 Thread Zhang, Roger
Brian,

You seem to have a pretty large cluster. How do you think about the overall 
performance?
Is your implementation on Open-SSH or SSH2?

I'm new to this and trying to setup a 20 node cluster. But our Linux boxes 
enforced F-secure SSH2 already, which I found HDFS 0.18 does not support right 
now. 

Anyone has any idea of a workaround?


Thanks and best Rgds. 
Roger Zhang 

-Original Message-
From: Brian Bockelman [mailto:[EMAIL PROTECTED] 
Sent: 2008年11月4日 21:36
To: core-user@hadoop.apache.org
Subject: Re: Hadoop hardware specs

Hey Arjit,

We use all internal SATA drives in our cluster, which is about 110TB  
today; if we grow it to our planned 350TB, it will be a healthy mix of  
worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached  
vaults, and fibre channel vaults.

Brian

On Nov 4, 2008, at 4:16 AM, Arijit Mukherjee wrote:

> Hi All
>
> We're thinking of setting up a Hadoop cluster which will be used to
> create a prototype system for analyzing telecom data. The wiki page on
> machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives  
> an
> overview of the node specs and from the Hadoop primer I found the
> following specs -
>
> * 5 x dual core CPUs
> * RAM - 4-8GB; ECC preferred, though more expensive
> * 2 x 250GB SATA drives (on each of the 5 nodes)
> * 1-5 TB external storage
>
> I'm curious to find out what sort of specs do people use normally. Is
> the external storage essential or will the individual disks on each  
> node
> be sufficient? Why would you need an external storage in a hadoop
> cluster? How can I find out what other projects on hadoop are using?
> Cheers
> Arijit
>
>
> Dr. Arijit Mukherjee
> Principal Member of Technical Staff, Level-II
> Connectiva Systems (I) Pvt. Ltd.
> J-2, Block GP, Sector V, Salt Lake
> Kolkata 700 091, India
> Phone: +91 (0)33 23577531/32 x 107
> http://www.connectivasystems.com
>



Re: Status FUSE-Support of HDFS

2008-11-04 Thread Brian Bockelman

Hey Robert,

I would chime in saying that our usage of FUSE results in a network  
transfer rate of about 30MB/s, and it does not seem to be a limiting  
factor (right now, we're CPU bound).


In our (limited) tests, we've achieved 80Gbps of reads in our cluster  
overall.  This did not appear to push the limits of FUSE or Hadoop.


Since we've applied the patches (which are in 0.18.2 by default), we  
haven't had any corruption issues.  Our application has rather heavy- 
handed internal file checksums, and the jobs would crash immediately  
if they were reading in garbage.


Brian

On Nov 4, 2008, at 10:07 AM, Robert Krüger wrote:



Thanks! This is good news. So it's fast enough for our purposes if it
turns out to be the same order of magnitude on our systems.

Have you used this with rsync? If so, any known issues with that
(reading or writing)?

Thanks in advance,

Robert


Pete Wyckoff wrote:

Reads are 20-30% slower
Writes are 33% slower before https://issues.apache.org/jira/browse/HADOOP-3805 
 - You need a kernel > 2.6.26-rc* to test 3805, which I don't have :(


These #s are with hadoop 0.17 and the 0.18.2 version of fuse-dfs.

-- pete


On 11/2/08 6:23 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:



Hi Pete,

thanks for the info. That helps a lot. We will probably test it for  
our
use cases then. Did you benchmark throughput when reading writing  
files
through fuse-dfs and compared it to command line tool or API  
access? Is

there a notable difference?

Thanks again,

Robert



Pete Wyckoff wrote:
It has come a long way since 0.18 and facebook keeps our (0.17)  
dfs mounted via fuse and uses that for some operations.


There have recently been some problems with fuse-dfs when used in  
a multithreaded environment, but those have been fixed in 0.18.2  
and 0.19. (do not use 0.18 or 0.18.1)


The current (known) issues are:
 1. Wrong semantics when copying over an existing file - namely it  
does a delete and then re-creates the file, so ownership/ 
permissions may end up wrong. There is a patch for this.
 2. When directories have 10s of thousands of files, performance  
can be very poor.
 3. Posix truncate is supported only for truncating it to 0 size  
since hdfs doesn't support truncate.
 4. Appends are not supported - this is a libhdfs problem and  
there is a patch for it.


It is still a pre-1.0 product for sure, but it has been pretty  
stable for us.



-- pete


On 10/31/08 9:08 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:



Hi,

could anyone tell me what the current Status of FUSE support for  
HDFS

is? Is this something that can be expected to be usable in a few
weeks/months in a production environment? We have been really
happy/successful with HDFS in our production system. However, some
software we use in our application simply requires an OS-Level file
system which currently requires us to do a lot of copying between  
HDFS
and a regular file system for processes which require that  
software and
FUSE support would really eliminate that one disadvantage we have  
with
HDFS. We wouldn't even require the performance of that to be  
outstanding
because just by eliminatimng the copy step, we would greatly  
increase

the thruput of those processes.

Thanks for sharing any thoughts on this.

Regards,

Robert












Re: hadoop 0.18.1 x-trace

2008-11-04 Thread Veiko Schnabel

Hi George, 

thanks, 

we try to evaluate hadoop, and my part is monitoring and performance
measurements, to get an impression how it works and how we could improve 
prformances

so i'm sure, x-trace is the right tool to get an complete overview while the 
application is running

I'd like to stay tuned and best wishes for your talk at the apachecon

Veiko


> Hi Veiko,
> 
> Right now the patches represent an instrumentation API for the RPC  
> layer (the X-Trace implementation is not currently part of the
> patch-- I'm hoping to submit it as a contrib/ project).  I'll be
> talking at the Hadoop Camp later this week about X-Trace and Hadoop.
> There is much to do in terms of building UIs, analysis tools, and
> trace storage/query interfaces.  So stay tuned (and if you would be
> willing to talk more about your anticipated uses--please let me
> know.  I'd be very interested in talking with you).
> 
> Thanks,
> George
> 
> On Nov 3, 2008, at 11:32 AM, Michael Bieniosek wrote:
> 
> > Try applying the last one only.
> >
> > Let us know if it works!
> >
> > -Michael
> >
> > On 11/3/08 6:23 AM, "Veiko Schnabel" <[EMAIL PROTECTED]>
> > wrote:
> >
> > Dear Hadoop Users and Developers,
> >
> > I have a requirement of monitoring the hadoop-cluster by using x- 
> > trace.
> >
> > i found these pathes on
> >
> > http://issues.apache.org/jira/browse/HADOOP-4049
> >
> >
> > but when i try to integrate them with 0.18.1, then i cannot build  
> > hadoop anymore
> >
> > first of all , the patch-order is not clear to me,
> > can anyone explain to me which patches i really need and the order  
> > to bring in these patches
> >
> > thanks
> >
> > Veiko
> >
> >
> >



Re: Hadoop hardware specs

2008-11-04 Thread Allen Wittenauer



On 11/4/08 2:16 AM, "Arijit Mukherjee" <[EMAIL PROTECTED]>
wrote:

> * 1-5 TB external storage
> 
> I'm curious to find out what sort of specs do people use normally. Is
> the external storage essential or will the individual disks on each node
> be sufficient? Why would you need an external storage in a hadoop
> cluster? 

The big reason for the external storage is two fold:

A) Provide shared home directory (especially for the HDFS user so that it is
easy to use the start scripts that call ssh)

B) An off-machine copy of the fsimage and edits file as used by the name
node.  This way if the name node goes belly up, you'll have an always
up-to-date backup to recover.

> How can I find out what other projects on hadoop are using?

Slide 12 of the Apachecon presentation I did earlier this year talks
about what Yahoo!'s typical node looks like.  For a small 5 node cluster,
your hardware specs seem fine to me.

An 8GB namenode for 4 data nodes (or maybe even running nn on the same
machine as a data node if memory size of jobs is kept in check) should be
a-ok, even if you double the storage.  You're likely going to run out of
disk space before the name node starts swapping.



Re: Status FUSE-Support of HDFS

2008-11-04 Thread Robert Krüger

Thanks! This is good news. So it's fast enough for our purposes if it
turns out to be the same order of magnitude on our systems.

Have you used this with rsync? If so, any known issues with that
(reading or writing)?

Thanks in advance,

Robert


Pete Wyckoff wrote:
> Reads are 20-30% slower
> Writes are 33% slower before 
> https://issues.apache.org/jira/browse/HADOOP-3805 - You need a kernel > 
> 2.6.26-rc* to test 3805, which I don't have :(
> 
> These #s are with hadoop 0.17 and the 0.18.2 version of fuse-dfs.
> 
> -- pete
> 
> 
> On 11/2/08 6:23 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:
> 
> 
> 
> Hi Pete,
> 
> thanks for the info. That helps a lot. We will probably test it for our
> use cases then. Did you benchmark throughput when reading writing files
> through fuse-dfs and compared it to command line tool or API access? Is
> there a notable difference?
> 
> Thanks again,
> 
> Robert
> 
> 
> 
> Pete Wyckoff wrote:
>> It has come a long way since 0.18 and facebook keeps our (0.17) dfs mounted 
>> via fuse and uses that for some operations.
>>
>> There have recently been some problems with fuse-dfs when used in a 
>> multithreaded environment, but those have been fixed in 0.18.2 and 0.19. (do 
>> not use 0.18 or 0.18.1)
>>
>> The current (known) issues are:
>>   1. Wrong semantics when copying over an existing file - namely it does a 
>> delete and then re-creates the file, so ownership/permissions may end up 
>> wrong. There is a patch for this.
>>   2. When directories have 10s of thousands of files, performance can be 
>> very poor.
>>   3. Posix truncate is supported only for truncating it to 0 size since hdfs 
>> doesn't support truncate.
>>   4. Appends are not supported - this is a libhdfs problem and there is a 
>> patch for it.
>>
>> It is still a pre-1.0 product for sure, but it has been pretty stable for us.
>>
>>
>> -- pete
>>
>>
>> On 10/31/08 9:08 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>> Hi,
>>
>> could anyone tell me what the current Status of FUSE support for HDFS
>> is? Is this something that can be expected to be usable in a few
>> weeks/months in a production environment? We have been really
>> happy/successful with HDFS in our production system. However, some
>> software we use in our application simply requires an OS-Level file
>> system which currently requires us to do a lot of copying between HDFS
>> and a regular file system for processes which require that software and
>> FUSE support would really eliminate that one disadvantage we have with
>> HDFS. We wouldn't even require the performance of that to be outstanding
>> because just by eliminatimng the copy step, we would greatly increase
>> the thruput of those processes.
>>
>> Thanks for sharing any thoughts on this.
>>
>> Regards,
>>
>> Robert
>>
>>
>>
> 
> 
> 
> 



Re: Hadoop hardware specs

2008-11-04 Thread Brian Bockelman

Hey Arjit,

We use all internal SATA drives in our cluster, which is about 110TB  
today; if we grow it to our planned 350TB, it will be a healthy mix of  
worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached  
vaults, and fibre channel vaults.


Brian

On Nov 4, 2008, at 4:16 AM, Arijit Mukherjee wrote:


Hi All

We're thinking of setting up a Hadoop cluster which will be used to
create a prototype system for analyzing telecom data. The wiki page on
machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives  
an

overview of the node specs and from the Hadoop primer I found the
following specs -

* 5 x dual core CPUs
* RAM - 4-8GB; ECC preferred, though more expensive
* 2 x 250GB SATA drives (on each of the 5 nodes)
* 1-5 TB external storage

I'm curious to find out what sort of specs do people use normally. Is
the external storage essential or will the individual disks on each  
node

be sufficient? Why would you need an external storage in a hadoop
cluster? How can I find out what other projects on hadoop are using?
Cheers
Arijit


Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107
http://www.connectivasystems.com





Re: SecondaryNameNode on separate machine

2008-11-04 Thread Tomislav Poljak
Konstantin,

it works, thanks a lot!

Tomislav


On Mon, 2008-11-03 at 11:13 -0800, Konstantin Shvachko wrote:
> You can either do what you just described with dfs.name.dir = dirX
> or you can start name-node with -importCheckpoint option.
> This is an automation for copying image files from secondary to primary.
> 
> See here:
> http://hadoop.apache.org/core/docs/current/commands_manual.html#namenode
> http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Secondary+NameNode
> http://issues.apache.org/jira/browse/HADOOP-2585#action_12584755
> 
> --Konstantin
> 
> Tomislav Poljak wrote:
> > Hi,
> > Thank you all for your time and your answers!
> > 
> > Now SecondaryNameNode connects to the NameNode (after I configured
> > dfs.http.address to the NN's http server -> NN hostname on port 50070)
> > and creates(transfers) edits and fsimage from NameNode.
> > 
> > Can you explain me a little bit more how NameNode failover should work
> > now? 
> > 
> > For example, SecondaryNameNode now stores fsimage and edits to (SNN's)
> > dirX and let's say NameNode goes down (disk becomes unreadable). Now I
> > create/dedicate a new machine for NameNode (also change DNS to point to
> > this new NameNode machine as nameNode host) and take the data dirX from
> > SNN and copy it to new NameNode. How do I configure new NameNode to use
> > data from dirX (do I configure dfs.name.dir to point to dirX and start
> > new NameNode)?
> > 
> > Thanks,
> > Tomislav
> > 
> > 
> > 
> > On Fri, 2008-10-31 at 11:38 -0700, Konstantin Shvachko wrote:
> >> True, dfs.http.address is the NN Web UI address.
> >> This where the NN http server runs. Besides the Web UI there also
> >> a servlet running on that server which is used to transfer image
> >> and edits from NN to the secondary using http get.
> >> So SNN uses both addresses fs.default.name and dfs.http.address.
> >>
> >> When SNN finishes the checkpoint the primary needs to transfer the
> >> resulting image back. This is done via the http server running on SNN.
> >>
> >> Answering Tomislav's question:
> >> The difference between fs.default.name and dfs.http.address is that
> >> fs.default.name is the name-node's PRC address, where clients and
> >> data-nodes connect to, while dfs.http.address is the NN's http server
> >> address where our browsers connect to, but it is also used for
> >> transferring image and edits files.
> >>
> >> --Konstantin
> >>
> >> Otis Gospodnetic wrote:
> >>> Konstantin & Co, please correct me if I'm wrong, but looking at 
> >>> hadoop-default.xml makes me think that dfs.http.address is only the URL 
> >>> for the NN *Web UI*.  In other words, this is where we people go look at 
> >>> the NN.
> >>>
> >>> The secondary NN must then be using only the Primary NN URL specified in 
> >>> fs.default.name.  This URL looks like hdfs://name-node-hostname-here/.  
> >>> Something in Hadoop then knows the exact port for the Primary NN based on 
> >>> the URI schema (e.g. "hdfs://") in this URL.
> >>>
> >>> Is this correct?
> >>>
> >>>
> >>> Thanks,
> >>> Otis
> >>> --
> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>>
> >>>
> >>>
> >>> - Original Message 
>  From: Tomislav Poljak <[EMAIL PROTECTED]>
>  To: core-user@hadoop.apache.org
>  Sent: Thursday, October 30, 2008 1:52:18 PM
>  Subject: Re: SecondaryNameNode on separate machine
> 
>  Hi,
>  can you, please, explain the difference between fs.default.name and
>  dfs.http.address (like how and when is SecondaryNameNode using
>  fs.default.name and how/when dfs.http.address). I have set them both to
>  same (namenode's) hostname:port. Is this correct (or dfs.http.address
>  needs some other port)? 
> 
>  Thanks,
> 
>  Tomislav
> 
>  On Wed, 2008-10-29 at 16:10 -0700, Konstantin Shvachko wrote:
> > SecondaryNameNode uses http protocol to transfer the image and the edits
> > from the primary name-node and vise versa.
> > So the secondary does not access local files on the primary directly.
> > The primary NN should know the secondary's http address.
> > And the secondary NN need to know both fs.default.name and 
> > dfs.http.address of 
>  the primary.
> > In general we usually create one configuration file hadoop-site.xml
> > and copy it to all other machines. So you don't need to set up different
> > values for all servers.
> >
> > Regards,
> > --Konstantin
> >
> > Tomislav Poljak wrote:
> >> Hi,
> >> I'm not clear on how does SecondaryNameNode communicates with NameNode
> >> (if deployed on separate machine). Does SecondaryNameNode uses direct
> >> connection (over some port and protocol) or is it enough for
> >> SecondaryNameNode to have access to data which NameNode writes locally
> >> on disk?
> >>
> >> Tomislav
> >>
> >> On Wed, 2008-10-29 at 09:08 -0400, Jean-Daniel Cryans wrote:
> >>> I think a lot of t

RE: Hadoop hardware specs

2008-11-04 Thread Arijit Mukherjee
One correction - the number 5 in the mail below is my estimation of the
number of nodes we might need. Can this be too small a cluster?

Arijit

Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107
http://www.connectivasystems.com


-Original Message-
From: Arijit Mukherjee [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 04, 2008 3:47 PM
To: core-user@hadoop.apache.org
Subject: Hadoop hardware specs


Hi All

We're thinking of setting up a Hadoop cluster which will be used to
create a prototype system for analyzing telecom data. The wiki page on
machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an
overview of the node specs and from the Hadoop primer I found the
following specs -

* 5 x dual core CPUs 
* RAM - 4-8GB; ECC preferred, though more expensive 
* 2 x 250GB SATA drives (on each of the 5 nodes) 
* 1-5 TB external storage 

I'm curious to find out what sort of specs do people use normally. Is
the external storage essential or will the individual disks on each node
be sufficient? Why would you need an external storage in a hadoop
cluster? How can I find out what other projects on hadoop are using?
Cheers 
Arijit 


Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com No
virus found in this incoming message. Checked by AVG -
http://www.avg.com 
Version: 8.0.175 / Virus Database: 270.8.6/1765 - Release Date:
11/3/2008 4:59 PM




Hadoop hardware specs

2008-11-04 Thread Arijit Mukherjee
Hi All

We're thinking of setting up a Hadoop cluster which will be used to
create a prototype system for analyzing telecom data. The wiki page on
machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an
overview of the node specs and from the Hadoop primer I found the
following specs -

* 5 x dual core CPUs 
* RAM - 4-8GB; ECC preferred, though more expensive 
* 2 x 250GB SATA drives (on each of the 5 nodes) 
* 1-5 TB external storage 

I'm curious to find out what sort of specs do people use normally. Is
the external storage essential or will the individual disks on each node
be sufficient? Why would you need an external storage in a hadoop
cluster? How can I find out what other projects on hadoop are using?
Cheers 
Arijit 


Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107
http://www.connectivasystems.com