Q, Find a counter

2008-11-04 Thread Edward J. Yoon
Hi, I'd like to find a counter of REDUCE_OUTPUT_RECORDS after job done. BTW, org.apache.hadoop.mapred.Task.Counter is not visible and, a findCounter(String, int, String) is deprecated. What is the best code? -- Best regards, Edward J. Yoon @ NHN, corp. [EMAIL PROTECTED] http://blog.udanax.org

Re: _temporary directories not deleted

2008-11-04 Thread Amareshwari Sriramadasu
Nathan Marz wrote: Hello all, Occasionally when running jobs, Hadoop fails to clean up the "_temporary" directories it has left behind. This only appears to happen when a task is killed (aka a speculative execution), and the data that task has outputted so far is not cleaned up. Is this a kn

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Amit k. Saha
On Wed, Nov 5, 2008 at 3:17 AM, Tom Wheeler <[EMAIL PROTECTED]> wrote: > Done. I also added a link to the article that Amit Kumar Saha wrote > just a few weeks ago for linux.com. Thanks you Tom :-) -Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skyp

Re: Missing blocks from bin/hadoop text but fsck is all right

2008-11-04 Thread Sagar Naik
Hi, We were hitting file descriptor limits :). Increased it and got solved. Thanks Jason -Sagar Sagar Naik wrote: Hi, We have a strange problem on getting out some of our files bin/hadoop dfs -text dir/* gives me missing block exceptions. 0/8/11/04 10:45:09 [main] INFO dfs.DFSClient: Could

too many open files? Isn't 4K enough???

2008-11-04 Thread Yuri Pradkin
Hi, I'm running current snapshot (-r709609), doing a simple word count using python over streaming. I'm have a relatively moderate setup of 17 nodes. I'm getting this exception: java.io.FileNotFoundException: /usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Ravion
Great and thank you!! Best Ravion - Original Message - From: "Tom Wheeler" <[EMAIL PROTECTED]> To: Sent: Wednesday, November 05, 2008 5:47 AM Subject: Re: Seeking Someone to Review Hadoop Article Done. I also added a link to the article that Amit Kumar Saha wrote just a few weeks a

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Tom Wheeler
Done. I also added a link to the article that Amit Kumar Saha wrote just a few weeks ago for linux.com. On Tue, Nov 4, 2008 at 4:37 PM, Ravion <[EMAIL PROTECTED]> wrote: > Dear Tom, > > Here is one more writtien by our core data warehouse team. Appreciate if you > can add it in Hadoop article lin

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Ravion
Dear Tom, Here is one more writtien by our core data warehouse team. Appreciate if you can add it in Hadoop article links, so that community is more benefitted:- http://www.javaworld.com/javaworld/jw-09-2008/jw-09-hadoop.html Best, Ravion - Original Message - From: "Tom Wheeler" <[E

Re: HDFS Login Security

2008-11-04 Thread Alex Loddengaard
Look at the "hadoop.job.ugi" configuration option. You can manually set a user and the groups that user is a part of. Alex On Tue, Nov 4, 2008 at 1:42 PM, Wasim Bari <[EMAIL PROTECTED]> wrote: > Hi, > Do we have any Java class for Login purpose to HDFS programmatically > like traditional Use

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Tom Wheeler
On Tue, Nov 4, 2008 at 3:46 PM, Milind Bhandarkar <[EMAIL PROTECTED]> wrote: > Please consider adding it to: http://wiki.apache.org/hadoop/HadoopArticles Great suggestion -- I've just linked it there as you request. -- Tom Wheeler http://www.tomwheeler.com/

Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Milind Bhandarkar
Tom, Please consider adding it to: http://wiki.apache.org/hadoop/HadoopArticles Thanks, - milind On 11/2/08 5:57 PM, "Tom Wheeler" <[EMAIL PROTECTED]> wrote: > The article I've written about Hadoop has just been published: > >http://www.ociweb.com/jnb/jnbNov2008.html > > I'd like to aga

Re: Hadoop hardware specs

2008-11-04 Thread Brian Bockelman
Hey Roger, SSH is only needed to start and stop daemons - it's not really needed for running Hadoop itself. Currently, we do this through custom site mechanisms, and not through SSH. Brian On Nov 4, 2008, at 10:36 AM, Zhang, Roger wrote: Brian, You seem to have a pretty large cluster.

_temporary directories not deleted

2008-11-04 Thread Nathan Marz
Hello all, Occasionally when running jobs, Hadoop fails to clean up the "_temporary" directories it has left behind. This only appears to happen when a task is killed (aka a speculative execution), and the data that task has outputted so far is not cleaned up. Is this a known issue in had

HDFS Login Security

2008-11-04 Thread Wasim Bari
Hi, Do we have any Java class for Login purpose to HDFS programmatically like traditional UserName/Password mechanism ? or we can have only system user or user who started NameNode ? Thanks, Wasim

Missing blocks from bin/hadoop text but fsck is all right

2008-11-04 Thread Sagar Naik
Hi, We have a strange problem on getting out some of our files bin/hadoop dfs -text dir/* gives me missing block exceptions. 0/8/11/04 10:45:09 [main] INFO dfs.DFSClient: Could not obtain block blk_6488385702283300787_1247408 from any node: java.io.IOException: No live nodes contain current b

Re: Recovery from Failed Jobs

2008-11-04 Thread Alex Loddengaard
With regard to checkpointing, not yet. This JIRA is a prerequisite: I'm a little confused about what you're trying to do with log parsing. You should consider Scribe or Chukwa, though Chukwa isn't ready to be used yet. Learn more here: Chukwa:

Re: Problem while starting Hadoop

2008-11-04 Thread Alex Loddengaard
Does 'ping lca2-s3-pc01' resolve from lca2-s3-pc04 and vise-versa? Are your 'slaves' and 'master' configuration files configured correctly? You can also try stopping everything, deleting all of your Hadoop data on each machine (by default in /tmp), reformating the namenode, and starting all again.

[ANNOUNCE] Hadoop release 0.18.2 available

2008-11-04 Thread Nigel Daley
Release 0.18.2 fixes many critical bugs in 0.18.1. For Hadoop release details and downloads, visit: http://hadoop.apache.org/core/releases.html Hadoop 0.18.2 Release Notes are at http://hadoop.apache.org/core/docs/r0.18.2/releasenotes.html Thanks to all who contributed to this release! Nigel

Recovery from Failed Jobs

2008-11-04 Thread shahab mehmandoust
Hello, I want to parse lines of an access logs, line by line with map/reduce. I want to know, once my access log is in the HDFS, am I guaranteed that every line will be processed and results will be in the output dir? In other words, if a job fails, does hadoop know where it failed? and can hado

Problem while starting Hadoop

2008-11-04 Thread srikanth . bondalapati
Hi, I am trying to use hadoop 0.18.1. After I start the hadoop, I am able to see namenode running on the master. But, datanode on the client machine is unable to connect to the namenode. I use 2 machines with hostnames lca2-s3-pc01 and lca2-s3-pc04 respectively. It shows the following

RE: Hadoop hardware specs

2008-11-04 Thread Zhang, Roger
Brian, You seem to have a pretty large cluster. How do you think about the overall performance? Is your implementation on Open-SSH or SSH2? I'm new to this and trying to setup a 20 node cluster. But our Linux boxes enforced F-secure SSH2 already, which I found HDFS 0.18 does not support right

Re: Status FUSE-Support of HDFS

2008-11-04 Thread Brian Bockelman
Hey Robert, I would chime in saying that our usage of FUSE results in a network transfer rate of about 30MB/s, and it does not seem to be a limiting factor (right now, we're CPU bound). In our (limited) tests, we've achieved 80Gbps of reads in our cluster overall. This did not appear to

Re: hadoop 0.18.1 x-trace

2008-11-04 Thread Veiko Schnabel
Hi George, thanks, we try to evaluate hadoop, and my part is monitoring and performance measurements, to get an impression how it works and how we could improve prformances so i'm sure, x-trace is the right tool to get an complete overview while the application is running I'd like to stay

Re: Hadoop hardware specs

2008-11-04 Thread Allen Wittenauer
On 11/4/08 2:16 AM, "Arijit Mukherjee" <[EMAIL PROTECTED]> wrote: > * 1-5 TB external storage > > I'm curious to find out what sort of specs do people use normally. Is > the external storage essential or will the individual disks on each node > be sufficient? Why would you need an external sto

Re: Status FUSE-Support of HDFS

2008-11-04 Thread Robert Krüger
Thanks! This is good news. So it's fast enough for our purposes if it turns out to be the same order of magnitude on our systems. Have you used this with rsync? If so, any known issues with that (reading or writing)? Thanks in advance, Robert Pete Wyckoff wrote: > Reads are 20-30% slower > Wr

Re: Hadoop hardware specs

2008-11-04 Thread Brian Bockelman
Hey Arjit, We use all internal SATA drives in our cluster, which is about 110TB today; if we grow it to our planned 350TB, it will be a healthy mix of worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached vaults, and fibre channel vaults. Brian On Nov 4, 2008, at 4:16 AM

Re: SecondaryNameNode on separate machine

2008-11-04 Thread Tomislav Poljak
Konstantin, it works, thanks a lot! Tomislav On Mon, 2008-11-03 at 11:13 -0800, Konstantin Shvachko wrote: > You can either do what you just described with dfs.name.dir = dirX > or you can start name-node with -importCheckpoint option. > This is an automation for copying image files from second

RE: Hadoop hardware specs

2008-11-04 Thread Arijit Mukherjee
One correction - the number 5 in the mail below is my estimation of the number of nodes we might need. Can this be too small a cluster? Arijit Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2, Block GP, Sector V, Salt Lake Kolkata 700 091, In

Hadoop hardware specs

2008-11-04 Thread Arijit Mukherjee
Hi All We're thinking of setting up a Hadoop cluster which will be used to create a prototype system for analyzing telecom data. The wiki page on machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an overview of the node specs and from the Hadoop primer I found the following spec