Re: How do I create a sequence file on my local harddrive?

2011-04-25 Thread David Rosenstrauch
On 04/22/2011 09:09 PM, W.P. McNeill wrote: I want to create a sequence file on my local harddrive. I want to write something like this: LocalFileSystem fs = new LocalFileSystem(); Configuration configuration = new Configuration(); Try doing this instead: Configuration

How to change logging level for an individual job

2011-04-13 Thread David Rosenstrauch
Is it possible to change the logging level for an individual job? (As opposed to the cluster as a whole.) E.g., is there some key that I can set on the job's configuration object that would allow me to bump up the logging from info to debug just for that particular job? Thanks, DR

Re: Developing, Testing, Distributing

2011-04-07 Thread David Rosenstrauch
On 04/07/2011 03:39 AM, Guy Doulberg wrote: Hey, I have been developing Map/Red jars for a while now, and I am still not comfortable with the developing environment I gathered for myself (and the team) I am curious how other Hadoop developers out-there, are developing their jobs... What IDE y

Re: How to abort a job in a map task

2011-04-07 Thread David Rosenstrauch
On 04/06/2011 08:40 PM, Haruyasu Ueda wrote: Hi all, I'm writing M/R java program. I want to abort a job itself in a map task, when the map task found irregular data. I have two idea to do so. 1. execulte "bin/hadoop -kill jobID" in map task, from slave machine. 2. raise an IOException to

Re: What does "Too many fetch-failures" mean? How do I debug it?

2011-03-31 Thread David Rosenstrauch
On 03/31/2011 05:13 PM, W.P. McNeill wrote: I'm running a big job on my cluster and a handful of attempts are failing with a "Too many fetch-failures" error message. They're all on the same node, but that node doesn't appear to be down. Subsequent attempts succeed, so this looks like a transient

Re: CDH and Hadoop

2011-03-24 Thread David Rosenstrauch
They do, but IIRC, they recently announced that they're going to be discontinuing it. DR On Thu, March 24, 2011 8:10 pm, Rita wrote: > Thanks everyone for your replies. > > I knew Cloudera had their release but never knew Y! had one too... > > > > > > On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins

Re: Writable Class with an Array

2011-03-17 Thread David Rosenstrauch
I would try implementing this using an ArrayWritable, which contains an array of IntWritables. HTH, DR On 03/17/2011 05:04 PM, maha wrote: Hello, I'm stuck with this for two days now ...I found a previous post discussing this, but not with arrays. I know how to write Writable class with pri

Re: hadoop fs -rmr /*?

2011-03-16 Thread David Rosenstrauch
On 03/16/2011 01:35 PM, W.P. McNeill wrote: On HDFS, anyone can run hadoop fs -rmr /* and delete everything. Not sure how you have your installation set but on ours (we installed Cloudera CDH), only user "hadoop" has full read/write access to HDFS. Since we rarely either login as user hadoop,

Re: Custom Input format...

2011-02-11 Thread David Rosenstrauch
On 02/11/2011 05:43 AM, Nitin Khandelwal wrote: Hi, I want to give a folder as input path to Map Red. Each Task should read one file out of that folder at once . i was using it before in 0.19 using multiFileSplit Format and my own Input format extending it. can u plz tell how to do the same in 0.

Re: why is it invalid to have non-alphabet characters as a result of MultipleOutputs?

2011-02-08 Thread David Rosenstrauch
On 02/08/2011 05:01 AM, Jun Young Kim wrote: Hi, Multipleoutputs supports to have named outputs as a result of a hadoop. but, it has inconvenient restrictions to have it. only, alphabet characters are valid as a named output. A ~ Z a ~ z 0 ~ 9 are only characters we can take. I believe if I

Re: Streaming data locality

2011-02-03 Thread David Rosenstrauch
On 02/03/2011 12:16 PM, Keith Wiley wrote: I've seen this asked before, but haven't seen a response yet. If the input to a streaming job is not actual data splits but simple HDFS file names which are then read by the mappers, then how can data locality be achieved. Likewise, is there any easier

Re: MultipleOutputs Performance?

2011-01-13 Thread David Rosenstrauch
On 12/10/2010 02:16 PM, Harsh J wrote: Hi, On Thu, Dec 2, 2010 at 10:40 PM, Matt Tanquary wrote: I am using MultipleOutputs to split a mapper input into about 20 different files. Adding this split has had an extremely adverse effect on performance. Is MultipleOutputs known for performing slowl

Re: monitor the hadoop cluster

2010-11-11 Thread David Rosenstrauch
On 11/11/2010 02:52 PM, Da Zheng wrote: Hello, I wrote a MapReduce program and ran it on a 3-node hadoop cluster, but its running time varies a lot, from 2 minutes to 3 minutes. I want to understand how time is used by the map phase and the reduce phase, and hope to find the place to improve the

Re: jobtracker: Cannot assign requested address

2010-09-21 Thread David Rosenstrauch
On 09/21/2010 03:17 AM, Jing Tie wrote: I am still suffering from the problem... Did anyone encounter it before? Or any suggestions? Many thanks in advance! Jing On Fri, Sep 17, 2010 at 5:19 PM, Jing Tie wrote: Dear all, I am having this exception when starting jobtracker, and I checked by

Re: do you need to call super in Mapper.Context.setup()?

2010-09-17 Thread David Rosenstrauch
On 09/16/2010 11:38 PM, Mark Kerzner wrote: Hi, any need for this, protected void setup(Mapper.Context context) throws IOException, InterruptedException { super.setup(context); // TODO - does this need to be done? this.context = context; } Thank you, Mark "Use the source Lu

Re: How to run a job?

2010-09-17 Thread David Rosenstrauch
On 09/17/2010 12:53 AM, Mark Kerzner wrote: Hi, the documentationsays I should do this: JobClient.*runJob*(JobConf

Re: migrating from 0.18 to 0.20

2010-09-16 Thread David Rosenstrauch
It certainly is! I wasted a few hours on that a couple of weeks back. DR On 09/16/2010 02:58 AM, Lance Norskog wrote: After this, if you add anything to the conf object, it does not get added to the job. This is a source of confusion. Mark Kerzner wrote: Thanks! Mark On Wed, Sep 15, 2010 at

Re: Multiple DataNodes on a single machine

2010-09-15 Thread David Rosenstrauch
On 09/15/2010 11:50 AM, Arv Mistry wrote: Hi, Is it possible to run multiple data nodes on a single machine? I currently have a machine with multiple disks and enough disk capacity for replication across them. I don't need redundancy at the machine level but would like to be able to handle a sin

Re: Writable questions

2010-08-31 Thread David Rosenstrauch
On 08/31/2010 02:09 PM, Mark wrote: On 8/31/10 10:07 AM, David Rosenstrauch wrote: On 08/31/2010 12:58 PM, Mark wrote: I have a question regarding outputting Writable objects. I thought all Writables know how to serialize themselves to output. For example I have an ArrayWritable of strings

Re: Writable questions

2010-08-31 Thread David Rosenstrauch
On 08/31/2010 12:58 PM, Mark wrote: I have a question regarding outputting Writable objects. I thought all Writables know how to serialize themselves to output. For example I have an ArrayWritable of strings (or Texts) but when I output it to a file it shows up as 'org.apache.hadoop.io.arraywrit

Re: Custom partitioner for hadoop

2010-08-26 Thread David Rosenstrauch
On 08/26/2010 05:47 PM, Mithila Nagendra wrote: Thank you so much for a response. I had one last question. What if I don't want a particular pair to be put into a partition? For example, if K=5, then I want the partitioner to skip this Key. How would I do this? I tried to return -1 when I don't

Re: Custom partitioner for hadoop

2010-08-25 Thread David Rosenstrauch
If you define a Hadoop object as implementing Configurable, then its setConf() method will be called once, right after it gets instantiated. So each partitioner that gets instantiated will have its setConf() method called right afterwards. I'm taking advantage of that fact by calling my own (

Re: Custom partitioner for hadoop

2010-08-25 Thread David Rosenstrauch
On 08/25/2010 12:40 PM, Mithila Nagendra wrote: In order to avoid this I was thinking of passing the range boundaries to the partitioner. How would I do that? Is there an alternative? Any suggestion would prove useful. We use a custom partitioner, for which we pass in configuration data that g

Viewing counters in history job

2010-08-23 Thread David Rosenstrauch
I had a job that I ran a few days ago that rolled over to the Job tracker history. Now when I go view it in the history viewer although I can see basic stats such as total # records in/out, I can no longer see all the counter values (i.e, most notably my own custom counter values). Is there a

Re: DiskChecker$DiskErrorException: Could not find any valid local directory -- lots of free disk space

2010-08-12 Thread David Rosenstrauch
On 08/12/2010 01:42 PM, Rares Vernica wrote: I forgot to mention that in my cluster the HDFS replication is set to 1. I know this is not recommended but I only have 5 nodes in the cluster, there are no failures There will be! :-) DR

Re: Partitioner in Hadoop 0.20

2010-08-04 Thread David Rosenstrauch
On 08/04/2010 01:55 PM, Wilkes, Chris wrote: On Aug 4, 2010, at 10:50 AM, David Rosenstrauch wrote: On 08/04/2010 12:30 PM, Owen O'Malley wrote: On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote: Anyone know if there's any particular reason why the new Partitioner cla

Re: Partitioner in Hadoop 0.20

2010-08-04 Thread David Rosenstrauch
On 08/04/2010 12:30 PM, Owen O'Malley wrote: On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote: Anyone know if there's any particular reason why the new Partitioner class doesn't implement JobConfigurable? (And, if not, whether there's any plans to fix this omission?

Fwd: Partitioner in Hadoop 0.20

2010-08-04 Thread David Rosenstrauch
Someone sent this email to the commons-user list a while back, but it seems like it slipped through the cracks. We're starting to dig into some hard-core Hadoop development and just came upon this same issue, though. Anyone know if there's any particular reason why the new Partitioner class

Re: Hadoop in fully-distributed mode

2010-07-14 Thread David Rosenstrauch
On 07/14/2010 06:58 AM, abc xyz wrote: Hi everyone, When hadoop is running in fully-distributed mode and I am not the cluster administrator, instead I just can execute my programs on the cluster, how can I get access to the log files of the programs that I run on the cluster? I want to see the o

Re: Text files vs. SequenceFiles

2010-07-06 Thread David Rosenstrauch
It is worth the little bit of engineering effort to save space. /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop */ On Fri, Jul 2, 2010 at 6:14 PM, Alex Loddengaard wrote: Hi David, On Fri, Jul 2, 2010 at 2:54 PM, David Rosenstrauch wrote: * We should use a SequenceFile (binary

Text files vs. SequenceFiles

2010-07-02 Thread David Rosenstrauch
Our team is still new to Hadoop, and a colleague and I are trying to make a decision on file formats. The arguments are: * We should use a SequenceFile (binary) format as it's faster for the machine to read than parsing text, and the files are smaller. * We should use a text file format as i

Re: Does hadoop need to have ZooKeeper to work?

2010-06-28 Thread David Rosenstrauch
On 06/28/2010 10:09 AM, legolas wrote: Hi, I am wondering whether Hadoop has some dependencies on ZooKeeper or not. I mean when I download http://apache.thelorne.com/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz does it has ZooKeeper with it or I should download zoo keeper separately. Finally

Re: which node processed my job

2010-05-06 Thread David Rosenstrauch
On 05/06/2010 11:09 AM, Alan Miller wrote: Not sure if this is the right list for this question, but. Is it possible to determine which host actually processed my MR job? Regards, Alan I'm curious: why would you need to know? DR

Host name problem in Hadoop GUI

2010-04-23 Thread David Rosenstrauch
Having an issue with host names on my new Hadoop cluster. The cluster is currently 1 name node and 2 data nodes, running in a cloud vendor data center. All is well with general operations of the cluster - i.e., name node and data nodes can talk just fine, I can read/write to/from the HDFS, ya

Re: hadoop from apache or cloudera?

2010-03-03 Thread David Rosenstrauch
On 03/03/2010 11:41 AM, Fitrah Elly Firdaus wrote: Dear all, I'm new comer in hadoop. I want ask about hadoop. which one should i install, hadoop from apache, or hadoop from cloudera? and then what the different between hadoop from apache and cloudera? Regards See: http://www.cloudera.com/h