Re: ***LIST MANAGER***

2012-08-09 Thread John Armstrong
On 08/09/2012 01:24 AM, Mike Lyon wrote: I'm takng a wild guess, but i think they are legitimately trying to unsubscribe but are clueless and don't RTFM The scariest part is that these are supposedly people with enough technical chops to set up and program to a Hadoop cluster. I suddenly do

Re: Trigger job from Java application causes ClassNotFound

2012-07-27 Thread John Armstrong
On 07/26/2012 09:20 PM, Steve Armstrong wrote: Do you mean I need to deploy the mahout jars to the lib directory of the master node? Or all the data nodes? Or is there a way to simply tell the hadoop job launcher to upload the jars itself? Every node that runs a Task (mapper or reducer) needs a

Re: Do I have to sort?

2012-06-18 Thread John Armstrong
On 06/18/2012 10:40 AM, Mark Kerzner wrote: that sounds very interesting, and I may implement such a workflow, but can I write back to HDFS in the mapper? In the reducer it is a standard context.write(), but it is a different context. Both Mapper.Context and Reducer.Context descend from TaskIn

Re: Do I have to sort?

2012-06-18 Thread John Armstrong
On 06/18/2012 10:19 AM, Mark Kerzner wrote: If only reducers could be told to start their work on the first maps that they see, my processing would begin to show results much earlier, before all the mappers are done. The sort/shuffle phase isn't just about ordering the keys, it's about collect

Re: Using a combiner

2012-03-15 Thread John Armstrong
Another important note: the combiner runs can "stack". Let's say Prashant is right that the default spill number that triggers the combiner is 3, and that we have a mapper that generates 9 spills. These spills will generate 3 combiner runs, which meets the threshold again, and so we get *anoth

Re: Setting up Hadoop single node setup on Mac OS X

2012-03-05 Thread John Armstrong
On 02/27/2012 11:53 AM, W.P. McNeill wrote: You don't need any virtualization. Mac OS X is Linux and runs Hadoop as is. Nitpick: OS X is NEXTSTEP based on Mach, which is a different POSIX-compliant system from Linux.

Re: How do I synchronize Hadoop jobs?

2012-02-15 Thread John Armstrong
Actually, I think this is what Oozie is for. It seems to leap out as a great example of a forked workflow. hth On 02/15/2012 02:23 PM, W.P. McNeill wrote: Say I have two Hadoop jobs, A and B, that can be run in parallel. I have another job, C, that takes the output of both A and B as input.

Re: Wrong version of jackson parser while running a hadoop job

2012-01-23 Thread John Armstrong
On Fri, 13 Jan 2012 13:59:12 -0800 (PST), vvkbtnkr wrote: > I am running a hadoop jar and keep getting this error - > java.lang.NoSuchMethodError: > org.codehaus.jackson.JsonParser.getValueAsLong() Nobody seems to have answered this while I was on vacation, so... Okay, here's what I know, havin

Re: Debugging mapper

2011-09-15 Thread John Armstrong
On Thu, 15 Sep 2011 12:51:48 -0700, Frank Astier wrote: > I’m using IntelliJ and the WordCount example in Hadoop (which uses > MiniMRCluster). Is it possible to set an IntelliJ debugger breakpoint > straight into the map function of the mapper? - I’ve tried, but so far, the > debugger does not sto

Re: Jobs failing on submit

2011-08-26 Thread John Armstrong
On Fri, 26 Aug 2011 12:20:47 -0700, Ramya Sunil wrote: > Can you also post the configuration of the scheduler you are using? You > might also want to check the jobtracker logs. It would help in further > debugging. Where would I find the scheduler configuration? I haven't changed it, so I assume

Re: Jobs failing on submit

2011-08-26 Thread John Armstrong
On Fri, 26 Aug 2011 11:46:42 -0700, Ramya Sunil wrote: > How many tasktrackers do you have? Can you check if your tasktrackers are > running and the total available map and reduce capacity in your cluster? In pseudo-distributed there's one tasktracker, which is running, and the total map and redu

Jobs failing on submit

2011-08-26 Thread John Armstrong
One of my colleagues has noticed this problem for a while, and now it's biting me. Jobs seem to be failing before every really starting. It seems to be limited (so far) to running in pseudo-distributed mode, since that's where he saw the problem and where I'm now seeing it; it hasn't come up on o

Re: Making sure I understand HADOOP_CLASSPATH

2011-08-22 Thread John Armstrong
On Mon, 22 Aug 2011 11:01:23 -0700, "W.P. McNeill" wrote: > If it is, what is the proper way to make MyJar.jar available to both the > Job > Client and the Task Trackers? Do you mean the task trackers, or the tasks themselves? What process do you want to be able to run the code in MyJar.jar?

Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

2011-08-05 Thread John Armstrong
On Fri, 5 Aug 2011 08:50:02 +0800 (CST), "Daniel,Wu" wrote: > The book also > mentioned the value if mutable, I think the key might also be mutable, > means as we loop each value in iterable, the content of the > key object is reset. The "mutability" of the value is one of the weirdnesses of Hado

Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

2011-08-04 Thread John Armstrong
On Thu, 4 Aug 2011 14:07:12 +0800 (CST), "Daniel,Wu" wrote: > I am using the new API (released is from cloudera). We can see from the > output, for each call of reduce function, 100 records were processed, but > as the reduce is defined as > reduce(IntPair key, Iterable values, Context context),

Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

2011-08-03 Thread John Armstrong
On Wed, 3 Aug 2011 10:35:51 +0800 (CST), "Daniel,Wu" wrote: > So the key of a group is determined by the first coming record in the > group, if we have 3 records in a group > 1: (1900,35) > 2:(1900,34) > 3:(1900,33) > > if (1900,35) comes in as the first row, then the result key will be > (1900,

Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

2011-08-02 Thread John Armstrong
On Tue, 2 Aug 2011 21:49:22 +0800 (CST), "Daniel,Wu" wrote: > we usually use something like values.next() to loop every rows in a > specific group, but I didn't see any code to loop the list, at least it > need to get the first row in the list, which is something like > values.get(). > or will

Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

2011-08-02 Thread John Armstrong
On Tue, 2 Aug 2011 21:25:47 +0800 (CST), "Daniel,Wu" wrote: > at page 243: > Per my understanding, The reducer is supposed to output the first value > (the maximum) for each year. But I just don't know how it work. > > suppose we have the data > 1901 200 > 1901 300 > 1901 400 > > Since grou

Re: Using -libjar option

2011-08-01 Thread John Armstrong
On Mon, 1 Aug 2011 15:30:49 -0400, "Aquil H. Abdullah" wrote: > Don't I feel sheepish... Happens to the best, or so they tell me. > OK, so I've hacked this sample code below, from the ConfigurationPrinter > example in Hadoop: The Definitive Guide. If -libjars had been added to the > configuratio

Re: Using -libjar option

2011-08-01 Thread John Armstrong
On Mon, 1 Aug 2011 13:21:27 -0400, "Aquil H. Abdullah" wrote: > [AA] I am currently invoking my application as follows: > > hadoop jar /home/test/hadoop/test.option.demo.jar > test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar I believe the problem might be that it's looking

Re: Using -libjar option

2011-08-01 Thread John Armstrong
On Mon, 1 Aug 2011 12:11:27 -0400, "Aquil H. Abdullah" wrote: > but it still isn't clear to me how the -libjars option is parsed, whether > or > not I need to explicitly add it to the classpath inside my run method, or > where it needs to be placed in the command-line? IIRC it's parsed as a comma

Re: Class loading problem

2011-07-28 Thread John Armstrong
On Thu, 28 Jul 2011 10:05:57 -0400, "Kumar, Ranjan" wrote: > I have a class to define data I am reading from a MySQL database. > According to online tutorials I created a class called MyRecord and > extended it from Writable, DBWritable. While running it with hadoop I get a > NoSuchMethodException

Re: localhost permission denied

2011-07-19 Thread John Armstrong
On Tue, 19 Jul 2011 20:47:31 +0100, Kobina Kwarko wrote: > Hello, > > Please any assistance?? I am using Hadoop for a school project and managed > to install it on two computers testing with the wordcount example. However, > after stopping Hadoop and restarting the computers (Ubuntu Server 10.10)

Re: Writing out a single file

2011-07-05 Thread John Armstrong
On Tue, 05 Jul 2011 08:09:16 -0700, Mark wrote: > Is there anyway I can write out the results of my mapreduce job into 1 > local file... ie the opposite of getmerge? I don't know about writing directly to the local filesystem, but if you specify a single reducer you should get only one output fi

Requiring configuration parameters in Tools

2011-06-30 Thread John Armstrong
I've got my map/reduce programs implementing Tool, so I can pass various parameters as configuration properties. I'm currently using an ad-hoc method of verifying that a program has been passed all the configuration properties it requires, but I'm wondering if there's a more "Hadoopy" way this sho

Re: Reading HDFS files via Spring

2011-06-27 Thread John Armstrong
On Sun, 26 Jun 2011 17:34:34 -0700, Mark wrote: > Hello all, > > We have a recommendation system that reads in similarity data via a > Spring context.xml as follows: > > class="org.apache.mahout.cf.taste.impl.similarity.file.FileItemSimilarity"> > > > > Is it possible to use Hadoop/HDFS wit

Re: DistributedCache

2011-06-07 Thread John Armstrong
On Tue, 7 Jun 2011 09:41:21 -0300, "Juan P." wrote: > Not 100% clear on what you meant. You are saying I should put the file into > my HDFS cluster or should I use DistributedCache? If you suggest the > latter, > could you address my original question? I mean that you can certainly get away with

Re: DistributedCache

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 16:14:14 -0500, Shi Yu wrote: > I still don't understand, in a cluster you have a shared directory to > all the nodes, right? Just put the configuration file in that directory > and load it in all the mappers, isn't that simple? > So I still don't understand why bother Distri

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 09:34:56 -0400, wrote: > Yeah, that's a good point. > > I wonder though, what the load on the tracker nodes (port et. al) would > be if a inter-rack fiber switch at 10's of GBS' is getting maxed. > > Seems to me that if there is that much traffic being mitigate across > racks

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 09:26:11 -0400, wrote: > I'm not a hadoop jedi, but in that case, wouldn't one of the hadoop > "trackers" get bottlenecked to resolve those dependencies? > > Again, this exposes the oddity of hadoop IMO, it tries to NOT > be I/O bound, but seems its very I/O bound... I'm not

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 09:18:45 -0400, wrote: > I never understood how hadoop can throttle an inter-rack fiber switch. > Its supposed to operate on the principle of move-the-code to the data > because of the I/O cost of moving the data, right? But what happens when a reducer on rack A gets most of i

Re: SequenceFile.Reader

2011-06-02 Thread John Armstrong
On Thu, 2 Jun 2011 15:43:37 -0700, Mark question wrote: > Does anyone knows if : SequenceFile.next(key) is actually not reading > value into memory I think what you're confused by is something I stumbled upon quite by accident. The secret is that there is actually only ONE Key object that

Re: change of default port 8020

2011-06-02 Thread John Armstrong
On Thu, 02 Jun 2011 17:23:08 +0300, George Kousiouris wrote: > Are there anywhere instructions on how to change from the default ports > of Hadoop and HDFS? My main interest is in default port 8020. I think this is part of fs.default.name. You would go into core-site.xml and add (or change)