On 08/09/2012 01:24 AM, Mike Lyon wrote:
I'm takng a wild guess, but i think they are legitimately trying to
unsubscribe but are clueless and don't RTFM
The scariest part is that these are supposedly people with enough
technical chops to set up and program to a Hadoop cluster.
I suddenly do
On 07/26/2012 09:20 PM, Steve Armstrong wrote:
Do you mean I need to deploy the mahout jars to the lib directory of
the master node? Or all the data nodes? Or is there a way to simply
tell the hadoop job launcher to upload the jars itself?
Every node that runs a Task (mapper or reducer) needs a
On 06/18/2012 10:40 AM, Mark Kerzner wrote:
that sounds very interesting, and I may implement such a workflow, but
can I write back to HDFS in the mapper? In the reducer it is a standard
context.write(), but it is a different context.
Both Mapper.Context and Reducer.Context descend from
TaskIn
On 06/18/2012 10:19 AM, Mark Kerzner wrote:
If only reducers could be told to start their work on the first
maps that they see, my processing would begin to show results much earlier,
before all the mappers are done.
The sort/shuffle phase isn't just about ordering the keys, it's about
collect
Another important note: the combiner runs can "stack".
Let's say Prashant is right that the default spill number that triggers
the combiner is 3, and that we have a mapper that generates 9 spills.
These spills will generate 3 combiner runs, which meets the threshold
again, and so we get *anoth
On 02/27/2012 11:53 AM, W.P. McNeill wrote:
You don't need any virtualization. Mac OS X is Linux and runs Hadoop as is.
Nitpick: OS X is NEXTSTEP based on Mach, which is a different
POSIX-compliant system from Linux.
Actually, I think this is what Oozie is for. It seems to leap out as a
great example of a forked workflow.
hth
On 02/15/2012 02:23 PM, W.P. McNeill wrote:
Say I have two Hadoop jobs, A and B, that can be run in parallel. I have
another job, C, that takes the output of both A and B as input.
On Fri, 13 Jan 2012 13:59:12 -0800 (PST), vvkbtnkr
wrote:
> I am running a hadoop jar and keep getting this error -
> java.lang.NoSuchMethodError:
> org.codehaus.jackson.JsonParser.getValueAsLong()
Nobody seems to have answered this while I was on vacation, so...
Okay, here's what I know, havin
On Thu, 15 Sep 2011 12:51:48 -0700, Frank Astier
wrote:
> I’m using IntelliJ and the WordCount example in Hadoop (which uses
> MiniMRCluster). Is it possible to set an IntelliJ debugger breakpoint
> straight into the map function of the mapper? - I’ve tried, but so far,
the
> debugger does not sto
On Fri, 26 Aug 2011 12:20:47 -0700, Ramya Sunil
wrote:
> Can you also post the configuration of the scheduler you are using? You
> might also want to check the jobtracker logs. It would help in further
> debugging.
Where would I find the scheduler configuration? I haven't changed it, so
I assume
On Fri, 26 Aug 2011 11:46:42 -0700, Ramya Sunil
wrote:
> How many tasktrackers do you have? Can you check if your tasktrackers
are
> running and the total available map and reduce capacity in your cluster?
In pseudo-distributed there's one tasktracker, which is running, and the
total map and redu
One of my colleagues has noticed this problem for a while, and now it's
biting me. Jobs seem to be failing before every really starting. It seems
to be limited (so far) to running in pseudo-distributed mode, since that's
where he saw the problem and where I'm now seeing it; it hasn't come up on
o
On Mon, 22 Aug 2011 11:01:23 -0700, "W.P. McNeill"
wrote:
> If it is, what is the proper way to make MyJar.jar available to both the
> Job
> Client and the Task Trackers?
Do you mean the task trackers, or the tasks themselves? What process do
you want to be able to run the code in MyJar.jar?
On Fri, 5 Aug 2011 08:50:02 +0800 (CST), "Daniel,Wu"
wrote:
> The book also
> mentioned the value if mutable, I think the key might also be mutable,
> means as we loop each value in iterable, the content of
the
> key object is reset.
The "mutability" of the value is one of the weirdnesses of Hado
On Thu, 4 Aug 2011 14:07:12 +0800 (CST), "Daniel,Wu"
wrote:
> I am using the new API (released is from cloudera). We can see from the
> output, for each call of reduce function, 100 records were processed,
but
> as the reduce is defined as
> reduce(IntPair key, Iterable values, Context context),
On Wed, 3 Aug 2011 10:35:51 +0800 (CST), "Daniel,Wu"
wrote:
> So the key of a group is determined by the first coming record in the
> group, if we have 3 records in a group
> 1: (1900,35)
> 2:(1900,34)
> 3:(1900,33)
>
> if (1900,35) comes in as the first row, then the result key will be
> (1900,
On Tue, 2 Aug 2011 21:49:22 +0800 (CST), "Daniel,Wu"
wrote:
> we usually use something like values.next() to loop every rows in a
> specific group, but I didn't see any code to loop the list, at least it
> need to get the first row in the list, which is something like
> values.get().
> or will
On Tue, 2 Aug 2011 21:25:47 +0800 (CST), "Daniel,Wu"
wrote:
> at page 243:
> Per my understanding, The reducer is supposed to output the first value
> (the maximum) for each year. But I just don't know how it work.
>
> suppose we have the data
> 1901 200
> 1901 300
> 1901 400
>
> Since grou
On Mon, 1 Aug 2011 15:30:49 -0400, "Aquil H. Abdullah"
wrote:
> Don't I feel sheepish...
Happens to the best, or so they tell me.
> OK, so I've hacked this sample code below, from the ConfigurationPrinter
> example in Hadoop: The Definitive Guide. If -libjars had been added to
the
> configuratio
On Mon, 1 Aug 2011 13:21:27 -0400, "Aquil H. Abdullah"
wrote:
> [AA] I am currently invoking my application as follows:
>
> hadoop jar /home/test/hadoop/test.option.demo.jar
> test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar
I believe the problem might be that it's looking
On Mon, 1 Aug 2011 12:11:27 -0400, "Aquil H. Abdullah"
wrote:
> but it still isn't clear to me how the -libjars option is parsed,
whether
> or
> not I need to explicitly add it to the classpath inside my run method,
or
> where it needs to be placed in the command-line?
IIRC it's parsed as a comma
On Thu, 28 Jul 2011 10:05:57 -0400, "Kumar, Ranjan"
wrote:
> I have a class to define data I am reading from a MySQL database.
> According to online tutorials I created a class called MyRecord and
> extended it from Writable, DBWritable. While running it with hadoop I
get a
> NoSuchMethodException
On Tue, 19 Jul 2011 20:47:31 +0100, Kobina Kwarko
wrote:
> Hello,
>
> Please any assistance?? I am using Hadoop for a school project and
managed
> to install it on two computers testing with the wordcount example.
However,
> after stopping Hadoop and restarting the computers (Ubuntu Server 10.10)
On Tue, 05 Jul 2011 08:09:16 -0700, Mark
wrote:
> Is there anyway I can write out the results of my mapreduce job into 1
> local file... ie the opposite of getmerge?
I don't know about writing directly to the local filesystem, but if you
specify a single reducer you should get only one output fi
I've got my map/reduce programs implementing Tool, so I can pass various
parameters as configuration properties. I'm currently using an ad-hoc
method of verifying that a program has been passed all the configuration
properties it requires, but I'm wondering if there's a more "Hadoopy" way
this sho
On Sun, 26 Jun 2011 17:34:34 -0700, Mark
wrote:
> Hello all,
>
> We have a recommendation system that reads in similarity data via a
> Spring context.xml as follows:
>
>
class="org.apache.mahout.cf.taste.impl.similarity.file.FileItemSimilarity">
>
>
>
> Is it possible to use Hadoop/HDFS wit
On Tue, 7 Jun 2011 09:41:21 -0300, "Juan P."
wrote:
> Not 100% clear on what you meant. You are saying I should put the file
into
> my HDFS cluster or should I use DistributedCache? If you suggest the
> latter,
> could you address my original question?
I mean that you can certainly get away with
On Mon, 06 Jun 2011 16:14:14 -0500, Shi Yu wrote:
> I still don't understand, in a cluster you have a shared directory to
> all the nodes, right? Just put the configuration file in that directory
> and load it in all the mappers, isn't that simple?
> So I still don't understand why bother Distri
On Mon, 06 Jun 2011 09:34:56 -0400, wrote:
> Yeah, that's a good point.
>
> I wonder though, what the load on the tracker nodes (port et. al) would
> be if a inter-rack fiber switch at 10's of GBS' is getting maxed.
>
> Seems to me that if there is that much traffic being mitigate across
> racks
On Mon, 06 Jun 2011 09:26:11 -0400, wrote:
> I'm not a hadoop jedi, but in that case, wouldn't one of the hadoop
> "trackers" get bottlenecked to resolve those dependencies?
>
> Again, this exposes the oddity of hadoop IMO, it tries to NOT
> be I/O bound, but seems its very I/O bound...
I'm not
On Mon, 06 Jun 2011 09:18:45 -0400, wrote:
> I never understood how hadoop can throttle an inter-rack fiber switch.
> Its supposed to operate on the principle of move-the-code to the data
> because of the I/O cost of moving the data, right?
But what happens when a reducer on rack A gets most of i
On Thu, 2 Jun 2011 15:43:37 -0700, Mark question
wrote:
> Does anyone knows if : SequenceFile.next(key) is actually not reading
> value into memory
I think what you're confused by is something I stumbled upon quite by
accident. The secret is that there is actually only ONE Key object that
On Thu, 02 Jun 2011 17:23:08 +0300, George Kousiouris
wrote:
> Are there anywhere instructions on how to change from the default ports
> of Hadoop and HDFS? My main interest is in default port 8020.
I think this is part of fs.default.name. You would go into core-site.xml
and add (or change)
33 matches
Mail list logo