Turns out, it does cause problems later on.
I think the problem is that the slaves have, in their hosts files:
127.0.0.1 localhost.localdomain localhost
127.0.0.1 machinename.cse.sc.edu machinename
The reduce phase fails because the reducer cannot get data from the
mappers as it tries to open a
On Jul 25, 2008, at 3:53 PM, Joydeep Sen Sarma wrote:
Just as an aside - there is probably a general perception that
streaming
is really slow (at least I had it).
The last I did some profiling (in 0.15) - the primary overheads from
streaming came from the scripting language (python is
sssl
Just as an aside - there is probably a general perception that streaming
is really slow (at least I had it).
The last I did some profiling (in 0.15) - the primary overheads from
streaming came from the scripting language (python is sssw). For
an insanely fast script (bin/cat), I saw signif
On Friday 25 July 2008 15:18:24 James Moore wrote:
> On Thu, Jul 24, 2008 at 10:48 PM, Venkat Seeth <[EMAIL PROTECTED]> wrote:
> > Why dont you use hadoop streaming?
>
> I think that's more of a broader question - why doesn't everyone use
> streaming?
>
> There's no real difference between doing Ha
This is a bit scattered but I wanted to post this in case it might
help someone...
Here's a little more detail on the loading problems I've been having.
For now, I'm just trying to call some ruby from the reduce method of
my map/reduce job. I want to move to a more general setup, like the
one Ja
Hi,
Besides knowing "data-local" and "rack-local" map task numbers, I am
interested in the size of data that are transferred on network. E.g.,
the size of intermediate map output transferred (not dealt locally). I
wonder if there is such a counter. Thank you.
Best,
-Kevin
I was using BSF to avoid java 6 issues. However I'm having similar
issues using both systems. Basically, I can't load the scripting
engine from within hadoop. I have successfully compiled and run some
stand-alone test examples but am having trouble getting anything to
work from hadoop. One conf
On Thu, Jul 24, 2008 at 8:03 AM, Amber <[EMAIL PROTECTED]> wrote:
> Yes, I think this is the simplest method , but there are problems too:
>
> 1. The reduce stage wouldn't begin until the map stage ends, by when we have
> done a two table scanning, and the comparing will take almost the same time,
HBase is the project that use dfs
if you want to know how to use dfs directly, "bin/hadoop" script may
be a good entrance.
For example,
"bin/hadoop dfs -cat ***"
*** is a file name in your dfs
follow this command, you can find how to access dfs directly.
Hope it will help you
在 2008-7-25,上午
On Thu, Jul 24, 2008 at 10:48 PM, Venkat Seeth <[EMAIL PROTECTED]> wrote:
> Why dont you use hadoop streaming?
I think that's more of a broader question - why doesn't everyone use
streaming?
There's no real difference between doing Hadoop in
Ruby/Scala/Java/Jython/whatever - these days, Java is j
That sounds really interesting
On Jul 25, 2008, at 00:42, James Moore wrote:
Funny you should mention it - I'm working on a framework to do JRuby
Hadoop this week. Something like:
class MyHadoopJob < Radoop
input_format :text_input_format
output_format :text_output_format
map_output_key_cl
11 matches
Mail list logo