without leading space
... so stripping it out could mean you couldn't enter some valid
directory names (but who has folders starting with a space?)
I'm sure my input wasn't very useful, but just a comment.
Tim Wintle
On Fri, 2009-04-03 at 11:19 +0100, Steve Loughran wrote:
True, but this way nobody gets the opportunity to learn how to do it
themselves, which can be a tactical error one comes to regret further
down the line. By learning the pain of cluster management today, you get
to keep it under
On Fri, 2009-04-03 at 09:42 -0700, Ricky Ho wrote:
1) I can pick the language that offers a different programming
paradigm (e.g. I may choose functional language, or logic programming
if they suit the problem better). In fact, I can even chosen Erlang
at the map() and Prolog at the
haven't found this a major issue (especially if
there are many times more mappers than machines), since the shuffle and
sort stages take significant time and effort anyway.
Tim Wintle
On Wed, 2009-03-04 at 23:14 +0100, Lukáš Vlček wrote:
Sorry for off topic question
It is very off topic.
Any ideas, best practices, book recomendations, papers, tech talk links ...
I found this a nice little book:
http://developer.yahoo.net/blog/archives/2008/11/allspaw_capacityplanning.html
together.
I'm probably going to be using hadoop more again in the near future so
I'll bookmark that, thanks Steve.
Personally I only need text based records, so I'm fine using a wrapper
around streaming
Tim Wintle
the scripts (who may not be programmers)
to understand multiple processes etc, just stdin and stdout.
Tim Wintle
On Thu, 2009-02-19 at 13:43 +0800, 柳松 wrote:
Hadoop is designed for High performance computing equipment, but claimed to
be fit for daily pcs.
The phrase High Performance Computing equipment makes me think of
infiniband, fibre all over the place etc.
Hadoop doesn't need that, it runs well on
I normally find the intermediate stage of copying data to the reducers
from the mappers to be a significant step - but that's not over the best
quality switches...
The mappers and reducers work on the same boxes, close to the data.
On Wed, 2008-10-01 at 10:59 -0700, Alex Loddengaard wrote:
a) Do I need to install hadoop and start reunning HDFS (using start-dfs.sh)
in all those machines where the log files are getting created ? And then do a
file get from the central HDFS server` ?
I'd install hadoop on the machine, but you don't have to start any nodes
there - you can log
I haven't used KFS, but I believe a major difference is that you can
(apparently) mount KFS as a standard device under Linux, allowing you to
read and write directly to it without having to re-compile the
application (as far as I know that's not possible with HDFS, although
the last time I
I've set hadoop up on a load of Intel Macs before - I think that sshd is
what Apple call Remote Log-in or something like that - it was a GUI
option to allow an account to log in remotely.
Hope that helps
On Tue, 2008-06-17 at 14:27 +0800, j.L wrote:
i wanna try hadoop, but i can't run sshd when
I have had exactly the same problem with using the command line to cat
files - they can take for ages, although I don't know why. Network
utilisation does not seem to be the bottleneck, though.
(Running 0.15.3)
Is the slow part of the reduce while you are waiting for the map data to
copy over to
Have you seen PIG:
http://incubator.apache.org/pig/
It generates hadoop code and is more query like, and (as far as I
remember) includes union, join, etc.
Tim
On Fri, 2008-02-22 at 09:13 -0800, Chuck Lan wrote:
Hi,
I'm currently looking into how to better scale the performance of our
I would certainly appreciate being able to watch them online too, and
they would help spread the word about hadoop - think of all the people
who watch Google's Techtalks (am I allowed to say the G word around
here?).
On Thu, 2008-02-21 at 08:34 +0100, Lukas Vlcek wrote:
Online webcast/recorded
I agree, this is a really annoying problem - most of the job appears to
work, but unfortunately the reduce stage doensn't normally work.
Interestingly, when hadoop runs on OSX it seems to set the hostname as
the ip (or sets a hostname through zeroconfig). Would be useful if we
could use just ip
You can set which nodes are allowed to connect in hadoop-site.xml - it's
useful to be able to connect from nodes that aren't in the slaves file
so that you can put in input data direct from another machine that's not
part of the cluster, or add extra machines on the fly (just make sure
they're
Doesn't the -setrep command force the replication to be increased
immediately?
./hadoop dfs -setrep [replication] path
(I may have misunderstood)
On Thu, 2008-02-07 at 17:05 -0800, Ted Dunning wrote:
Chris Kline reported a problem in early January where a file which had too
few replicated
18 matches
Mail list logo