Re: maybe a bug in hadoop?

2009-06-10 Thread Tim Wintle
without leading space ... so stripping it out could mean you couldn't enter some valid directory names (but who has folders starting with a space?) I'm sure my input wasn't very useful, but just a comment. Tim Wintle

Re: Amazon Elastic MapReduce

2009-04-03 Thread Tim Wintle
On Fri, 2009-04-03 at 11:19 +0100, Steve Loughran wrote: True, but this way nobody gets the opportunity to learn how to do it themselves, which can be a tactical error one comes to regret further down the line. By learning the pain of cluster management today, you get to keep it under

Re: How many people is using Hadoop Streaming ?

2009-04-03 Thread Tim Wintle
On Fri, 2009-04-03 at 09:42 -0700, Ricky Ho wrote: 1) I can pick the language that offers a different programming paradigm (e.g. I may choose functional language, or logic programming if they suit the problem better). In fact, I can even chosen Erlang at the map() and Prolog at the

Re: Does reduce start only after the map is completed?

2009-03-07 Thread Tim Wintle
haven't found this a major issue (especially if there are many times more mappers than machines), since the shuffle and sort stages take significant time and effort anyway. Tim Wintle

Re: Off topic: web framework for high traffic service

2009-03-04 Thread Tim Wintle
On Wed, 2009-03-04 at 23:14 +0100, Lukáš Vlček wrote: Sorry for off topic question It is very off topic. Any ideas, best practices, book recomendations, papers, tech talk links ... I found this a nice little book: http://developer.yahoo.net/blog/archives/2008/11/allspaw_capacityplanning.html

Re: the question about the common pc?

2009-02-23 Thread Tim Wintle
together. I'm probably going to be using hadoop more again in the near future so I'll bookmark that, thanks Steve. Personally I only need text based records, so I'm fine using a wrapper around streaming Tim Wintle

Re: the question about the common pc?

2009-02-20 Thread Tim Wintle
the scripts (who may not be programmers) to understand multiple processes etc, just stdin and stdout. Tim Wintle

Re: Re:Re: the question about the common pc?

2009-02-18 Thread Tim Wintle
On Thu, 2009-02-19 at 13:43 +0800, 柳松 wrote: Hadoop is designed for High performance computing equipment, but claimed to be fit for daily pcs. The phrase High Performance Computing equipment makes me think of infiniband, fibre all over the place etc. Hadoop doesn't need that, it runs well on

Re: architecture diagram

2008-10-01 Thread Tim Wintle
I normally find the intermediate stage of copying data to the reducers from the mappers to be a significant step - but that's not over the best quality switches... The mappers and reducers work on the same boxes, close to the data. On Wed, 2008-10-01 at 10:59 -0700, Alex Loddengaard wrote:

Re: Accessing input files from different servers

2008-09-12 Thread Tim Wintle
a) Do I need to install hadoop and start reunning HDFS (using start-dfs.sh) in all those machines where the log files are getting created ? And then do a file get from the central HDFS server` ? I'd install hadoop on the machine, but you don't have to start any nodes there - you can log

Re: HDFS Vs KFS

2008-08-21 Thread Tim Wintle
I haven't used KFS, but I believe a major difference is that you can (apparently) mount KFS as a standard device under Linux, allowing you to read and write directly to it without having to re-compile the application (as far as I know that's not possible with HDFS, although the last time I

Re: anybody know how to run sshd in LEOPARD

2008-06-17 Thread Tim Wintle
I've set hadoop up on a load of Intel Macs before - I think that sshd is what Apple call Remote Log-in or something like that - it was a GUI option to allow an account to log in remotely. Hope that helps On Tue, 2008-06-17 at 14:27 +0800, j.L wrote: i wanna try hadoop, but i can't run sshd when

RE: Questions regarding configuration parameters...

2008-02-22 Thread Tim Wintle
I have had exactly the same problem with using the command line to cat files - they can take for ages, although I don't know why. Network utilisation does not seem to be the bottleneck, though. (Running 0.15.3) Is the slow part of the reduce while you are waiting for the map data to copy over to

Re: Calculations involve large datasets

2008-02-22 Thread Tim Wintle
Have you seen PIG: http://incubator.apache.org/pig/ It generates hadoop code and is more query like, and (as far as I remember) includes union, join, etc. Tim On Fri, 2008-02-22 at 09:13 -0800, Chuck Lan wrote: Hi, I'm currently looking into how to better scale the performance of our

Re: Hadoop summit / workshop at Yahoo!

2008-02-21 Thread Tim Wintle
I would certainly appreciate being able to watch them online too, and they would help spread the word about hadoop - think of all the people who watch Google's Techtalks (am I allowed to say the G word around here?). On Thu, 2008-02-21 at 08:34 +0100, Lukas Vlcek wrote: Online webcast/recorded

Re: URLs contain non-existant domain names in machines.jsp

2008-02-10 Thread Tim Wintle
I agree, this is a really annoying problem - most of the job appears to work, but unfortunately the reduce stage doensn't normally work. Interestingly, when hadoop runs on OSX it seems to set the hostname as the ip (or sets a hostname through zeroconfig). Would be useful if we could use just ip

RE: Starting up a larger cluster

2008-02-07 Thread Tim Wintle
You can set which nodes are allowed to connect in hadoop-site.xml - it's useful to be able to connect from nodes that aren't in the slaves file so that you can put in input data direct from another machine that's not part of the cluster, or add extra machines on the fly (just make sure they're

Re: Namenode fails to replicate file

2008-02-07 Thread Tim Wintle
Doesn't the -setrep command force the replication to be increased immediately? ./hadoop dfs -setrep [replication] path (I may have misunderstood) On Thu, 2008-02-07 at 17:05 -0800, Ted Dunning wrote: Chris Kline reported a problem in early January where a file which had too few replicated