Re: wordcount getting slower with more mappers and reducers?

2009-03-04 Thread Nick Cen
i think this maybe not relatived to whether you are using psuedo-distributed mode or truely distributed mode. the speed not only relatived to the number of mapper and reducer count but also relatived to the problem size and problem type. A simple example is the word count ,assume we only have 1

HDFS handle creates local files?

2009-03-04 Thread Philip M. White
Hi, all, I have a problem that sounds ridiculous, but I've been struggling with this for a while now. I hope you can help. I have a small Java program that performs some very basic operation within the HDFS. The program either creates a text file or just creates a new blank file or creates a di

Re: wordcount getting slower with more mappers and reducers?

2009-03-04 Thread haizhou zhao
Since you are running hadoop on psuedo-distributed mode, it is possible that just 1 reduce task will bing better performance, and this will depend on your input's size and content. 2009/3/5 Sandy > Hello all, > > For the sake of benchmarking, I ran the standard hadoop wordcount example > on > an

Re: Off topic: web framework for high traffic service

2009-03-04 Thread Lukáš Vlček
Hi Tim, Thanks for links. I know this may sound off topic. On the other hand if you look for example at the eBay architecture (http://highscalability.com/ebay-architecture) then you can see that some concepts are close to Hadoop like system (I mean when you want to build somethink like eBay then s

Re: Off topic: web framework for high traffic service

2009-03-04 Thread Tim Wintle
On Wed, 2009-03-04 at 23:14 +0100, Lukáš Vlček wrote: > Sorry for off topic question It is very off topic. > Any ideas, best practices, book recomendations, papers, tech talk links ... I found this a nice little book:

Re: Importing data from mysql into hadoop

2009-03-04 Thread anand
Thanks. It worked. -- anand kishore http://blog.semanticvoid.com http://twitter.com/semanticvoid On Wed, Mar 4, 2009 at 3:29 PM, Amandeep Khurana wrote: > Put it into the $HADOOP_HOME/lib folder. To be on the safer side, I > generally include it in the job jar. > > Dont forget to put Class.forN

Re: Importing data from mysql into hadoop

2009-03-04 Thread Amandeep Khurana
Put it into the $HADOOP_HOME/lib folder. To be on the safer side, I generally include it in the job jar. Dont forget to put Class.forName(driverClassName); in your job code. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Mar 4, 2009 at

Importing data from mysql into hadoop

2009-03-04 Thread anand
Hi, I'm trying out an example of importing data from mysql into hadoop (something like http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/DBCountPageView.java ). I'm connecting to Mysql and hence am providing the mysql-jdbc-connector jar via the '-libjar' opt

Importing data from mysql into hadoop

2009-03-04 Thread anand
Hi, I'm trying out an example of importing data from mysql into hadoop (something like http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/DBCountPageView.java). I'm connecting to Mysql and hence am providing the mysql-jdbc-connector jar via the '-libjar' opti

Re: Hadoop FS shell no longer working with S3 Native

2009-03-04 Thread S D
My fault on this one. I mistakenly thought the environment variables (AMAZON_ACCESS_KEY_ID and AMAZON_SECRET_ACCESS_KEY) would override values set in hadoop-site.xml; I now see that this is not the case for the Hadoop FS shell commands. John On Wed, Mar 4, 2009 at 5:18 PM, S D wrote: > I'm usin

wordcount getting slower with more mappers and reducers?

2009-03-04 Thread Sandy
Hello all, For the sake of benchmarking, I ran the standard hadoop wordcount example on an input file using 2, 4, and 8 mappers and reducers for my job. In other words, I do: time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2 -r 2 sample.txt output time -p bin/hadoop jar hadoop-0.1

Hadoop FS shell no longer working with S3 Native

2009-03-04 Thread S D
I'm using Hadoop 0.19.0 with S3 Native. Up until a few days ago I was successfully able to use the various shell functions successfully; e.g., hadoop dfs -ls . To ensure access to my Amazon S3 Native data store I set the following environment variables: AMAZON_ACCESS_KEY_ID and AMAZON_SECRET_A

Re: Jobs run slower and slower

2009-03-04 Thread Runping Qi
The task (job) tracker log should show when a task was scheduled. The log for individual task should show when it finished initialization. On Wed, Mar 4, 2009 at 12:29 PM, Sean Laurent wrote: > Hrmmm. I can tell init/execution at the job level, but I don't know how to > figure that out at the in

Re: Jobs run slower and slower

2009-03-04 Thread Sean Laurent
Hrmmm. I can tell init/execution at the job level, but I don't know how to figure that out at the individual map task level. What would be the best way for me to determine that? -Sean On Wed, Mar 4, 2009 at 12:13 PM, Runping Qi wrote: > Do you know the break down of times for a mapper task take

How to Apply patch to Relase 0.19.1

2009-03-04 Thread Aviad sela
I need to apply patch to Release 0.19.1, In order to deploy I need to compile again and create a new Hadoop-*-core.jar The development environment consists of Windows XP , Eclipse 3.4 (IBM RAD 7.5) The target platform is AIX 5.3. I have extracted the release from the SVN, and tried to exeucte

Repartitioned Joins

2009-03-04 Thread Richa Khandelwal
Hi All, Does anyone know of tweaking in map-reduce joins that will optimize it further in terms of the moving only those tuples to reduce phase that join in the two tables? There are replicated joins and semi-join strategies but they are more of databases than map-reduce. Thanks, Richa Khandelwal

Re: Running 0.19.2 branch in production before release

2009-03-04 Thread Aaron Kimball
I recommend 0.18.3 for production use and avoid the 19 branch entirely. If your priority is stability, then stay a full minor version behind, not just a revision. - Aaron On Tue, Mar 3, 2009 at 5:28 PM, Nathan Marz wrote: > I would like to get the community's opinion on this. Do you think it's

Re: Jobs run slower and slower

2009-03-04 Thread Runping Qi
Do you know the break down of times for a mapper task takes to initialize and to execute the map function? On Wed, Mar 4, 2009 at 8:44 AM, Sean Laurent wrote: > On Tue, Mar 3, 2009 at 10:14 PM, Amar Kamat wrote: > > > Yeah. May be its not the problem with the JobTracker. Can you check (via > >

Re: Jobs run slower and slower

2009-03-04 Thread Sean Laurent
On Tue, Mar 3, 2009 at 10:14 PM, Amar Kamat wrote: > Yeah. May be its not the problem with the JobTracker. Can you check (via > job history) what is the best and the worst task runtimes? You can analyze > the jobs after they complete. > Amar Okay, I ran the same job 35 times last night. Each jo

Re: Mappers become less utilized as time goes on?

2009-03-04 Thread Pedro Vivancos
I have the same problem... Please let me know if you solve it and how to. Thanks. On Wed, Mar 4, 2009 at 2:49 AM, Nathan Marz wrote: > Nope... and there were no failed tasks. > > > > On Mar 3, 2009, at 5:16 PM, Runping Qi wrote: > > Were task Trackers black-listed? >> >> >> On Tue, Mar 3, 20