Re: Upgrade from 0.16.3 to 0.17.0

2008-06-04 Thread Tanton Gibbs
1) Yes, that is normal.  You have to manually finalize the upgrade.
2) Probably, because (as I understand it), it keeps a backup of the
pre-upgraded state.
3) you can use hadoop dfsadmin -finalizeUpgrade to finalize it.  See
here: http://wiki.apache.org/hadoop/Hadoop_Upgrade
4) I assume the finalizeUpgrade command will do that for you.

On Wed, Jun 4, 2008 at 6:18 AM, Iván de Prado
[EMAIL PROTECTED] wrote:
 I have upgraded from 0.16.3 to 0.17.0 correctly. But after a few days
 the disk usage has been increased. I have notice that there are two
 folder in the data nodes:

 - current - With version -13
 - previous - With version -11

 And I have this message in the HDFS webapp:

 Upgrade for version -13 has been completed. Upgrade is not finalized.

 Questions:

 1) Is that normal?
 2) Could the upgrade increase the disk usage?
 3) Is there any way to finish the upgrade? Any command to force it?
 4) Could I delete the previous folder?

 Thanks,

 Iván de Prado
 www.ivanprado.es




Re: Percent progress of map/reduce in JobClient

2008-06-04 Thread Tanton Gibbs
Doesn't seem to be, to  me.  Seems to be an indicator of records

On Wed, Jun 4, 2008 at 5:53 PM, Daniel Blaisdell [EMAIL PROTECTED] wrote:
 Is the map progress indicator computed as a percentage of maps completed?

 -Daniel

 On Wed, Jun 4, 2008 at 6:51 PM, Tanton Gibbs [EMAIL PROTECTED] wrote:

 From what I've read, there are three reduce phases 1. copy 2. sort 3.
 reduce
 From 0 - 33% is the copy phase.  I guess if you don't need that phase
 it could skip this completely.
 After 33%, it waits until it is done sorting before outputting status
 again at 66%, then it updates regularly during the reduce phase to
 100%.  This has been my experience, at least.

 Tanton

 On Wed, Jun 4, 2008 at 4:19 PM, Stuart Sierra [EMAIL PROTECTED]
 wrote:
  How does Hadoop decide when to update the percent complete for
  map/reduce tasks?  I've been running a small job (~150 MB) on a
  pseudo-distributed cluster.  bin/hadoop jar prints:
 
  08/06/04 17:02:16 INFO mapred.JobClient:  map 0% reduce 0%
  08/06/04 17:05:52 INFO mapred.JobClient:  map 100% reduce 0%
  08/06/04 17:06:05 INFO mapred.JobClient:  map 100% reduce 66%
  08/06/04 17:06:10 INFO mapred.JobClient:  map 100% reduce 67%
  08/06/04 17:06:17 INFO mapred.JobClient:  map 100% reduce 68%
 
  And so on until the job completes.  What seems odd is that I don't get
  any feedback at all on the progress of the map task until it reaches
  100%, and I get no feedback on the reduce task until it reaches 66%.
  After that, I get updates every few seconds.  The TaskTracker shows
  the same thing.  What might cause this?
 
  This is Hadoop 0.17.  The input and output are both text, both ~140MB,
  gzip-compressed down to ~12MB.
 
  Thanks,
  -Stuart
 




External Jar

2008-05-29 Thread Tanton Gibbs
What  is the right way to use a jar file within my map reduce program.
 I want to use the simmetrics code for double metaphone, but I'm not
sure how to include it so that my map/reduce code can see it.

Any pointers?

Tanton


Re: Users Group Meeting Slides

2008-05-22 Thread Tanton Gibbs
Excellent!  Thanks!

On Thu, May 22, 2008 at 1:13 PM, Lukas Vlcek [EMAIL PROTECTED] wrote:
 http://svn.apache.org/repos/asf/lucene/mahout/trunk

 On Thu, May 22, 2008 at 8:10 PM, Tanton Gibbs [EMAIL PROTECTED]
 wrote:

 I checked out the Wiki.  I am in need of a canopy clustering algorithm
 for hadoop.   I'm about to embark on writing one, but if you have one
 already, that would be better.  It doesn't have to be perfect, I can
 improve on it as I go.  However, I couldn't find the svn repository
 for Mahout...any pointers to where I can find the code?

 Thanks!
 Tanton

 On Thu, May 22, 2008 at 11:36 AM, Jeff Eastman
 [EMAIL PROTECTED] wrote:
  I uploaded the slides from my Mahout overview to our wiki
  (http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with
 another
  recent talk by Isabel Drost. Both are similar in content but their
  differences reflect the rapid evolution of the project in the month that
  separates them in time. After I got home I worried a bit that I had
 skipped
  over too much of the material in an effort to keep it brief.
 
  Thanks to Yahoo! for hosting the meeting and providing beer and pizza for
  the 50 or so friends of Hadoop who attended. I thought the meeting was
 fun
  and informative and a great way to begin to associate faces and richer
  personalities with the names that fly through my in box. For those of you
  who were not able to attend, the meeting was quite informal and we got
  brief, mostly extemporaneous, updates of some of the projects that were
 on
  the agenda at the recent Hadoop Summit: Hadoop 0.17, HBase, Pig,
 Zookeeper,
  Mahout, ...
 
  In the wrapup, people seemed to like this agenda and I think Ajay plans
 to
  continue with the same general format in subsequent monthly meetings. Due
 to
  the informality, I did not take notes. Perhaps the other presenters can
 post
  summaries of their updates for the wider community's appreciation.
 
  Jeff
 




 --
 http://blog.lukas-vlcek.com/



Hadoop Streaming - revised

2008-05-21 Thread Tanton Gibbs
Ok, I turned on verbose output.  It looks as though it is adding
everything in my /tmp directory to the jar file it builds.  Where do I
tell it not to do that?

Thanks!
Tanton


Hadoop Streaming - final

2008-05-21 Thread Tanton Gibbs
Ok, I figured it out.  Hadoop Streaming adds the entire
stream.shipped.hadoopstreaming directory to the jar file.  For me, I
wasn't setting it and it was defaulting to /tmp.  That means my entire
/tmp directory was getting added to the jar.

I set that directory to the location of my hadoop streaming jar
directory and it seemed to work fine.

Sorry for the noise.


Re: How does one learn to program in Hadoop?

2008-05-20 Thread Tanton Gibbs
There are a few videos on YouTube for MapReduce at google.  You can
get the general idea of how to approach problems in Hadoop from them.
Just search for Hadoop in YouTube.

Also, there are a number of videos from Yahoo! Research on Hadoop;
they are linked to from this mailing list, so you can search it for
them.

On Tue, May 20, 2008 at 2:24 AM, Hadoop [EMAIL PROTECTED] wrote:

 Thanks for the link.

 What I found so far are collections of Hadoop objects(APIs) and examples
 which I think are not enough to learn how to program using Hadoop.

 Could you please advice me how can I find any recourses (books, manuals,
 sites, etc) that explain how to program using Hadoop?

 --
 View this message in context: 
 http://www.nabble.com/How-does-one-learn-to-program-in-Hadoop--tp17308837p17334761.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.