Re: Upgrade from 0.16.3 to 0.17.0
1) Yes, that is normal. You have to manually finalize the upgrade. 2) Probably, because (as I understand it), it keeps a backup of the pre-upgraded state. 3) you can use hadoop dfsadmin -finalizeUpgrade to finalize it. See here: http://wiki.apache.org/hadoop/Hadoop_Upgrade 4) I assume the finalizeUpgrade command will do that for you. On Wed, Jun 4, 2008 at 6:18 AM, Iván de Prado [EMAIL PROTECTED] wrote: I have upgraded from 0.16.3 to 0.17.0 correctly. But after a few days the disk usage has been increased. I have notice that there are two folder in the data nodes: - current - With version -13 - previous - With version -11 And I have this message in the HDFS webapp: Upgrade for version -13 has been completed. Upgrade is not finalized. Questions: 1) Is that normal? 2) Could the upgrade increase the disk usage? 3) Is there any way to finish the upgrade? Any command to force it? 4) Could I delete the previous folder? Thanks, Iván de Prado www.ivanprado.es
Re: Percent progress of map/reduce in JobClient
Doesn't seem to be, to me. Seems to be an indicator of records On Wed, Jun 4, 2008 at 5:53 PM, Daniel Blaisdell [EMAIL PROTECTED] wrote: Is the map progress indicator computed as a percentage of maps completed? -Daniel On Wed, Jun 4, 2008 at 6:51 PM, Tanton Gibbs [EMAIL PROTECTED] wrote: From what I've read, there are three reduce phases 1. copy 2. sort 3. reduce From 0 - 33% is the copy phase. I guess if you don't need that phase it could skip this completely. After 33%, it waits until it is done sorting before outputting status again at 66%, then it updates regularly during the reduce phase to 100%. This has been my experience, at least. Tanton On Wed, Jun 4, 2008 at 4:19 PM, Stuart Sierra [EMAIL PROTECTED] wrote: How does Hadoop decide when to update the percent complete for map/reduce tasks? I've been running a small job (~150 MB) on a pseudo-distributed cluster. bin/hadoop jar prints: 08/06/04 17:02:16 INFO mapred.JobClient: map 0% reduce 0% 08/06/04 17:05:52 INFO mapred.JobClient: map 100% reduce 0% 08/06/04 17:06:05 INFO mapred.JobClient: map 100% reduce 66% 08/06/04 17:06:10 INFO mapred.JobClient: map 100% reduce 67% 08/06/04 17:06:17 INFO mapred.JobClient: map 100% reduce 68% And so on until the job completes. What seems odd is that I don't get any feedback at all on the progress of the map task until it reaches 100%, and I get no feedback on the reduce task until it reaches 66%. After that, I get updates every few seconds. The TaskTracker shows the same thing. What might cause this? This is Hadoop 0.17. The input and output are both text, both ~140MB, gzip-compressed down to ~12MB. Thanks, -Stuart
External Jar
What is the right way to use a jar file within my map reduce program. I want to use the simmetrics code for double metaphone, but I'm not sure how to include it so that my map/reduce code can see it. Any pointers? Tanton
Re: Users Group Meeting Slides
Excellent! Thanks! On Thu, May 22, 2008 at 1:13 PM, Lukas Vlcek [EMAIL PROTECTED] wrote: http://svn.apache.org/repos/asf/lucene/mahout/trunk On Thu, May 22, 2008 at 8:10 PM, Tanton Gibbs [EMAIL PROTECTED] wrote: I checked out the Wiki. I am in need of a canopy clustering algorithm for hadoop. I'm about to embark on writing one, but if you have one already, that would be better. It doesn't have to be perfect, I can improve on it as I go. However, I couldn't find the svn repository for Mahout...any pointers to where I can find the code? Thanks! Tanton On Thu, May 22, 2008 at 11:36 AM, Jeff Eastman [EMAIL PROTECTED] wrote: I uploaded the slides from my Mahout overview to our wiki (http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with another recent talk by Isabel Drost. Both are similar in content but their differences reflect the rapid evolution of the project in the month that separates them in time. After I got home I worried a bit that I had skipped over too much of the material in an effort to keep it brief. Thanks to Yahoo! for hosting the meeting and providing beer and pizza for the 50 or so friends of Hadoop who attended. I thought the meeting was fun and informative and a great way to begin to associate faces and richer personalities with the names that fly through my in box. For those of you who were not able to attend, the meeting was quite informal and we got brief, mostly extemporaneous, updates of some of the projects that were on the agenda at the recent Hadoop Summit: Hadoop 0.17, HBase, Pig, Zookeeper, Mahout, ... In the wrapup, people seemed to like this agenda and I think Ajay plans to continue with the same general format in subsequent monthly meetings. Due to the informality, I did not take notes. Perhaps the other presenters can post summaries of their updates for the wider community's appreciation. Jeff -- http://blog.lukas-vlcek.com/
Hadoop Streaming - revised
Ok, I turned on verbose output. It looks as though it is adding everything in my /tmp directory to the jar file it builds. Where do I tell it not to do that? Thanks! Tanton
Hadoop Streaming - final
Ok, I figured it out. Hadoop Streaming adds the entire stream.shipped.hadoopstreaming directory to the jar file. For me, I wasn't setting it and it was defaulting to /tmp. That means my entire /tmp directory was getting added to the jar. I set that directory to the location of my hadoop streaming jar directory and it seemed to work fine. Sorry for the noise.
Re: How does one learn to program in Hadoop?
There are a few videos on YouTube for MapReduce at google. You can get the general idea of how to approach problems in Hadoop from them. Just search for Hadoop in YouTube. Also, there are a number of videos from Yahoo! Research on Hadoop; they are linked to from this mailing list, so you can search it for them. On Tue, May 20, 2008 at 2:24 AM, Hadoop [EMAIL PROTECTED] wrote: Thanks for the link. What I found so far are collections of Hadoop objects(APIs) and examples which I think are not enough to learn how to program using Hadoop. Could you please advice me how can I find any recourses (books, manuals, sites, etc) that explain how to program using Hadoop? -- View this message in context: http://www.nabble.com/How-does-one-learn-to-program-in-Hadoop--tp17308837p17334761.html Sent from the Hadoop core-user mailing list archive at Nabble.com.