Re: Hadoop-Git-Eclipse
I did not find that screencast useful. This one worked for me: http://wiki.apache.org/hadoop/EclipseEnvironment Best, Deniz On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote: Check out this link: http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/ Regards ∞ Shashwat Shriparv On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh prkal...@in.ibm.comwrote: Hi I have done MapReduce programming using Eclipse before but now I need to learn the Hadoop code internals for one of my projects. I have forked Hadoop from github (https://github.com/apache/hadoop-common ) and need to configure it to work with Eclipse. All the links I could find list steps for earlier versions of Hadoop. I am right now following instructions given in these links: - http://wiki.apache.org/hadoop/GitAndHadoop - http://wiki.apache.org/hadoop/EclipseEnvironment - http://wiki.apache.org/hadoop/HowToContribute Can someone please give me a link to the steps to be followed for getting Hadoop (latest from trunk) started in Eclipse? I need to be able to commit changes to my forked repository on github. Thanks in advance. Regards, Prajakta -- ∞ Shashwat Shriparv
Re: Working with MapFiles
Not sure if this helps in your use case but you can put all output file into distributed cache and then access them in the subsequent map-reduce job (in driver code): // previous mr-job's output String pstr = hdfs://output_path/; FileStatus[] files = fs.listStatus(new Path(pstr)); for (FileStatus f : files) { if (!f.isDir()) { DistributedCache.addCacheFile(f.getPath().toUri(), job.getConfiguration()); } } I think you can also copy these files to a different location in dfs and then put into distributed cache. Deniz On Mar 29, 2012, at 8:05 AM, Ondřej Klimpera wrote: Hello, I have a MapFile as a product of MapReduce job, and what I need to do is: 1. If MapReduce produced more spilts as Output, merge them to single file. 2. Copy this merged MapFile to another HDFS location and use it as a Distributed cache file for another MapReduce job. I'm wondering if it is even possible to merge MapFiles according to their nature and use them as Distributed cache file. What I'm trying to achieve is repeatedly fast search in this file during another MapReduce job. If my idea is absolute wrong, can you give me any tip how to do it? The file is supposed to be 20MB large. I'm using Hadoop 0.20.203. Thanks for your reply:) Ondrej Klimpera
mapreduce.job.user.name
I am trying to submit a job under a different username. I assumed mapreduce.job.user.name property is for this, and I have tried to set it in mapred-site.xml file but it doesn't seem to work. Any idea on what I should do? Thanks, Deniz