Re: Hadoop-Git-Eclipse

2012-06-08 Thread Deniz Demir
I did not find that screencast useful. This one worked for me:

http://wiki.apache.org/hadoop/EclipseEnvironment

Best,
Deniz

On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote:

 Check out this link:
 http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
 
 Regards
 
 ∞
 Shashwat Shriparv
 
 
 
 
 On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh prkal...@in.ibm.comwrote:
 
 Hi
 
 I have done MapReduce programming using Eclipse before but now I need to
 learn the Hadoop code internals for one of my projects.
 
 I have forked Hadoop from github (https://github.com/apache/hadoop-common
 ) and need to configure it to work with Eclipse. All the links I could
 find list steps for earlier versions of Hadoop. I am right now following
 instructions given in these links:
 - http://wiki.apache.org/hadoop/GitAndHadoop
 - http://wiki.apache.org/hadoop/EclipseEnvironment
 - http://wiki.apache.org/hadoop/HowToContribute
 
 Can someone please give me a link to the steps to be followed for getting
 Hadoop (latest from trunk) started in Eclipse? I need to be able to commit
 changes to my forked repository on github.
 
 Thanks in advance.
 Regards,
 Prajakta
 
 
 
 
 -- 
 
 
 ∞
 Shashwat Shriparv



Re: Working with MapFiles

2012-03-29 Thread Deniz Demir
Not sure if this helps in your use case but you can put all output file into 
distributed cache and then access them in the subsequent map-reduce job (in 
driver code):

// previous mr-job's output
String pstr = hdfs://output_path/; 
FileStatus[] files = fs.listStatus(new Path(pstr));
for (FileStatus f : files) {
if (!f.isDir()) {
DistributedCache.addCacheFile(f.getPath().toUri(), 
job.getConfiguration());
}
}

I think you can also copy these files to a different location in dfs and then 
put into distributed cache.


Deniz 


On Mar 29, 2012, at 8:05 AM, Ondřej Klimpera wrote:

 Hello,
 
 I have a MapFile as a product of MapReduce job, and what I need to do is:
 
 1. If MapReduce produced more spilts as Output, merge them to single file.
 
 2. Copy this merged MapFile to another HDFS location and use it as a 
 Distributed cache file for another MapReduce job.
 
 I'm wondering if it is even possible to merge MapFiles according to their 
 nature and use them as Distributed cache file.
 
 What I'm trying to achieve is repeatedly fast search in this file during 
 another MapReduce job.
 If my idea is absolute wrong, can you give me any tip how to do it?
 
 The file is supposed to be 20MB large.
 I'm using Hadoop 0.20.203.
 
 Thanks for your reply:)
 
 Ondrej Klimpera



mapreduce.job.user.name

2012-03-27 Thread Deniz Demir
I am trying to submit a job under a different username. I assumed 
mapreduce.job.user.name property is for this, and I have tried to set it in 
mapred-site.xml file but it doesn't seem to work. Any idea on what I should do?

Thanks,
Deniz