Re: Reducers spawned when mapred.reduce.tasks=0
Instantiation of Reducer is moved to the place where reduce() is getting called, in branch 0.19.1. See HADOOP-5002. Hope that should solve your issue with configure() method. Thanks Amareshwari Chris K Wensel wrote: fwiw, we have released a workaround for this issue in Cascading 1.0.5. http://www.cascading.org/ http://cascading.googlecode.com/files/cascading-1.0.5.tgz In short, Hadoop 0.19.0 and .1 instantiate the users Reducer class and subsequently calls configure() when there is no intention to use the class (during job/task cleanup tasks). This clearly can cause havoc for users who use configure() to initialize resources used by the reduce() method. Testing for jobConf.getNumReduceTasks() is 0 inside the configure() method seems to work out well. branch-0.19 looks like it won't instantiate the Reducer class during job/task cleanup tasks, so I expect will leak into future releases. cheers, ckw On Mar 12, 2009, at 8:20 PM, Amareshwari Sriramadasu wrote: Are you seeing reducers getting spawned from web ui? then, it is a bug. If not, there won't be reducers spawned, it could be job-setup/ job-cleanup task that is running on a reduce slot. See HADOOP-3150 and HADOOP-4261. -Amareshwari Chris K Wensel wrote: May have found the answer, waiting on confirmation from users. Turns out 0.19.0 and .1 instantiate the reducer class when the task is actually intended for job/task cleanup. branch-0.19 looks like it resolves this issue by not instantiating the reducer class in this case. I've got a workaround in the next maint release: http://github.com/cwensel/cascading/tree/wip-1.0.5 ckw On Mar 12, 2009, at 10:12 AM, Chris K Wensel wrote: Hey all Have some users reporting intermittent spawning of Reducers when the job.xml shows mapred.reduce.tasks=0 in 0.19.0 and .1. This is also confirmed when jobConf is queried in the (supposedly ignored) Reducer implementation. In general this issue would likely go unnoticed since the default reducer is IdentityReducer. but since it should be ignored in the Mapper only case, we don't bother not setting the value, and subsequently comes to ones attention rather abruptly. am happy to open a JIRA, but wanted to see if anyone else is experiencing this issue. note the issue seems to manifest with or without spec exec. ckw --Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/ --Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/ -- Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/
Re: Temporary files for mapppers and reducers
Just use the current working directory. Each task gets a unique directory that is erased when the task finished. -- Owen On Mar 15, 2009, at 16:08, Mark Kerzner wrote: Hi, what would be the best place to put temporary files for a reducer? I believe that since reducers each work on its own machine, at its own time, one can do anything, but I would like a confirmation from the experts. Thanks, Mark
Re: Temporary files for mapppers and reducers
If you use the Java System Property java.io.tmpdir, your reducer will use the ./tmp directory in the local working directory allocated by the framework for your task. If you have a specialty file system for transient data, such as a tmpfs, use that. On Sun, Mar 15, 2009 at 4:08 PM, Mark Kerzner wrote: > Hi, > > what would be the best place to put temporary files for a reducer? I > believe > that since reducers each work on its own machine, at its own time, one can > do anything, but I would like a confirmation from the experts. > > Thanks, > Mark > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Temporary files for mapppers and reducers
Hi, what would be the best place to put temporary files for a reducer? I believe that since reducers each work on its own machine, at its own time, one can do anything, but I would like a confirmation from the experts. Thanks, Mark
Re: Compare Files
Map- Output key,value pair as- (source, file_num) 1,1 2,1 3,1 2,2 7,2 Reduce- (1, [1]), (2, [1,2]), (3, [1]), (7, [2]) Ouptut only those keys whose list of values do not contain file2- 1 3 -Taran On Sun, Mar 15, 2009 at 7:24 AM, Tamir Kamara wrote: > Hi, > > I have 2 files in this format: > file1: (source, target) > file2: (source) > > I would like to write MR which will output all records in file1 that their > source isn't in file2. Example: > file1: > 1,2 > 2,9 > 3,5 > > file2: > 2 > 7 > > outcome: > 1,2 > 3,5 > > Could you help me with this ? >
hadoop suffix tree
hi,Can I use hadoop to construct suffix tree parallelly? If can,how? I need your help,Thank you! 2009-03-15 ywm001
Compare Files
Hi, I have 2 files in this format: file1: (source, target) file2: (source) I would like to write MR which will output all records in file1 that their source isn't in file2. Example: file1: 1,2 2,9 3,5 file2: 2 7 outcome: 1,2 3,5 Could you help me with this ?