Re: Reducers spawned when mapred.reduce.tasks=0

2009-03-15 Thread Amareshwari Sriramadasu
Instantiation of Reducer is moved to the place where reduce() is getting 
called, in branch 0.19.1. See HADOOP-5002. Hope that should solve your 
issue with configure() method.

Thanks
Amareshwari
Chris K Wensel wrote:

fwiw, we have released a workaround for this issue in Cascading 1.0.5.

http://www.cascading.org/
http://cascading.googlecode.com/files/cascading-1.0.5.tgz

In short, Hadoop 0.19.0 and .1 instantiate the users Reducer class and 
subsequently calls configure() when there is no intention to use the 
class (during job/task cleanup tasks).


This clearly can cause havoc for users who use configure() to 
initialize resources used by the reduce() method.


Testing for jobConf.getNumReduceTasks() is 0 inside the configure() 
method seems to work out well.


branch-0.19 looks like it won't instantiate the Reducer class during 
job/task cleanup tasks, so I expect will leak into future releases.


cheers,

ckw

On Mar 12, 2009, at 8:20 PM, Amareshwari Sriramadasu wrote:


Are you seeing reducers getting spawned from web ui? then, it is a bug.
If not, there won't be reducers spawned, it could be job-setup/ 
job-cleanup task that is running on a reduce slot. See HADOOP-3150 
and HADOOP-4261.

-Amareshwari
Chris K Wensel wrote:


May have found the answer, waiting on confirmation from users.

Turns out 0.19.0 and .1 instantiate the reducer class when the task 
is actually intended for job/task cleanup.


branch-0.19 looks like it resolves this issue by not instantiating 
the reducer class in this case.


I've got a workaround in the next maint release:
http://github.com/cwensel/cascading/tree/wip-1.0.5

ckw

On Mar 12, 2009, at 10:12 AM, Chris K Wensel wrote:


Hey all

Have some users reporting intermittent spawning of Reducers when 
the job.xml shows mapred.reduce.tasks=0 in 0.19.0 and .1.


This is also confirmed when jobConf is queried in the (supposedly 
ignored) Reducer implementation.


In general this issue would likely go unnoticed since the default 
reducer is IdentityReducer.


but since it should be ignored in the Mapper only case, we don't 
bother not setting the value, and subsequently comes to ones 
attention rather abruptly.


am happy to open a JIRA, but wanted to see if anyone else is 
experiencing this issue.


note the issue seems to manifest with or without spec exec.

ckw

--Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/



--Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/





--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/





Re: Temporary files for mapppers and reducers

2009-03-15 Thread Owen O'Malley
Just use the current working directory. Each task gets a unique  
directory that is erased when the task finished.


-- Owen

On Mar 15, 2009, at 16:08, Mark Kerzner  wrote:


Hi,

what would be the best place to put temporary files for a reducer? I  
believe
that since reducers each work on its own machine, at its own time,  
one can

do anything, but I would like a confirmation from the experts.

Thanks,
Mark


Re: Temporary files for mapppers and reducers

2009-03-15 Thread jason hadoop
If you use the Java System Property java.io.tmpdir, your reducer will use
the ./tmp directory in the local working directory allocated by the
framework for your task.

If you have a specialty file system for transient data, such as a tmpfs, use
that.



On Sun, Mar 15, 2009 at 4:08 PM, Mark Kerzner  wrote:

> Hi,
>
> what would be the best place to put temporary files for a reducer? I
> believe
> that since reducers each work on its own machine, at its own time, one can
> do anything, but I would like a confirmation from the experts.
>
> Thanks,
> Mark
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Temporary files for mapppers and reducers

2009-03-15 Thread Mark Kerzner
Hi,

what would be the best place to put temporary files for a reducer? I believe
that since reducers each work on its own machine, at its own time, one can
do anything, but I would like a confirmation from the experts.

Thanks,
Mark


Re: Compare Files

2009-03-15 Thread Tarandeep Singh
Map- Output key,value pair as- (source, file_num)
1,1
2,1
3,1
2,2
7,2

Reduce- (1, [1]), (2, [1,2]), (3, [1]), (7, [2])
Ouptut only those keys whose list of values do not contain file2-
1
3

-Taran

On Sun, Mar 15, 2009 at 7:24 AM, Tamir Kamara  wrote:

> Hi,
>
> I have 2 files in this format:
> file1: (source, target)
> file2: (source)
>
> I would like to write MR which will output all records in file1 that their
> source isn't in file2. Example:
> file1:
> 1,2
> 2,9
> 3,5
>
> file2:
> 2
> 7
>
> outcome:
> 1,2
> 3,5
>
> Could you help me with this ?
>


hadoop suffix tree

2009-03-15 Thread ywm001

hi,Can I use hadoop to construct suffix tree parallelly?
If can,how?
I need your help,Thank you!


2009-03-15 



ywm001 


Compare Files

2009-03-15 Thread Tamir Kamara
Hi,

I have 2 files in this format:
file1: (source, target)
file2: (source)

I would like to write MR which will output all records in file1 that their
source isn't in file2. Example:
file1:
1,2
2,9
3,5

file2:
2
7

outcome:
1,2
3,5

Could you help me with this ?