Re: [Streaming]What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Steve Gao
One more little question, why Hadoop streaming is designed in this way to use 2 
different options to do the same thing (i.e. control the reduce number)? What's 
the point here?
Thanks

--- On Fri, 7/18/08, Arun C Murthy <[EMAIL PROTECTED]> wrote:
From: Arun C Murthy <[EMAIL PROTECTED]>
Subject: Re: [Streaming]What is the difference between streaming options: -file 
and -CacheFile ?
To: core-user@hadoop.apache.org, "Steve Gao" <[EMAIL PROTECTED]>
Date: Friday, July 18, 2008, 8:27 PM

On Jul 18, 2008, at 4:53 PM, Steve Gao wrote:

> Hi All,
> I am using Hadoop Streaming. I am confused by streaming  
> options: -file and -CacheFile. Seems that they mean the same thing,  
> right?
>

The difference is that -file will 'ship' your file (local file) to  
the cluster, while -cachefile assumes that it is already present on  
HDFS at the given path.

> Another misleading options are : -NumReduceTasks and -jobconf  
> mapred.reduce.tasks. Both are used to control (or give hit to) the  
> number of reducers.
>

Yes, they are both equivalent.

hth,
Arun


  

Re: [Streaming]What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Arun C Murthy


On Jul 18, 2008, at 4:53 PM, Steve Gao wrote:


Hi All,
I am using Hadoop Streaming. I am confused by streaming  
options: -file and -CacheFile. Seems that they mean the same thing,  
right?




The difference is that -file will 'ship' your file (local file) to  
the cluster, while -cachefile assumes that it is already present on  
HDFS at the given path.


Another misleading options are : -NumReduceTasks and -jobconf  
mapred.reduce.tasks. Both are used to control (or give hit to) the  
number of reducers.




Yes, they are both equivalent.

hth,
Arun


[Streaming]What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Steve Gao
Hi All,  
    I am using Hadoop Streaming. I am confused by streaming options: -file and 
-CacheFile. Seems that they mean the same thing, right?

    Another misleading options are : -NumReduceTasks and -jobconf 
mapred.reduce.tasks. Both are used to control (or give hit to) the number of 
reducers.

  Thanks