Will coding computational intensive algorithms using c/c++ and using
them with streaming mode improve the performance ? Just curiosity.
Xiance
On Aug 13, 2008, at 10:56 AM, Gaurav Veda wrote:
Thank you all for the replies. They do clarify things!
Cheers,
Gaurav
On Tue, Aug 12, 2008 at 8:01 PM, Arun C Murthy <[EMAIL PROTECTED]>
wrote:
On Aug 12, 2008, at 3:15 PM, Ashish Venugopal wrote:
There is definitely functionality in "normal" mode that is not
available
in
streaming, like the ability to write counters to instruments jobs. I
personally just use streaming, so I am interested to see if there
are
further key differences...
With hadoop-0.18 (under vote now) you get counters for streaming too:
http://issues.apache.org/jira/browse/HADOOP-1328
As others have pointed out, the fact that your input/output has to be
'textual' is a major difference - lots of applications need binary
data.
This 'stringification' has serious performance implications too, some
benchmarks I did a while ago for Pig put this at nearly 3x.
Arun
Ashish
On Tue, Aug 12, 2008 at 3:09 PM, Gaurav Veda
<[EMAIL PROTECTED]<[EMAIL PROTECTED]>
wrote:
Hi All,
This might seem too silly, but I couldn't find a satisfactory
answer
to this yet. What are the advantages / disadvantages of using
Hadoop
Streaming over the normal mode (wherein you write your own
mapper and
reducer in Java)? From what I gather, the real advantage of Hadoop
Streaming is that you can use any executable (in c / perl / python
etc) as a mapper / reducer.
A slight disadvantage is that the default is to read (write)
from the
standard input (output) ... though one can specify their own
Input and
Output format (and package it with the default hadoop streaming jar
file).
My point is, why should I ever use the normal mode? Streaming seems
just as good. Is there a performance problem or do I have only
limited
control over my job if I use the streaming mode or some other
issue?
Thanks!
Gaurav
--
Share what you know, learn what you don't !
--
Share what you know, learn what you don't !