I agree... as I said in one of the earlier emails, I saw a 50% speedup in a
perl script which categorizes O(10^9) rows at a time.  Also I wrote a very
simple python script (something like a 'cat'), and saw similar speedup.
These tests were with 1 Gig files.

We were testing this here at DoubleClick (though it's kind of pointless now
given that we have access to Google's MapReduce cluster :-) , and we
regularly process 25-100 Gig datasets... the best part of which is that we
don't have to rewrite much of our perl, R or bash code.



On Mon, Mar 31, 2008 at 7:21 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

>
> My experiences with Groovy are similar.  Noticeable slowdown, but quite
> bearable (almost always better than 50% of best attainable speed).
>
> The highest virtue is that simple programs become simple again.  Word
> count
> is < 5 lines of code.
>
>
>
>
> On 3/31/08 6:10 PM, "Colin Evans" <[EMAIL PROTECTED]> wrote:
>
> > At Metaweb, we did a lot of comparisons between streaming (using Python)
> > and native Java, and in general streaming performance was not much
> > slower than the native java -- most of the slowdown was from Python
> > being a slow language.
> >
> > The main problems with streaming apps that we found are that they are
> > hard to write and there are many ways that you can make simple mistakes
> > in streaming that slow down performance.
> >
> > We've been experimenting with embedding JavaScript (Rhino) and Jython
> > for writing jobs, and have found that performance is good and the apps
> > are much easier to write.  The tight Java integration means that
> > performance bottlenecks get rewritten in Java with little sacrifice to
> > development speed.  One of these days we'll open source these
> frameworks.
> >
> >
> >
> > Parand Darugar wrote:
> >> Travis Brady wrote:
> >>> This brings up two interesting issues:
> >>>
> >>> 1. Hadoop streaming is a potentially very powerful tool, especially
> for
> >>> those of us who don't work in Java for whatever reason
> >>> 2. If Hadoop streaming is "at best a jury rigged solution" then that
> >>> should
> >>> be made known somewhere on the wiki.  If it's really not supposed to
> be
> >>> used, why is it provided at all?
> >>>
> >>
> >> A set of reasonable performance tests and results would be very
> >> helpful in helping people decide whether to go with streaming or not.
> >> Hopefully we can get some numbers from this thread and publish them?
> >> Anyone else compared streaming with native java?
> >>
> >> Best,
> >>
> >> Parand
> >
>
>


-- 
Theodore Van Rooy
http://greentheo.scroggles.com

Reply via email to