Dumbo certainly makes Python Streaming much nicer; there's more info here: http://wiki.github.com/klbostee/dumbo http://dumbotics.com/
For example, Dumbo makes it easy to implement combiners in Python. Zak On Tue, May 19, 2009 at 8:17 PM, Alex Loddengaard <a...@cloudera.com> wrote: > You might also check out Dumbo, which is a Hadoop Python module. > > <http://www.audioscrobbler.net/development/dumbo/> > > Alex > > On Tue, May 19, 2009 at 10:35 AM, s d <s.d.sau...@gmail.com> wrote: > >> Thanks. >> So in the overall scheme of things, what is the general feeling about using >> python for this? I like the ease of deploying and reading python compared >> with Java but want to make sure using python over hadoop is scalable & is >> standard practice and not something done only for prototyping and small >> scale tests. >> >> >> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <a...@cloudera.com> >> wrote: >> >> > Streaming is slightly slower than native Java jobs. Otherwise Python >> works >> > great in streaming. >> > >> > Alex >> > >> > On Tue, May 19, 2009 at 8:36 AM, s d <s.d.sau...@gmail.com> wrote: >> > >> > > Hi, >> > > How robust is using hadoop with python over the streaming protocol? Any >> > > disadvantages (performance? flexibility?) ? It just strikes me that >> > python >> > > is so much more convenient when it comes to deploying and crunching >> text >> > > files. >> > > Thanks, >> > > >> > >> >