Dumbo certainly makes Python Streaming much nicer; there's more info here:

http://wiki.github.com/klbostee/dumbo
http://dumbotics.com/

For example, Dumbo makes it easy to implement combiners in Python.

Zak


On Tue, May 19, 2009 at 8:17 PM, Alex Loddengaard <a...@cloudera.com> wrote:
> You might also check out Dumbo, which is a Hadoop Python module.
>
> <http://www.audioscrobbler.net/development/dumbo/>
>
> Alex
>
> On Tue, May 19, 2009 at 10:35 AM, s d <s.d.sau...@gmail.com> wrote:
>
>> Thanks.
>> So in the overall scheme of things, what is the general feeling about using
>> python for this? I like the ease of deploying and reading python compared
>> with Java but want to make sure using python over hadoop is scalable & is
>> standard practice and not something done only for prototyping and small
>> scale tests.
>>
>>
>> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <a...@cloudera.com>
>> wrote:
>>
>> > Streaming is slightly slower than native Java jobs.  Otherwise Python
>> works
>> > great in streaming.
>> >
>> > Alex
>> >
>> > On Tue, May 19, 2009 at 8:36 AM, s d <s.d.sau...@gmail.com> wrote:
>> >
>> > > Hi,
>> > > How robust is using hadoop with python over the streaming protocol? Any
>> > > disadvantages (performance? flexibility?) ?  It just strikes me that
>> > python
>> > > is so much more convenient when it comes to deploying and crunching
>> text
>> > > files.
>> > > Thanks,
>> > >
>> >
>>
>

Reply via email to