Re: Hadoop & Python

s d Wed, 20 May 2009 08:12:49 -0700

Thanks, What would be the # of severs , file sizes that in their range the
performance hit will be minor? I am concerned about implementing it all only
to rewrite it later to scale economically.
Thanks for all the information.


On Tue, May 19, 2009 at 1:30 PM, Amr Awadallah <a...@cloudera.com> wrote:

> S d,
>
>  It is totally fine to use Python streaming if it does the job you are
> after, there will be a slight performance hit, but that is noise assuming
> your cluster is a small one. If you are operating a large cluster
> continuously, then once your logic is stabilized using Python it might make
> sense to convert/operationalize some jobs to Java (or C pipes) to improve
> performance for purpose of finishing quicker or reducing number of servers
> needed.
>
>  You should also take a look at PIG and Hive, they are both higher level
> languages and very easy to learn:
>
> http://www.cloudera.com/hadoop-training-pig-introduction
>
> http://www.cloudera.com/hadoop-training-hive-introduction
>
> -- amr
>
>
> s d wrote:
>
>> Thanks.
>> So in the overall scheme of things, what is the general feeling about
>> using
>> python for this? I like the ease of deploying and reading python compared
>> with Java but want to make sure using python over hadoop is scalable & is
>> standard practice and not something done only for prototyping and small
>> scale tests.
>>
>>
>> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <a...@cloudera.com>
>> wrote:
>>
>>
>>
>>> Streaming is slightly slower than native Java jobs.  Otherwise Python
>>> works
>>> great in streaming.
>>>
>>> Alex
>>>
>>> On Tue, May 19, 2009 at 8:36 AM, s d <s.d.sau...@gmail.com> wrote:
>>>
>>>
>>>
>>>> Hi,
>>>> How robust is using hadoop with python over the streaming protocol? Any
>>>> disadvantages (performance? flexibility?) ?  It just strikes me that
>>>>
>>>>
>>> python
>>>
>>>
>>>> is so much more convenient when it comes to deploying and crunching text
>>>> files.
>>>> Thanks,
>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Hadoop & Python

Reply via email to