Direct link to HADOOP-4842:

https://issues.apache.org/jira/browse/HADOOP-4842

On Tue, May 19, 2009 at 5:04 PM, Peter Skomoroch
<peter.skomor...@gmail.com>wrote:

> Whoops, should have googled it first.  Looks like this is now fixed in
> trunk, HADOOP-4842.  For people stuck using 18.3, a workaround appears to be
> adding something like "| sort | sh combiner.sh" to the call of the mapper
> script (via Klaas Bosteels)
>
> Would be great to get this patched into distributions like EMR and Cloudera
>
>
> On Tue, May 19, 2009 at 4:59 PM, Peter Skomoroch <
> peter.skomor...@gmail.com> wrote:
>
>> One area I'm curious about is the requirement that any combiners in
>> Streaming jobs be java classes.  Are there any plans to change this in the
>> future?  Prototyping streaming jobs in Python is great, and the ability to
>> use a Python combiner would help performance a lot without needing to move
>> to Java.
>>
>>
>>
>>
>> On Tue, May 19, 2009 at 4:30 PM, Amr Awadallah <a...@cloudera.com> wrote:
>>
>>> S d,
>>>
>>>  It is totally fine to use Python streaming if it does the job you are
>>> after, there will be a slight performance hit, but that is noise assuming
>>> your cluster is a small one. If you are operating a large cluster
>>> continuously, then once your logic is stabilized using Python it might make
>>> sense to convert/operationalize some jobs to Java (or C pipes) to improve
>>> performance for purpose of finishing quicker or reducing number of servers
>>> needed.
>>>
>>>  You should also take a look at PIG and Hive, they are both higher level
>>> languages and very easy to learn:
>>>
>>> http://www.cloudera.com/hadoop-training-pig-introduction
>>>
>>> http://www.cloudera.com/hadoop-training-hive-introduction
>>>
>>> -- amr
>>>
>>>
>>> s d wrote:
>>>
>>>> Thanks.
>>>> So in the overall scheme of things, what is the general feeling about
>>>> using
>>>> python for this? I like the ease of deploying and reading python
>>>> compared
>>>> with Java but want to make sure using python over hadoop is scalable &
>>>> is
>>>> standard practice and not something done only for prototyping and small
>>>> scale tests.
>>>>
>>>>
>>>> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <a...@cloudera.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>> Streaming is slightly slower than native Java jobs.  Otherwise Python
>>>>> works
>>>>> great in streaming.
>>>>>
>>>>> Alex
>>>>>
>>>>> On Tue, May 19, 2009 at 8:36 AM, s d <s.d.sau...@gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>> How robust is using hadoop with python over the streaming protocol?
>>>>>> Any
>>>>>> disadvantages (performance? flexibility?) ?  It just strikes me that
>>>>>>
>>>>>>
>>>>> python
>>>>>
>>>>>
>>>>>> is so much more convenient when it comes to deploying and crunching
>>>>>> text
>>>>>> files.
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Peter N. Skomoroch
>> 617.285.8348
>> http://www.datawrangling.com
>> http://delicious.com/pskomoroch
>> http://twitter.com/peteskomoroch
>>
>
>
>
> --
> Peter N. Skomoroch
> 617.285.8348
> http://www.datawrangling.com
> http://delicious.com/pskomoroch
> http://twitter.com/peteskomoroch
>



-- 
Peter N. Skomoroch
617.285.8348
http://www.datawrangling.com
http://delicious.com/pskomoroch
http://twitter.com/peteskomoroch

Reply via email to