One area I'm curious about is the requirement that any combiners in Streaming jobs be java classes. Are there any plans to change this in the future? Prototyping streaming jobs in Python is great, and the ability to use a Python combiner would help performance a lot without needing to move to Java.
On Tue, May 19, 2009 at 4:30 PM, Amr Awadallah <a...@cloudera.com> wrote: > S d, > > It is totally fine to use Python streaming if it does the job you are > after, there will be a slight performance hit, but that is noise assuming > your cluster is a small one. If you are operating a large cluster > continuously, then once your logic is stabilized using Python it might make > sense to convert/operationalize some jobs to Java (or C pipes) to improve > performance for purpose of finishing quicker or reducing number of servers > needed. > > You should also take a look at PIG and Hive, they are both higher level > languages and very easy to learn: > > http://www.cloudera.com/hadoop-training-pig-introduction > > http://www.cloudera.com/hadoop-training-hive-introduction > > -- amr > > > s d wrote: > >> Thanks. >> So in the overall scheme of things, what is the general feeling about >> using >> python for this? I like the ease of deploying and reading python compared >> with Java but want to make sure using python over hadoop is scalable & is >> standard practice and not something done only for prototyping and small >> scale tests. >> >> >> On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard <a...@cloudera.com> >> wrote: >> >> >> >>> Streaming is slightly slower than native Java jobs. Otherwise Python >>> works >>> great in streaming. >>> >>> Alex >>> >>> On Tue, May 19, 2009 at 8:36 AM, s d <s.d.sau...@gmail.com> wrote: >>> >>> >>> >>>> Hi, >>>> How robust is using hadoop with python over the streaming protocol? Any >>>> disadvantages (performance? flexibility?) ? It just strikes me that >>>> >>>> >>> python >>> >>> >>>> is so much more convenient when it comes to deploying and crunching text >>>> files. >>>> Thanks, >>>> >>>> >>>> >>> >> >> > -- Peter N. Skomoroch 617.285.8348 http://www.datawrangling.com http://delicious.com/pskomoroch http://twitter.com/peteskomoroch