Re: Optimized Hadoop

2012-02-23 Thread Schubert Zhang
@Todd,
Yes, in our first code tag, we intendedly keep away from the security and
user-control feature.
It is because in our existing deploys of production solutions in enterprise
field, this feature is always turned off. I think it may be mainly because
of the different business model between Hanborq and others.

But, we really have plan to completely compat with Apache and Cloudera in
the future.

For the worker-pool implementation, it is true we will continue to improve
our solution

Schubert Zhang

Looking at the code, it seems you only support the default task
executor. Do you have plans to support run-as-user through the linux
task-controller? It's a requirement for secure environments. But, it
makes the worker pool model a little tougher since you can't share a
JVM cross-user.



On Wed, Feb 22, 2012 at 7:34 PM, Dieter Plaetinck 
dieter.plaeti...@intec.ugent.be wrote:

 Great work folks! Very interesting.

 PS: did you notice if you google for hanborq or HDH it's very hard to
 find your website, hanborq.com ?

 Dieter

 On Tue, 21 Feb 2012 02:17:31 +0800
 Schubert Zhang zson...@gmail.com wrote:

  We just update the slides of this improvements:
 
 http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
 
  Updates:
  (1) modified some describes to make things more clear and accuracy.
  (2) add some benchmarks to make sense.
 
  On Sat, Feb 18, 2012 at 11:12 PM, Anty anty@gmail.com wrote:
 
  
  
   On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon t...@cloudera.com
 wrote:
  
   Hey Schubert,
  
   Looking at the code on github, it looks like your rewritten shuffle is
   in fact just a backport of the shuffle from MR2. I didn't look closely
  
  
   additionally, the rewritten shuffle in MR2 has some bugs, which harm
 the
   overall performance, for which I have already file a jira to report
 this,
   with a patch available.
   MAPREDUCE-3685 https://issues.apache.org/jira/browse/MAPREDUCE-3685
  
  
  
   - are there any distinguishing factors?
   Also, the OOB heartbeat and adaptive heartbeat code seems to be the
   same as what's in 1.0?
  
   -Todd
  
   On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang zson...@gmail.com
   wrote:
Here is the presentation to describe our job,
   
  
 http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
Wellcome to give your advises.
It's just a little step, and we are continue to do more
 improvements,
   thanks
for your help.
   
   
   
   
On Thu, Feb 16, 2012 at 11:01 PM, Anty anty@gmail.com wrote:
   
Hi: Guys
   We just deliver a optimized hadoop , if you are interested,
 Pls
refer to https://github.com/hanborq/hadoop
   
--
Best Regards
Anty Rao
   
   
  
  
  
   --
   Todd Lipcon
   Software Engineer, Cloudera
  
  
  
  
   --
   Best Regards
   Anty Rao
  




Re: Optimized Hadoop

2012-02-23 Thread Schubert Zhang
Thanks Dieter, Any comment is welcome.

Hehe, Hanborq Inc. is a small and low profile company, enen though we have
been in hadoop ecosystem for 4+ years. In fact, we were working hard and
busy in resolving big data problems of big enterprises, in china.  Of
cause, we were also finding our business model.

I think our home page site (www.hanborq.com) is very simple and ungainly
now, it seems we should get a guy who is good at website. :-)
But if you Google Hanborq Hadoop, Hanborq MapReduce, you may get you
want.

Thanks

On Wed, Feb 22, 2012 at 7:34 PM, Dieter Plaetinck 
dieter.plaeti...@intec.ugent.be wrote:

 Great work folks! Very interesting.

 PS: did you notice if you google for hanborq or HDH it's very hard to
 find your website, hanborq.com ?

 Dieter

 On Tue, 21 Feb 2012 02:17:31 +0800
 Schubert Zhang zson...@gmail.com wrote:

  We just update the slides of this improvements:
 
 http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
 
  Updates:
  (1) modified some describes to make things more clear and accuracy.
  (2) add some benchmarks to make sense.
 
  On Sat, Feb 18, 2012 at 11:12 PM, Anty anty@gmail.com wrote:
 
  
  
   On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon t...@cloudera.com
 wrote:
  
   Hey Schubert,
  
   Looking at the code on github, it looks like your rewritten shuffle is
   in fact just a backport of the shuffle from MR2. I didn't look closely
  
  
   additionally, the rewritten shuffle in MR2 has some bugs, which harm
 the
   overall performance, for which I have already file a jira to report
 this,
   with a patch available.
   MAPREDUCE-3685 https://issues.apache.org/jira/browse/MAPREDUCE-3685
  
  
  
   - are there any distinguishing factors?
   Also, the OOB heartbeat and adaptive heartbeat code seems to be the
   same as what's in 1.0?
  
   -Todd
  
   On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang zson...@gmail.com
   wrote:
Here is the presentation to describe our job,
   
  
 http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
Wellcome to give your advises.
It's just a little step, and we are continue to do more
 improvements,
   thanks
for your help.
   
   
   
   
On Thu, Feb 16, 2012 at 11:01 PM, Anty anty@gmail.com wrote:
   
Hi: Guys
   We just deliver a optimized hadoop , if you are interested,
 Pls
refer to https://github.com/hanborq/hadoop
   
--
Best Regards
Anty Rao
   
   
  
  
  
   --
   Todd Lipcon
   Software Engineer, Cloudera
  
  
  
  
   --
   Best Regards
   Anty Rao
  




Re: Optimized Hadoop

2012-02-22 Thread Dieter Plaetinck
Great work folks! Very interesting.

PS: did you notice if you google for hanborq or HDH it's very hard to find 
your website, hanborq.com ?

Dieter

On Tue, 21 Feb 2012 02:17:31 +0800
Schubert Zhang zson...@gmail.com wrote:

 We just update the slides of this improvements:
 http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
 
 Updates:
 (1) modified some describes to make things more clear and accuracy.
 (2) add some benchmarks to make sense.
 
 On Sat, Feb 18, 2012 at 11:12 PM, Anty anty@gmail.com wrote:
 
 
 
  On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon t...@cloudera.com wrote:
 
  Hey Schubert,
 
  Looking at the code on github, it looks like your rewritten shuffle is
  in fact just a backport of the shuffle from MR2. I didn't look closely
 
 
  additionally, the rewritten shuffle in MR2 has some bugs, which harm the
  overall performance, for which I have already file a jira to report this,
  with a patch available.
  MAPREDUCE-3685 https://issues.apache.org/jira/browse/MAPREDUCE-3685
 
 
 
  - are there any distinguishing factors?
  Also, the OOB heartbeat and adaptive heartbeat code seems to be the
  same as what's in 1.0?
 
  -Todd
 
  On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang zson...@gmail.com
  wrote:
   Here is the presentation to describe our job,
  
  http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
   Wellcome to give your advises.
   It's just a little step, and we are continue to do more improvements,
  thanks
   for your help.
  
  
  
  
   On Thu, Feb 16, 2012 at 11:01 PM, Anty anty@gmail.com wrote:
  
   Hi: Guys
  We just deliver a optimized hadoop , if you are interested, Pls
   refer to https://github.com/hanborq/hadoop
  
   --
   Best Regards
   Anty Rao
  
  
 
 
 
  --
  Todd Lipcon
  Software Engineer, Cloudera
 
 
 
 
  --
  Best Regards
  Anty Rao