Re: yslow optimizations

ASF - Maillists Tue, 13 Mar 2012 13:54:19 -0700

The correlation based optimization in YSmart looks good as it creates minimal 
number of jobs by exploiting correlation between the multiple jobs. In the 
experiment section it is mentioned that they used CDH distribution for their 
experimental setup. Since the paper is published in ICDCS 2011 in June, a quick 
glance over CDH3 beta 4 (released in Feb 2011) release history shows Pig 0.8.0.
 
Looks like they have patched this in Hive 
http://code.google.com/p/ysmart/wiki/HivePatchhttp://code.google.com/p/ysmart/wiki/HivePatch



On Mar 10, 2012, at 11:16 PM, Dmitriy Ryaboy wrote:

> Yslow does some clever correlation-based optimizations to achieve
> significant speedups. They have a good paper about it:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Note the Hive/Pig numbers.. we are generating unnecessary jobs, and
> too much intermediate data, it seems (not sure which version of Pig
> they ran).
> 
> D

Re: yslow optimizations

Reply via email to