.. I mean L7, obviously. On Wed, Dec 14, 2011 at 12:41 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:
> Two questions relating to that: > > 1) we currently hardcode parallel 40 in pigmix. Since Pig can now > automatically select parallelism, would it be better to let it do so? > > 2) I noticed that L17 can be greatly optimized. Currently it does this: > > register pigperf.jar; > %default PIGMIX_DIR /user/pig/tests/data/pigmix > A = load '$PIGMIX_DIR/page_views' using > org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, > timespent, query_term, > ip_addr, timestamp, estimated_revenue, page_info, page_links); > B = foreach A generate user, timestamp; > C = group B by user; > D = foreach C { > morning = filter B by timestamp < 43200; > afternoon = filter B by timestamp >= 43200; > generate group, COUNT(morning), COUNT(afternoon); > } > store D into 'L7out'; > > It can be improved to use combiners: > > register pigperf.jar; > %default PIGMIX_DIR /user/pig/tests/data/pigmix > A = load '$PIGMIX_DIR/page_views' using > org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, > timespent, query_term, > ip_addr, timestamp, estimated_revenue, page_info, page_links); > B = foreach A generate user, timestamp, > (timestamp < 43200 ? 1 : 0) as morning, (timestamp >= 43200 ? 1 : 0) > as afternoon; > C = group B by user; > D = foreach C { > generate group, SUM(B.morning), SUM(B,afternoon); > } > store D into 'L7out'; > > Is L17 supposed to test something that precludes the use of combiners, or > is improving the query fair game? > > D >