.. I mean L7, obviously.

On Wed, Dec 14, 2011 at 12:41 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> Two questions relating to that:
>
> 1) we currently hardcode parallel 40 in pigmix. Since Pig can now
> automatically select parallelism, would it be better to let it do so?
>
> 2) I noticed that L17 can be greatly optimized. Currently it does this:
>
> register pigperf.jar;
> %default PIGMIX_DIR /user/pig/tests/data/pigmix
> A = load '$PIGMIX_DIR/page_views' using
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action,
> timespent, query_term,
>             ip_addr, timestamp, estimated_revenue, page_info, page_links);
> B = foreach A generate user, timestamp;
> C = group B by user;
> D = foreach C {
>     morning = filter B by timestamp < 43200;
>     afternoon = filter B by timestamp >= 43200;
>     generate group, COUNT(morning), COUNT(afternoon);
> }
> store D into 'L7out';
>
> It can be improved to use combiners:
>
> register pigperf.jar;
> %default PIGMIX_DIR /user/pig/tests/data/pigmix
> A = load '$PIGMIX_DIR/page_views' using
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action,
> timespent, query_term,
>             ip_addr, timestamp, estimated_revenue, page_info, page_links);
> B = foreach A generate user, timestamp,
>       (timestamp < 43200 ? 1 : 0) as morning, (timestamp >= 43200 ? 1 : 0)
> as afternoon;
> C = group B by user;
> D = foreach C {
>     generate group, SUM(B.morning), SUM(B,afternoon);
> }
> store D into 'L7out';
>
> Is L17 supposed to test something that precludes the use of combiners, or
> is improving the query fair game?
>
> D
>

Reply via email to