Hi,

I have the following query where i want to generate (sld, count of distinct
domains).
The traffic data comes with domain, subnet and the sld is obtained by a
second file (with a join).
I had a problem with generating this in a simple fashion and especially with
the distinct domains part. Would you have a look on the script below and
help me figure out if there's a way to simplify this ?

Thanks,
Tamir

traffic = LOAD 'traffic.txt' AS (domain:chararray, subnet:long, w:int,
e:int, o:int);
traffic1 = FOREACH traffic GENERATE domain, subnet;

traffic_by_subnet = GROUP traffic1 BY subnet;
traffic_by_subnet1 = FOREACH traffic_by_subnet GENERATE group AS subnet,
traffic1.domain;

subnet_info = LOAD 'subnet_info.txt' AS (subnet:long, country:chararray,
sld:chararray, org:chararray);
us_subnets = FILTER subnet_info BY country eq 'us';
us_subnets1 = FOREACH us_subnets GENERATE subnet, sld;

jr = JOIN traffic_by_subnet1 BY subnet, us_subnets1 by subnet;

r0 = FOREACH jr GENERATE sld, domain;
r1 = GROUP r0 BY sld;
r2 = FOREACH r1 GENERATE group as sld, flatten(r0.domain) as domain;
r3 = GROUP r2 BY domain;
r4 = FOREACH r3 GENERATE r2.sld, COUNT(group) as domains;

store r4 into 'sld-domains-count';

Reply via email to