-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25564/
-----------------------------------------------------------

(Updated Sept. 29, 2014, 12:20 a.m.)


Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and 
Sam Shah.


Changes
-------

Updated to new patch.


Repository: datafu


Description
-------

Example use:
group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray); 
with_group = CROSS group_fields, hour_rounded;
with_group = FOREACH with_group GENERATE group_fields::groupField AS 
groupField, 
hour_rounded::sourceNameOrIp AS sourceNameOrIp,
hour_rounded::destinationNameOrIp AS destinationNameOrIp,
...;
with_value_substitution = FOREACH with_group GENERATE 
ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *;
with_value_substitution = FOREACH with_value_substitution GENERATE 
FLATTEN(groupValue) AS groupValue:chararray,
groupField,
foo,
bar,
...;
all_success = FOREACH (GROUP with_value_substitution BY (groupField, 
groupValue, day)) GENERATE
FLATTEN(group) AS (seriesType, groupValue, day),
(int)COUNT_STAR(with_value_substitution) AS connections:int;


Diffs (updated)
-----

  datafu-pig/src/main/java/datafu/pig/util/SelectFieldByName.java PRE-CREATION 
  datafu-pig/src/test/java/datafu/test/pig/util/SelectFieldByNameTest.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/25564/diff/


Testing
-------

This UDF was used to replace a very inefficient pig script where macros that 
did many individual GROUP BY's took many minutes to plan.

Testing: unit tests and used on real data on a cluster.


Thanks,

Russell Jurney

Reply via email to