----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25564/ -----------------------------------------------------------
(Updated Sept. 29, 2014, 12:20 a.m.) Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah. Changes ------- Updated to new patch. Repository: datafu Description ------- Example use: group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray); with_group = CROSS group_fields, hour_rounded; with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField, hour_rounded::sourceNameOrIp AS sourceNameOrIp, hour_rounded::destinationNameOrIp AS destinationNameOrIp, ...; with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *; with_value_substitution = FOREACH with_value_substitution GENERATE FLATTEN(groupValue) AS groupValue:chararray, groupField, foo, bar, ...; all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE FLATTEN(group) AS (seriesType, groupValue, day), (int)COUNT_STAR(with_value_substitution) AS connections:int; Diffs (updated) ----- datafu-pig/src/main/java/datafu/pig/util/SelectFieldByName.java PRE-CREATION datafu-pig/src/test/java/datafu/test/pig/util/SelectFieldByNameTest.java PRE-CREATION Diff: https://reviews.apache.org/r/25564/diff/ Testing ------- This UDF was used to replace a very inefficient pig script where macros that did many individual GROUP BY's took many minutes to plan. Testing: unit tests and used on real data on a cluster. Thanks, Russell Jurney