----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25564/ -----------------------------------------------------------
(Updated Oct. 28, 2014, 7:28 p.m.) Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah. Changes ------- Updated patch with new name, SelectStringFieldByName Repository: datafu Description ------- Example use: group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray); with_group = CROSS group_fields, hour_rounded; with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField, hour_rounded::sourceNameOrIp AS sourceNameOrIp, hour_rounded::destinationNameOrIp AS destinationNameOrIp, ...; with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *; with_value_substitution = FOREACH with_value_substitution GENERATE FLATTEN(groupValue) AS groupValue:chararray, groupField, foo, bar, ...; all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE FLATTEN(group) AS (seriesType, groupValue, day), (int)COUNT_STAR(with_value_substitution) AS connections:int; Diffs (updated) ----- datafu-pig/src/main/java/datafu/pig/util/SelectStringFieldByName.java PRE-CREATION datafu-pig/src/test/java/datafu/test/pig/util/SelectStringFieldByNameTest.java PRE-CREATION Diff: https://reviews.apache.org/r/25564/diff/ Testing ------- This UDF was used to replace a very inefficient pig script where macros that did many individual GROUP BY's took many minutes to plan. Testing: unit tests and used on real data on a cluster. Thanks, Russell Jurney