----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25564/#review54778 -----------------------------------------------------------
datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java <https://reviews.apache.org/r/25564/#comment95054> Something like this seems more accurate and concise: Selects the value for a field within a tuple using that field's name. datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java <https://reviews.apache.org/r/25564/#comment95055> I'm not sure if I like the name ChooseFieldByValue . What about SelectFieldByName? datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java <https://reviews.apache.org/r/25564/#comment95052> remove this comment datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java <https://reviews.apache.org/r/25564/#comment95053> include message in exception, also something like IllegalArgumentException is probably more appropriate datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java <https://reviews.apache.org/r/25564/#comment95056> Should start at i=1 since doesn't make sense to select itself Sorry it took awhile for me to take a look at this. - Matthew Hayes On Sept. 15, 2014, 6:58 p.m., Russell Jurney wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25564/ > ----------------------------------------------------------- > > (Updated Sept. 15, 2014, 6:58 p.m.) > > > Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and > Sam Shah. > > > Repository: datafu > > > Description > ------- > > Example use: > group_fields = LOAD '/e8/smalldata/group_fields.txt' AS > (groupField:chararray); > with_group = CROSS group_fields, hour_rounded; > with_group = FOREACH with_group GENERATE group_fields::groupField AS > groupField, > hour_rounded::sourceNameOrIp AS sourceNameOrIp, > hour_rounded::destinationNameOrIp AS destinationNameOrIp, > ...; > with_value_substitution = FOREACH with_group GENERATE > ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *; > with_value_substitution = FOREACH with_value_substitution GENERATE > FLATTEN(groupValue) AS groupValue:chararray, > groupField, > foo, > bar, > ...; > all_success = FOREACH (GROUP with_value_substitution BY (groupField, > groupValue, day)) GENERATE > FLATTEN(group) AS (seriesType, groupValue, day), > (int)COUNT_STAR(with_value_substitution) AS connections:int; > > > Diffs > ----- > > datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java > PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/util/ChooseFieldByValueTest.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/25564/diff/ > > > Testing > ------- > > This UDF was used to replace a very inefficient pig script where macros that > did many individual GROUP BY's took many minutes to plan. > > Testing: unit tests and used on real data on a cluster. > > > Thanks, > > Russell Jurney > >
