> On Sept. 29, 2014, 12:56 a.m., Matthew Hayes wrote:
> > datafu-pig/src/main/java/datafu/pig/util/SelectFieldByName.java, line 49
> > <https://reviews.apache.org/r/25564/diff/2/?file=707974#file707974line49>
> >
> > Hmm, something just occurred to me. This does not currently provide
> > the output schema. So this is one problem. But, how do we determine the
> > output schema? If the output value is decided dynamically, then it can
> > vary. One way to address this is to require that all the other values of
> > the tuple are of the same type. Then you just take the schema form the
> > first value. In your example they are all chararray. But this does limit
> > the uses of this UDF.
In practice, this is not an issue. The UDF is used this way, and you can cast
it to what you want.
with_value_substitution = FOREACH with_group GENERATE
FLATTEN(ChooseFieldByValue(groupField, *)) AS groupValue:chararray,
*,
(int)$period AS periodSeconds:int;
However, I don't see why I can't detect the schema of the field selected and
return that?
- Russell
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25564/#review54788
-----------------------------------------------------------
On Sept. 29, 2014, 12:20 a.m., Russell Jurney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25564/
> -----------------------------------------------------------
>
> (Updated Sept. 29, 2014, 12:20 a.m.)
>
>
> Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and
> Sam Shah.
>
>
> Repository: datafu
>
>
> Description
> -------
>
> Example use:
> group_fields = LOAD '/e8/smalldata/group_fields.txt' AS
> (groupField:chararray);
> with_group = CROSS group_fields, hour_rounded;
> with_group = FOREACH with_group GENERATE group_fields::groupField AS
> groupField,
> hour_rounded::sourceNameOrIp AS sourceNameOrIp,
> hour_rounded::destinationNameOrIp AS destinationNameOrIp,
> ...;
> with_value_substitution = FOREACH with_group GENERATE
> ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *;
> with_value_substitution = FOREACH with_value_substitution GENERATE
> FLATTEN(groupValue) AS groupValue:chararray,
> groupField,
> foo,
> bar,
> ...;
> all_success = FOREACH (GROUP with_value_substitution BY (groupField,
> groupValue, day)) GENERATE
> FLATTEN(group) AS (seriesType, groupValue, day),
> (int)COUNT_STAR(with_value_substitution) AS connections:int;
>
>
> Diffs
> -----
>
> datafu-pig/src/main/java/datafu/pig/util/SelectFieldByName.java
> PRE-CREATION
> datafu-pig/src/test/java/datafu/test/pig/util/SelectFieldByNameTest.java
> PRE-CREATION
>
> Diff: https://reviews.apache.org/r/25564/diff/
>
>
> Testing
> -------
>
> This UDF was used to replace a very inefficient pig script where macros that
> did many individual GROUP BY's took many minutes to plan.
>
> Testing: unit tests and used on real data on a cluster.
>
>
> Thanks,
>
> Russell Jurney
>
>