> On Sept. 29, 2014, 12:56 a.m., Matthew Hayes wrote:
> > datafu-pig/src/main/java/datafu/pig/util/SelectFieldByName.java, line 49
> > <https://reviews.apache.org/r/25564/diff/2/?file=707974#file707974line49>
> >
> >     Hmm, something just occurred to me.  This does not currently provide 
> > the output schema.  So this is one problem.  But, how do we determine the 
> > output schema?  If the output value is decided dynamically, then it can 
> > vary.  One way to address this is to require that all the other values of 
> > the tuple are of the same type.  Then you just take the schema form the 
> > first value.  In your example they are all chararray.  But this does limit 
> > the uses of this UDF.
> 
> Russell Jurney wrote:
>     In practice, this is not an issue. The UDF is used this way, and you can 
> cast it to what you want.
>     
>     with_value_substitution = FOREACH with_group GENERATE 
>         FLATTEN(ChooseFieldByValue(groupField, *)) AS groupValue:chararray,
>         *, 
>         (int)$period AS periodSeconds:int;
>     
>     However, I don't see why I can't detect the schema of the field selected 
> and return that?
> 
> Matthew Hayes wrote:
>     The schema can't be dynamic like that.  I'll have to think about this 
> some more.  I don't like that we have to cast it like this.  One way we can 
> make this better is to have the UDF pick the schema that is best fit for the 
> types provided.  For example, if all the fields are of the same type, like 
> chararray, then the resulting type is chararray.  Otherwise make the type 
> bytearray and you can cast however you want.  I'd like to hear what other 
> people think about this.  How about emailing datafu dev?
> 
> Russell Jurney wrote:
>     I will bring it up on the list, but I don't think returning a tuple is 
> weird at all. It is highly convenient, and 'just works.'

I'm not saying that returning a tuple is weird.  What is weird to me is not 
defining the schema of the tuple being returned by the UDF.


- Matthew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25564/#review54788
-----------------------------------------------------------


On Oct. 2, 2014, 4:19 p.m., Russell Jurney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25564/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2014, 4:19 p.m.)
> 
> 
> Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and 
> Sam Shah.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> Example use:
> group_fields = LOAD '/e8/smalldata/group_fields.txt' AS 
> (groupField:chararray); 
> with_group = CROSS group_fields, hour_rounded;
> with_group = FOREACH with_group GENERATE group_fields::groupField AS 
> groupField, 
> hour_rounded::sourceNameOrIp AS sourceNameOrIp,
> hour_rounded::destinationNameOrIp AS destinationNameOrIp,
> ...;
> with_value_substitution = FOREACH with_group GENERATE 
> ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *;
> with_value_substitution = FOREACH with_value_substitution GENERATE 
> FLATTEN(groupValue) AS groupValue:chararray,
> groupField,
> foo,
> bar,
> ...;
> all_success = FOREACH (GROUP with_value_substitution BY (groupField, 
> groupValue, day)) GENERATE
> FLATTEN(group) AS (seriesType, groupValue, day),
> (int)COUNT_STAR(with_value_substitution) AS connections:int;
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/util/SelectFieldByName.java 
> PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/util/SelectFieldByNameTest.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/25564/diff/
> 
> 
> Testing
> -------
> 
> This UDF was used to replace a very inefficient pig script where macros that 
> did many individual GROUP BY's took many minutes to plan.
> 
> Testing: unit tests and used on real data on a cluster.
> 
> 
> Thanks,
> 
> Russell Jurney
> 
>

Reply via email to