[ 
https://issues.apache.org/jira/browse/DATAFU-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883404#comment-13883404
 ] 

Matthew Hayes commented on DATAFU-23:
-------------------------------------

It's not clear to me that PadZero is a generally useful function.  Perhaps for 
this particular use case a more generally useful UDF would be one that can 
round to a particular date boundary.  For example, round to the nearest hour, 
day, week, month, etc.  This rounded value could be used for grouping.  It's 
much simpler since you can avoid dealing with the individual year, month, day, 
etc. parameters.  What do you think about this?

> Create datafu.pig.util.PadZero to pad integers < 10 with 0s
> -----------------------------------------------------------
>
>                 Key: DATAFU-23
>                 URL: https://issues.apache.org/jira/browse/DATAFU-23
>             Project: DataFu
>          Issue Type: Improvement
>            Reporter: Russell Jurney
>         Attachments: DATAFU-23.patch
>
>
> /* Now group by time down to the hour, our time series granularity */
> grouped_by_time = GROUP bytes_in_out BY (GetYear(date_time), 
> GetMonth(date_time), GetDay(date_time), GetHour(date_time));
> bytes_per_hour = FOREACH grouped_by_time GENERATE FLATTEN(group) AS (year, 
> month, day, hour), 
>                                                   SUM(bytes_in_out.sc_bytes) 
> AS total_sc_bytes,
>                                                   SUM(bytes_in_out.cs_bytes) 
> AS total_cs_bytes;
> /* Now convert time elements back into a key for HBase */
> bytes_per_hour = FOREACH bytes_per_hour GENERATE ToDate(StringConcat(year, 
> '-', month, '-', day, 'T', hour, ':00:00.000Z')) AS date_time, 
>                                                  total_sc_bytes, 
>                                                  total_cs_bytes;
> The previous code will erroneously generate bad ISO8601 dates, looking like 
> this: "2005-1-1:1:00:00.000Z"
> Therefore a PadZero utility is needed to regenerate ISO8601 keys after 
> grouping by date pieces.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to