[ 
https://issues.apache.org/jira/browse/DATAFU-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883692#comment-13883692
 ] 

Sam Shah commented on DATAFU-23:
--------------------------------

Agree with others -- this is a workaround for something that should be fixed in 
another way.

> Create datafu.pig.util.PadZero to pad integers < 10 with 0s
> -----------------------------------------------------------
>
>                 Key: DATAFU-23
>                 URL: https://issues.apache.org/jira/browse/DATAFU-23
>             Project: DataFu
>          Issue Type: Improvement
>            Reporter: Russell Jurney
>         Attachments: DATAFU-23.patch
>
>
> /* Now group by time down to the hour, our time series granularity */
> grouped_by_time = GROUP bytes_in_out BY (GetYear(date_time), 
> GetMonth(date_time), GetDay(date_time), GetHour(date_time));
> bytes_per_hour = FOREACH grouped_by_time GENERATE FLATTEN(group) AS (year, 
> month, day, hour), 
>                                                   SUM(bytes_in_out.sc_bytes) 
> AS total_sc_bytes,
>                                                   SUM(bytes_in_out.cs_bytes) 
> AS total_cs_bytes;
> /* Now convert time elements back into a key for HBase */
> bytes_per_hour = FOREACH bytes_per_hour GENERATE ToDate(StringConcat(year, 
> '-', month, '-', day, 'T', hour, ':00:00.000Z')) AS date_time, 
>                                                  total_sc_bytes, 
>                                                  total_cs_bytes;
> The previous code will erroneously generate bad ISO8601 dates, looking like 
> this: "2005-1-1:1:00:00.000Z"
> Therefore a PadZero utility is needed to regenerate ISO8601 keys after 
> grouping by date pieces.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to