[ https://issues.apache.org/jira/browse/DATAFU-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883692#comment-13883692 ]
Sam Shah commented on DATAFU-23: -------------------------------- Agree with others -- this is a workaround for something that should be fixed in another way. > Create datafu.pig.util.PadZero to pad integers < 10 with 0s > ----------------------------------------------------------- > > Key: DATAFU-23 > URL: https://issues.apache.org/jira/browse/DATAFU-23 > Project: DataFu > Issue Type: Improvement > Reporter: Russell Jurney > Attachments: DATAFU-23.patch > > > /* Now group by time down to the hour, our time series granularity */ > grouped_by_time = GROUP bytes_in_out BY (GetYear(date_time), > GetMonth(date_time), GetDay(date_time), GetHour(date_time)); > bytes_per_hour = FOREACH grouped_by_time GENERATE FLATTEN(group) AS (year, > month, day, hour), > SUM(bytes_in_out.sc_bytes) > AS total_sc_bytes, > SUM(bytes_in_out.cs_bytes) > AS total_cs_bytes; > /* Now convert time elements back into a key for HBase */ > bytes_per_hour = FOREACH bytes_per_hour GENERATE ToDate(StringConcat(year, > '-', month, '-', day, 'T', hour, ':00:00.000Z')) AS date_time, > total_sc_bytes, > total_cs_bytes; > The previous code will erroneously generate bad ISO8601 dates, looking like > this: "2005-1-1:1:00:00.000Z" > Therefore a PadZero utility is needed to regenerate ISO8601 keys after > grouping by date pieces. -- This message was sent by Atlassian JIRA (v6.1.5#6160)