Russell Jurney created DATAFU-23:
------------------------------------
Summary: Create datafu.pig.util.PadZero to pad integers < 10 with
0s
Key: DATAFU-23
URL: https://issues.apache.org/jira/browse/DATAFU-23
Project: DataFu
Issue Type: Improvement
Reporter: Russell Jurney
/* Now group by time down to the hour, our time series granularity */
grouped_by_time = GROUP bytes_in_out BY (GetYear(date_time),
GetMonth(date_time), GetDay(date_time), GetHour(date_time));
bytes_per_hour = FOREACH grouped_by_time GENERATE FLATTEN(group) AS (year,
month, day, hour),
SUM(bytes_in_out.sc_bytes) AS
total_sc_bytes,
SUM(bytes_in_out.cs_bytes) AS
total_cs_bytes;
/* Now convert time elements back into a key for HBase */
bytes_per_hour = FOREACH bytes_per_hour GENERATE ToDate(StringConcat(year, '-',
month, '-', day, 'T', hour, ':00:00.000Z')) AS date_time,
total_sc_bytes,
total_cs_bytes;
The previous code will erroneously generate bad ISO8601 dates, looking like
this: "2005-1-1:1:00:00.000Z"
Therefore a PadZero utility is needed to regenerate ISO8601 keys after grouping
by date pieces.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)