[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405724#comment-13405724
 ] 

Zhijie Shen commented on PIG-1314:
----------------------------------

There's some issues with loading/storing pig data. When store a DateTime object 
with "Utf8StorageConverter" without using UDFs to convert it to some string, 
should we serialize it as a millis+timezone composite, or output an UTC-style 
datetime string (e.g., 2012-07-03T08:14:19.962+01:00))? The latter operation 
behaves the same as uses "String ToString(DateTime d)" before storing the 
string? Personally, I like the latter choice, because the data is directly 
readable from the stored files.

On the other hand, if a datetime object is stored in the file as a datetime 
string, when we load it again as a datetime object, should we use the default 
timezone or use the one specified in the timezone string (e.g., +01:00 in the 
last example)? I again prefer the second choice. When we use Pig, it is 
possible to do a bunch of store/load to achieve some goal. The timezone 
information need to be preserved. For example, let's assume +08:00 is the 
default timezone. A datatime object whose individual timezone is -04:00 is 
stored as a string, which will have -04:00 as suffix. When the string is loaded 
as a datetime object for further process, we'd better keep to the previously 
used timezone, -04:00, instead of the default one.

How do you think about this? Thanks!

                
> Add DateTime Support to Pig
> ---------------------------
>
>                 Key: PIG-1314
>                 URL: https://issues.apache.org/jira/browse/PIG-1314
>             Project: Pig
>          Issue Type: Bug
>          Components: data
>    Affects Versions: 0.7.0
>            Reporter: Russell Jurney
>            Assignee: Zhijie Shen
>              Labels: gsoc2012
>         Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to