[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401613#comment-13401613
 ] 

Thejas M Nair commented on PIG-1314:
------------------------------------

bq. As far as I know, either Java builtin Date or Joda DateTime uses 
millisecond-shift (stored in a long integer variable) from the midnight UTC, 
which is not exactly the Unix time. 
Yes, as you noted, the difference is unix timestamp can store upto +/- 292 
Billion years, while Joda DateTime supports only +/- 292 Milllion years. Which 
should be sufficient for most practical purposes! :)

bq. The time zone determines only determines the ISO time string,
It also affects the field values, (getDayOfWeek(), getHourOfDay() etc. In your 
data, you can have dates belonging to different timezones, and users might want 
to retain that information. 
An example of use case where timezone also needs to be stored - if you want to 
do analysis of how many people come to a global website during their morning 
hours, you want to .getHourOfDay() to return the hour as per local timezone. 

We need an efficient way to serialize timezone along with the long. Can you 
propose something ? (Maybe, just make it efficient for 256 most 'popular' 
timezones and store it a byte. And not have the byte for UTC. For other 
timezones,  add a timezone string ?) 

bq. When we need to convert the DateTime object to Unix time string, we may use 
the default time zone of the Pig environment 
If the date field has the timezone value in it, we don't have to rely on 
default time zone to convert to unix time stamp. (assuming that is what you 
meant by 'unix time *string*' )
But udfs like DateTime ToDate(String s) where timezone might not be specified, 
we need a default timezone. I think we should use the default timezone on the 
pig client machine. Using the default time zone on each task tracker node can 
lead to a nightmare in debugging if one of the nodes happens to have a 
different timezone. We should allow the user to set a default timezone using a 
pig property. 

bq. We probably need one more UDF String ToString(DateTime d, String format, 
String timezone)
Having timezone argument in this call is necessary only if user wants to print 
the time for a different timezone. This is useful, but not mandatory. 


bq.Since the ISO duration is non-negative (Please correct me if I'm wrong), we 
need to SubstractDuration as well.
Yes, you are right. I could not find any references to negative values in ISO 
duration. Lets add SubstractDuration

Trivia from wikipedia: 64 bit unix timestamp, in the negative direction, goes 
back more than twenty times the age of the universe 

                
> Add DateTime Support to Pig
> ---------------------------
>
>                 Key: PIG-1314
>                 URL: https://issues.apache.org/jira/browse/PIG-1314
>             Project: Pig
>          Issue Type: Bug
>          Components: data
>    Affects Versions: 0.7.0
>            Reporter: Russell Jurney
>            Assignee: Zhijie Shen
>              Labels: gsoc2012
>         Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to