[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated PIG-1314:
-----------------------------

    Attachment: joda_vs_builtin.zip

As suggested by Thejas, I've done performance comparison between JODA and 
builtin datetime-related objects. For each function, I repeated 100,000 times 
of computation, and calculated the time respectively. Please refer to the 
attachment for the code details. Bellow is the summary of the results (unit is 
millisecond):

ISOToSecond: JODA-958 Builtin-1326
ISOToMinute: JODA-532 Builtin-850
ISOToHour: JODA-414 Builtin-680
ISOToDay: JODA-475 Builtin-685
ISOToMonth: JODA-463 Builtin-692
ISOToYear: JODA-462 Builtin-715
ISOSecondsBetween: JODA-961 Builtin-968
ISOMinutesBetween: JODA-734 Builtin-565
ISOHoursBetween: JODA-596 Builtin-656
ISODaysBetween: JODA-592 Builtin-555
ISOMonthsBetween: JODA-586 Builtin-968
ISOYearsBetween: JODA-654 Builtin-952
ISOToUnix: JODA-678 Builtin-6965
UnixToISO: JODA-225 Builtin-206
Custom Format 1 [yyyy.MM.dd G 'at' HH:mm:ss.SSS Z]: JODA-596 Builtin-6914
Custom Format 2 [yyyyy.MMMMM.dd GGG hh:mm aaa]: JODA-534 Builtin-425

Two major conclusions are as follows:
1. The datetime operations with the help of JODA generally performs as good as 
those with the builtin data structure (according to my implementation), except 
the operation of parsing a time string.
2. It is found that based on my implementation, the builtin data structure 
needs one more order of magnitude of time to parse a time string when the 
format has a timezone component (i.e., "Z").

To sum up, my suggestion is that since JODA provides no worse performance and 
more trustworthy correctness, I vote for going on with JODA when implementing 
the datetime primitive type.
                
> Add DateTime Support to Pig
> ---------------------------
>
>                 Key: PIG-1314
>                 URL: https://issues.apache.org/jira/browse/PIG-1314
>             Project: Pig
>          Issue Type: Bug
>          Components: data
>    Affects Versions: 0.7.0
>            Reporter: Russell Jurney
>            Assignee: Zhijie Shen
>              Labels: gsoc2012
>         Attachments: joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to