[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhijie Shen updated PIG-1314: ----------------------------- Attachment: joda_vs_builtin.zip As suggested by Thejas, I've done performance comparison between JODA and builtin datetime-related objects. For each function, I repeated 100,000 times of computation, and calculated the time respectively. Please refer to the attachment for the code details. Bellow is the summary of the results (unit is millisecond): ISOToSecond: JODA-958 Builtin-1326 ISOToMinute: JODA-532 Builtin-850 ISOToHour: JODA-414 Builtin-680 ISOToDay: JODA-475 Builtin-685 ISOToMonth: JODA-463 Builtin-692 ISOToYear: JODA-462 Builtin-715 ISOSecondsBetween: JODA-961 Builtin-968 ISOMinutesBetween: JODA-734 Builtin-565 ISOHoursBetween: JODA-596 Builtin-656 ISODaysBetween: JODA-592 Builtin-555 ISOMonthsBetween: JODA-586 Builtin-968 ISOYearsBetween: JODA-654 Builtin-952 ISOToUnix: JODA-678 Builtin-6965 UnixToISO: JODA-225 Builtin-206 Custom Format 1 [yyyy.MM.dd G 'at' HH:mm:ss.SSS Z]: JODA-596 Builtin-6914 Custom Format 2 [yyyyy.MMMMM.dd GGG hh:mm aaa]: JODA-534 Builtin-425 Two major conclusions are as follows: 1. The datetime operations with the help of JODA generally performs as good as those with the builtin data structure (according to my implementation), except the operation of parsing a time string. 2. It is found that based on my implementation, the builtin data structure needs one more order of magnitude of time to parse a time string when the format has a timezone component (i.e., "Z"). To sum up, my suggestion is that since JODA provides no worse performance and more trustworthy correctness, I vote for going on with JODA when implementing the datetime primitive type. > Add DateTime Support to Pig > --------------------------- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data > Affects Versions: 0.7.0 > Reporter: Russell Jurney > Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira