[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476519#comment-13476519 ] Dmitriy V. Ryaboy commented on PIG-1314: A chunk of this is committed, and it's not clear what's left to do. Can we close this and create a new ticket for the remaining work? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, > PIG-1314-7.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443630#comment-13443630 ] Thejas M Nair commented on PIG-1314: Yes, that was not intentional. Deleted JobControlCompiler.java.orig in svn. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, > PIG-1314-7.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443614#comment-13443614 ] Julien Le Dem commented on PIG-1314: Hi Thejas, this commit added JobControlCompiler.java.orig which I suspect is not what you intended. http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java.orig?view=log&pathrev=1376800 Could you double check? Thanks, Julien > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, > PIG-1314-7.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440860#comment-13440860 ] Zhijie Shen commented on PIG-1314: -- Hi Thejas, let me do that. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, > PIG-1314-7.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440858#comment-13440858 ] Thejas M Nair commented on PIG-1314: We also need to have some test cases that set the timezone property. This might not be easy to do in the e2e framework, so unit test cases are better candidate for this. Please let me know if you need any help. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, > PIG-1314-7.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440851#comment-13440851 ] Thejas M Nair commented on PIG-1314: PIG-1314-7.patch committed to trunk! Thanks Zhijie. We need to update the documentation regarding this change. Can you please upload a new patch for that ? To see generated docs, run - ant -Dforrest.home= docs. The files to be edited are under - trunk/src/docs/src/documentation/ . We should also add a few end to end test cases for datetime. See https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-EndtoendTesting . We should have a few queries that do some of the basic operations on date time, and queries that have order-by , group and join on date fields. These can be submitted as multiple patches. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, > PIG-1314-7.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436528#comment-13436528 ] Zhijie Shen commented on PIG-1314: -- {quote} I have one suggestion - add getWeeks and weeksBetween, if it isn't inconvenient. I think Jodatime can do this. It is useful when dealing in weeks. {quote} Yes, week field should be useful. In addition to it, I think it's better to add getWeekYear as well, because using weeks of year alone may cause ambiguity sometimes. For example, both "2008-12-31" and "2009-01-01" are week 1 of weekyear 2009, though the two dates are in two different years. In addition, do you think it is better to rename some time UDFs as follows? getMonth -> getMonthOfYear getDay -> getDayOfMonth (do we need getDayOfWeek and getDayOfYear as well?) getHour -> getHourOfDay getMinute -> getMinuteOfHour getSecond -> getSecondOfMinute getMilliSecond -> getMilliOfSecond The changes will make UDFs' names longer but clearer. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436477#comment-13436477 ] Russell Jurney commented on PIG-1314: - I have one suggestion - add getWeeks and weeksBetween, if it isn't inconvenient. I think Jodatime can do this. It is useful when dealing in weeks. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434787#comment-13434787 ] Zhijie Shen commented on PIG-1314: -- {quote} I believe you should be able to set the default timezone property in PigContext constructor, and also let user override the default. In backend, you can access the value using something like - PigMapReduce.sJobConfInternal.get().get("pig.datetime.default.tz"). {quote} Thank you, Thejas! Let me investigate this issue. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434760#comment-13434760 ] Russell Jurney commented on PIG-1314: - I agree with Thejas. The user will want to control the timezone of NOW() without having to reconfigure the hadoop cluster/contact the hadoop administrator. Setting this on the client is consistent with Pig as a client-side technology. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434733#comment-13434733 ] Thejas M Nair commented on PIG-1314: bq. 2. According to your last response, I'm not clear how the default timezone of client can be sent to the server with the code. In my opinion, the default timezone should be specified on the server side by configuration, which should be taken care of by administrators. How do you think about this. I believe you should be able to set the default timezone property in PigContext constructor, and also let user override the default. In backend, you can access the value using something like - PigMapReduce.sJobConfInternal.get().get("pig.datetime.default.tz"). > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, > PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428458#comment-13428458 ] Thejas M Nair commented on PIG-1314: bq. 1. You've mentioned that we need to propagate the timezone from the client to backend, where the udfs get executed. How the timezone should be propagated to the backend, which I assume the machine that runs the code? Yes bq. Previously I made the timezone setting in pig.properties, which will be loaded when PigServer runs, such that the default timezone will be set. Consequently, if a datetime object is created without specifying the timezone, the default one will be used. However, do you mean some other way? It is possible that some of the task nodes might be misconfigured and have different default time zone. In such cases, the results won't be what you want and it will be very difficult to debug. So the default timezone on the client should be used in the nodes as well. bq. I convert the location-based timezone to the utc-offset one and only use utc-offset style internally. Therefore, the aforementioned two equal datetime objects will not be mis-treated. Sounds good. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, > PIG-1314-4.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413472#comment-13413472 ] Zhijie Shen commented on PIG-1314: -- Hi Thejas, Thanks for your review. I'll check out your comments. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, > joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412452#comment-13412452 ] Thejas M Nair commented on PIG-1314: Zhijie, I have added comments on your latest patch in https://reviews.apache.org/r/5414/. Yes, lets focus on test cases now, so that we can get an initial version committed. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, > joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406238#comment-13406238 ] Zhijie Shen commented on PIG-1314: -- {quote} But the timezone part and time part of the datetime string should be optional. Does jodatime support that? {quote} Yes, these two parts are not mandatory. The default time value is "00:00:00.000" while the default timezone offset is "+00:00". When the datetime object is outputed an ISO-format string, the default parts will be filled up (e.g., 2012-07-03T00:00:00.000Z). > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405971#comment-13405971 ] Thejas M Nair commented on PIG-1314: PigStorage is meant to be a human readable format. So that is another reason to store the timestamp in the ISO string as you suggested. Yes, If the timezone is specified in the string, pig should use that value. But the timezone part and time part of the datetime string should be optional. Does jodatime support that ? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405724#comment-13405724 ] Zhijie Shen commented on PIG-1314: -- There's some issues with loading/storing pig data. When store a DateTime object with "Utf8StorageConverter" without using UDFs to convert it to some string, should we serialize it as a millis+timezone composite, or output an UTC-style datetime string (e.g., 2012-07-03T08:14:19.962+01:00))? The latter operation behaves the same as uses "String ToString(DateTime d)" before storing the string? Personally, I like the latter choice, because the data is directly readable from the stored files. On the other hand, if a datetime object is stored in the file as a datetime string, when we load it again as a datetime object, should we use the default timezone or use the one specified in the timezone string (e.g., +01:00 in the last example)? I again prefer the second choice. When we use Pig, it is possible to do a bunch of store/load to achieve some goal. The timezone information need to be preserved. For example, let's assume +08:00 is the default timezone. A datatime object whose individual timezone is -04:00 is stored as a string, which will have -04:00 as suffix. When the string is loaded as a datetime object for further process, we'd better keep to the previously used timezone, -04:00, instead of the default one. How do you think about this? Thanks! > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403114#comment-13403114 ] Zhijie Shen commented on PIG-1314: -- Hi Thejas, I'll take your suggestions. Thanks! > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402517#comment-13402517 ] Russell Jurney commented on PIG-1314: - This sounds good to me. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402467#comment-13402467 ] Thejas M Nair commented on PIG-1314: bq. Or we temporally set aside the performance issue right now, and move forward to make timezone serialization work by simply serializing the timezone id string. We can add features later, but dropping features later won't be good. In my opinion, the support for long timezone name is not going to be needed by most people. I think we can support it only for creating a DateTime field, but say that pig will not preserve the long name. Pig will only retain hours+minute offset (no seconds and milliseconds!). The hour+min offset form is portable and more likely to be supported by other serialization formats. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402061#comment-13402061 ] Zhijie Shen commented on PIG-1314: -- {quote} Yes, it will be lossy, but the part that is important for date calculations is preserved. The ISO spec only has offset for timezone. I don't think we have to allow datetime field to be used for storing location information. Does JodaTime preserve the location string ? {quote} Yes, I think so. If I get an DateTimeZone object by DateTimeZone.forID("asia/singapore"), the returned DateTimeZone object doesn't change to "+08:00", but keeps "asia/singapore". We'd better preserve it because when users want to output the time in their customized format that has "z" in the pattern string, the exact timezone can be outputed. {quote} But won't jodatime support a timezone outside this list, If the user specifies a date using the UTC offset format ? {quote} Yes, DateTimeZone.forID() also allows UTC offset string as input, such as "+08:00", though it is not in the list. However, the offset can be value in the range [-23:59:59.999, +23:59:59.999], and the minimal granularity can be the millisecond Then, we are expected to have a combined lookup table that maps canonical timezone ids and UTC offset to their concise representation. Do you have any suggestion here? Or we temporally set aside the performance issue right now, and move forward to make timezone serialization work by simply serializing the timezone id string. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401937#comment-13401937 ] Thejas M Nair commented on PIG-1314: bq. Several time zones may share the same UTC offset, such that when the reverse operation is to do, it will be unknown which timezone the UTC offset should be converted to. Yes, it will be lossy, but the part that is important for date calculations is preserved. The ISO spec only has offset for timezone. I don't think we have to allow datetime field to be used for storing location information. Does JodaTime preserve the location string ? bq. I'm not sure whether "getAvailableIDs" returns the same time zone list on all machines or is machine-dependent. It depends on the release/jar (http://joda-time.sourceforge.net/tz_update.html). As pig will be shipping this jar to the nodes, it is ok to assume that it will be the same across all nodes for a query. So it is safe to rely on the id for intermediate serialization. But won't jodatime support a timezone outside this list, If the user specifies a date using the UTC offset format ? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401884#comment-13401884 ] Zhijie Shen commented on PIG-1314: -- Hi Thejas and Russell, I'll do serialization for timezone as well. {quote} I think converting the string timezone (location name) to UTC offset in minutes, is one possibility. {quote} In my opinion, this kind of compression is lossy. Several time zones may share the same UTC offset, such that when the reverse operation is to do, it will be unknown which timezone the UTC offset should be converted to. {quote} We need an efficient way to serialize timezone along with the long. Can you propose something ? (Maybe, just make it efficient for 256 most 'popular' timezones and store it a byte. And not have the byte for UTC. For other timezones, add a timezone string ?) {quote} The time zone class in either builtin and joda has the function "getAvailableIDs", which returns all the available time zone strings. On my machine, I got 616 from the builtin time zone while 558 from the joda one. Probably we can have a one-to-one mapping between the time zone strings and the integer ids in short variables. However the "available" in the function "getAvailableIDs" sounds tricky. I'm not sure whether "getAvailableIDs" returns the same time zone list on all machines or is machine-dependent. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401786#comment-13401786 ] Thejas M Nair commented on PIG-1314: bq. Are we discussing a user-facing API, or an internal storage mechanism? Some questions were about interface, some about internal storage. bq. Regarding the interface, presenting integers to a user as an interface seems wrong to me. Converting dates to integer is something user can optionally do, this is not expected to be a common use case. String representation of date literals will also be supported. Most operations will be on date type itself, without converting it to int/string. bq. Excluding certain timezones in the name of efficiency also seems wrong to me. All timezones supported by JodaTime will be supported. I was only proposing that we encode the timezone info efficiently, at least for most likely used ones. I think converting the string timezone (location name) to UTC offset in minutes, is one possibility. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401676#comment-13401676 ] Russell Jurney commented on PIG-1314: - Jodatime seems to solve these problems. Serializing from a string without a timezone, it does things in a reasonable manner. Serializing things from a string with a timezone, it does things in a reasonable manner. Are we discussing a user-facing API, or an internal storage mechanism? I'm not clear on which. Regarding the interface, presenting integers to a user as an interface seems wrong to me. Excluding certain timezones in the name of efficiency also seems wrong to me. The point of a datetime type is to add timezones, otherwise we can simply use longs. As an internal storage mechanism, I'm un-opinionated, so long as all timezones are retained at all times. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401613#comment-13401613 ] Thejas M Nair commented on PIG-1314: bq. As far as I know, either Java builtin Date or Joda DateTime uses millisecond-shift (stored in a long integer variable) from the midnight UTC, which is not exactly the Unix time. Yes, as you noted, the difference is unix timestamp can store upto +/- 292 Billion years, while Joda DateTime supports only +/- 292 Milllion years. Which should be sufficient for most practical purposes! :) bq. The time zone determines only determines the ISO time string, It also affects the field values, (getDayOfWeek(), getHourOfDay() etc. In your data, you can have dates belonging to different timezones, and users might want to retain that information. An example of use case where timezone also needs to be stored - if you want to do analysis of how many people come to a global website during their morning hours, you want to .getHourOfDay() to return the hour as per local timezone. We need an efficient way to serialize timezone along with the long. Can you propose something ? (Maybe, just make it efficient for 256 most 'popular' timezones and store it a byte. And not have the byte for UTC. For other timezones, add a timezone string ?) bq. When we need to convert the DateTime object to Unix time string, we may use the default time zone of the Pig environment If the date field has the timezone value in it, we don't have to rely on default time zone to convert to unix time stamp. (assuming that is what you meant by 'unix time *string*' ) But udfs like DateTime ToDate(String s) where timezone might not be specified, we need a default timezone. I think we should use the default timezone on the pig client machine. Using the default time zone on each task tracker node can lead to a nightmare in debugging if one of the nodes happens to have a different timezone. We should allow the user to set a default timezone using a pig property. bq. We probably need one more UDF String ToString(DateTime d, String format, String timezone) Having timezone argument in this call is necessary only if user wants to print the time for a different timezone. This is useful, but not mandatory. bq.Since the ISO duration is non-negative (Please correct me if I'm wrong), we need to SubstractDuration as well. Yes, you are right. I could not find any references to negative values in ISO duration. Lets add SubstractDuration Trivia from wikipedia: 64 bit unix timestamp, in the negative direction, goes back more than twenty times the age of the universe > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401162#comment-13401162 ] Russell Jurney commented on PIG-1314: - Whatever the format is, I think we should serialize/persist DateTimes in a way that the timezone stays with the datetime. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401155#comment-13401155 ] Zhijie Shen commented on PIG-1314: -- Dear Thejas and Russell, {quote} 1) Don't persist DateTimes as ints/longs unless you also persist a timezone offset with it somehow (is this possible?). I forgot about timezone. We need to serialize the timezone information as well, while supporting the same range of dates as JodaTime . With int/long this will not be possible. (Zhijie can you confirm ?) {quote} As far as I know, either Java builtin Date or Joda DateTime uses millisecond-shift (stored in a long integer variable) from the midnight UTC, which is not exactly the Unix time. Importantly, the millisecond-shift has nothing to do with the time zone. For example, both new DateTime(922337201704319L, DateTimeZone.UTC).getMillis(); and new DateTime(922337201704319L, DateTimeZone.forID("Asia/Singapore").getMillis(); will return the same value, that is, 922337201704319L. The time zone only determines only determines the ISO time string, such that the two DateTime objects will output different ISO time strings when toString() is called. Hence I think the long variable which represents the millisecond-shift is good for internal serialization. When we need to convert the DateTime object to Unix time string, we may use the default time zone of the Pig environment (I'm still working on this. Please let me know how you think the Pig-wide time zone should be set.) or the user-defined time zone (We probably need one more UDF String ToString(DateTime d, String format, String timezone)). AS to Pig DateTime, internal Joda DateTime objects is either created with the long variable of millisecond-shift or with ISO time string. Initialization with a long variable (from Long.MIN_VALUE to Long.MAX_VALUE) has no range problem when getMillis() is called, obtaining the result ranged from Long.MIN_VALUE to Long.MAX_VALUE as well. Initialization with a ISO time string, the JODA DateTime object only accepts the year in the range [-292275054,292278993], such that the corresponding millisecond-shift is also within [Long.MIN_VALUE, Long.MAX_VALUE]. In summary, the range will be fine when Long is used for serialization. Please correct me if I'm wrong. Thanks a lot! {quote} 2) Consider using jodatime/ISO8601 durations for date math, as a separate type. i.e. If this extends scope too far, save it for later. http://en.wikipedia.org/wiki/ISO_8601#Durations +1 . This is much cleaner. Lets use replace the Add* functions with just AddDuration . For example AddDuration(d1, "P3Y"), would return d1 + 3 years. {quote} +1. In this way, it is more flexible for users to define the amount of time to add/subtract. Since the ISO duration is non-negative (Please correct me if I'm wrong), we need to SubstractDuration as well. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401107#comment-13401107 ] Thejas M Nair commented on PIG-1314: bq. 1) Don't persist DateTimes as ints/longs unless you also persist a timezone offset with it somehow (is this possible?). I forgot about timezone. We need to serialize the timezone information as well, while supporting the same range of dates as JodaTime . With int/long this will not be possible. (Zhijie can you confirm ?) bq. 2) Consider using jodatime/ISO8601 durations for date math, as a separate type. i.e. If this extends scope too far, save it for later. http://en.wikipedia.org/wiki/ISO_8601#Durations +1 . This is much cleaner. Lets use replace the Add* functions with just AddDuration . For example AddDuration(d1, "P3Y"), would return d1 + 3 years. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401081#comment-13401081 ] Russell Jurney commented on PIG-1314: - A couple comments: 1) Don't persist DateTimes as ints/longs unless you also persist a timezone offset with it somehow (is this possible?). Persisting timezones is one of the key benefits of a DateTime type in my opinion. At Hadoop scale you are often dealing with events from different sites/locations. DateTime needs timezone, or we can just use long/unix time. 2) Consider using jodatime/ISO8601 durations for date math, as a separate type. i.e. If this extends scope too far, save it for later. http://en.wikipedia.org/wiki/ISO_8601#Durations Although it may be inefficient, I would encourage an ISO8601 string representation during serialization. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400705#comment-13400705 ] Thejas M Nair commented on PIG-1314: bq. what does DateTime DateAdd(DateTime d1) mean? Adding datetime based on the current time? Discussed this with Daniel. I think it makes sense to replace this with different functions - // add number of days specified in days param to the DateTime date. // The days param can be positive or negative AddYears(DateTime date, int days); Similarly we should have AddMonths, AddDays, AddHours .. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400299#comment-13400299 ] Thejas M Nair commented on PIG-1314: bq. 1. Pig can also import into and export from HBase storage, which also doesn't have the primitive DataTime. Throw exception in this case as well, correct? Yes. The exception should be thrown from HBaseStorage. bq. if we conclude the design for Avro, we should keep to it for the others. Please note that pig does not have a way of know if the format will support datetime. The behavior will be controlled by the storage func implementation. But for the ones that are part of pig codebase, I think we should throw an exception. bq. 3. DateTime is serialized as a Long value (Unix timestamp) when it is necessary. JodaTime supports milliseconds as well. Will we be able to convert all values within limits of JodaTime date into a long ? bq. the output datatype of DiffDate(DateTime d1, DateTime d2) should use long instead of int, because the diff may be too large for int range to conver. Makes sense, we should use a type that is appropriate for range. bq. what does DateTime DateAdd(DateTime d1) mean? Adding datetime based on the current time? Not sure. Daniel, do you know ? bq. we allow explicit cast between datetime and string, correct? Similarly, do we allow explicit cast between datetime and long/int (representing unix timestamp)? Yes, we should support explicit cast between these types. Though conversion to int might not be successful for all datetime values. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292834#comment-13292834 ] Zhijie Shen commented on PIG-1314: -- {quote} Avro might store DateTimes as an ISO string? {quote} It's possible, but there seems to be one problem. If we store a datetime as an iso string, how do we determine whether a string is just a string or a datetime when it is loaded? One more issue is that it' good to keep all the IO targets that does not support datetime handle the IO process uniformly. Hence if we conclude the design for Avro, we should keep to it for the others. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291961#comment-13291961 ] Russell Jurney commented on PIG-1314: - Avro might store DateTimes as an ISO string? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288680#comment-13288680 ] Thejas M Nair commented on PIG-1314: bq. When adding the DateTime type for Pig, we need to take care of the I/O with AVRO, which still doesn't support the Date/Time type. StoreFuncs that write in avro format will need to throw an exception if the schema being stored contains a datetime type. That will force the users to serialize datetime as some other type. As long as we are not breaking existing pig queries don't use datetime type, we should be fine. Avro is just one of the many formats. Regarding AugmentBaseDataVisitor, that is used for example generation. (see [sigmod paper on illustrate feature | http://infolab.stanford.edu/~olston/publications/sigmod09.pdf] for details) . For example, if there is no value in col1 in sample that satisfies "col1 > 0", a value of col1 > 0 is generated. This will be useful for datetime type as well. To have a more realistic value generated (similar to values in input), I think we should increment/decrement the smallest field that is non zero. For example if the millisecond and second fields are 0, but hour field is non zero, increment that. If all time parts are 0, but day of month is not, increment that. In case of boolean, as we don't support > or < operations, these functions do not make sense. Thanks for bringing this up. I had forgot about this use case. We should add a few unit tests for example generation that involve datetime. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288203#comment-13288203 ] Zhijie Shen commented on PIG-1314: -- One more issue needs to be clarified: In the AugmentBaseDataVisitor class, there're two functions: Object GetSmallerValue(Object v) and Object GetLargerValue(Object v). where if v is a numeric value, v is added or reduced by one while if v is a byte array, it is added or reduced by one byte. Then, how do we do if v is a datetime? I vote for returning null, and am looking forward to the community's opinions. By the way, how about if v is a boolean, which seems not to be handled? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284603#comment-13284603 ] Thejas M Nair commented on PIG-1314: CURRENT_TIME() might be a more intuitive alias for DATETIME(NOW). I think we can consider adding support for DATE and CURRENT_TIMESTAMP() as a next step after adding DATETIME. We can focus on DATETIME in this jira. I also had a look at timestamp datatype that was added to hive, to see if it will be interoperable (through hcatalog). The only difference is that hive timestamp type supports storing up to nano second precision, while jodatime supports only up to millisecond. Nanoseconds are not likely to be used in most cases, so loosing that precision when converting hive timestamp to pig datetime should be OK in most cases. The range of years supported in both cases is also approximately the same. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284592#comment-13284592 ] Russell Jurney commented on PIG-1314: - "DATETIME" makes sense, but "TIMESTAMP" is a good (simple) alias for DATETIME(NOW). "DATE" is a good alias for a date-truncated DATETIME. I'm not sure if you would want to implement these in Pig... as there is clearly less utility than in a database, where for instance a TIMESTAMP can be updated whenever a field is written or updated. Maybe "DATE" and not "TIMESTAMP," but only as an afterthought? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284575#comment-13284575 ] Thejas M Nair commented on PIG-1314: bq. One quick issue: we need to give a name to the new type. We are supposed to use "DATETIME", correct? Or "DATE", "TIMESTAMP"? "datetime" makes sense when it has both date and time (hrs,mins,secs) parts to it. The problem with using (unix) timestamp, is that the date range is limited to 78 years. Using jodatime, we will be able to support much larger date range than timestamp. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284392#comment-13284392 ] Zhijie Shen commented on PIG-1314: -- One quick issue: we need to give a name to the new type. We are supposed to use "DATETIME", correct? Or "DATE", "TIMESTAMP"? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284059#comment-13284059 ] Russell Jurney commented on PIG-1314: - I concur about JODA. So far as I know you can't even parse ISO times with java builtins without using javax.xml.bind.DatatypeConverter, and it is ugly and slow. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244985#comment-13244985 ] Zhijie Shen commented on PIG-1314: -- Ah, I forgot doing it. Public now:-) > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Labels: gsoc2012 > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244980#comment-13244980 ] Prashant Kommireddi commented on PIG-1314: -- Thanks Zhijie. Can you please make it public? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Labels: gsoc2012 > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244979#comment-13244979 ] Zhijie Shen commented on PIG-1314: -- I've pasted the proposal to the official website: http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/zjshen/21002 Any comments are welcome, such that I can improve the proposal in the remaining days. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Labels: gsoc2012 > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238187#comment-13238187 ] Daniel Dai commented on PIG-1314: - I would like to mentor this. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Labels: gsoc2012 > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238044#comment-13238044 ] Zhijie Shen commented on PIG-1314: -- Coincidentally, I'm that person making Boolean working:-) Daniel helped me a lot to work out that issue, if he'd like to mentor this one, it will also be awesome. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Labels: gsoc2012 > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238037#comment-13238037 ] Russell Jurney commented on PIG-1314: - I am happy to help regarding questions about the DateTime UDFs, but do not remember the internals of my attempt to add Boolean in preparation for DateTime. I suggest the comitter who got Boolean working would be a good candidate? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Labels: gsoc2012 > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237884#comment-13237884 ] Zhijie Shen commented on PIG-1314: -- Hi folks, Below is my proposal draft. Any comments are welcome:-) == Proposal Title: Adding the Datetime Type as a Primitive for Pig Student Name: Zhijie Shen Student E-mail: zjshe...@gmail.com Organization/Project: Apache Software Foundation - Pig Assigned Mentor: Daniel Dai /Russell Jurney Proposal Abstract: Apache Pig is a platform for analyzing large data sets based on Hadoop. Currently Pig does not support the primitive datetime type [1], which is a desired feature to be implemented. In this proposal, I explain my plan to implement the primitive datetime type, including the details of my solution and schedule. Additionally, I briefly introduce my background and the motivation of applying GSoC'12. Detailed Description: 1. Understanding of the Project 1.1 What is Apache Pig? Apache Pig is a platform for analyzing large data sets. Notably, at Yahoo! 40% of all Hadoop jobs are run with Pig [5]. Pig has is own dataflow language, named Pig Latin, which encapsulates map/reduce jobs step-by-step, and offers the relational primitives such as LOAD, FOREACH, GROUP, FILTER and JOIN. Pig provides many built-in functions, but also allow users to define their user-defined functions (UDFs) to achieve particular purposes. There are more benefits: Pig can operates on the plain files directly without any schema information; it has a flexible, nested data model, which is more compatible with that of major programming languages; it provides a debugging environment. 1.2 Why primitive datetime type is required? Datetime is a conventional data type in many of database management systems as well as programming languages. Within the Hadoop ecosystem, Hive, which is an analog of Pig, also supports the primitive datetime type (timestamp actually). In contrast, Pig does not fully support this type. Currently, users can only use the string type for the datetime data, and rely on the UDF which takes datetime strings. However, Pig is supposed to primarily parse log data, and most log data has attributes in the datetime type. Consequently, it is desired for Pig to support the datetime type as a primitive. By doing so, we can expect the following benefits: a more compact serialized format, working with conventional operators (+/-/==/!=/), a dedicated faster comparator, being sortable, fewer times of runtime conversion from string, and relieving users from deciding the input datetime string format. 2. Roadmap of Implementing the New Feature 2.1 To Do List 2.1.1 Adding Support in Antlr Parser Pig Latin supports the assign data type explicitly, such that the “datetime” keyword and some constants, such as “now()” and “today()” can be recognized. The related syntax needs to be added into 5 antlr scripts: AliasMasker.g, AstPrinter.g, AstValidator.g, LogicalPlanGenerator.g, QueryParser.g. 2.1.2 Adding Datetime as a Primitive The dateime type should be added into the DataType class, and the basic conversion between it and other data types need to be defined. Previously, the internal data structure relies on Joda datetime data type, which is more powerful than java.util.DateTime, but much easier than java.util.Calendar. Hence it is wise to keep this convention. Moreover, be careful that implicit type cast from/to the datetime type is not allowed. I also need to change the LoadCaster and StoreCaster interfaces to include bytesToDateTime/toBytes(DateTime) method, and add details to the classes that implemented these two interfaces. In addition, I need override +/-/==/!=/ operators for the datetime type, mapping the to some bulitin EvalFuncs. The TypeCheckingExpVisitor class needs to be modified as well to support the datetime type vailidation. One important issue is that according to my previous experience, the data type related code in Pig is widely spread, such that I need to be careful all the related parts are touched. 2.1.3 Refactoring of the Datetime Related UDFs Thanks Russell Jurney for having implemented a number of useful datetime related UDFs, which can be utilized for the primitive datetime type as well. Part of the UDF Classes located in the “org.apache.pig.piggybank.evaluation.datetime” package under the “contrib” folder need to be move to the “org.apache.pig.builtin” package under the “src” folder. Below are the related UDFs: int DiffDate(DateTime d1, DateTime d2) int YearsBetween(DateTime d1, DateTime d2) int MonthsBetween(DateTime d1, DateTime d2) int DaysBetween(DateTime d1, DateTime d2) int HoursBetween(DateTime d1, DateTime d2) int MinutesBetween(DateTime d1, DateTime d2) int SecondsBetween(DateTime d1, DateTime d2) int GetYear(DateTime d1) int GetMonth(DateTime d1) int GetDate(DateTime d1) int GetHour(Dat
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237885#comment-13237885 ] Zhijie Shen commented on PIG-1314: -- By the way, who would like to mentor this issue? > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Labels: gsoc2012 > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232026#comment-13232026 ] Daniel Dai commented on PIG-1314: - Looking forward to your proposal! > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Labels: gsoc2012 > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232004#comment-13232004 ] Zhijie Shen commented on PIG-1314: -- GSoC is back! I'd like to apply it with this issue. The proposal draft will come in following days:-) > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Labels: gsoc2012 > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100761#comment-13100761 ] Daniel Dai commented on PIG-1314: - That will be great. Here is a specification I wrote: https://cwiki.apache.org/confluence/display/PIG/DateTime+type+specification. Take a look and we can discuss. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100380#comment-13100380 ] Zhijie Shen commented on PIG-1314: -- I've solved the related issue PIG-1429. If nobody is currently working on this issue, I volunteer to investigate into it. > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027250#comment-13027250 ] Jeremy Hanna commented on PIG-1314: --- I think this would be nice also when outputting from pig scripts using DBStorage to an RDBMS - to be able to serialize properly to the db's timestamp or date type (without extra UDF work). > Add DateTime Support to Pig > --- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.7.0 >Reporter: Russell Jurney >Assignee: Russell Jurney > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira