[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-10-15 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476519#comment-13476519
 ] 

Dmitriy V. Ryaboy commented on PIG-1314:


A chunk of this is committed, and it's not clear what's left to do. Can we 
close this and create a new ticket for the remaining work?

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443630#comment-13443630
 ] 

Thejas M Nair commented on PIG-1314:


Yes, that was not intentional. Deleted JobControlCompiler.java.orig in svn.


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-28 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443614#comment-13443614
 ] 

Julien Le Dem commented on PIG-1314:


Hi Thejas,
this commit added JobControlCompiler.java.orig which I suspect is not what you 
intended.
http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java.orig?view=log&pathrev=1376800
Could you double check?
Thanks, Julien

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440860#comment-13440860
 ] 

Zhijie Shen commented on PIG-1314:
--

Hi Thejas, let me do that.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440858#comment-13440858
 ] 

Thejas M Nair commented on PIG-1314:


We also need to have some test cases that set the timezone property. This might 
not be easy to do in the e2e framework, so unit test cases are better candidate 
for this. Please let me know if you need any help.


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440851#comment-13440851
 ] 

Thejas M Nair commented on PIG-1314:


PIG-1314-7.patch committed to trunk! Thanks Zhijie.

We need to update the documentation regarding this change. Can you please 
upload a new patch for that ? To see generated docs, run - ant 
-Dforrest.home= docs. The files to be edited are 
under - trunk/src/docs/src/documentation/ .

We should also add a few end to end test cases for datetime. See 
https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-EndtoendTesting
 . We should have a few queries that do some of the basic operations on date 
time, and queries that have order-by , group and join on date fields. 
These can be submitted as multiple patches.  

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
> PIG-1314-7.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436528#comment-13436528
 ] 

Zhijie Shen commented on PIG-1314:
--

{quote}
I have one suggestion - add getWeeks and weeksBetween, if it isn't 
inconvenient. I think Jodatime can do this. It is useful when dealing in weeks.
{quote}

Yes, week field should be useful. In addition to it, I think it's better to add 
getWeekYear as well, because using weeks of year alone may cause ambiguity 
sometimes. For example, both "2008-12-31" and "2009-01-01" are week 1 of 
weekyear 2009, though the two dates are in two different years.

In addition, do you think it is better to rename some time UDFs as follows?

getMonth -> getMonthOfYear
getDay -> getDayOfMonth (do we need getDayOfWeek and getDayOfYear as well?)
getHour -> getHourOfDay
getMinute -> getMinuteOfHour
getSecond -> getSecondOfMinute
getMilliSecond -> getMilliOfSecond

The changes will make UDFs' names longer but clearer.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-16 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436477#comment-13436477
 ] 

Russell Jurney commented on PIG-1314:
-

I have one suggestion - add getWeeks and weeksBetween, if it isn't 
inconvenient. I think Jodatime can do this. It is useful when dealing in weeks.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434787#comment-13434787
 ] 

Zhijie Shen commented on PIG-1314:
--

{quote}
I believe you should be able to set the default timezone property in PigContext 
constructor, and also let user override the default. In backend, you can access 
the value using something like - 
PigMapReduce.sJobConfInternal.get().get("pig.datetime.default.tz").
{quote}

Thank you, Thejas! Let me investigate this issue.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-14 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434760#comment-13434760
 ] 

Russell Jurney commented on PIG-1314:
-

I agree with Thejas. The user will want to control the timezone of NOW() 
without having to reconfigure the hadoop cluster/contact the hadoop 
administrator. Setting this on the client is consistent with Pig as a 
client-side technology.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434733#comment-13434733
 ] 

Thejas M Nair commented on PIG-1314:


bq. 2. According to your last response, I'm not clear how the default timezone 
of client can be sent to the server with the code. In my opinion, the default 
timezone should be specified on the server side by configuration, which should 
be taken care of by administrators. How do you think about this.

I believe you should be able to set the default timezone property in PigContext 
constructor, and also let user override the default. In backend, you can access 
the value using something like - 
PigMapReduce.sJobConfInternal.get().get("pig.datetime.default.tz").


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
> PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-03 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428458#comment-13428458
 ] 

Thejas M Nair commented on PIG-1314:


bq. 1. You've mentioned that we need to propagate the timezone from the client 
to backend, where the udfs get executed. How the timezone should be propagated 
to the backend, which I assume the machine that runs the code? 
Yes
bq. Previously I made the timezone setting in pig.properties, which will be 
loaded when PigServer runs, such that the default timezone will be set. 
Consequently, if a datetime object is created without specifying the timezone, 
the default one will be used. However, do you mean some other way?
It is possible that some of the task nodes  might be misconfigured and have 
different default time zone. In such cases, the results won't be what you want 
and it will be very difficult to debug. So the default timezone on the client 
should be used in the nodes as well. 

bq. I convert the location-based timezone to the utc-offset one and only use 
utc-offset style internally. Therefore, the aforementioned two equal datetime 
objects will not be mis-treated.
Sounds good.


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, 
> PIG-1314-4.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-07-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413472#comment-13413472
 ] 

Zhijie Shen commented on PIG-1314:
--

Hi Thejas,

Thanks for your review. I'll check out your comments.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, 
> joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-07-11 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412452#comment-13412452
 ] 

Thejas M Nair commented on PIG-1314:


Zhijie,
I have added comments on your latest patch in  
https://reviews.apache.org/r/5414/.
Yes, lets focus on test cases now, so that we can get an initial version 
committed. 

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, 
> joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-07-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406238#comment-13406238
 ] 

Zhijie Shen commented on PIG-1314:
--

{quote}
But the timezone part and time part of the datetime string should be optional. 
Does jodatime support that?
{quote}

Yes, these two parts are not mandatory. The default time value is 
"00:00:00.000" while the default timezone offset is "+00:00". When the datetime 
object is outputed an ISO-format string, the default parts will be filled up 
(e.g., 2012-07-03T00:00:00.000Z).

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-07-03 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405971#comment-13405971
 ] 

Thejas M Nair commented on PIG-1314:


PigStorage is meant to be a human readable format. So that is another reason to 
store the timestamp in the ISO string as you suggested. 
Yes, If the timezone is specified in the string, pig should use that value. But 
the timezone part and time part of the datetime string should be optional. Does 
jodatime support that ?


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-07-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405724#comment-13405724
 ] 

Zhijie Shen commented on PIG-1314:
--

There's some issues with loading/storing pig data. When store a DateTime object 
with "Utf8StorageConverter" without using UDFs to convert it to some string, 
should we serialize it as a millis+timezone composite, or output an UTC-style 
datetime string (e.g., 2012-07-03T08:14:19.962+01:00))? The latter operation 
behaves the same as uses "String ToString(DateTime d)" before storing the 
string? Personally, I like the latter choice, because the data is directly 
readable from the stored files.

On the other hand, if a datetime object is stored in the file as a datetime 
string, when we load it again as a datetime object, should we use the default 
timezone or use the one specified in the timezone string (e.g., +01:00 in the 
last example)? I again prefer the second choice. When we use Pig, it is 
possible to do a bunch of store/load to achieve some goal. The timezone 
information need to be preserved. For example, let's assume +08:00 is the 
default timezone. A datatime object whose individual timezone is -04:00 is 
stored as a string, which will have -04:00 as suffix. When the string is loaded 
as a datetime object for further process, we'd better keep to the previously 
used timezone, -04:00, instead of the default one.

How do you think about this? Thanks!


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-28 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403114#comment-13403114
 ] 

Zhijie Shen commented on PIG-1314:
--

Hi Thejas, I'll take your suggestions. Thanks!

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-27 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402517#comment-13402517
 ] 

Russell Jurney commented on PIG-1314:
-

This sounds good to me.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-27 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402467#comment-13402467
 ] 

Thejas M Nair commented on PIG-1314:


bq. Or we temporally set aside the performance issue right now, and move 
forward to make timezone serialization work by simply serializing the timezone 
id string.
We can add features later, but dropping features later won't be good. In my 
opinion, the support for long timezone name is not going to be needed by most 
people. I think we can support it only for creating a DateTime field, but say 
that pig will not preserve the long name. Pig will only retain hours+minute 
offset (no seconds and milliseconds!). The hour+min offset form is portable and 
more likely to be supported by other serialization formats. 


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-27 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402061#comment-13402061
 ] 

Zhijie Shen commented on PIG-1314:
--

{quote}
Yes, it will be lossy, but the part that is important for date calculations is 
preserved. The ISO spec only has offset for timezone. I don't think we have to 
allow datetime field to be used for storing location information. Does JodaTime 
preserve the location string ?
{quote}

Yes, I think so. If I get an DateTimeZone object by 
DateTimeZone.forID("asia/singapore"), the returned DateTimeZone object doesn't 
change to "+08:00", but keeps "asia/singapore". We'd better preserve it because 
when users want to output the time in their customized format that has "z" in 
the pattern string, the exact timezone can be outputed.

{quote}
But won't jodatime support a timezone outside this list, If the user specifies 
a date using the UTC offset format ?
{quote}

Yes, DateTimeZone.forID() also allows UTC offset string as input, such as 
"+08:00", though it is not in the list. However, the offset can be value in the 
range [-23:59:59.999, +23:59:59.999], and the minimal granularity can be the 
millisecond

Then, we are expected to have a combined lookup table that maps canonical 
timezone ids and UTC offset to their concise representation. Do you have any 
suggestion here? Or we temporally set aside the performance issue right now, 
and move forward to make timezone serialization work by simply serializing the 
timezone id string.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401937#comment-13401937
 ] 

Thejas M Nair commented on PIG-1314:


bq. Several time zones may share the same UTC offset, such that when the 
reverse operation is to do, it will be unknown which timezone the UTC offset 
should be converted to.
Yes, it will be lossy, but the part that is important for date calculations is 
preserved. The ISO spec only has offset for timezone. I don't think we have to 
allow datetime field to be used for storing location information. Does JodaTime 
preserve the location string ?

bq. I'm not sure whether "getAvailableIDs" returns the same time zone list on 
all machines or is machine-dependent.
It depends on the release/jar 
(http://joda-time.sourceforge.net/tz_update.html). As pig will be shipping this 
jar to the nodes, it is ok to assume that it will be the same across all nodes 
for a query. So it is safe to rely on the id for intermediate serialization. 
But won't jodatime support a timezone outside this list, If the user specifies 
a date using the UTC offset format ?





> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-26 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401884#comment-13401884
 ] 

Zhijie Shen commented on PIG-1314:
--

Hi Thejas and Russell,

I'll do serialization for timezone as well.

{quote}
I think converting the string timezone (location name) to UTC offset in 
minutes, is one possibility.
{quote}

In my opinion, this kind of compression is lossy. Several time zones may share 
the same UTC offset, such that when the reverse operation is to do, it will be 
unknown which timezone the UTC offset should be converted to.

{quote}
We need an efficient way to serialize timezone along with the long. Can you 
propose something ? (Maybe, just make it efficient for 256 most 'popular' 
timezones and store it a byte. And not have the byte for UTC. For other 
timezones, add a timezone string ?)
{quote}

The time zone class in either builtin and joda has the function 
"getAvailableIDs", which returns all the available time zone strings. On my 
machine, I got 616 from the builtin time zone while 558 from the joda one. 
Probably we can have a one-to-one mapping between the time zone strings and the 
integer ids in short variables. However the "available" in the function 
"getAvailableIDs" sounds tricky. I'm not sure whether "getAvailableIDs" returns 
the same time zone list on all machines or is machine-dependent.


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401786#comment-13401786
 ] 

Thejas M Nair commented on PIG-1314:


bq. Are we discussing a user-facing API, or an internal storage mechanism? 
Some questions were about interface, some about internal storage.

bq. Regarding the interface, presenting integers to a user as an interface 
seems wrong to me. 
Converting dates to integer is something user can optionally do, this is not 
expected to be a common use case. String representation of date literals will 
also be supported. Most operations will be on date type itself, without 
converting it to int/string.

bq. Excluding certain timezones in the name of efficiency also seems wrong to 
me.
All timezones supported by JodaTime will be supported. I was only proposing 
that we encode the timezone info efficiently, at least for most likely used 
ones. I think converting the string timezone (location name) to UTC offset in 
minutes, is one possibility. 



> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-26 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401676#comment-13401676
 ] 

Russell Jurney commented on PIG-1314:
-

Jodatime seems to solve these problems. Serializing from a string without a 
timezone, it does things in a reasonable manner.  Serializing things from a 
string with a timezone, it does things in a reasonable manner.

Are we discussing a user-facing API, or an internal storage mechanism?  I'm not 
clear on which.  Regarding the interface, presenting integers to a user as an 
interface seems wrong to me.  Excluding certain timezones in the name of 
efficiency also seems wrong to me.  The point of a datetime type is to add 
timezones, otherwise we can simply use longs.

As an internal storage mechanism, I'm un-opinionated, so long as all timezones 
are retained at all times.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401613#comment-13401613
 ] 

Thejas M Nair commented on PIG-1314:


bq. As far as I know, either Java builtin Date or Joda DateTime uses 
millisecond-shift (stored in a long integer variable) from the midnight UTC, 
which is not exactly the Unix time. 
Yes, as you noted, the difference is unix timestamp can store upto +/- 292 
Billion years, while Joda DateTime supports only +/- 292 Milllion years. Which 
should be sufficient for most practical purposes! :)

bq. The time zone determines only determines the ISO time string,
It also affects the field values, (getDayOfWeek(), getHourOfDay() etc. In your 
data, you can have dates belonging to different timezones, and users might want 
to retain that information. 
An example of use case where timezone also needs to be stored - if you want to 
do analysis of how many people come to a global website during their morning 
hours, you want to .getHourOfDay() to return the hour as per local timezone. 

We need an efficient way to serialize timezone along with the long. Can you 
propose something ? (Maybe, just make it efficient for 256 most 'popular' 
timezones and store it a byte. And not have the byte for UTC. For other 
timezones,  add a timezone string ?) 

bq. When we need to convert the DateTime object to Unix time string, we may use 
the default time zone of the Pig environment 
If the date field has the timezone value in it, we don't have to rely on 
default time zone to convert to unix time stamp. (assuming that is what you 
meant by 'unix time *string*' )
But udfs like DateTime ToDate(String s) where timezone might not be specified, 
we need a default timezone. I think we should use the default timezone on the 
pig client machine. Using the default time zone on each task tracker node can 
lead to a nightmare in debugging if one of the nodes happens to have a 
different timezone. We should allow the user to set a default timezone using a 
pig property. 

bq. We probably need one more UDF String ToString(DateTime d, String format, 
String timezone)
Having timezone argument in this call is necessary only if user wants to print 
the time for a different timezone. This is useful, but not mandatory. 


bq.Since the ISO duration is non-negative (Please correct me if I'm wrong), we 
need to SubstractDuration as well.
Yes, you are right. I could not find any references to negative values in ISO 
duration. Lets add SubstractDuration

Trivia from wikipedia: 64 bit unix timestamp, in the negative direction, goes 
back more than twenty times the age of the universe 


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-25 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401162#comment-13401162
 ] 

Russell Jurney commented on PIG-1314:
-

Whatever the format is, I think we should serialize/persist DateTimes in a way 
that the timezone stays with the datetime. 

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-25 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401155#comment-13401155
 ] 

Zhijie Shen commented on PIG-1314:
--

Dear Thejas and Russell,

{quote}
1) Don't persist DateTimes as ints/longs unless you also persist a timezone 
offset with it somehow (is this possible?).
I forgot about timezone. We need to serialize the timezone information as well, 
while supporting the same range of dates as JodaTime . With int/long this will 
not be possible. (Zhijie can you confirm ?)
{quote}

As far as I know, either Java builtin Date or Joda DateTime uses 
millisecond-shift (stored in a long integer variable) from the midnight UTC, 
which is not exactly the Unix time. Importantly, the millisecond-shift has 
nothing to do with the time zone. For example, both

new DateTime(922337201704319L, DateTimeZone.UTC).getMillis();

and

new DateTime(922337201704319L, 
DateTimeZone.forID("Asia/Singapore").getMillis();

will return the same value, that is, 922337201704319L. The time zone only 
determines only determines the ISO time string, such that the two DateTime 
objects will output different ISO time strings when toString() is called. Hence 
I think the long variable which represents the millisecond-shift is good for 
internal serialization. When we need to convert the DateTime object to Unix 
time string, we may use the default time zone of the Pig environment (I'm still 
working on this. Please let me know how you think the Pig-wide time zone should 
be set.) or the user-defined time zone (We probably need one more UDF String 
ToString(DateTime d, String format, String timezone)).

AS to Pig DateTime, internal Joda DateTime objects is either created with the 
long variable of millisecond-shift or with ISO time string. Initialization with 
a long variable (from Long.MIN_VALUE to Long.MAX_VALUE) has no range problem 
when getMillis() is called, obtaining the result ranged from Long.MIN_VALUE to 
Long.MAX_VALUE as well. Initialization with a ISO time string, the JODA 
DateTime object only accepts the year in the range [-292275054,292278993], such 
that the corresponding millisecond-shift is also within [Long.MIN_VALUE, 
Long.MAX_VALUE]. In summary, the range will be fine when Long is used for 
serialization.

Please correct me if I'm wrong. Thanks a lot!

{quote}
2) Consider using jodatime/ISO8601 durations for date math, as a separate type. 
i.e. If this extends scope too far, save it for later. 
http://en.wikipedia.org/wiki/ISO_8601#Durations
+1 . This is much cleaner. Lets use replace the Add* functions with just 
AddDuration . For example AddDuration(d1, "P3Y"), would return d1 + 3 years.
{quote}

+1. In this way, it is more flexible for users to define the amount of time to 
add/subtract. Since the ISO duration is non-negative (Please correct me if I'm 
wrong), we need to SubstractDuration as well.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401107#comment-13401107
 ] 

Thejas M Nair commented on PIG-1314:


bq. 1) Don't persist DateTimes as ints/longs unless you also persist a timezone 
offset with it somehow (is this possible?).
I forgot about timezone. We need to serialize the timezone information as well, 
while supporting the same range of dates as JodaTime . With int/long this will 
not be possible. (Zhijie can you confirm ?)

bq. 2) Consider using jodatime/ISO8601 durations for date math, as a separate 
type. i.e. If this extends scope too far, save it for later. 
http://en.wikipedia.org/wiki/ISO_8601#Durations
+1 . This is much cleaner. Lets use replace the Add* functions with just 
AddDuration . For example AddDuration(d1, "P3Y"), would return d1 + 3 years. 


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-25 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401081#comment-13401081
 ] 

Russell Jurney commented on PIG-1314:
-

A couple comments:

1) Don't persist DateTimes as ints/longs unless you also persist a timezone 
offset with it somehow (is this possible?). Persisting timezones is one of the 
key benefits of a DateTime type in my opinion. At Hadoop scale you are often 
dealing with events from different sites/locations. DateTime needs timezone, or 
we can just use long/unix time.
2) Consider using jodatime/ISO8601 durations for date math, as a separate type. 
i.e.  If this extends scope too far, save it for later.  
http://en.wikipedia.org/wiki/ISO_8601#Durations

Although it may be inefficient, I would encourage an ISO8601 string 
representation during serialization.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400705#comment-13400705
 ] 

Thejas M Nair commented on PIG-1314:


bq. what does DateTime DateAdd(DateTime d1) mean? Adding datetime based on the 
current time?
Discussed this with Daniel. I think it makes sense to replace this with 
different functions -
// add number of days specified in days param to the DateTime date. 
// The days param can be positive or negative
AddYears(DateTime date, int days);

Similarly we should have AddMonths, AddDays, AddHours ..



> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-24 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400299#comment-13400299
 ] 

Thejas M Nair commented on PIG-1314:


bq. 1. Pig can also import into and export from HBase storage, which also 
doesn't have the primitive DataTime. Throw exception in this case as well, 
correct?
Yes. The exception should be thrown from HBaseStorage.

bq. if we conclude the design for Avro, we should keep to it for the others.
Please note that pig does not have a way of know if the format will support 
datetime. The behavior will be controlled by the storage func implementation. 
But for the ones that are part of pig codebase, I think we should throw an 
exception. 

bq. 3. DateTime is serialized as a Long value (Unix timestamp) when it is 
necessary.
JodaTime supports milliseconds as well. Will we be able to convert all values 
within limits of JodaTime date into a long ?

bq. the output datatype of DiffDate(DateTime d1, DateTime d2) should use long 
instead of int, because the diff may be too large for int range to conver.
Makes sense, we should use a type that is appropriate for range.

bq. what does DateTime DateAdd(DateTime d1) mean? Adding datetime based on the 
current time?
Not sure. Daniel, do you know ?

bq. we allow explicit cast between datetime and string, correct? Similarly, do 
we allow explicit cast between datetime and long/int (representing unix 
timestamp)?
Yes, we should support explicit cast between these types. Though conversion to 
int might not be successful for all datetime values. 



> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292834#comment-13292834
 ] 

Zhijie Shen commented on PIG-1314:
--

{quote}
Avro might store DateTimes as an ISO string?
{quote}

It's possible, but there seems to be one problem. If we store a datetime as an 
iso string, how do we determine whether a string is just a string or a datetime 
when it is loaded?

One more issue is that it' good to keep all the IO targets that does not 
support datetime handle the IO process uniformly. Hence if we conclude the 
design for Avro, we should keep to it for the others.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-08 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291961#comment-13291961
 ] 

Russell Jurney commented on PIG-1314:
-

Avro might store DateTimes as an ISO string?

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-04 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288680#comment-13288680
 ] 

Thejas M Nair commented on PIG-1314:


bq. When adding the DateTime type for Pig, we need to take care of the I/O with 
AVRO, which still doesn't support the Date/Time type.
StoreFuncs that write in avro format will need to throw an exception if the 
schema being stored contains a datetime type. That will force the users to 
serialize datetime as some other type. As long as we are not breaking existing 
pig queries don't use datetime type, we should be fine. Avro is just one of the 
many formats.

Regarding AugmentBaseDataVisitor, that is used for example generation. (see 
[sigmod paper on illustrate feature | 
http://infolab.stanford.edu/~olston/publications/sigmod09.pdf] for details) . 
For example, if there is no value in col1 in sample that satisfies "col1 > 0", 
a value of col1 > 0 is generated. This will be useful for datetime type as 
well. 
To have a more realistic value generated (similar to values in input), I think 
we should increment/decrement the smallest field that is non zero. For example 
if the millisecond and second fields are 0, but hour field is non zero, 
increment that. If all time parts are 0, but day of month is not, increment 
that.
In case of boolean, as we don't support > or < operations, these functions do 
not make sense. 

Thanks for bringing this up. I had forgot about this use case. We should add a 
few unit tests for example generation that involve datetime.

 

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288203#comment-13288203
 ] 

Zhijie Shen commented on PIG-1314:
--

One more issue needs to be clarified:

In the AugmentBaseDataVisitor class, there're two functions: Object 
GetSmallerValue(Object v) and Object GetLargerValue(Object v). where if v is a 
numeric value, v is added or reduced by one while if v is a byte array, it is 
added or reduced by one byte. Then, how do we do if v is a datetime? I vote for 
returning null, and am looking forward to the community's opinions.

By the way, how about if v is a boolean, which seems not to be handled?

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-05-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284603#comment-13284603
 ] 

Thejas M Nair commented on PIG-1314:


CURRENT_TIME() might be a more intuitive alias for DATETIME(NOW). I think we 
can consider adding support for DATE and CURRENT_TIMESTAMP() as a next step 
after adding  DATETIME. We can focus on DATETIME in this jira.

I also had a look at timestamp datatype that was added to hive, to see if it 
will be interoperable (through hcatalog). The only difference is that hive 
timestamp type supports storing up to nano second precision, while jodatime 
supports only up to millisecond. Nanoseconds are not likely to be used in most 
cases, so loosing that precision when converting hive timestamp to pig datetime 
should be OK in most cases. The range of years supported in both cases is also 
approximately the same.





> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-05-28 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284592#comment-13284592
 ] 

Russell Jurney commented on PIG-1314:
-

"DATETIME" makes sense, but "TIMESTAMP" is a good (simple) alias for 
DATETIME(NOW).  "DATE" is a good alias for a date-truncated DATETIME.

I'm not sure if you would want to implement these in Pig... as there is clearly 
less utility than in a database, where for instance a TIMESTAMP can be updated 
whenever a field is written or updated. Maybe "DATE" and not "TIMESTAMP," but 
only as an afterthought?

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-05-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284575#comment-13284575
 ] 

Thejas M Nair commented on PIG-1314:


bq. One quick issue: we need to give a name to the new type. We are supposed to 
use "DATETIME", correct? Or "DATE", "TIMESTAMP"?
"datetime" makes sense when it has both date and time (hrs,mins,secs) parts to 
it. The problem with using (unix) timestamp, is that the date range is limited 
to 78 years. Using jodatime, we will be able to support much larger date range 
than timestamp.



> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-05-28 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284392#comment-13284392
 ] 

Zhijie Shen commented on PIG-1314:
--

One quick issue: we need to give a name to the new type. We are supposed to use 
"DATETIME", correct? Or "DATE", "TIMESTAMP"?

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-05-26 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284059#comment-13284059
 ] 

Russell Jurney commented on PIG-1314:
-

I concur about JODA. So far as I know you can't even parse ISO times with java 
builtins without using javax.xml.bind.DatatypeConverter, and it is ugly and 
slow.


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-04-02 Thread Zhijie Shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244985#comment-13244985
 ] 

Zhijie Shen commented on PIG-1314:
--

Ah, I forgot doing it. Public now:-)

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>  Labels: gsoc2012
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-04-02 Thread Prashant Kommireddi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244980#comment-13244980
 ] 

Prashant Kommireddi commented on PIG-1314:
--

Thanks Zhijie. Can you please make it public?

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>  Labels: gsoc2012
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-04-02 Thread Zhijie Shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244979#comment-13244979
 ] 

Zhijie Shen commented on PIG-1314:
--

I've pasted the proposal to the official website: 
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/zjshen/21002

Any comments are welcome, such that I can improve the proposal in the remaining 
days.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>  Labels: gsoc2012
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-03-26 Thread Daniel Dai (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238187#comment-13238187
 ] 

Daniel Dai commented on PIG-1314:
-

I would like to mentor this.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>  Labels: gsoc2012
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-03-25 Thread Zhijie Shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238044#comment-13238044
 ] 

Zhijie Shen commented on PIG-1314:
--

Coincidentally, I'm that person making Boolean working:-)

Daniel helped me a lot to work out that issue, if he'd like to mentor this one, 
it will also be awesome.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>  Labels: gsoc2012
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-03-25 Thread Russell Jurney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238037#comment-13238037
 ] 

Russell Jurney commented on PIG-1314:
-

I am happy to help regarding questions about the DateTime UDFs, but do not 
remember the internals of my attempt to add Boolean in preparation for 
DateTime.  I suggest the comitter who got Boolean working would be a good 
candidate?

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>  Labels: gsoc2012
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-03-25 Thread Zhijie Shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237884#comment-13237884
 ] 

Zhijie Shen commented on PIG-1314:
--

Hi folks,

Below is my proposal draft. Any comments are welcome:-)

==

Proposal Title: Adding the Datetime Type as a Primitive for Pig


Student Name: Zhijie Shen 
Student E-mail: zjshe...@gmail.com 

Organization/Project: Apache Software Foundation - Pig 
Assigned Mentor: Daniel Dai /Russell Jurney


Proposal Abstract: 

Apache Pig is a platform for analyzing large data sets based on Hadoop. 
Currently Pig does not support the primitive datetime type [1], which is a 
desired feature to be implemented. In this proposal, I explain my plan to 
implement the primitive datetime type, including the details of my solution and 
schedule. Additionally, I briefly introduce my background and the motivation of 
applying GSoC'12. 

Detailed Description: 

1. Understanding of the Project

1.1 What is Apache Pig?

Apache Pig is a platform for analyzing large data sets. Notably, at Yahoo! 40% 
of all Hadoop jobs are run with Pig [5]. Pig has is own dataflow language, 
named Pig Latin, which encapsulates map/reduce jobs step-by-step, and offers 
the relational primitives such as LOAD, FOREACH, GROUP, FILTER and JOIN. Pig 
provides many built-in functions, but also  allow users to define their 
user-defined functions (UDFs) to achieve particular purposes. There are more 
benefits: Pig can operates on the plain files directly without any schema 
information; it has a flexible, nested data model, which is more compatible 
with that of major programming languages; it provides a debugging environment.

1.2 Why primitive datetime type is required?

Datetime is a conventional data type in many of database management systems as 
well as programming languages. Within the Hadoop ecosystem, Hive, which is an 
analog of Pig, also supports the primitive datetime type (timestamp actually). 
In contrast, Pig does not fully support this type. Currently, users can only 
use the string type for the datetime data, and rely on the UDF which takes 
datetime strings. However, Pig is supposed to primarily parse log data, and 
most log data has attributes in the datetime type. 

Consequently, it is desired for Pig to support the datetime type as a 
primitive. By doing so, we can expect the following benefits: a more compact 
serialized format, working with conventional operators (+/-/==/!=/), a 
dedicated faster comparator, being sortable, fewer times of runtime conversion 
from string, and relieving users
 from deciding the input datetime string format.


2. Roadmap of Implementing the New Feature

2.1 To Do List

2.1.1  Adding Support in Antlr Parser

Pig Latin supports the assign data type explicitly, such that the “datetime” 
keyword and some constants, such as “now()” and “today()” can be recognized. 
The related syntax needs to be added into 5 antlr scripts: AliasMasker.g, 
AstPrinter.g, AstValidator.g, LogicalPlanGenerator.g, QueryParser.g.

2.1.2 Adding Datetime as a Primitive

The dateime type should be added into the DataType class, and the basic 
conversion between it and other data types need to be defined. Previously, the 
internal data structure relies on Joda datetime data type, which is more 
powerful than java.util.DateTime, but much easier than java.util.Calendar. 
Hence it is wise to keep this convention.
 Moreover, be careful that implicit type cast from/to the datetime type is not 
allowed.

I also need to change the LoadCaster and StoreCaster interfaces to include 
bytesToDateTime/toBytes(DateTime) method, and add details to the classes that 
implemented these two interfaces. In addition, I need override +/-/==/!=/ 
operators for the datetime type, mapping the to some bulitin EvalFuncs. The 
TypeCheckingExpVisitor class needs to be modified as well to support the 
datetime type vailidation. One important issue is that according to my previous 
experience, the data type related code in Pig is widely spread, such that I 
need to be careful all the related parts are touched.

2.1.3 Refactoring of the Datetime Related UDFs

Thanks Russell Jurney for having implemented a number of useful datetime 
related UDFs, which can be utilized for the primitive datetime type as well. 
Part of the UDF Classes located in the 
“org.apache.pig.piggybank.evaluation.datetime” package  under the “contrib” 
folder need to be move to the “org.apache.pig.builtin” package under the “src” 
folder. Below are the related UDFs:

int DiffDate(DateTime d1, DateTime d2)
int YearsBetween(DateTime d1, DateTime d2)
int MonthsBetween(DateTime d1, DateTime d2)
int DaysBetween(DateTime d1, DateTime d2)
int HoursBetween(DateTime d1, DateTime d2)
int MinutesBetween(DateTime d1, DateTime d2)
int SecondsBetween(DateTime d1, DateTime d2)
int GetYear(DateTime d1)
int GetMonth(DateTime d1)
int GetDate(DateTime d1)
int GetHour(Dat

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-03-25 Thread Zhijie Shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237885#comment-13237885
 ] 

Zhijie Shen commented on PIG-1314:
--

By the way, who would like to mentor this issue?

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>  Labels: gsoc2012
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-03-17 Thread Daniel Dai (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232026#comment-13232026
 ] 

Daniel Dai commented on PIG-1314:
-

Looking forward to your proposal!

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>  Labels: gsoc2012
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-03-17 Thread Zhijie Shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232004#comment-13232004
 ] 

Zhijie Shen commented on PIG-1314:
--

GSoC is back! I'd like to apply it with this issue. The proposal draft will 
come in following days:-)

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>  Labels: gsoc2012
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2011-09-08 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100761#comment-13100761
 ] 

Daniel Dai commented on PIG-1314:
-

That will be great. Here is a specification I wrote: 
https://cwiki.apache.org/confluence/display/PIG/DateTime+type+specification. 
Take a look and we can discuss.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2011-09-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100380#comment-13100380
 ] 

Zhijie Shen commented on PIG-1314:
--

I've solved the related issue PIG-1429. If nobody is currently working on this 
issue, I volunteer to investigate into it.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2011-04-29 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027250#comment-13027250
 ] 

Jeremy Hanna commented on PIG-1314:
---

I think this would be nice also when outputting from pig scripts using 
DBStorage to an RDBMS - to be able to serialize properly to the db's timestamp 
or date type (without extra UDF work).

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira