[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848285#action_12848285
 ] 

Alan Gates commented on PIG-1314:
---------------------------------

Major +1.  Adding DateTime as a Pig primitive is definitely a good idea.  It's 
on our list of things to do (http://wiki.apache.org/pig/PigJournal).  A brief 
overview of the work to be done:

# Add support in parser, both for declaring an input to be of type datetime and 
datetime constants
# Add support in TypeChecker for datetime types, including any allowed type 
promotions (ie implicit casts)
# Change LoadCaster interface to include bytesToDateTime method, add method to 
default implementation
# Determerine which builtin UDFs that we want for datetime and get agreement 
from community.  Implement these UDFs.
# Implement any allowed cast operators for datetime (probably just string <-> 
datetime).
# Implement datetime class represents datetime in memory.  This needs to 
implement WritableComparable so that it can be serialized and compared in Hadoop
# Implement raw comparator for the type so it can be used as a key in groups 
bys and joins.
# Change physical operators and builtin UDFs to handle processing of datetime 
types.
# Change data conversion and type discovery routines in DataType
# And, of course, add prolific tests

The other question is backward compatibility.  I can think of only two backward 
incompatible changes
# Addition of bytesToDateTime in the LoadCaster interface.  Given that this 
will only require a change if people recompile their implementation, and AFAIK 
there are no implementations of LoadCaster before our default implementation, I 
think this is ok.
# Changes to Pig Latin to specify a field as of type date, plus however we 
denote datetime strings.  We need to make these as unobtrusive as possible, but 
again I think it will be ok, though we'll need to get community buy in on it.

Would such a patch be accepted?  If it's of good quality deals with backward 
compatibility concerns, certainly.  In time for 0.8, I don't know.  We try to 
do a release every three months, with a feature cut off about a month before 
release (give or take).  Branching and feature cutoff for 0.7 is today, so 
branching and features cut off for 0.8 will probably be in June.  

If you want to pursue this, the first step should be a brief design that says 
how you'll go about doing it.  It should cover things like which date format 
will you use (SQL, something else)?  Which date function do you think should be 
built in?  How to you plan to store this type in memory?  Are there existing 
datetime libraries you can leverage or incorporate to avoid rebuilding the 
wheel?  It's easiest to write up the design on Pig's wiki and then link to it 
on this bug.  This will give users and developers a chance to review your 
thoughts and give feedback.


> Add DateTime Support to Pig
> ---------------------------
>
>                 Key: PIG-1314
>                 URL: https://issues.apache.org/jira/browse/PIG-1314
>             Project: Pig
>          Issue Type: Bug
>          Components: data
>    Affects Versions: 0.7.0
>            Reporter: Russell Jurney
>             Fix For: 0.8.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to