RE: Please check grammar for TIMESTAMP

2009-03-09 Thread Ashish Thusoo
One immediate issue is that the format string is a lexical token, so a string 
of that format will not conform to the grammar at places where a string literal 
is expected. A better approach is to treat the format as a stringliteral and 
then do the format checks at the typecheck and semantic analysis time.

Ashish 

-Original Message-
From: Shyam Sarkar [mailto:shyam_sar...@yahoo.com] 
Sent: Sunday, March 08, 2009 7:16 AM
To: hive-dev@hadoop.apache.org
Subject: Please check grammar for TIMESTAMP

Hi Zheng and others,

Could you please check Hive.g grammar changes for TIMESTAMP (See the comments 
with // Change by Shyam)?
Please review and let me know your feedback. I shall write a short design doc 
later for review after these short exchanges.

Thanks,
shyam_sar...@yahoo.com


  

RE: Please check grammar for TIMESTAMP

2009-03-09 Thread Shyam Sarkar

Dear Ashish,

Thanks for the comment. I found the following things from MySQL 6.0 ::

(1) Inside CREATE TABLE, TIMESTAMP does not have any format. It is treated like 
a primitive type (string).

(2) Inside SELECT clause, TIMESTAMP(MMDDHHMMSS) is called as a routine with 
format information for output spec. 

===  MySQL 6.0 function ===
TIMESTAMP(expr), TIMESTAMP(expr1,expr2) 
With a single argument, this function returns the date or datetime expression 
expr as a datetime value. With two arguments, it adds the time expression expr2 
to the date or datetime expression expr1 and returns the result as a datetime 
value. 
mysql SELECT TIMESTAMP('2003-12-31');
- '2003-12-31 00:00:00'
mysql SELECT TIMESTAMP('2003-12-31 12:00:00','12:00:00');
- '2004-01-01 00:00:00'
===

As a result, we have to define TIMESTAMP as primitive type as well as a complex 
type with format information.  I have to upgrade the grammar after further 
inspection. I am going to add a basic design document to JIRA.

Please provide suggestions.

Thanks,
Shyam


--- On Mon, 3/9/09, Ashish Thusoo athu...@facebook.com wrote:

 From: Ashish Thusoo athu...@facebook.com
 Subject: RE: Please check grammar for TIMESTAMP
 To: hive-dev@hadoop.apache.org hive-dev@hadoop.apache.org
 Date: Monday, March 9, 2009, 2:52 PM
 One immediate issue is that the format string is a lexical
 token, so a string of that format will not conform to the
 grammar at places where a string literal is expected. A
 better approach is to treat the format as a stringliteral
 and then do the format checks at the typecheck and semantic
 analysis time.
 
 Ashish 
 
 -Original Message-
 From: Shyam Sarkar [mailto:shyam_sar...@yahoo.com] 
 Sent: Sunday, March 08, 2009 7:16 AM
 To: hive-dev@hadoop.apache.org
 Subject: Please check grammar for TIMESTAMP
 
 Hi Zheng and others,
 
 Could you please check Hive.g grammar changes for TIMESTAMP
 (See the comments with // Change by Shyam)?
 Please review and let me know your feedback. I shall write
 a short design doc later for review after these short
 exchanges.
 
 Thanks,
 shyam_sar...@yahoo.com


  


Please check grammar for TIMESTAMP

2009-03-08 Thread Shyam Sarkar
Hi Zheng and others,

Could you please check Hive.g grammar changes for TIMESTAMP (See the comments 
with // Change by Shyam)?
Please review and let me know your feedback. I shall write a short design doc 
later for review after these short exchanges.

Thanks,
shyam_sar...@yahoo.com


  grammar Hive;

options
{
output=AST;
ASTLabelType=CommonTree;
backtrack=true;
k=1;
}
 
tokens {
TOK_INSERT;
TOK_QUERY;
TOK_SELECT;
TOK_SELECTDI;
TOK_SELEXPR;
TOK_FROM;
TOK_TAB;
TOK_PARTSPEC;
TOK_PARTVAL;
TOK_DIR;
TOK_LOCAL_DIR;
TOK_TABREF;
TOK_SUBQUERY;
TOK_DESTINATION;
TOK_ALLCOLREF;
TOK_COLREF;
TOK_FUNCTION;
TOK_FUNCTIONDI;
TOK_WHERE;
TOK_OP_EQ;
TOK_OP_NE;
TOK_OP_LE;
TOK_OP_LT;
TOK_OP_GE;
TOK_OP_GT;
TOK_OP_DIV;
TOK_OP_ADD;
TOK_OP_SUB;
TOK_OP_MUL;
TOK_OP_MOD;
TOK_OP_BITAND;
TOK_OP_BITNOT;
TOK_OP_BITOR;
TOK_OP_BITXOR;
TOK_OP_AND;
TOK_OP_OR;
TOK_OP_NOT;
TOK_OP_LIKE;
TOK_TRUE;
TOK_FALSE;
TOK_TRANSFORM;
TOK_EXPLIST;
TOK_ALIASLIST;
TOK_GROUPBY;
TOK_ORDERBY;
TOK_CLUSTERBY;
TOK_DISTRIBUTEBY;
TOK_SORTBY;
TOK_UNION;
TOK_JOIN;
TOK_LEFTOUTERJOIN;
TOK_RIGHTOUTERJOIN;
TOK_FULLOUTERJOIN;
TOK_LOAD;
TOK_NULL;
TOK_ISNULL;
TOK_ISNOTNULL;
TOK_TINYINT;
TOK_SMALLINT;
TOK_INT;
TOK_BIGINT;
TOK_BOOLEAN;
TOK_FLOAT;
TOK_DOUBLE;
TOK_DATE;
TOK_DATETIME;
TOK_TIMESTAMP;
TOK_STRING;
TOK_LIST;
TOK_MAP;
TOK_CREATETABLE;
TOK_DESCTABLE;
TOK_ALTERTABLE_RENAME;
TOK_ALTERTABLE_ADDCOLS;
TOK_ALTERTABLE_REPLACECOLS;
TOK_ALTERTABLE_ADDPARTS;
TOK_ALTERTABLE_DROPPARTS;
TOK_ALTERTABLE_SERDEPROPERTIES;
TOK_ALTERTABLE_SERIALIZER;
TOK_ALTERTABLE_PROPERTIES;
TOK_MSCK;
TOK_SHOWTABLES;
TOK_SHOWPARTITIONS;
TOK_CREATEEXTTABLE;
TOK_DROPTABLE;
TOK_TABCOLLIST;
TOK_TABCOL;
TOK_TABLECOMMENT;
TOK_TABLEPARTCOLS;
TOK_TABLEBUCKETS;
TOK_TABLEROWFORMAT;
TOK_TABLEROWFORMATFIELD;
TOK_TABLEROWFORMATCOLLITEMS;
TOK_TABLEROWFORMATMAPKEYS;
TOK_TABLEROWFORMATLINES;
TOK_TBLSEQUENCEFILE;
TOK_TBLTEXTFILE;
TOK_TABLEFILEFORMAT;
TOK_TABCOLNAME;
TOK_TABLELOCATION;
TOK_PARTITIONLOCATION;
TOK_TABLESAMPLE;
TOK_TMP_FILE;
TOK_TABSORTCOLNAMEASC;
TOK_TABSORTCOLNAMEDESC;
TOK_CHARSETLITERAL;
TOK_CREATEFUNCTION;
TOK_EXPLAIN;
TOK_TABLESERIALIZER;
TOK_TABLEPROPERTIES;
TOK_TABLEPROPLIST;
TOK_TABTYPE;
TOK_LIMIT;
TOK_TABLEPROPERTY;
TOK_IFNOTEXISTS;
}


// Package headers
@header {
package org.apache.hadoop.hive.ql.parse;
}
@lexer::header {package org.apache.hadoop.hive.ql.parse;}


@members { 
  Stack msgs = new StackString();
}

@rulecatch {
catch (RecognitionException e) {
 reportError(e);
  throw e;
}
}
 
// starting rule
statement
: explainStatement EOF
| execStatement EOF
;

explainStatement
@init { msgs.push(explain statement); }
@after { msgs.pop(); }
: KW_EXPLAIN (isExtended=KW_EXTENDED)? execStatement - ^(TOK_EXPLAIN 
execStatement $isExtended?)
;

execStatement
@init { msgs.push(statement); }
@after { msgs.pop(); }
: queryStatementExpression
| loadStatement
| ddlStatement
;

loadStatement
@init { msgs.push(load statement); }
@after { msgs.pop(); }
: KW_LOAD KW_DATA (islocal=KW_LOCAL)? KW_INPATH (path=StringLiteral) 
(isoverwrite=KW_OVERWRITE)? KW_INTO KW_TABLE (tab=tabName) 
- ^(TOK_LOAD $path $tab $islocal? $isoverwrite?)
;

ddlStatement
@init { msgs.push(ddl statement); }
@after { msgs.pop(); }
: createStatement
| dropStatement
| alterStatement
| descStatement
| showStatement
| metastoreCheck
| createFunctionStatement
;

ifNotExists
@init { msgs.push(if not exists clause); }
@after { msgs.pop(); }
: KW_IF KW_NOT KW_EXISTS
- ^(TOK_IFNOTEXISTS)
;

createStatement
@init { msgs.push(create statement); }
@after { msgs.pop(); }
: KW_CREATE (ext=KW_EXTERNAL)? KW_TABLE ifNotExists? name=Identifier 
(LPAREN columnNameTypeList RPAREN)? tableComment? tablePartition? tableBuckets? 
tableRowFormat? tableFileFormat? tableLocation?
- {$ext == null}? ^(TOK_CREATETABLE $name ifNotExists? columnNameTypeList? 
tableComment? tablePartition? tableBuckets? tableRowFormat? tableFileFormat? 
tableLocation?)
- ^(TOK_CREATEEXTTABLE $name ifNotExists? 
columnNameTypeList? tableComment? tablePartition? tableBuckets? tableRowFormat? 
tableFileFormat? tableLocation?)
;

dropStatement
@init { msgs.push(drop statement); }
@after { msgs.pop(); }
: KW_DROP KW_TABLE Identifier  - ^(TOK_DROPTABLE Identifier)
;

alterStatement
@init { msgs.push(alter statement); }
@after { msgs.pop(); }
: alterStatementRename
| alterStatementAddCol
| alterStatementDropPartitions
| alterStatementAddPartitions
| alterStatementProperties
| alterStatementSerdeProperties
;

alterStatementRename
@init { msgs.push(rename statement); }
@after { msgs.pop(); }
: KW_ALTER KW_TABLE oldName=Identifier KW_RENAME KW_TO newName=Identifier 
- ^(TOK_ALTERTABLE_RENAME $oldName $newName)
;

alterStatementAddCol
@init { msgs.push(add column statement); }
@after { msgs.pop(); }
: KW_ALTER KW_TABLE Identifier (add=KW_ADD | replace=KW_REPLACE) KW_COLUMNS 
LPAREN 

Re: Please check grammar for TIMESTAMP

2009-03-08 Thread Tim Hawkins
Is there going to be any Timezone Support?, ie will the time-stamp be  
stored in a recognised standard such as UTC regardless of the actual  
time submitted, given that hive/hadoop tend to be used for log  
processing and reporting in many use cases, understanding the  
normalising  time-zone details may be nessacary, especially where you  
may have data sourced from multiple time zones.


It may be worth considering this issue now as retrofitting it later  
may cause problems.


On 8 Mar 2009, at 14:15, Shyam Sarkar wrote:


Hi Zheng and others,

Could you please check Hive.g grammar changes for TIMESTAMP (See the  
comments with // Change by Shyam)?
Please review and let me know your feedback. I shall write a short  
design doc later for review after these short exchanges.


Thanks,
shyam_sar...@yahoo.com


grammar Hive;

options
{
output=AST;
ASTLabelType=CommonTree;
backtrack=true;
k=1;
}
 
tokens {
TOK_INSERT;
TOK_QUERY;
TOK_SELECT;
TOK_SELECTDI;
TOK_SELEXPR;
TOK_FROM;
TOK_TAB;
TOK_PARTSPEC;
TOK_PARTVAL;
TOK_DIR;
TOK_LOCAL_DIR;
TOK_TABREF;
TOK_SUBQUERY;
TOK_DESTINATION;
TOK_ALLCOLREF;
TOK_COLREF;
TOK_FUNCTION;
TOK_FUNCTIONDI;
TOK_WHERE;
TOK_OP_EQ;
TOK_OP_NE;
TOK_OP_LE;
TOK_OP_LT;
TOK_OP_GE;
TOK_OP_GT;
TOK_OP_DIV;
TOK_OP_ADD;
TOK_OP_SUB;
TOK_OP_MUL;
TOK_OP_MOD;
TOK_OP_BITAND;
TOK_OP_BITNOT;
TOK_OP_BITOR;
TOK_OP_BITXOR;
TOK_OP_AND;
TOK_OP_OR;
TOK_OP_NOT;
TOK_OP_LIKE;
TOK_TRUE;
TOK_FALSE;
TOK_TRANSFORM;
TOK_EXPLIST;
TOK_ALIASLIST;
TOK_GROUPBY;
TOK_ORDERBY;
TOK_CLUSTERBY;
TOK_DISTRIBUTEBY;
TOK_SORTBY;
TOK_UNION;
TOK_JOIN;
TOK_LEFTOUTERJOIN;
TOK_RIGHTOUTERJOIN;
TOK_FULLOUTERJOIN;
TOK_LOAD;
TOK_NULL;
TOK_ISNULL;
TOK_ISNOTNULL;
TOK_TINYINT;
TOK_SMALLINT;
TOK_INT;
TOK_BIGINT;
TOK_BOOLEAN;
TOK_FLOAT;
TOK_DOUBLE;
TOK_DATE;
TOK_DATETIME;
TOK_TIMESTAMP;
TOK_STRING;
TOK_LIST;
TOK_MAP;
TOK_CREATETABLE;
TOK_DESCTABLE;
TOK_ALTERTABLE_RENAME;
TOK_ALTERTABLE_ADDCOLS;
TOK_ALTERTABLE_REPLACECOLS;
TOK_ALTERTABLE_ADDPARTS;
TOK_ALTERTABLE_DROPPARTS;
TOK_ALTERTABLE_SERDEPROPERTIES;
TOK_ALTERTABLE_SERIALIZER;
TOK_ALTERTABLE_PROPERTIES;
TOK_MSCK;
TOK_SHOWTABLES;
TOK_SHOWPARTITIONS;
TOK_CREATEEXTTABLE;
TOK_DROPTABLE;
TOK_TABCOLLIST;
TOK_TABCOL;
TOK_TABLECOMMENT;
TOK_TABLEPARTCOLS;
TOK_TABLEBUCKETS;
TOK_TABLEROWFORMAT;
TOK_TABLEROWFORMATFIELD;
TOK_TABLEROWFORMATCOLLITEMS;
TOK_TABLEROWFORMATMAPKEYS;
TOK_TABLEROWFORMATLINES;
TOK_TBLSEQUENCEFILE;
TOK_TBLTEXTFILE;
TOK_TABLEFILEFORMAT;
TOK_TABCOLNAME;
TOK_TABLELOCATION;
TOK_PARTITIONLOCATION;
TOK_TABLESAMPLE;
TOK_TMP_FILE;
TOK_TABSORTCOLNAMEASC;
TOK_TABSORTCOLNAMEDESC;
TOK_CHARSETLITERAL;
TOK_CREATEFUNCTION;
TOK_EXPLAIN;
TOK_TABLESERIALIZER;
TOK_TABLEPROPERTIES;
TOK_TABLEPROPLIST;
TOK_TABTYPE;
TOK_LIMIT;
TOK_TABLEPROPERTY;
TOK_IFNOTEXISTS;
}


// Package headers
@header {
package org.apache.hadoop.hive.ql.parse;
}
@lexer::header {package org.apache.hadoop.hive.ql.parse;}


@members { 
  Stack msgs = new StackString();
}

@rulecatch {
catch (RecognitionException e) {
 reportError(e);
  throw e;
}
}
 
// starting rule
statement
: explainStatement EOF
| execStatement EOF
;

explainStatement
@init { msgs.push(explain statement); }
@after { msgs.pop(); }
: KW_EXPLAIN (isExtended=KW_EXTENDED)? execStatement - ^(TOK_EXPLAIN 
execStatement $isExtended?)
;

execStatement
@init { msgs.push(statement); }
@after { msgs.pop(); }
: queryStatementExpression
| loadStatement
| ddlStatement
;

loadStatement
@init { msgs.push(load statement); }
@after { msgs.pop(); }
: KW_LOAD KW_DATA (islocal=KW_LOCAL)? KW_INPATH (path=StringLiteral) 
(isoverwrite=KW_OVERWRITE)? KW_INTO KW_TABLE (tab=tabName) 
- ^(TOK_LOAD $path $tab $islocal? $isoverwrite?)
;

ddlStatement
@init { msgs.push(ddl statement); }
@after { msgs.pop(); }
: createStatement
| dropStatement
| alterStatement
| descStatement
| showStatement
| metastoreCheck
| createFunctionStatement
;

ifNotExists
@init { msgs.push(if not exists clause); }
@after { msgs.pop(); }
: KW_IF KW_NOT KW_EXISTS
- ^(TOK_IFNOTEXISTS)
;

createStatement
@init { msgs.push(create statement); }
@after { msgs.pop(); }
: KW_CREATE (ext=KW_EXTERNAL)? KW_TABLE ifNotExists? name=Identifier 
(LPAREN columnNameTypeList RPAREN)? tableComment? tablePartition? tableBuckets? 
tableRowFormat? tableFileFormat? tableLocation?
- {$ext == null}? ^(TOK_CREATETABLE $name ifNotExists? columnNameTypeList? 
tableComment? tablePartition? tableBuckets? tableRowFormat? tableFileFormat? 
tableLocation?)
- ^(TOK_CREATEEXTTABLE $name ifNotExists? 
columnNameTypeList? tableComment? tablePartition? tableBuckets? tableRowFormat? 
tableFileFormat? tableLocation?)
;

dropStatement
@init { msgs.push(drop statement); }
@after { msgs.pop(); }
: KW_DROP KW_TABLE Identifier  - ^(TOK_DROPTABLE Identifier)
;

alterStatement
@init { msgs.push(alter statement); }
@after { msgs.pop(); }
: alterStatementRename
| alterStatementAddCol
| 

Re: Please check grammar for TIMESTAMP

2009-03-08 Thread Shyam Sarkar

Yes there will be Timezone support. We shall follow MySQL 6.0 TIMESTAMP 
specification::

http://dev.mysql.com/doc/refman/6.0/en/timestamp.html

Thanks,
shyam_sar...@yahoo.com


--- On Sun, 3/8/09, Tim Hawkins tim.hawk...@bejant.com wrote:

 From: Tim Hawkins tim.hawk...@bejant.com
 Subject: Re: Please check grammar for TIMESTAMP
 To: hive-dev@hadoop.apache.org
 Date: Sunday, March 8, 2009, 7:22 AM
 Is there going to be any Timezone Support?, ie will the
 time-stamp be stored in a recognised standard such as UTC
 regardless of the actual time submitted, given that
 hive/hadoop tend to be used for log processing and reporting
 in many use cases, understanding the normalising  time-zone
 details may be nessacary, especially where you may have data
 sourced from multiple time zones.
 
 It may be worth considering this issue now as retrofitting
 it later may cause problems.
 
 On 8 Mar 2009, at 14:15, Shyam Sarkar wrote:
 
  Hi Zheng and others,
  
  Could you please check Hive.g grammar changes for
 TIMESTAMP (See the comments with // Change by Shyam)?
  Please review and let me know your feedback. I shall
 write a short design doc later for review after these short
 exchanges.
  
  Thanks,
  shyam_sar...@yahoo.com