Pengcheng Xiong created HIVE-13837:
--------------------------------------
Summary: current_timestamp() output format is different in some
cases
Key: HIVE-13837
URL: https://issues.apache.org/jira/browse/HIVE-13837
Project: Hive
Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
As [~jdere] reports:
{code}
current_timestamp() udf returns result with different format in some cases.
select current_timestamp() returns result with decimal precision:
{noformat}
hive> select current_timestamp();
OK
2016-04-14 18:26:58.875
Time taken: 0.077 seconds, Fetched: 1 row(s)
{noformat}
But output format is different for select current_timestamp() from all100k
union select current_timestamp() from over100k limit 5;
{noformat}
hive> select current_timestamp() from all100k union select current_timestamp()
from over100k limit 5;
Query ID = hrt_qa_20160414182956_c4ed48f2-9913-4b3b-8f09-668ebf55b3e3
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id
application_1460611908643_0624)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING
FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... llap SUCCEEDED 1 1 0 0
0 0
Map 4 .......... llap SUCCEEDED 1 1 0 0
0 0
Reducer 3 ...... llap SUCCEEDED 1 1 0 0
0 0
----------------------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 0.92 s
----------------------------------------------------------------------------------------------
OK
2016-04-14 18:29:56
Time taken: 10.558 seconds, Fetched: 1 row(s)
{noformat}
explain plan for select current_timestamp();
{noformat}
hive> explain extended select current_timestamp();
OK
ABSTRACT SYNTAX TREE:
TOK_QUERY
TOK_INSERT
TOK_DESTINATION
TOK_DIR
TOK_TMP_FILE
TOK_SELECT
TOK_SELEXPR
TOK_FUNCTION
current_timestamp
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
GatherStats: false
Select Operator
expressions: 2016-04-14 18:30:57.206 (type: timestamp)
outputColumnNames: _col0
ListSink
Time taken: 0.062 seconds, Fetched: 30 row(s)
{noformat}
explain plan for select current_timestamp() from all100k union select
current_timestamp() from over100k limit 5;
{noformat}
hive> explain extended select current_timestamp() from all100k union select
current_timestamp() from over100k limit 5;
OK
ABSTRACT SYNTAX TREE:
TOK_QUERY
TOK_FROM
TOK_SUBQUERY
TOK_QUERY
TOK_FROM
TOK_SUBQUERY
TOK_UNIONALL
TOK_QUERY
TOK_FROM
TOK_TABREF
TOK_TABNAME
all100k
TOK_INSERT
TOK_DESTINATION
TOK_DIR
TOK_TMP_FILE
TOK_SELECT
TOK_SELEXPR
TOK_FUNCTION
current_timestamp
TOK_QUERY
TOK_FROM
TOK_TABREF
TOK_TABNAME
over100k
TOK_INSERT
TOK_DESTINATION
TOK_DIR
TOK_TMP_FILE
TOK_SELECT
TOK_SELEXPR
TOK_FUNCTION
current_timestamp
_u1
TOK_INSERT
TOK_DESTINATION
TOK_DIR
TOK_TMP_FILE
TOK_SELECTDI
TOK_SELEXPR
TOK_ALLCOLREF
_u2
TOK_INSERT
TOK_DESTINATION
TOK_DIR
TOK_TMP_FILE
TOK_SELECT
TOK_SELEXPR
TOK_ALLCOLREF
TOK_LIMIT
5
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
DagId: hrt_qa_20160414183119_ec8e109e-8975-4799-a142-4a2289f85910:7
Edges:
Map 1 <- Union 2 (CONTAINS)
Map 4 <- Union 2 (CONTAINS)
Reducer 3 <- Union 2 (SIMPLE_EDGE)
DagName:
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: all100k
Statistics: Num rows: 100000 Data size: 15801336 Basic stats:
COMPLETE Column stats: COMPLETE
GatherStats: false
Select Operator
Statistics: Num rows: 100000 Data size: 4000000 Basic
stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: 2016-04-14 18:31:19.0 (type: timestamp)
outputColumnNames: _col0
Statistics: Num rows: 200000 Data size: 8000000 Basic
stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: timestamp)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 40 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: timestamp)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: timestamp)
Statistics: Num rows: 1 Data size: 40 Basic stats:
COMPLETE Column stats: COMPLETE
tag: -1
TopN: 5
TopN Hash Memory Usage: 0.04
auto parallelism: true
Execution mode: llap
LLAP IO: no inputs
Path -> Alias:
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
[all100k]
Path -> Partition:
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
Partition
base file name: all100k
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE
{"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","s":"true","dc":"true","bo":"true","v":"true","c":"true","ts":"true"}}
EXTERNAL TRUE
bucket_count -1
columns t,si,i,b,f,d,s,dc,bo,v,c,ts,dt
columns.comments
columns.types
tinyint:smallint:int:bigint:float:double:string:decimal(38,18):boolean:varchar(25):char(25):timestamp:date
field.delim |
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
name default.all100k
numFiles 1
numRows 100000
rawDataSize 15801336
serialization.ddl struct all100k { byte t, i16 si, i32 i,
i64 b, float f, double d, string s, decimal(38,18) dc, bool bo, varchar(25) v,
char(25) c, timestamp ts, date dt}
serialization.format |
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 15901336
transient_lastDdlTime 1460612683
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE
{"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","s":"true","dc":"true","bo":"true","v":"true","c":"true","ts":"true"}}
EXTERNAL TRUE
bucket_count -1
columns t,si,i,b,f,d,s,dc,bo,v,c,ts,dt
columns.comments
columns.types
tinyint:smallint:int:bigint:float:double:string:decimal(38,18):boolean:varchar(25):char(25):timestamp:date
field.delim |
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
name default.all100k
numFiles 1
numRows 100000
rawDataSize 15801336
serialization.ddl struct all100k { byte t, i16 si, i32 i,
i64 b, float f, double d, string s, decimal(38,18) dc, bool bo, varchar(25) v,
char(25) c, timestamp ts, date dt}
serialization.format |
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 15901336
transient_lastDdlTime 1460612683
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.all100k
name: default.all100k
Truncated Path -> Alias:
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
[all100k]
Map 4
Map Operator Tree:
TableScan
alias: over100k
Statistics: Num rows: 100000 Data size: 6631229 Basic stats:
COMPLETE Column stats: COMPLETE
GatherStats: false
Select Operator
Statistics: Num rows: 100000 Data size: 4000000 Basic
stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: 2016-04-14 18:31:19.0 (type: timestamp)
outputColumnNames: _col0
Statistics: Num rows: 200000 Data size: 8000000 Basic
stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: timestamp)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 40 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: timestamp)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: timestamp)
Statistics: Num rows: 1 Data size: 40 Basic stats:
COMPLETE Column stats: COMPLETE
tag: -1
TopN: 5
TopN Hash Memory Usage: 0.04
auto parallelism: true
Execution mode: llap
LLAP IO: no inputs
Path -> Alias:
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
[over100k]
Path -> Partition:
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
Partition
base file name: over100k
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE
{"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","bo":"true","s":"true","bin":"true"}}
EXTERNAL TRUE
bucket_count -1
columns t,si,i,b,f,d,bo,s,bin
columns.comments
columns.types
tinyint:smallint:int:bigint:float:double:boolean:string:binary
field.delim :
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
name default.over100k
numFiles 1
numRows 100000
rawDataSize 6631229
serialization.ddl struct over100k { byte t, i16 si, i32 i,
i64 b, float f, double d, bool bo, string s, binary bin}
serialization.format :
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 6731229
transient_lastDdlTime 1460612798
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE
{"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","bo":"true","s":"true","bin":"true"}}
EXTERNAL TRUE
bucket_count -1
columns t,si,i,b,f,d,bo,s,bin
columns.comments
columns.types
tinyint:smallint:int:bigint:float:double:boolean:string:binary
field.delim :
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
name default.over100k
numFiles 1
numRows 100000
rawDataSize 6631229
serialization.ddl struct over100k { byte t, i16 si, i32
i, i64 b, float f, double d, bool bo, string s, binary bin}
serialization.format :
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 6731229
transient_lastDdlTime 1460612798
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.over100k
name: default.over100k
Truncated Path -> Alias:
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
[over100k]
Reducer 3
Execution mode: vectorized, llap
Needs Tagging: false
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: timestamp)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE
Column stats: COMPLETE
Limit
Number of rows: 5
Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE
Column stats: COMPLETE
File Output Operator
compressed: false
GlobalTableId: 0
directory:
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/tmp/hive/hrt_qa/ec0773d7-0ac2-45c7-b9cb-568bbed2c49c/hive_2016-04-14_18-31-19_532_3480081382837900888-1/-mr-10001/.hive-staging_hive_2016-04-14_18-31-19_532_3480081382837900888-1/-ext-10002
NumFilesPerFileSink: 1
Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE
Column stats: COMPLETE
Stats Publishing Key Prefix:
hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/tmp/hive/hrt_qa/ec0773d7-0ac2-45c7-b9cb-568bbed2c49c/hive_2016-04-14_18-31-19_532_3480081382837900888-1/-mr-10001/.hive-staging_hive_2016-04-14_18-31-19_532_3480081382837900888-1/-ext-10002/
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
properties:
columns _col0
columns.types timestamp
escape.delim \
hive.serialization.extend.additional.nesting.levels
true
serialization.escape.crlf true
serialization.format 1
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TotalFiles: 1
GatherStats: false
MultiFileSpray: false
Union 2
Vertex: Union 2
Stage: Stage-0
Fetch Operator
limit: 5
Processor Tree:
ListSink
Time taken: 0.301 seconds, Fetched: 284 row(s)
{noformat}
Both the queries used return timestamp with YYYY-MM-DD HH:MM:SS.fff format in
past releases.
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)