I'm getting an error in Hive when executing a query on a table in ORC format.
After several trials, I succeeded to run the same query on the same table in
TEXTFILE format.
I 've been able to reproduce the error with the simple sql script below.
I create the same table in TEXFILE and in ORC and I run a SELECT ...GROUP BY on
the tables.
The first SELECT issued on the TEXTFILE table succeeds.
The second SELECT issued on the ORC table fails.
NB : There is a CONCAT in the query. If I remove the CONCAT the query is
running ok with both tables ...
Example script to reproduce the error :
USE pvr_temp;
DROP TABLE IF EXISTS students_text;
CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa
DECIMAL(3, 2)) STORED AS TEXTFILE;
INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13
13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
SELECT CONCAT(TO_DATE(datetime), '-'), SUM(gpa) FROM students_text GROUP BY
CONCAT(TO_DATE(datetime), '-');
DROP TABLE IF EXISTS students_orc;
CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa
DECIMAL(3, 2)) STORED AS ORC;
INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13
SELECT CONCAT(TO_DATE(datetime), '-'), SUM(gpa) FROM students_orc GROUP BY
CONCAT(TO_DATE(datetime), '-');
13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
Log where you can see the error :
[pvr@tpcalr01s ~]$ cat test.log
scan complete in 9ms
Connecting to jdbc:hive2://tpcrmm03s:1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Apache Hive (version 0.14.0.2.2.0.0-2041)
Driver: Hive JDBC (version 0.14.0.2.2.0.0-2041)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://tpcrmm03s:1> USE pvr_temp;
No rows affected (0.061 seconds)
0: jdbc:hive2://tpcrmm03s:1> DROP TABLE IF EXISTS students_text;
No rows affected (0.12 seconds)
0: jdbc:hive2://tpcrmm03s:1> CREATE TABLE students_text (name VARCHAR(64),
age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS TEXTFILE;
No rows affected (0.057 seconds)
0: jdbc:hive2://tpcrmm03s:1> INSERT INTO TABLE students_text VALUES ('fred
flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32,
'2015-04-13 13:40:00', 2.32);
INFO : Tez session hasn't been created yet. Opening session
INFO :
INFO : Status: Running (Executing on YARN cluster with App id
application_1428656093356_0047)
INFO : Map 1: -/-
INFO : Map 1: 0/1
No rows affected (14.134 seconds)
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 1/1
INFO : Loading data to table pvr_temp.students_text from
hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-08_445_2811483497310651606-20/-ext-1
INFO : Table pvr_temp.students_text stats: [numFiles=1, numRows=2,
totalSize=86, rawDataSize=84]
0: jdbc:hive2://tpcrmm03s:1> SELECT CONCAT(TO_DATE(datetime), '-'),
SUM(gpa) FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '-');
INFO : Session is already open
INFO :
INFO : Status: Running (Executing on YARN cluster with App id
application_1428656093356_0047)
INFO : Map 1: -/- Reducer 2: 0/1
INFO : Map 1: 0/1 Reducer 2: 0/1
INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
INFO : Map 1: 1/1 Reducer 2: 0(+1)/1
INFO : Map 1: 1/1 Reducer 2: 1/1
+--+--+--+
| _c0 | _c1 |
+--+--+--+
| 2015-04-13- | 3.6 |
+--+--+--+
1 row selected (3.258 seconds)
0: jdbc:hive2://tpcrmm03s:1> DROP TABLE IF EXISTS students_orc;
No rows affected (0.109 seconds)
0: jdbc:hive2://tpcrmm03s:1> CREATE TABLE students_orc (name VARCHAR(64),
age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC;
No rows affected (0.063 seconds)
0: jdbc:hive2://tpcrmm03s:1> INSERT INTO TABLE students_orc VALUES ('fred
flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32,
'2015-04-13 13:40:00', 2.32);
No rows affected (2.125 seconds)
INFO : Session is already open
INFO :
INFO : Status: Running (Executing on YARN cluster with App id
application_1428656093356_0047)
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 1/1
INFO : Loading data to table pvr_temp.students_orc from
hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-26_056_1247475009666467472-20/-ext-1
INFO : Table pvr_temp.students_orc stats: [numFiles=1, numRows=2,
totalSize=590, rawDataSize=508