Hello,
I'm getting only NULLs when trying to read data from an external table for
which the data was staged. I'm using Cloudera's hive-0.4.0+14.tar.gz with
hadoop-0.20.1+152.tar.gz on a Centos machine.
My steps were:
# CREATE TABLE
CREATE EXTERNAL TABLE mailog_stg(month STRING, day STRING, time STRING, host
STRING, logline STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "^([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) (.+)$",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
STORED AS TEXTFILE
LOCATION '/user/data/staging/mailog';
# COPY DATA FILE
hadoop dfs -put /ken/sf45-cc2-maillogs/mail.20091102 /user/data/staging/mailog
# DO A SELECT
hive> select * from mailog_stg;
OK
NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL
I verified my data is out there:
[r...@jims-desktop ~]# hadoop dfs -cat /user/data/staging/mailog/m*|wc
4 77 1016
And in fact when I used LOAD DATA INPATH with a table defined the same way but
without making it external, it worked fine.
Any suggestions how to debug this?
Thanks
Ken