I looked in hive.log while doing the CREATE TABLE and found:

2009-11-30 17:51:53,240 WARN  serde2.RegexSerDe 
(RegexSerDe.java:deserialize(179)) - 1 unmatched rows are found: Nov  2 
23:00:04 sf45-cc2 sendmail[10905]: [ID 801593 mail.info] nA3704qO010905: 
from=nagios, size=286, class=0, nrcpts=1, 
msgid=<[email protected]>, 
relay=nag...@localhost

So instead of grabbing a log file at random, I went back and used exactly the 
same one I did when I was testing the Serde.
This time, the SELECT statement worked fine.

Maybe I'll run my regex over a large sampling of our syslog files and see how 
many don't conform to it. I may have to change my regex to be more forgiving of 
deviations...
_____________________________________________
From: Barclay, Ken
Sent: Monday, November 30, 2009 5:07 PM
To: [email protected]
Subject: Table created using RegexSerDe doesn't work when made external


Hello,

I'm getting only NULLs when trying to read data from an external table for 
which the data was staged. I'm using Cloudera's hive-0.4.0+14.tar.gz with 
hadoop-0.20.1+152.tar.gz on a Centos machine.

My steps were:

# CREATE TABLE

CREATE EXTERNAL TABLE mailog_stg(month STRING, day STRING, time STRING, host 
STRING, logline STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "^([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) (.+)$",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
STORED AS TEXTFILE
LOCATION '/user/data/staging/mailog';

# COPY DATA FILE

hadoop dfs -put /ken/sf45-cc2-maillogs/mail.20091102  /user/data/staging/mailog

# DO A SELECT

hive> select * from mailog_stg;
OK
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL

I verified my data is out there:

[r...@jims-desktop ~]#  hadoop dfs -cat /user/data/staging/mailog/m*|wc
      4      77    1016

And in fact when I used LOAD DATA INPATH with a table defined the same way but 
without making it external, it worked fine.

Any suggestions how to debug this?

Thanks
Ken


Reply via email to