Hi,

I have this very strange problem of queries hanging when I use the
org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe with the
serialization format
org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol. I'm basically
loading a very standard apache log file (pretty small) using this example
from one of the examples on the net.

CREATE TABLE apachelog (
ipaddress STRING, identd STRING, user STRING,finishtime STRING,
requestline string, returncode INT, size INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe'
WITH SERDEPROPERTIES (
'serialization.format'='org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol',
'quote.delim'='("|\\[|\\])',
'field.delim'=' ',
'serialization.null.format'='-')
STORED AS TEXTFILE;

After that, I simply load a small log file. Everything is good except, when
I run any simple query like SELECT count(1) from apachelog, the job "hangs."
It basically sits there forever and the job gets killed after 10 minutes. I
must be missing something very basic.

I can get the same setup working well using the RegexSerDe but I'd rather
use the DynamicSerDe like above. I'm using hive-0.4 branch but I'm pretty
sure I saw the same behavior using trunk as well. There is not a thing I can
find in the /tmp/<>/hive.log file.

Thanks for your help!
Vijay

Reply via email to