I was trying to add decimal, timestamp, date, array, map type support to
PyHive DBAPI. In order to parse the result set correctly, I have to know
the result set schema for each SELECT. For simple types(integer, string,
timestamp, decimal, …), it’s not a problem. I can get all information by
calling HiveServer2.GetResultSetMetadata. But for complex types(array, map,
struct), the nested type information is missing. I can’t find a way to know
if it’s an integer array or a string array.

According to TCLIService.thrift
, recursively defined types such as array<int>, map<int, string> should be
described by TTypeEntry.arrayEntry, TTypeEntry.mapEntry rather than
in the first element ofTypeDesc.types. The nested types should be
reside inTypeDesc.types`
as following elements, and be pointed from the first element.

However, I got just a single TTypeEntry.primitivyEntry in TypeDesc.types
with TPrimitiveTypeEntry.type = ARRAY_TYPE when I actually called
GetResultSetMetadata for the query SELECT array(1, 2, 3) .

It violated both the descriptions of “TTypeDesc employs a type list that
integer “pointers” to TTypeEntry objects”
and “The primitive type token. This must satisfy the condition that type is
in the PRIMITIVE_TYPES set.”

I tried the following script.

create temporary table dummy(a int);insert into table dummy values
(1), (2), (3);create temporary table tt(a int,  b string, c map<INT,
ARRAY<string>>);insert into table tt select 1, 'a', map(3,
array('a','b','c')) from dummy limit 1;select * from tt;

And called GetResultSetMetadata right after executing the SELECT query.
The value of response.schema.columns was

[TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc(
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]),
position=1, comment=None),
 TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]),
position=2, comment=None),
 TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]),
position=3, comment=None)]

However, according to the thrift file, it should be

[TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc(types=[
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]),
position=1, comment=None),
 TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]),
position=2, comment=None),
 TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[
  TTypeEntry(primitiveEntry=None, arrayEntry=None,
mapEntry=TMapTypeEntry(keyTypePtr=1, valueTypePtr=2),
structEntry=None, unionEntry=None, userDefinedTypeEntry=None),
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None),
arrayEntry=TArrayTypeEntry(objectTypePtr=3), mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None),
typeQualifiers=None), arrayEntry=None, mapEntry=None,
structEntry=None, unionEntry=None, userDefinedTypeEntry=None)
]), position=3, comment=None)]

I found the related function in hive codebase.
It seems that this function always put TPrimitiveTypeEntry to TTypeDesc.type,
even for complex type like array and map which is inconsistent with the
thirft file.

Reply via email to