Khazar Mammadli created KUDU-3401:
-------------------------------------

             Summary: Unable to query Kudu tables from Hive with Kudu HMS 
Integration enabled
                 Key: KUDU-3401
                 URL: https://issues.apache.org/jira/browse/KUDU-3401
             Project: Kudu
          Issue Type: Bug
          Components: hms
            Reporter: Khazar Mammadli
            Assignee: Khazar Mammadli


When Kudu HMS integration is enabled there are several missing fields when 
creating a table via query  "stored as kudu table" on Impala from hive. This 
results in ClassNotFound error when trying to query the table from Hive after 
creating the table:

 
{code:java}
ERROR : Failed
org.apache.hadoop.hive.metastore.api.MetaException: 
java.lang.ClassNotFoundException Class not found
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:98)
 ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77)
 ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:331)
 ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141] {code}
 

When running a following sample query in Impala to create a kudu table with 
Kudu HMS integration enabled the table gets created with the InputFormat, 
OutputFormat and SerDe Library fields are missing

 
{code:java}
create table default.kudu_test (
col1 string comment 'col1',
col2 string comment 'col2',
primary key (col1)
)
comment 'kudu_test'
stored as kudu;{code}
 
|SerDe Library:| |NULL|
|InputFormat:| |NULL|
|OutputFormat:| |NULL|

Hive Metastore log for the table creation:
INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-124]: 134: 
source:172.25.35.0 create_table: Table(tableName:kudu_test, dbName:default, 
owner:root, createTime:0, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1), 
FieldSchema(name:col2, type:string, comment:col2)], location:, inputFormat:, 
outputFormat:, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:, 
serializationLib:, parameters:{}), bucketCols:[], sortCols:[], parameters:{}), 
partitionKeys:[], parameters:{kudu.table_name=default.kudu_test, 
kudu.table_id=5ac46856863f402fb69941ce7b967945, comment=, 
kudu.master_addresses=c3549-node2.coelab.cloudera.com:7051, 
storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler, 
kudu.cluster_id=65c8dfbc8b75485db1328ab42f55fa07}, viewOriginalText:, 
viewExpandedText:, tableType:MANAGED_TABLE, temporary:false, ownerType:USER)
Running the same query in Impala with Kudu HMS Integration disabled on the 
other hand has these fields populated when the table is created:
|SerDe Library:|org.apache.hadoop.hive.kudu.KuduSerDe|NULL|
|InputFormat:|org.apache.hadoop.hive.kudu.KuduInputFormat|NULL|
|OutputFormat:|org.apache.hadoop.hive.kudu.KuduOutputFormat|NULL|

Hive Metastore log for table creation:
NFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-173]: 183: 
source:172.25.35.0 create_table_req: Table(tableName:kudu_test, dbName:default, 
owner:root, createTime:0, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1), 
FieldSchema(name:col2, type:string, comment:col2)], location:null, 
inputFormat:org.apache.hadoop.hive.kudu.KuduInputFormat, 
outputFormat:org.apache.hadoop.hive.kudu.KuduOutputFormat, compressed:false, 
numBuckets:0, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.kudu.KuduSerDe, parameters:{}), 
bucketCols:[], sortCols:[], parameters:null), partitionKeys:[], 
parameters:{comment=kudu_test_lbodor_no_hms_integration, 
kudu.master_addresses=c3549-node2.coelab.cloudera.com, 
storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler, 
kudu.table_name=impala::default.kudu_test}, viewOriginalText:null, 
viewExpandedText:null, tableType:MANAGED_TABLE, catName:hive, ownerType:USER, 
accessType:8)
--------------------------------
Code path for table creation when Kudu HMS integration enabled(Kudu Codepath):

Quick recap of steps when creating a kudu table:

HMSCatalog::CreateTable() —> hive::Table declared and passed to PopulateTable(… 
, &table) -> Thirft client Execute call —> HMSClient::CreateTable(Table(one 
that just got populated), envcontext(default)) -> 
hms_client.create_table_with_environment_context(table, envcontext). 

CreateTable

[https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L146] ->

Populate the fields of table

[https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L367]

Hms client call

[https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_client.cc#L280]

----------------------------- 

Code path for table creation when Kudu HMS integration is disabled(Impala 
Codepath):
CreateTable -> CreateMetaStoreTable

[https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3191]

->line 3248 tbl.setSd(createSd(params)); 

CreateSd

[https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3260|https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/fe/src/main/java/org/apache/impala/catalog/HiveStorageDescriptorFactory.java#L36]

 

Checking the code paths its observable that the missing fields are filled via 
CreateSd with default values for the table getting created without Kudu HMS 
integration(Through Impala).

These fields are untouched when Kudu HMS integration is enabled and table is 
getting created(Kudu code path). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to