[ 
https://issues.apache.org/jira/browse/KUDU-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613067#comment-17613067
 ] 

ASF subversion and git services commented on KUDU-3401:
-------------------------------------------------------

Commit 3eb4607e4e04044f92a2100b44574d730c6e05e6 in kudu's branch 
refs/heads/master from mammadli.khazar
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3eb4607e4 ]

KUDU-3401 Fix table creation with HMS Integration

Hive queries on Kudu Tables were failing with the following stack trace:

ERROR : Failed
org.apache.hadoop.hive.metastore.api.MetaException: 
java.lang.ClassNotFoundException Class not found
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:98)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:331)

The issue was due to the Kudu HMS Client not sending the fields required
by Hive, namely the Input/Outputformat and Serialization library for the
created table when making a create table request. Thus, running queries through
Hive on Kudu tables would fail due to these fields missing in the HMS
Backend Database.

This patch adds the missing Input/Output formats and Serialization
library to table creation with Kudu HMS Integration.The patch also extends
the current test cases to cover the  added fields. Manually tested on a
seperate cluster by creating a Kudu table with several columns via
"stored as kudu", confirmed the missing data is sent by checking the
parameters of the create_table request in Hive log files, and checked
that the data is written to the HMS Backend Database by going through the SDS
table for INPUT_FORMAT, OUTPUT_FORMAT and SERDES table for SLIB to see
if the data was filled for the newly created kudu table.
Ran a few Hive queries on the created Kudu tables and confirmed that no errors
are present.

Change-Id: Ia1b53b55005e2899d8575b0fb7250351d914afb4
Reviewed-on: http://gerrit.cloudera.org:8080/19026
Reviewed-by: Alexey Serbin <ale...@apache.org>
Reviewed-by: Zoltan Chovan <zcho...@cloudera.com>
Tested-by: Attila Bukor <abu...@apache.org>
Reviewed-by: Attila Bukor <abu...@apache.org>


> Unable to query Kudu tables from Hive with Kudu HMS Integration enabled
> -----------------------------------------------------------------------
>
>                 Key: KUDU-3401
>                 URL: https://issues.apache.org/jira/browse/KUDU-3401
>             Project: Kudu
>          Issue Type: Bug
>          Components: hms
>            Reporter: Khazar Mammadli
>            Assignee: Khazar Mammadli
>            Priority: Major
>
> When Kudu HMS integration is enabled there are several missing fields when 
> creating a table via query  "stored as kudu table" on Impala from hive. This 
> results in ClassNotFound error when trying to query the table from Hive after 
> creating the table:
>  
> {code:java}
> ERROR : Failed
> org.apache.hadoop.hive.metastore.api.MetaException: 
> java.lang.ClassNotFoundException Class not found
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:98)
>  ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77)
>  ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
> at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:331)
>  ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141] {code}
>  
> When running a following sample query in Impala to create a kudu table with 
> Kudu HMS integration enabled the table gets created with the InputFormat, 
> OutputFormat and SerDe Library fields are missing
>  
> {code:java}
> create table default.kudu_test (
> col1 string comment 'col1',
> col2 string comment 'col2',
> primary key (col1)
> )
> comment 'kudu_test'
> stored as kudu;{code}
>  
> |SerDe Library:| |NULL|
> |InputFormat:| |NULL|
> |OutputFormat:| |NULL|
> Hive Metastore log for the table creation:
> INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-124]: 
> 134: source:172.25.35.0 create_table: Table(tableName:kudu_test, 
> dbName:default, owner:root, createTime:0, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1), 
> FieldSchema(name:col2, type:string, comment:col2)], location:, inputFormat:, 
> outputFormat:, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:, 
> serializationLib:, parameters:{}), bucketCols:[], sortCols:[], 
> parameters:{}), partitionKeys:[], 
> parameters:{kudu.table_name=default.kudu_test, 
> kudu.table_id=5ac46856863f402fb69941ce7b967945, comment=, 
> kudu.master_addresses=c3549-node2.coelab.cloudera.com:7051, 
> storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler, 
> kudu.cluster_id=65c8dfbc8b75485db1328ab42f55fa07}, viewOriginalText:, 
> viewExpandedText:, tableType:MANAGED_TABLE, temporary:false, ownerType:USER)
> Running the same query in Impala with Kudu HMS Integration disabled on the 
> other hand has these fields populated when the table is created:
> |SerDe Library:|org.apache.hadoop.hive.kudu.KuduSerDe|NULL|
> |InputFormat:|org.apache.hadoop.hive.kudu.KuduInputFormat|NULL|
> |OutputFormat:|org.apache.hadoop.hive.kudu.KuduOutputFormat|NULL|
> Hive Metastore log for table creation:
> NFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-173]: 
> 183: source:172.25.35.0 create_table_req: Table(tableName:kudu_test, 
> dbName:default, owner:root, createTime:0, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1), 
> FieldSchema(name:col2, type:string, comment:col2)], location:null, 
> inputFormat:org.apache.hadoop.hive.kudu.KuduInputFormat, 
> outputFormat:org.apache.hadoop.hive.kudu.KuduOutputFormat, compressed:false, 
> numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.kudu.KuduSerDe, parameters:{}), 
> bucketCols:[], sortCols:[], parameters:null), partitionKeys:[], 
> parameters:{comment=kudu_test_lbodor_no_hms_integration, 
> kudu.master_addresses=c3549-node2.coelab.cloudera.com, 
> storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler, 
> kudu.table_name=impala::default.kudu_test}, viewOriginalText:null, 
> viewExpandedText:null, tableType:MANAGED_TABLE, catName:hive, ownerType:USER, 
> accessType:8)
> --------------------------------
> Code path for table creation when Kudu HMS integration enabled(Kudu Codepath):
> Quick recap of steps when creating a kudu table:
> HMSCatalog::CreateTable() —> hive::Table declared and passed to 
> PopulateTable(… , &table) -> Thirft client Execute call —> 
> HMSClient::CreateTable(Table(one that just got populated), 
> envcontext(default)) -> 
> hms_client.create_table_with_environment_context(table, envcontext). 
> CreateTable
> [https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L146] 
> ->
> Populate the fields of table
> [https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L367]
> Hms client call
> [https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_client.cc#L280]
> ----------------------------- 
> Code path for table creation when Kudu HMS integration is disabled(Impala 
> Codepath):
> CreateTable -> CreateMetaStoreTable
> [https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3191]
> ->line 3248 tbl.setSd(createSd(params)); 
> CreateSd
> [https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3260|https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/fe/src/main/java/org/apache/impala/catalog/HiveStorageDescriptorFactory.java#L36]
>  
> Checking the code paths its observable that the missing fields are filled via 
> CreateSd with default values for the table getting created without Kudu HMS 
> integration(Through Impala).
> These fields are untouched when Kudu HMS integration is enabled and table is 
> getting created(Kudu code path). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to