[ https://issues.apache.org/jira/browse/KUDU-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613067#comment-17613067 ]
ASF subversion and git services commented on KUDU-3401: ------------------------------------------------------- Commit 3eb4607e4e04044f92a2100b44574d730c6e05e6 in kudu's branch refs/heads/master from mammadli.khazar [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3eb4607e4 ] KUDU-3401 Fix table creation with HMS Integration Hive queries on Kudu Tables were failing with the following stack trace: ERROR : Failed org.apache.hadoop.hive.metastore.api.MetaException: java.lang.ClassNotFoundException Class not found at org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:98) at org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:331) The issue was due to the Kudu HMS Client not sending the fields required by Hive, namely the Input/Outputformat and Serialization library for the created table when making a create table request. Thus, running queries through Hive on Kudu tables would fail due to these fields missing in the HMS Backend Database. This patch adds the missing Input/Output formats and Serialization library to table creation with Kudu HMS Integration.The patch also extends the current test cases to cover the added fields. Manually tested on a seperate cluster by creating a Kudu table with several columns via "stored as kudu", confirmed the missing data is sent by checking the parameters of the create_table request in Hive log files, and checked that the data is written to the HMS Backend Database by going through the SDS table for INPUT_FORMAT, OUTPUT_FORMAT and SERDES table for SLIB to see if the data was filled for the newly created kudu table. Ran a few Hive queries on the created Kudu tables and confirmed that no errors are present. Change-Id: Ia1b53b55005e2899d8575b0fb7250351d914afb4 Reviewed-on: http://gerrit.cloudera.org:8080/19026 Reviewed-by: Alexey Serbin <ale...@apache.org> Reviewed-by: Zoltan Chovan <zcho...@cloudera.com> Tested-by: Attila Bukor <abu...@apache.org> Reviewed-by: Attila Bukor <abu...@apache.org> > Unable to query Kudu tables from Hive with Kudu HMS Integration enabled > ----------------------------------------------------------------------- > > Key: KUDU-3401 > URL: https://issues.apache.org/jira/browse/KUDU-3401 > Project: Kudu > Issue Type: Bug > Components: hms > Reporter: Khazar Mammadli > Assignee: Khazar Mammadli > Priority: Major > > When Kudu HMS integration is enabled there are several missing fields when > creating a table via query "stored as kudu table" on Impala from hive. This > results in ClassNotFound error when trying to query the table from Hive after > creating the table: > > {code:java} > ERROR : Failed > org.apache.hadoop.hive.metastore.api.MetaException: > java.lang.ClassNotFoundException Class not found > at > org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:98) > ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141] > at > org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77) > ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141] > at > org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:331) > ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141] {code} > > When running a following sample query in Impala to create a kudu table with > Kudu HMS integration enabled the table gets created with the InputFormat, > OutputFormat and SerDe Library fields are missing > > {code:java} > create table default.kudu_test ( > col1 string comment 'col1', > col2 string comment 'col2', > primary key (col1) > ) > comment 'kudu_test' > stored as kudu;{code} > > |SerDe Library:| |NULL| > |InputFormat:| |NULL| > |OutputFormat:| |NULL| > Hive Metastore log for the table creation: > INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-124]: > 134: source:172.25.35.0 create_table: Table(tableName:kudu_test, > dbName:default, owner:root, createTime:0, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1), > FieldSchema(name:col2, type:string, comment:col2)], location:, inputFormat:, > outputFormat:, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:, > serializationLib:, parameters:{}), bucketCols:[], sortCols:[], > parameters:{}), partitionKeys:[], > parameters:{kudu.table_name=default.kudu_test, > kudu.table_id=5ac46856863f402fb69941ce7b967945, comment=, > kudu.master_addresses=c3549-node2.coelab.cloudera.com:7051, > storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler, > kudu.cluster_id=65c8dfbc8b75485db1328ab42f55fa07}, viewOriginalText:, > viewExpandedText:, tableType:MANAGED_TABLE, temporary:false, ownerType:USER) > Running the same query in Impala with Kudu HMS Integration disabled on the > other hand has these fields populated when the table is created: > |SerDe Library:|org.apache.hadoop.hive.kudu.KuduSerDe|NULL| > |InputFormat:|org.apache.hadoop.hive.kudu.KuduInputFormat|NULL| > |OutputFormat:|org.apache.hadoop.hive.kudu.KuduOutputFormat|NULL| > Hive Metastore log for table creation: > NFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-173]: > 183: source:172.25.35.0 create_table_req: Table(tableName:kudu_test, > dbName:default, owner:root, createTime:0, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1), > FieldSchema(name:col2, type:string, comment:col2)], location:null, > inputFormat:org.apache.hadoop.hive.kudu.KuduInputFormat, > outputFormat:org.apache.hadoop.hive.kudu.KuduOutputFormat, compressed:false, > numBuckets:0, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.kudu.KuduSerDe, parameters:{}), > bucketCols:[], sortCols:[], parameters:null), partitionKeys:[], > parameters:{comment=kudu_test_lbodor_no_hms_integration, > kudu.master_addresses=c3549-node2.coelab.cloudera.com, > storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler, > kudu.table_name=impala::default.kudu_test}, viewOriginalText:null, > viewExpandedText:null, tableType:MANAGED_TABLE, catName:hive, ownerType:USER, > accessType:8) > -------------------------------- > Code path for table creation when Kudu HMS integration enabled(Kudu Codepath): > Quick recap of steps when creating a kudu table: > HMSCatalog::CreateTable() —> hive::Table declared and passed to > PopulateTable(… , &table) -> Thirft client Execute call —> > HMSClient::CreateTable(Table(one that just got populated), > envcontext(default)) -> > hms_client.create_table_with_environment_context(table, envcontext). > CreateTable > [https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L146] > -> > Populate the fields of table > [https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L367] > Hms client call > [https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_client.cc#L280] > ----------------------------- > Code path for table creation when Kudu HMS integration is disabled(Impala > Codepath): > CreateTable -> CreateMetaStoreTable > [https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3191] > ->line 3248 tbl.setSd(createSd(params)); > CreateSd > [https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3260|https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/fe/src/main/java/org/apache/impala/catalog/HiveStorageDescriptorFactory.java#L36] > > Checking the code paths its observable that the missing fields are filled via > CreateSd with default values for the table getting created without Kudu HMS > integration(Through Impala). > These fields are untouched when Kudu HMS integration is enabled and table is > getting created(Kudu code path). -- This message was sent by Atlassian Jira (v8.20.10#820010)