[
https://issues.apache.org/jira/browse/HIVE-26035?focusedWorklogId=841829&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-841829
]
ASF GitHub Bot logged work on HIVE-26035:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Jan/23 18:34
Start Date: 26/Jan/23 18:34
Worklog Time Spent: 10m
Work Description: saihemanth-cloudera commented on code in PR #3905:
URL: https://github.com/apache/hive/pull/3905#discussion_r1087110063
##########
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java:
##########
@@ -692,6 +692,8 @@ public enum ConfVars {
"Default transaction isolation level for identity generation."),
DATANUCLEUS_USE_LEGACY_VALUE_STRATEGY("datanucleus.rdbms.useLegacyNativeValueStrategy",
"datanucleus.rdbms.useLegacyNativeValueStrategy", true, ""),
+ DATANUCLEUS_QUERY_SQL_ALLOWALL("datanucleus.query.sql.allowAll",
"datanucleus.query.sql.allowAll",
+ true, "Allow insert, update and delete operations from JDO SQL"),
Review Comment:
Can this description be more detailed? Like the performance impact of this
config.
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlInsertPart.java:
##########
@@ -0,0 +1,835 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import static org.apache.commons.lang3.StringUtils.repeat;
+import static org.apache.hadoop.hive.metastore.Batchable.NO_BATCHING;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import javax.jdo.PersistenceManager;
+
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.model.MColumnDescriptor;
+import org.apache.hadoop.hive.metastore.model.MFieldSchema;
+import org.apache.hadoop.hive.metastore.model.MOrder;
+import org.apache.hadoop.hive.metastore.model.MPartition;
+import org.apache.hadoop.hive.metastore.model.MPartitionColumnPrivilege;
+import org.apache.hadoop.hive.metastore.model.MPartitionPrivilege;
+import org.apache.hadoop.hive.metastore.model.MSerDeInfo;
+import org.apache.hadoop.hive.metastore.model.MStorageDescriptor;
+import org.apache.hadoop.hive.metastore.model.MStringList;
+import org.datanucleus.ExecutionContext;
+import org.datanucleus.api.jdo.JDOPersistenceManager;
+import org.datanucleus.identity.DatastoreId;
+import org.datanucleus.metadata.AbstractClassMetaData;
+import org.datanucleus.metadata.IdentityType;
+
+/**
+ * This class contains the methods to insert into tables on the underlying
database using direct SQL
+ */
+class DirectSqlInsertPart {
+ private final PersistenceManager pm;
+ private final DatabaseProduct dbType;
+ private final int batchSize;
+
+ public DirectSqlInsertPart(PersistenceManager pm, DatabaseProduct dbType,
int batchSize) {
+ this.pm = pm;
+ this.dbType = dbType;
+ this.batchSize = batchSize;
+ }
+
+ /**
+ * Interface to execute multiple row insert query in batch for direct SQL
+ */
+ interface BatchExecutionContext {
+ void execute(String batchQueryText, int batchRowCount, int
batchParamCount) throws MetaException;
+ }
+
+ private Long getDataStoreId(Class<?> modelClass) throws MetaException {
Review Comment:
Can you add java docs for all the newly added methods?
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlInsertPart.java:
##########
@@ -0,0 +1,835 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import static org.apache.commons.lang3.StringUtils.repeat;
+import static org.apache.hadoop.hive.metastore.Batchable.NO_BATCHING;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import javax.jdo.PersistenceManager;
+
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.model.MColumnDescriptor;
+import org.apache.hadoop.hive.metastore.model.MFieldSchema;
+import org.apache.hadoop.hive.metastore.model.MOrder;
+import org.apache.hadoop.hive.metastore.model.MPartition;
+import org.apache.hadoop.hive.metastore.model.MPartitionColumnPrivilege;
+import org.apache.hadoop.hive.metastore.model.MPartitionPrivilege;
+import org.apache.hadoop.hive.metastore.model.MSerDeInfo;
+import org.apache.hadoop.hive.metastore.model.MStorageDescriptor;
+import org.apache.hadoop.hive.metastore.model.MStringList;
+import org.datanucleus.ExecutionContext;
+import org.datanucleus.api.jdo.JDOPersistenceManager;
+import org.datanucleus.identity.DatastoreId;
+import org.datanucleus.metadata.AbstractClassMetaData;
+import org.datanucleus.metadata.IdentityType;
+
+/**
+ * This class contains the methods to insert into tables on the underlying
database using direct SQL
+ */
+class DirectSqlInsertPart {
+ private final PersistenceManager pm;
+ private final DatabaseProduct dbType;
+ private final int batchSize;
+
+ public DirectSqlInsertPart(PersistenceManager pm, DatabaseProduct dbType,
int batchSize) {
+ this.pm = pm;
+ this.dbType = dbType;
+ this.batchSize = batchSize;
+ }
+
+ /**
+ * Interface to execute multiple row insert query in batch for direct SQL
+ */
+ interface BatchExecutionContext {
+ void execute(String batchQueryText, int batchRowCount, int
batchParamCount) throws MetaException;
+ }
+
+ private Long getDataStoreId(Class<?> modelClass) throws MetaException {
+ ExecutionContext ec = ((JDOPersistenceManager) pm).getExecutionContext();
+ AbstractClassMetaData cmd =
ec.getMetaDataManager().getMetaDataForClass(modelClass,
ec.getClassLoaderResolver());
+ if (cmd.getIdentityType() == IdentityType.DATASTORE) {
+ return (Long) ec.getStoreManager().getValueGenerationStrategyValue(ec,
cmd, -1);
+ } else {
+ throw new MetaException("Identity type is not datastore.");
+ }
+ }
+
+ private void insertInBatch(String tableName, String columns, int
columnCount, int rowCount,
+ BatchExecutionContext bec) throws MetaException {
+ if (rowCount == 0 || columnCount == 0) {
+ return;
+ }
+ int maxRowsInBatch = (batchSize == NO_BATCHING) ? rowCount : batchSize;
+ int maxBatches = rowCount / maxRowsInBatch;
+ int last = rowCount % maxRowsInBatch;
+ String rowFormat = "(" + repeat(",?", columnCount).substring(1) + ")";
+ String query = "";
+ if (maxBatches > 0) {
+ query = dbType.getBatchInsertQuery(tableName, columns, rowFormat,
maxRowsInBatch);
+ }
+ int batchParamCount = maxRowsInBatch * columnCount;
+ for (int batch = 0; batch < maxBatches; batch++) {
+ bec.execute(query, maxRowsInBatch, batchParamCount);
Review Comment:
We can directly pass 'maxRowsInBatch * columnCount' into the execute method
since there are no other usages of batchParamCount variable.
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlInsertPart.java:
##########
@@ -0,0 +1,835 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import static org.apache.commons.lang3.StringUtils.repeat;
+import static org.apache.hadoop.hive.metastore.Batchable.NO_BATCHING;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import javax.jdo.PersistenceManager;
+
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.model.MColumnDescriptor;
+import org.apache.hadoop.hive.metastore.model.MFieldSchema;
+import org.apache.hadoop.hive.metastore.model.MOrder;
+import org.apache.hadoop.hive.metastore.model.MPartition;
+import org.apache.hadoop.hive.metastore.model.MPartitionColumnPrivilege;
+import org.apache.hadoop.hive.metastore.model.MPartitionPrivilege;
+import org.apache.hadoop.hive.metastore.model.MSerDeInfo;
+import org.apache.hadoop.hive.metastore.model.MStorageDescriptor;
+import org.apache.hadoop.hive.metastore.model.MStringList;
+import org.datanucleus.ExecutionContext;
+import org.datanucleus.api.jdo.JDOPersistenceManager;
+import org.datanucleus.identity.DatastoreId;
+import org.datanucleus.metadata.AbstractClassMetaData;
+import org.datanucleus.metadata.IdentityType;
+
+/**
+ * This class contains the methods to insert into tables on the underlying
database using direct SQL
+ */
+class DirectSqlInsertPart {
+ private final PersistenceManager pm;
+ private final DatabaseProduct dbType;
+ private final int batchSize;
+
+ public DirectSqlInsertPart(PersistenceManager pm, DatabaseProduct dbType,
int batchSize) {
+ this.pm = pm;
+ this.dbType = dbType;
+ this.batchSize = batchSize;
+ }
+
+ /**
+ * Interface to execute multiple row insert query in batch for direct SQL
+ */
+ interface BatchExecutionContext {
+ void execute(String batchQueryText, int batchRowCount, int
batchParamCount) throws MetaException;
+ }
+
+ private Long getDataStoreId(Class<?> modelClass) throws MetaException {
+ ExecutionContext ec = ((JDOPersistenceManager) pm).getExecutionContext();
+ AbstractClassMetaData cmd =
ec.getMetaDataManager().getMetaDataForClass(modelClass,
ec.getClassLoaderResolver());
+ if (cmd.getIdentityType() == IdentityType.DATASTORE) {
+ return (Long) ec.getStoreManager().getValueGenerationStrategyValue(ec,
cmd, -1);
+ } else {
+ throw new MetaException("Identity type is not datastore.");
+ }
+ }
+
+ private void insertInBatch(String tableName, String columns, int
columnCount, int rowCount,
+ BatchExecutionContext bec) throws MetaException {
Review Comment:
'bec' can this variable be renamed to something more meaningful?
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
##########
@@ -2606,39 +2606,69 @@ public boolean addPartitions(String catName, String
dbName, String tblName, List
tabGrants = this.listAllTableGrants(catName, dbName, tblName);
tabColumnGrants = this.listTableAllColumnGrants(catName, dbName,
tblName);
}
- List<Object> toPersist = new ArrayList<>();
+ List<MPartition> mParts = new ArrayList<>();
+ List<List<MPartitionPrivilege>> mPartPrivilegesList = new ArrayList<>();
+ List<List<MPartitionColumnPrivilege>> mPartColPrivilegesList = new
ArrayList<>();
for (Partition part : parts) {
if (!part.getTableName().equals(tblName) ||
!part.getDbName().equals(dbName)) {
throw new MetaException("Partition does not belong to target table "
+ dbName + "." + tblName + ": " + part);
}
MPartition mpart = convertToMPart(part, table, true);
-
- toPersist.add(mpart);
+ mParts.add(mpart);
int now = (int) (System.currentTimeMillis() / 1000);
+ List<MPartitionPrivilege> mPartPrivileges = new ArrayList<>();
if (tabGrants != null) {
for (MTablePrivilege tab: tabGrants) {
- toPersist.add(new MPartitionPrivilege(tab.getPrincipalName(),
- tab.getPrincipalType(), mpart, tab.getPrivilege(), now,
- tab.getGrantor(), tab.getGrantorType(), tab.getGrantOption(),
- tab.getAuthorizer()));
+ MPartitionPrivilege mPartPrivilege = new
MPartitionPrivilege(tab.getPrincipalName(), tab.getPrincipalType(),
+ mpart, tab.getPrivilege(), now, tab.getGrantor(),
tab.getGrantorType(), tab.getGrantOption(),
+ tab.getAuthorizer());
+ mPartPrivileges.add(mPartPrivilege);
}
}
+ List<MPartitionColumnPrivilege> mPartColumnPrivileges = new
ArrayList<>();
if (tabColumnGrants != null) {
for (MTableColumnPrivilege col : tabColumnGrants) {
- toPersist.add(new MPartitionColumnPrivilege(col.getPrincipalName(),
- col.getPrincipalType(), mpart, col.getColumnName(),
col.getPrivilege(),
- now, col.getGrantor(), col.getGrantorType(),
col.getGrantOption(),
- col.getAuthorizer()));
+ MPartitionColumnPrivilege mPartColumnPrivilege = new
MPartitionColumnPrivilege(col.getPrincipalName(),
+ col.getPrincipalType(), mpart, col.getColumnName(),
col.getPrivilege(), now, col.getGrantor(),
+ col.getGrantorType(), col.getGrantOption(),
col.getAuthorizer());
+ mPartColumnPrivileges.add(mPartColumnPrivilege);
}
}
+ mPartPrivilegesList.add(mPartPrivileges);
+ mPartColPrivilegesList.add(mPartColumnPrivileges);
}
- if (CollectionUtils.isNotEmpty(toPersist)) {
- pm.makePersistentAll(toPersist);
- pm.flush();
- }
+ if (CollectionUtils.isNotEmpty(mParts)) {
+ GetHelper<Void> helper = new GetHelper<Void>(null, null, null, true,
+ true) {
+ @Override
+ protected Void getSqlResult(GetHelper<Void> ctx) throws
MetaException {
+ directSql.addPartitions(mParts, mPartPrivilegesList,
mPartColPrivilegesList);
+ return null;
+ }
+
+ @Override
+ protected Void getJdoResult(GetHelper<Void> ctx) {
+ List<Object> toPersist = new ArrayList<>(mParts);
+ mPartPrivilegesList.forEach(toPersist::addAll);
+ mPartColPrivilegesList.forEach(toPersist::addAll);
+ pm.makePersistentAll(toPersist);
+ pm.flush();
+ return null;
Review Comment:
Do you mean 'return;'?
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlInsertPart.java:
##########
@@ -0,0 +1,835 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import static org.apache.commons.lang3.StringUtils.repeat;
+import static org.apache.hadoop.hive.metastore.Batchable.NO_BATCHING;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import javax.jdo.PersistenceManager;
+
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.model.MColumnDescriptor;
+import org.apache.hadoop.hive.metastore.model.MFieldSchema;
+import org.apache.hadoop.hive.metastore.model.MOrder;
+import org.apache.hadoop.hive.metastore.model.MPartition;
+import org.apache.hadoop.hive.metastore.model.MPartitionColumnPrivilege;
+import org.apache.hadoop.hive.metastore.model.MPartitionPrivilege;
+import org.apache.hadoop.hive.metastore.model.MSerDeInfo;
+import org.apache.hadoop.hive.metastore.model.MStorageDescriptor;
+import org.apache.hadoop.hive.metastore.model.MStringList;
+import org.datanucleus.ExecutionContext;
+import org.datanucleus.api.jdo.JDOPersistenceManager;
+import org.datanucleus.identity.DatastoreId;
+import org.datanucleus.metadata.AbstractClassMetaData;
+import org.datanucleus.metadata.IdentityType;
+
+/**
+ * This class contains the methods to insert into tables on the underlying
database using direct SQL
+ */
+class DirectSqlInsertPart {
+ private final PersistenceManager pm;
+ private final DatabaseProduct dbType;
+ private final int batchSize;
+
+ public DirectSqlInsertPart(PersistenceManager pm, DatabaseProduct dbType,
int batchSize) {
+ this.pm = pm;
+ this.dbType = dbType;
+ this.batchSize = batchSize;
+ }
+
+ /**
+ * Interface to execute multiple row insert query in batch for direct SQL
+ */
+ interface BatchExecutionContext {
+ void execute(String batchQueryText, int batchRowCount, int
batchParamCount) throws MetaException;
+ }
+
+ private Long getDataStoreId(Class<?> modelClass) throws MetaException {
+ ExecutionContext ec = ((JDOPersistenceManager) pm).getExecutionContext();
+ AbstractClassMetaData cmd =
ec.getMetaDataManager().getMetaDataForClass(modelClass,
ec.getClassLoaderResolver());
+ if (cmd.getIdentityType() == IdentityType.DATASTORE) {
+ return (Long) ec.getStoreManager().getValueGenerationStrategyValue(ec,
cmd, -1);
+ } else {
+ throw new MetaException("Identity type is not datastore.");
+ }
+ }
+
+ private void insertInBatch(String tableName, String columns, int
columnCount, int rowCount,
+ BatchExecutionContext bec) throws MetaException {
+ if (rowCount == 0 || columnCount == 0) {
+ return;
+ }
+ int maxRowsInBatch = (batchSize == NO_BATCHING) ? rowCount : batchSize;
+ int maxBatches = rowCount / maxRowsInBatch;
+ int last = rowCount % maxRowsInBatch;
+ String rowFormat = "(" + repeat(",?", columnCount).substring(1) + ")";
+ String query = "";
+ if (maxBatches > 0) {
+ query = dbType.getBatchInsertQuery(tableName, columns, rowFormat,
maxRowsInBatch);
+ }
+ int batchParamCount = maxRowsInBatch * columnCount;
+ for (int batch = 0; batch < maxBatches; batch++) {
+ bec.execute(query, maxRowsInBatch, batchParamCount);
+ }
+ if (last != 0) {
+ query = dbType.getBatchInsertQuery(tableName, columns, rowFormat, last);
+ bec.execute(query, last, last * columnCount);
+ }
+ }
+
+ private void insertSerdeInBatch(Map<Long, MSerDeInfo> serdeIdToSerDeInfo)
throws MetaException {
+ int rowCount = serdeIdToSerDeInfo.size();
+ String columns =
"(\"SERDE_ID\",\"DESCRIPTION\",\"DESERIALIZER_CLASS\",\"NAME\",\"SERDE_TYPE\",\"SLIB\","
+ + "\"SERIALIZER_CLASS\")";
+ int columnCount = 7;
+ BatchExecutionContext bec = new BatchExecutionContext() {
+ final Iterator<Map.Entry<Long, MSerDeInfo>> it =
serdeIdToSerDeInfo.entrySet().iterator();
+ @Override
+ public void execute(String batchQueryText, int batchRowCount, int
batchParamCount) throws MetaException {
+ Object[] params = new Object[batchParamCount];
+ int paramIndex = 0;
+ for (int index = 0; index < batchRowCount; index++) {
+ Map.Entry<Long, MSerDeInfo> entry = it.next();
+ MSerDeInfo serdeInfo = entry.getValue();
+ params[paramIndex++] = entry.getKey();
+ params[paramIndex++] = serdeInfo.getDescription();
+ params[paramIndex++] = serdeInfo.getDeserializerClass();
+ params[paramIndex++] = serdeInfo.getName();
+ params[paramIndex++] = serdeInfo.getSerdeType();
+ params[paramIndex++] = serdeInfo.getSerializationLib();
+ params[paramIndex++] = serdeInfo.getSerializerClass();
+ }
+ try (QueryWrapper query = new
QueryWrapper(pm.newQuery("javax.jdo.query.SQL", batchQueryText))) {
+ MetastoreDirectSqlUtils.executeWithArray(query.getInnerQuery(),
params, batchQueryText);
+ }
Review Comment:
This is a common piece of code in most of the methods in this class. Can we
move this out to a new method and call this in every other method?
Issue Time Tracking
-------------------
Worklog Id: (was: 841829)
Time Spent: 4h (was: 3h 50m)
> Explore moving to directsql for ObjectStore::addPartitions
> ----------------------------------------------------------
>
> Key: HIVE-26035
> URL: https://issues.apache.org/jira/browse/HIVE-26035
> Project: Hive
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Venugopal Reddy K
> Priority: Major
> Labels: pull-request-available
> Time Spent: 4h
> Remaining Estimate: 0h
>
> Currently {{addPartitions}} uses datanuclues and is super slow for large
> number of partitions. It will be good to move to direct sql. Lots of repeated
> SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)