[ 
https://issues.apache.org/jira/browse/HIVE-26035?focusedWorklogId=839295&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-839295
 ]

ASF GitHub Bot logged work on HIVE-26035:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Jan/23 06:08
            Start Date: 16/Jan/23 06:08
    Worklog Time Spent: 10m 
      Work Description: VenuReddy2103 commented on code in PR #3905:
URL: https://github.com/apache/hive/pull/3905#discussion_r1070867838


##########
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java:
##########
@@ -753,6 +755,12 @@ public enum ConfVars {
             "SQL. For some DBs like Oracle and MSSQL, there are hardcoded or 
perf-based limitations\n" +
             "that necessitate this. For DBs that can handle the queries, this 
isn't necessary and\n" +
             "may impede performance. -1 means no batching, 0 means automatic 
batching."),
+    
DIRECT_SQL_MAX_PARAMS_IN_INSERT("metastore.direct.sql.max.parameters.in.insert",

Review Comment:
   Have added the new parameter `metastore.direct.sql.max.parameters.in.insert` 
because multiple rows insert query performance would depend on the number of 
columns in the table too. Each table may have different column count. With this 
parameter, can calculate the maximum rows to insert with a single insert query 
for the table as show below. Number of rows in a multirow insert query would 
vary for each table depending upon the number of columns in the table. 
   
      `int maxRowsInBatch = maxParamsCount / columnCount; => max rows in the 
query
       int maxBatches = rowCount / maxRowsInBatch;`





Issue Time Tracking
-------------------

    Worklog Id:     (was: 839295)
    Time Spent: 2h  (was: 1h 50m)

> Explore moving to directsql for ObjectStore::addPartitions
> ----------------------------------------------------------
>
>                 Key: HIVE-26035
>                 URL: https://issues.apache.org/jira/browse/HIVE-26035
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Venugopal Reddy K
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently {{addPartitions}} uses datanuclues and is super slow for large 
> number of partitions. It will be good to move to direct sql. Lots of repeated 
> SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to