[
https://issues.apache.org/jira/browse/HIVE-26035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683928#comment-17683928
]
Venugopal Reddy K commented on HIVE-26035:
--
*Without direct sql(5 concurrent threads each creating 100 partitions):*
{noformat}
kvenureddy@192 hclient % java -jar
./metastore-benchmarks/target/hmsbench-jar-with-dependencies.jar -H localhost
--savedata /tmp/benchdata --sanitize -N 100 -o
0302bench_results_http_modified-1.csv -C -d testbench_http --params=100 -E
'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E
'listPartitions.*' -E 'getPartitions.*' -E 'getPartitionsByNames.*' -E
'getPartitionNames.*' -E 'listPartition' -E 'getPartition' -E 'getNid' -E
'listDatabases' -E 'getTable' -E 'createTable' -T 5
Operation Mean Med Min Max Err%
addPartition 62.47 55.34 32.66 172.1 37.15
addPartitions.100 191.8 182.3 167.0 292.5 12.68
concurrentPartitionAdd#5.100 1476 1464 1351 2162 6.815
{noformat}
*With direct sql(5 concurrent threads each creating 100 partitions):*
{noformat}
kvenureddy@192 hclient % java -jar
./metastore-benchmarks/target/hmsbench-jar-with-dependencies.jar -H localhost
--savedata /tmp/benchdata --sanitize -N 100 -o
0302bench_results_http_modified.csv -C -d testbench_http --params=100 -E
'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E
'listPartitions.*' -E 'getPartitions.*' -E 'getPartitionsByNames.*' -E
'getPartitionNames.*' -E 'listPartition' -E 'getPartition' -E 'getNid' -E
'listDatabases' -E 'getTable' -E 'createTable' -T 5
Operation Mean Med Min Max Err%
addPartition 66.33 59.16 36.85 176.2 40.69
addPartitions.100 81.58 74.11 59.33 240.6 31.03
concurrentPartitionAdd#5.100 410.4 391.4 345.8 1063
19.04{noformat}
Add 100 partitions and 1000 partitions in milliseconds.
{*}Base version({*}{*}Without direct SQL){*}{*}:{*}
|*Operation*|*Mean*|*Med*|*Min*|*Max*|*Err%*|
|*addPartitions.100*|*189.552*|*176.996*|*149.402*|*314.393*|*18.2392*|
|*addPartitions.1000*|*1641.48*|*1624.37*|*1577.92*|*1847.76*|*3.07802*|
|*concurrentPartitionAdd#2.100*|*390.799*|*377.246*|*352.988*|*544.446*|*8.13874*|
|*concurrentPartitionAdd#2.1000*|*3441.06*|*3419.13*|*.46*|*3931.22*|*2.50776*|
{*}After modification({*}{*}With direct SQL){*}{*}:{*}
||*Operation*||*Mean*||*Med*||*Min*||*Max*||*Err%*||
||*addPartitions.100*||*83.0217*||*72.2195*||*58.8024*||*214.897*||*33.4667*||
||*addPartitions.1000*||*506.649*||*496.345*||*473.402*||*687.063*||*6.23298*||
||*concurrentPartitionAdd#2.100*||*178.152*||*168.228*||*150.619*||*304.203*||*14.7953*||
|*concurrentPartitionAdd#2.1000*|*1144.33*|*1132.06*|*1092.98*|*1456.85*|*4.02413*|
> Explore moving to directsql for ObjectStore::addPartitions
> --
>
> Key: HIVE-26035
> URL: https://issues.apache.org/jira/browse/HIVE-26035
> Project: Hive
> Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Venugopal Reddy K
>Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 8.5h
> Remaining Estimate: 0h
>
> Currently {{addPartitions}} uses datanuclues and is super slow for large
> number of partitions. It will be good to move to direct sql. Lots of repeated
> SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)