[jira] [Commented] (HIVE-26035) Explore moving to directsql for ObjectStore::addPartitions

2023-02-03 Thread Venugopal Reddy K (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683928#comment-17683928
 ] 

Venugopal Reddy K commented on HIVE-26035:
--

*Without direct sql(5 concurrent threads each creating 100 partitions):*
{noformat}
kvenureddy@192 hclient % java -jar 
./metastore-benchmarks/target/hmsbench-jar-with-dependencies.jar -H localhost 
--savedata /tmp/benchdata --sanitize -N 100 -o 
0302bench_results_http_modified-1.csv -C -d testbench_http --params=100  -E 
'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E 
'listPartitions.*' -E 'getPartitions.*' -E 'getPartitionsByNames.*' -E 
'getPartitionNames.*' -E 'listPartition' -E 'getPartition'  -E 'getNid' -E 
'listDatabases' -E 'getTable' -E 'createTable'  -T 5
Operation                      Mean     Med      Min      Max      Err%    
addPartition                   62.47    55.34    32.66    172.1    37.15   
addPartitions.100              191.8    182.3    167.0    292.5    12.68   
concurrentPartitionAdd#5.100   1476     1464     1351     2162     6.815 
{noformat}
 

*With direct sql(5 concurrent threads each creating 100 partitions):*
{noformat}
kvenureddy@192 hclient % java -jar 
./metastore-benchmarks/target/hmsbench-jar-with-dependencies.jar -H localhost 
--savedata /tmp/benchdata --sanitize -N 100 -o 
0302bench_results_http_modified.csv -C -d testbench_http --params=100  -E 
'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E 
'listPartitions.*' -E 'getPartitions.*' -E 'getPartitionsByNames.*' -E 
'getPartitionNames.*' -E 'listPartition' -E 'getPartition'  -E 'getNid' -E 
'listDatabases' -E 'getTable' -E 'createTable'  -T 5 
Operation                      Mean     Med      Min      Max      Err%    
addPartition                   66.33    59.16    36.85    176.2    40.69   
addPartitions.100              81.58    74.11    59.33    240.6    31.03   
concurrentPartitionAdd#5.100   410.4    391.4    345.8    1063     
19.04{noformat}
 

Add 100 partitions and 1000 partitions in milliseconds.

{*}Base version({*}{*}Without direct SQL){*}{*}:{*}
|*Operation*|*Mean*|*Med*|*Min*|*Max*|*Err%*|
|*addPartitions.100*|*189.552*|*176.996*|*149.402*|*314.393*|*18.2392*|
|*addPartitions.1000*|*1641.48*|*1624.37*|*1577.92*|*1847.76*|*3.07802*|
|*concurrentPartitionAdd#2.100*|*390.799*|*377.246*|*352.988*|*544.446*|*8.13874*|
|*concurrentPartitionAdd#2.1000*|*3441.06*|*3419.13*|*.46*|*3931.22*|*2.50776*|

 

{*}After modification({*}{*}With direct SQL){*}{*}:{*}
||*Operation*||*Mean*||*Med*||*Min*||*Max*||*Err%*||
||*addPartitions.100*||*83.0217*||*72.2195*||*58.8024*||*214.897*||*33.4667*||
||*addPartitions.1000*||*506.649*||*496.345*||*473.402*||*687.063*||*6.23298*||
||*concurrentPartitionAdd#2.100*||*178.152*||*168.228*||*150.619*||*304.203*||*14.7953*||
|*concurrentPartitionAdd#2.1000*|*1144.33*|*1132.06*|*1092.98*|*1456.85*|*4.02413*|

 

 

> Explore moving to directsql for ObjectStore::addPartitions
> --
>
> Key: HIVE-26035
> URL: https://issues.apache.org/jira/browse/HIVE-26035
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Currently {{addPartitions}} uses datanuclues and is super slow for large 
> number of partitions. It will be good to move to direct sql. Lots of repeated 
> SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26035) Explore moving to directsql for ObjectStore::addPartitions

2023-02-02 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683595#comment-17683595
 ] 

Naveen Gangam commented on HIVE-26035:
--

[~VenuReddy] The patch has been merged to master. Could you please post the 
performance improvements from the microbenchmarks that were done to this jira. 
It would be useful for future. Thank you

> Explore moving to directsql for ObjectStore::addPartitions
> --
>
> Key: HIVE-26035
> URL: https://issues.apache.org/jira/browse/HIVE-26035
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Currently {{addPartitions}} uses datanuclues and is super slow for large 
> number of partitions. It will be good to move to direct sql. Lots of repeated 
> SQLs can be avoided as well (e.g SDS, SERDE, TABLE_PARAMS)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)