subject:"Review Request 72481\: HIVE\-23234\: Optimize TxnHandler\:\:allocateTableWriteIds"

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-14 Thread Marton Bod



> On May 14, 2020, 3:07 p.m., Denys Kuzmenko wrote:
> > LGTM, some minor comments

Thanks Denys, I've address your comments


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/#review220758
---


On May 14, 2020, 3:38 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72481/
> ---
> 
> (Updated May 14, 2020, 3:38 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Removed global mutex on writeId allocation, which means write ids can now be 
> allocated concurrently for different tables without blocking each other, 
> speeding up execution (perf test results below). Concurrent 
> allocateTableWriteIds() operations targeting the same table are still mutexed 
> by an S4U if the table is already present in next_write_id, otherwise a race 
> condition to insert the table into next_write_id is solved by retrying after 
> catching the duplicate key exception (the thread which commits later will be 
> the one to retry).
> 
> The situation is similar when allocateTableWriteIds() and 
> replTableWriteIdState() are running concurrently - if they target different 
> tables, they won't block each other anymore. If they target the same table, 
> and the table is already inserted into next_write_id, replTableWriteIdState() 
> returns early and allocateTableWriteIds() updates the next id. If the table 
> is not yet in next_write_id, they might attempt to insert the same row 
> concurrently, in which case who commits later will get a duplicate key 
> exception and retry the operation, just as above.
> 
> 
> Diffs
> -
> 
>   ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
> 868da0c7a0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  d59f863b11 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  cf41ef8aaf 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  1e177f4a7b 
> 
> 
> Diff: https://reviews.apache.org/r/72481/diff/2/
> 
> 
> Testing
> ---
> 
> Unit test in TestTxnHandler
> + Perf tests:
> dbTypesameTable variant  ms/op  error
> MYSQL FALSE original 46.93  3.041
> MYSQL FALSE patched  19.283 1.311
> MYSQL TRUE  original 50.185 3.595
> MYSQL TRUE  patched  32.254 2.164
> ORACLEFALSE original 57.609 4.461
> ORACLEFALSE patched  25.721 2.551
> ORACLETRUE  original 59.668 3.172
> ORACLETRUE  patched  39.061 2.548
> POSTGRES  FALSE original 39.364 2.94 
> POSTGRES  FALSE patched  18.518 1.038
> POSTGRES  TRUE  original 39.868 2.679
> POSTGRES  TRUE  patched  28.874 1.768
> SQLSERVER FALSE original 45.252 1.643
> SQLSERVER FALSE patched  24.583 1.529
> SQLSERVER TRUE  original 49.149 3.45 
> SQLSERVER TRUE  patched  32.918 1.654
> (sameTable=true means that all threads were trying to allocate ids for the 
> same db.table,
> false means they all targeted different tables)
> 
> 
> Thanks,
> 
> Marton Bod
> 
>

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-14 Thread Marton Bod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/
---

(Updated May 14, 2020, 3:38 p.m.)


Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

Removed global mutex on writeId allocation, which means write ids can now be 
allocated concurrently for different tables without blocking each other, 
speeding up execution (perf test results below). Concurrent 
allocateTableWriteIds() operations targeting the same table are still mutexed 
by an S4U if the table is already present in next_write_id, otherwise a race 
condition to insert the table into next_write_id is solved by retrying after 
catching the duplicate key exception (the thread which commits later will be 
the one to retry).

The situation is similar when allocateTableWriteIds() and 
replTableWriteIdState() are running concurrently - if they target different 
tables, they won't block each other anymore. If they target the same table, and 
the table is already inserted into next_write_id, replTableWriteIdState() 
returns early and allocateTableWriteIds() updates the next id. If the table is 
not yet in next_write_id, they might attempt to insert the same row 
concurrently, in which case who commits later will get a duplicate key 
exception and retry the operation, just as above.


Diffs (updated)
-

  ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
868da0c7a0 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 d59f863b11 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 cf41ef8aaf 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
 1e177f4a7b 


Diff: https://reviews.apache.org/r/72481/diff/2/

Changes: https://reviews.apache.org/r/72481/diff/1-2/


Testing
---

Unit test in TestTxnHandler
+ Perf tests:
dbTypesameTable variant  ms/op  error
MYSQL FALSE original 46.93  3.041
MYSQL FALSE patched  19.283 1.311
MYSQL TRUE  original 50.185 3.595
MYSQL TRUE  patched  32.254 2.164
ORACLEFALSE original 57.609 4.461
ORACLEFALSE patched  25.721 2.551
ORACLETRUE  original 59.668 3.172
ORACLETRUE  patched  39.061 2.548
POSTGRES  FALSE original 39.364 2.94 
POSTGRES  FALSE patched  18.518 1.038
POSTGRES  TRUE  original 39.868 2.679
POSTGRES  TRUE  patched  28.874 1.768
SQLSERVER FALSE original 45.252 1.643
SQLSERVER FALSE patched  24.583 1.529
SQLSERVER TRUE  original 49.149 3.45 
SQLSERVER TRUE  patched  32.918 1.654
(sameTable=true means that all threads were trying to allocate ids for the same 
db.table,
false means they all targeted different tables)


Thanks,

Marton Bod

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-14 Thread Denys Kuzmenko via Review Board



> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 2067 (original), 2057 (patched)
> > 
> >
> > Why is this change?
> 
> Marton Bod wrote:
> this was causing a checkstyle issue (line lenght too long)

it doesn't look long, maybe you can remove some leading spaces


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/#review220689
---


On May 7, 2020, 3:55 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72481/
> ---
> 
> (Updated May 7, 2020, 3:55 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Removed global mutex on writeId allocation, which means write ids can now be 
> allocated concurrently for different tables without blocking each other, 
> speeding up execution (perf test results below). Concurrent 
> allocateTableWriteIds() operations targeting the same table are still mutexed 
> by an S4U if the table is already present in next_write_id, otherwise a race 
> condition to insert the table into next_write_id is solved by retrying after 
> catching the duplicate key exception (the thread which commits later will be 
> the one to retry).
> 
> The situation is similar when allocateTableWriteIds() and 
> replTableWriteIdState() are running concurrently - if they target different 
> tables, they won't block each other anymore. If they target the same table, 
> and the table is already inserted into next_write_id, replTableWriteIdState() 
> returns early and allocateTableWriteIds() updates the next id. If the table 
> is not yet in next_write_id, they might attempt to insert the same row 
> concurrently, in which case who commits later will get a duplicate key 
> exception and retry the operation, just as above.
> 
> 
> Diffs
> -
> 
>   ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
> 868da0c7a0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  d59f863b11 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  cf41ef8aaf 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  1e177f4a7b 
> 
> 
> Diff: https://reviews.apache.org/r/72481/diff/1/
> 
> 
> Testing
> ---
> 
> Unit test in TestTxnHandler
> + Perf tests:
> dbTypesameTable variant  ms/op  error
> MYSQL FALSE original 46.93  3.041
> MYSQL FALSE patched  19.283 1.311
> MYSQL TRUE  original 50.185 3.595
> MYSQL TRUE  patched  32.254 2.164
> ORACLEFALSE original 57.609 4.461
> ORACLEFALSE patched  25.721 2.551
> ORACLETRUE  original 59.668 3.172
> ORACLETRUE  patched  39.061 2.548
> POSTGRES  FALSE original 39.364 2.94 
> POSTGRES  FALSE patched  18.518 1.038
> POSTGRES  TRUE  original 39.868 2.679
> POSTGRES  TRUE  patched  28.874 1.768
> SQLSERVER FALSE original 45.252 1.643
> SQLSERVER FALSE patched  24.583 1.529
> SQLSERVER TRUE  original 49.149 3.45 
> SQLSERVER TRUE  patched  32.918 1.654
> (sameTable=true means that all threads were trying to allocate ids for the 
> same db.table,
> false means they all targeted different tables)
> 
> 
> Thanks,
> 
> Marton Bod
> 
>

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-14 Thread Denys Kuzmenko via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/#review220758
---



LGTM, some minor comments


standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Line 2114 (original), 2105 (patched)


you can use txnToWriteIds.size() instead of counter



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Line 4120 (original), 4114 (patched)


could we try not to place every method argument on a new line



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
Line 49 (original), 49 (patched)


Could you please. remove checkLock here as well.


- Denys Kuzmenko


On May 7, 2020, 3:55 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72481/
> ---
> 
> (Updated May 7, 2020, 3:55 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Removed global mutex on writeId allocation, which means write ids can now be 
> allocated concurrently for different tables without blocking each other, 
> speeding up execution (perf test results below). Concurrent 
> allocateTableWriteIds() operations targeting the same table are still mutexed 
> by an S4U if the table is already present in next_write_id, otherwise a race 
> condition to insert the table into next_write_id is solved by retrying after 
> catching the duplicate key exception (the thread which commits later will be 
> the one to retry).
> 
> The situation is similar when allocateTableWriteIds() and 
> replTableWriteIdState() are running concurrently - if they target different 
> tables, they won't block each other anymore. If they target the same table, 
> and the table is already inserted into next_write_id, replTableWriteIdState() 
> returns early and allocateTableWriteIds() updates the next id. If the table 
> is not yet in next_write_id, they might attempt to insert the same row 
> concurrently, in which case who commits later will get a duplicate key 
> exception and retry the operation, just as above.
> 
> 
> Diffs
> -
> 
>   ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
> 868da0c7a0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  d59f863b11 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  cf41ef8aaf 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  1e177f4a7b 
> 
> 
> Diff: https://reviews.apache.org/r/72481/diff/1/
> 
> 
> Testing
> ---
> 
> Unit test in TestTxnHandler
> + Perf tests:
> dbTypesameTable variant  ms/op  error
> MYSQL FALSE original 46.93  3.041
> MYSQL FALSE patched  19.283 1.311
> MYSQL TRUE  original 50.185 3.595
> MYSQL TRUE  patched  32.254 2.164
> ORACLEFALSE original 57.609 4.461
> ORACLEFALSE patched  25.721 2.551
> ORACLETRUE  original 59.668 3.172
> ORACLETRUE  patched  39.061 2.548
> POSTGRES  FALSE original 39.364 2.94 
> POSTGRES  FALSE patched  18.518 1.038
> POSTGRES  TRUE  original 39.868 2.679
> POSTGRES  TRUE  patched  28.874 1.768
> SQLSERVER FALSE original 45.252 1.643
> SQLSERVER FALSE patched  24.583 1.529
> SQLSERVER TRUE  original 49.149 3.45 
> SQLSERVER TRUE  patched  32.918 1.654
> (sameTable=true means that all threads were trying to allocate ids for the 
> same db.table,
> false means they all targeted different tables)
> 
> 
> Thanks,
> 
> Marton Bod
> 
>

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-08 Thread Marton Bod



> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > Thanks Marci,
> > Few querstions below - probably I just do not understand this part of the 
> > code enough.
> > 
> > Another question for the perf test: How many threads are you using?
> > 
> > Thanks,
> > Peter

Thanks Peti for the review. See my answers below, they are related to minor 
housekeeping changes that are not core to the optimization story.

The perf tests were run with 8 concurrent threads.


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
> > Line 1102 (original)
> > 
> >
> > Why did we remove this?

checkRetryable never throws MetaException (and hence the code never enters this 
catch block), so I removed it from its throws clause


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
> > Line 1137 (original)
> > 
> >
> > why did we remove this?

same as above


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 1762-1765 (original), 1759-1762 (patched)
> > 
> >
> > Is this a functionality or performance change?

neither really, just some readability refactor


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 2021 (original), 2013 (patched)
> > 
> >
> > Why is this change required?

just seems counterintuitive to use string concat if we're already using a 
stringbuilder anyway


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 2067 (original), 2057 (patched)
> > 
> >
> > Why is this change?

this was causing a checkstyle issue (line lenght too long)


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 2079 (original), 2070 (patched)
> > 
> >
> > Why is this change?

unnecessary call to Long.toString


> On May 8, 2020, 10:23 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Line 2090 (original), 2081 (patched)
> > 
> >
> > Why is this change?

same as above


- Marton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/#review220689
---


On May 7, 2020, 3:55 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72481/
> ---
> 
> (Updated May 7, 2020, 3:55 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Removed global mutex on writeId allocation, which means write ids can now be 
> allocated concurrently for different tables without blocking each other, 
> speeding up execution (perf test results below). Concurrent 
> allocateTableWriteIds() operations targeting the same table are still mutexed 
> by an S4U if the table is already present in next_write_id, otherwise a race 
> condition to insert the table into next_write_id is solved by retrying after 
> catching the duplicate key exception (the thread which commits later will be 
> the one to retry).
> 
> The situation is similar when allocateTableWriteIds() and 
> replTableWriteIdState() are running concurrently - if they target different 
> tables, they won't block each other anymore. If they target the same table, 
> and the table is already inserted into next_write_id, replTableWriteIdState() 
> returns early and allocateTableWriteIds() updates the next id. If the table 
> is not yet in next_write_id, they might attempt to insert the same row 
> concurrently, in which case who commits later will get a duplicate key 
> exception and retry the operation, just as above.
> 
> 
> Diffs
> -
> 
>   ql/src/test/org/apache/hadoop/hive/me

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-08 Thread Peter Vary via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/#review220689
---



Thanks Marci,
Few querstions below - probably I just do not understand this part of the code 
enough.

Another question for the perf test: How many threads are you using?

Thanks,
Peter


standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
Line 1102 (original)


Why did we remove this?



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
Line 1137 (original)


why did we remove this?



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 1762-1765 (original), 1759-1762 (patched)


Is this a functionality or performance change?



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Line 2021 (original), 2013 (patched)


Why is this change required?



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Line 2067 (original), 2057 (patched)


Why is this change?



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Line 2079 (original), 2070 (patched)


Why is this change?



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Line 2090 (original), 2081 (patched)


Why is this change?


- Peter Vary


On máj. 7, 2020, 3:55 du, Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72481/
> ---
> 
> (Updated máj. 7, 2020, 3:55 du)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Removed global mutex on writeId allocation, which means write ids can now be 
> allocated concurrently for different tables without blocking each other, 
> speeding up execution (perf test results below). Concurrent 
> allocateTableWriteIds() operations targeting the same table are still mutexed 
> by an S4U if the table is already present in next_write_id, otherwise a race 
> condition to insert the table into next_write_id is solved by retrying after 
> catching the duplicate key exception (the thread which commits later will be 
> the one to retry).
> 
> The situation is similar when allocateTableWriteIds() and 
> replTableWriteIdState() are running concurrently - if they target different 
> tables, they won't block each other anymore. If they target the same table, 
> and the table is already inserted into next_write_id, replTableWriteIdState() 
> returns early and allocateTableWriteIds() updates the next id. If the table 
> is not yet in next_write_id, they might attempt to insert the same row 
> concurrently, in which case who commits later will get a duplicate key 
> exception and retry the operation, just as above.
> 
> 
> Diffs
> -
> 
>   ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
> 868da0c7a0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  d59f863b11 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  cf41ef8aaf 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  1e177f4a7b 
> 
> 
> Diff: https://reviews.apache.org/r/72481/diff/1/
> 
> 
> Testing
> ---
> 
> Unit test in TestTxnHandler
> + Perf tests:
> dbTypesameTable variant  ms/op  error
> MYSQL FALSE original 46.93  3.041
> MYSQL FALSE patched  19.283 1.311
> MYSQL TRUE  original 50.185 3.595
> MYSQL TRUE  patched  32.254 2.164
> ORACLEFALSE original 57.609 4.461
> ORACLEFALSE patched  25.721 2.551
> ORACLETRUE  original 59.668 3.172
> ORACLETRUE  patched  39.061 2.548
> POSTGRES  FALSE original 39.364 2.94 
> POSTGRES  FALSE patched  18.518 1.038
> POSTGRES  TRUE  original 39.868 2.679
> POSTGRES  TRUE  patched  28.874 1.768
> SQLSERVER FALSE original 45.252 1.643
> SQLSERVER FALSE patched  24.583 1.529
> SQLSERVER TRUE  original 49.149 3.45 
> SQLSERVER TRUE  patched  32.918 1.654
> (sameT

Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

2020-05-07 Thread Marton Bod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72481/
---

Review request for hive, Denys Kuzmenko and Peter Vary.


Repository: hive-git


Description
---

Removed global mutex on writeId allocation, which means write ids can now be 
allocated concurrently for different tables without blocking each other, 
speeding up execution (perf test results below). Concurrent 
allocateTableWriteIds() operations targeting the same table are still mutexed 
by an S4U if the table is already present in next_write_id, otherwise a race 
condition to insert the table into next_write_id is solved by retrying after 
catching the duplicate key exception (the thread which commits later will be 
the one to retry).

The situation is similar when allocateTableWriteIds() and 
replTableWriteIdState() are running concurrently - if they target different 
tables, they won't block each other anymore. If they target the same table, and 
the table is already inserted into next_write_id, replTableWriteIdState() 
returns early and allocateTableWriteIds() updates the next id. If the table is 
not yet in next_write_id, they might attempt to insert the same row 
concurrently, in which case who commits later will get a duplicate key 
exception and retry the operation, just as above.


Diffs
-

  ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
868da0c7a0 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 d59f863b11 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 cf41ef8aaf 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
 1e177f4a7b 


Diff: https://reviews.apache.org/r/72481/diff/1/


Testing
---

Unit test in TestTxnHandler
+ Perf tests:
dbTypesameTable variant  ms/op  error
MYSQL FALSE original 46.93  3.041
MYSQL FALSE patched  19.283 1.311
MYSQL TRUE  original 50.185 3.595
MYSQL TRUE  patched  32.254 2.164
ORACLEFALSE original 57.609 4.461
ORACLEFALSE patched  25.721 2.551
ORACLETRUE  original 59.668 3.172
ORACLETRUE  patched  39.061 2.548
POSTGRES  FALSE original 39.364 2.94 
POSTGRES  FALSE patched  18.518 1.038
POSTGRES  TRUE  original 39.868 2.679
POSTGRES  TRUE  patched  28.874 1.768
SQLSERVER FALSE original 45.252 1.643
SQLSERVER FALSE patched  24.583 1.529
SQLSERVER TRUE  original 49.149 3.45 
SQLSERVER TRUE  patched  32.918 1.654
(sameTable=true means that all threads were trying to allocate ids for the same 
db.table,
false means they all targeted different tables)


Thanks,

Marton Bod

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

Re: Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

Review Request 72481: HIVE-23234: Optimize TxnHandler::allocateTableWriteIds

7 matches

Site Navigation

Mail list logo

Footer information