[ 
https://issues.apache.org/jira/browse/HIVE-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954328#comment-16954328
 ] 

Ashutosh Bapat edited comment on HIVE-21114 at 10/18/19 7:04 AM:
-----------------------------------------------------------------

 

[~anishek] brought me to this one.

I quickly went through the patch, but haven't looked in detail. It looks like 
we will allocate a transaction id for a "read-only"  transaction as well but 
annotate it as "read-only". Whether a transaction (really a statement) is 
read-only is determined by parser/semantic analyzer. According to the SQL 
standard queries can have side-effects meaning an apparent SELECT query might 
change the db (through a subquery with UPDATE or a procedure/function invoked 
by a function running DML etc.). So relying on the parser to decide whether 
it's a read-only query may not be  the right approach.  Since the 
parser/semantic analyzer can deem a "read-only" statement as "write" to be on 
the safer side OR worst the other way round, which would be buggy. Also letting 
a user specify whether a transaction is read-only won't generally help in 
multi-statement scenario since the user may want to be on safer side to 
accommodate writes in a transaction. A strategy that regards a transaction to 
be read-only until a write will work here. The idea is to separate snapshot 
from transaction id. Use the first one for reading data and fetch the later one 
to track writes when the first write happens. So, if a query never allocated a 
write id, it never require to get a transaction id and thus never wrote 
anything. If we go that route, we will not create any transaction ids for 
read-only transactions and won't have corresponding events. Furthermore that 
might help us take out write-ids from the equation altogether.

During my investigation, I had also found another problem. Since the snapshot 
and transaction id are tied together right now, a transaction would never 
update its snapshot from one statement to the other in a multi-statement 
transaction. This means that a true "read committed' multi-statement 
transaction won't be possible. To fix that also requires us to separate 
snapshot and transaction id.

Transactions at serializable isolation are different through. They are 
considered to be "write" even if they are "read-only", since they block writes 
to the dataset already read. That might be another issue to look at.

Please let me know if I am missing something.


was (Author: ashutosh.bapat):
 

[~anishek] brought me to this one.

I quickly went through the patch, but haven't looked in detail. It looks like 
we will allocate a transaction id for a "read-only"  transaction as well but 
annotate it as "read-only". Whether a transaction (really a statement) is 
read-only is determined by parser/semantic analyzer. According to the SQL 
standard queries can have side-effects meaning an apparent SELECT query might 
change the db (through a subquery with UPDATE or a procedure/function invoked 
by a function running DML etc.). So relying on the parser to decide whether 
it's a read-only query may not be  the right approach.  Since the 
parser/semantic analyzer can deem a "read-only" statement as "write" to be on 
the safer side OR worst the other way round, which would be buggy. Also letting 
a user specify whether a transaction is read-only won't generally help in 
multi-statement scenario since the user may want to be on safer side to 
accommodate writes in a transaction. A strategy that regards a transaction to 
be read-only until a write will work here. The idea is to separate snapshot 
from transaction id. Use the first one for reading data and fetch the later one 
to track writes when the first write happens. So, if a query never allocated a 
write id, it never require to get a transaction id and thus never wrote 
anything. If we go that route, we will not create any transaction ids for 
read-only transactions and won't have corresponding events. Furthermore that 
might help us take out write-ids from the equation altogether.

During my investigation, I had also found another problem. Since the snapshot 
and transaction id are tied together right now, a transaction would never 
update its snapshot from one statement to the other in a multi-statement 
transaction. This means that a true "read committed' multi-statement 
transaction won't be possible. To fix that also requires us to separate 
snapshot and transaction id.

Please let me know if I am missing something.

> Create read-only transactions
> -----------------------------
>
>                 Key: HIVE-21114
>                 URL: https://issues.apache.org/jira/browse/HIVE-21114
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 4.0.0
>            Reporter: Eugene Koifman
>            Assignee: Denys Kuzmenko
>            Priority: Major
>         Attachments: HIVE-21114.1.patch, HIVE-21114.2.patch, 
> HIVE-21114.3.patch, HIVE-21114.4.patch, HIVE-21114.5.patch, 
> HIVE-21114.6.patch, HIVE-21114.7.patch
>
>
> With HIVE-21036 we have a way to indicate that a txn is read only.
> We should (at least in auto-commit mode) determine if the single stmt is a 
> read and mark the txn accordingly.  
> Then we can optimize {{TxnHandler.commitTxn()}} so that it doesn't do any 
> checks in write_set etc.
> {{TxnHandler.commitTxn()}} already starts with {{lockTransactionRecord(stmt, 
> txnid, TXN_OPEN)}} so it can read the txn type in the same SQL stmt.
> HiveOperation only has QUERY, which includes Insert and Select, so this 
> requires figuring out how to determine if a query is a SELECT.  By the time 
> {{Driver.openTransaction();}} is called, we have already parsed the query so 
> there should be a way to know if the statement only reads.
> For multi-stmt txns (once these are supported) we should allow user to 
> indicate that a txn is read-only and then not allow any statements that can 
> make modifications in this txn.  This should be a different jira.
> cc [~ikryvenko]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to