Andreas Neumann created TEPHRA-257:
--------------------------------------

             Summary: If start() encounters an RPC timeout, an invalid 
transaction is left behind
                 Key: TEPHRA-257
                 URL: https://issues.apache.org/jira/browse/TEPHRA-257
             Project: Tephra
          Issue Type: Bug
          Components: core
    Affects Versions: 0.13.0-incubating
            Reporter: Andreas Neumann
            Assignee: Poorna Chandra


Suppose the following scenario: 
- a thrift client starts a transaction
- the server responds, but for whatever reason it is slow 
- by the time the response is sent, the client has timed out the connection
- now the server has started a transaction, but the client has no knowledge of 
it
- that transaction will never be committed or aborted and eventually times out
- it becomes an invalid transaction

This is a common scenario when HDFS is slow and the write load is high. This 
means, a lot of change ids have to be written to a slow transaction log. Now we 
will generate invalid transactions systematically, which eventually degrades 
the performance of the entire system.

It would be good if the server could detect this situation and abort the 
transaction immediately. This is safe to do whenever sending of the response 
fails, because we know that the client did not receive it, and hence it will 
not generate data with that transaction id. 

This is a tricky change, though: Thrift does not give us a way to intercept 
exceptions from socket failures. We would have to copy a Thrift class 
(ProcessFunction) and change it to handle exceptions that occur during the 
write of the response. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to