[ 
https://issues.apache.org/jira/browse/TEPHRA-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326493#comment-16326493
 ] 

Andreas Neumann commented on TEPHRA-247:
----------------------------------------

I do see that it could be possible to work around the region split using a post 
split hook - but I still don't feel comfortable with the approach. The issue we 
are trying to solve is that when the invalid list gets large - and so does the 
transaction object - then we encode, transmit and decode this large object with 
every get() performed by in this transaction.

A very important case is a small transaction - say a transaction that performs 
a single get or scan, followed by a put, and then commits. Today, this requires 
sending the transaction only once: for the read operation, and it only gets 
sent to one region, or only the regions involved in the scan. The proposed 
design requires that we send the transaction to every region when the 
transaction starts. That appears to add overhead rather than reducing overhead. 

I feel that if we want to reduce overhead, we have multiple angles to look at 
this:
 * reduce the cost of encoding, transmitting and decoding the tx. This could 
involve:
 ** using a more efficient (faster) or more compact (smaller) codec
 ** caching the encoded transaction on the client side after it was encoded for 
the first time
 ** caching the decoded the transaction in region servers after it has been 
decoded for the first time
 * avoid decoding the tx all together, by using a codec that does not require 
decoding. That is, instead of binary search in an array of tx ids, some 
encoding that allows searching directly on the binary representation. 
 * avoid transmitting the invalid list, A possibility is to rely on the 
existing TransactionStateCache, which has knowledge about the invalid 
transactions in the last snapshot. That could allow us to only transmit the 
invalid transactions added since the last snapshot. 

By the way, there is similar overhead in the communication between Transaction 
Manager and the client when the transaction is created. That could be another 
area of improvement.\

Thoughts?

 

> Avoid encoding the transaction multiple times
> ---------------------------------------------
>
>                 Key: TEPHRA-247
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-247
>             Project: Tephra
>          Issue Type: Improvement
>          Components: core, manager
>    Affects Versions: 0.12.0-incubating
>            Reporter: Andreas Neumann
>            Assignee: Andreas Neumann
>            Priority: Major
>         Attachments: design.jpg
>
>
> Currently, the same transaction object is encoded again and again for every 
> Get performed in HBase. It would be better to cache the encoded transaction 
> for the duration of the transaction and reuse it, 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to