[ https://issues.apache.org/jira/browse/TEPHRA-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326493#comment-16326493 ]
Andreas Neumann commented on TEPHRA-247: ---------------------------------------- I do see that it could be possible to work around the region split using a post split hook - but I still don't feel comfortable with the approach. The issue we are trying to solve is that when the invalid list gets large - and so does the transaction object - then we encode, transmit and decode this large object with every get() performed by in this transaction. A very important case is a small transaction - say a transaction that performs a single get or scan, followed by a put, and then commits. Today, this requires sending the transaction only once: for the read operation, and it only gets sent to one region, or only the regions involved in the scan. The proposed design requires that we send the transaction to every region when the transaction starts. That appears to add overhead rather than reducing overhead. I feel that if we want to reduce overhead, we have multiple angles to look at this: * reduce the cost of encoding, transmitting and decoding the tx. This could involve: ** using a more efficient (faster) or more compact (smaller) codec ** caching the encoded transaction on the client side after it was encoded for the first time ** caching the decoded the transaction in region servers after it has been decoded for the first time * avoid decoding the tx all together, by using a codec that does not require decoding. That is, instead of binary search in an array of tx ids, some encoding that allows searching directly on the binary representation. * avoid transmitting the invalid list, A possibility is to rely on the existing TransactionStateCache, which has knowledge about the invalid transactions in the last snapshot. That could allow us to only transmit the invalid transactions added since the last snapshot. By the way, there is similar overhead in the communication between Transaction Manager and the client when the transaction is created. That could be another area of improvement.\ Thoughts? > Avoid encoding the transaction multiple times > --------------------------------------------- > > Key: TEPHRA-247 > URL: https://issues.apache.org/jira/browse/TEPHRA-247 > Project: Tephra > Issue Type: Improvement > Components: core, manager > Affects Versions: 0.12.0-incubating > Reporter: Andreas Neumann > Assignee: Andreas Neumann > Priority: Major > Attachments: design.jpg > > > Currently, the same transaction object is encoded again and again for every > Get performed in HBase. It would be better to cache the encoded transaction > for the duration of the transaction and reuse it, -- This message was sent by Atlassian JIRA (v7.6.3#76005)