[ https://issues.apache.org/jira/browse/TEPHRA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159316#comment-16159316 ]
Andreas Neumann commented on TEPHRA-232: ---------------------------------------- Two issues here 1. The transaction is encoded and sent over and over again (addressed in TEPHRA-247 and 248) 2. Only the write pointer is needed for puts, which is much smaller (addressed in TEPHRA-234) We'll keep this open to track it, but will push fixes individually for the other Jiras. > Transaction metadata sent on each put is too big > ------------------------------------------------ > > Key: TEPHRA-232 > URL: https://issues.apache.org/jira/browse/TEPHRA-232 > Project: Tephra > Issue Type: Bug > Affects Versions: 0.11.0-incubating, 0.12.0-incubating > Environment: HBase 1.2.0-cdh5.11 > CentOS 7.3 > 4x machines > Bandwidth between machines 1Gbps > Reporter: Micael Capitão > Assignee: Poorna Chandra > Priority: Minor > > I've been testing Tephra 0.11.0 (and more recently 0.12.0) for a project that > may need transactions on top of HBase and I find it's performance, for > instance, for a bulk load, very poor. Let's not discuss why am I doing a bulk > load with transactions. > In my use case I am generating batches of ~10000 elements and inserting them > with the *put(List<Put> puts)* method. There is no concurrent writers or > readers. > If I do the put without transactions it takes ~0.5s. If I use the > *TransactionAwareHTable* it takes ~12s. > In both cases the network bandwidth is fully utilised. > I've tracked down the performance killer to be the > *addToOperation(OperationWithAttributes op, Transaction tx)* on the > TransactionAwareHTable. > I've created a TransactionAwareHTableFix with the *addToOperation(txPut, tx)* > commented, and used it in my code, and each batch started to take ~0.5s. > Then I checked what was being done inside the *addToOperation* method and > verified that the issue has something to do with the serialization of the > Transaction object. The serialized Transaction object has 104171 bytes of > length. Considering that it happens for each put, basically my batch of > ~10000 elements has ~970MB of serialized transactions, which explains the 12s > vs 5s to be processed at the same time that the network is exhausted. > It seems that the transactions' metadata, despite being sent to HBase, is not > stored so the final table size, with or without transactions, is the same. > Is this metadata encoding and send behaviour expected? This is making Tephra > unusable, at least with only 1Gbps bandwidth. -- This message was sent by Atlassian JIRA (v6.4.14#64029)