[ 
https://issues.apache.org/jira/browse/TEPHRA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040621#comment-16040621
 ] 

Micael Capitão edited comment on TEPHRA-232 at 6/7/17 10:07 AM:
----------------------------------------------------------------

I've found the issue to be related to the amount of invalid transactions I have 
pilled up during my tests. That is 13015 invalid transactions which explains 
the size of the serialized Transaction object. My question is, shouldn't those 
invalid transactions get cleared?

Another question: does the Transaction object may change outside the startTx 
and updateTx calls? I was wondering if it is really needed to serialize it on 
each single operation or just on the startTx and then on each updateTx call.


was (Author: capitao):
I've found the issue to be related to the amount of invalid transactions I have 
pilled up during my tests. That is 13015 invalid transactions which explains 
the size of the serialized Transaction object. My question is, shouldn't those 
invalid transactions get cleared?

> Transaction metadata sent on each put is too big
> ------------------------------------------------
>
>                 Key: TEPHRA-232
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-232
>             Project: Tephra
>          Issue Type: Bug
>    Affects Versions: 0.11.0-incubating, 0.12.0-incubating
>         Environment: HBase 1.2.0-cdh5.11
> CentOS 7.3
> 4x machines
> Bandwidth between machines 1Gbps
>            Reporter: Micael Capitão
>            Assignee: Poorna Chandra
>            Priority: Minor
>
> I've been testing Tephra 0.11.0 (and more recently 0.12.0) for a project that 
> may need transactions on top of HBase and I find it's performance, for 
> instance, for a bulk load, very poor. Let's not discuss why am I doing a bulk 
> load with transactions.
> In my use case I am generating batches of ~10000 elements and inserting them 
> with the *put(List<Put> puts)* method. There is no concurrent writers or 
> readers.
> If I do the put without transactions it takes ~0.5s. If I use the 
> *TransactionAwareHTable* it takes ~12s.
> In both cases the network bandwidth is fully utilised.
> I've tracked down the performance killer to be the 
> *addToOperation(OperationWithAttributes op, Transaction tx)* on the 
> TransactionAwareHTable.
> I've created a TransactionAwareHTableFix with the *addToOperation(txPut, tx)* 
> commented, and used it in my code, and each batch started to take ~0.5s.
> Then I checked what was being done inside the *addToOperation* method and 
> verified that the issue has something to do with the serialization of the 
> Transaction object. The serialized Transaction object has 104171 bytes of 
> length. Considering that it happens for each put, basically my batch of 
> ~10000 elements has ~970MB of serialized transactions, which explains the 12s 
> vs 5s to be processed at the same time that the network is exhausted.
> It seems that the transactions' metadata, despite being sent to HBase, is not 
> stored so the final table size, with or without transactions, is the same.
> Is this metadata encoding and send behaviour expected? This is making Tephra 
> unusable, at least with only 1Gbps bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to