[ 
https://issues.apache.org/jira/browse/TEPHRA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040621#comment-16040621
 ] 

Micael Capitão commented on TEPHRA-232:
---------------------------------------

I've found the issue to be related to the amount of invalid transactions I have 
pilled up during my tests. That is 13015 invalid transactions which explains 
the size of the serialized Transaction object. My question is, shouldn't those 
invalid transactions get cleared?

> Transaction metadata sent on each put is too big
> ------------------------------------------------
>
>                 Key: TEPHRA-232
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-232
>             Project: Tephra
>          Issue Type: Bug
>    Affects Versions: 0.11.0-incubating, 0.12.0-incubating
>         Environment: HBase 1.2.0-cdh5.11
> CentOS 7.3
> 4x machines
> Bandwidth between machines 1Gbps
>            Reporter: Micael Capitão
>            Assignee: Poorna Chandra
>            Priority: Critical
>
> I've been testing Tephra 0.11.0 (and more recently 0.12.0) for a project that 
> may need transactions on top of HBase and I find it's performance, for 
> instance, for a bulk load, very poor. Let's not discuss why am I doing a bulk 
> load with transactions.
> In my use case I am generating batches of ~10000 elements and inserting them 
> with the *put(List<Put> puts)* method. There is no concurrent writers or 
> readers.
> If I do the put without transactions it takes ~0.5s. If I use the 
> *TransactionAwareHTable* it takes ~12s.
> In both cases the network bandwidth is fully utilised.
> I've tracked down the performance killer to be the 
> *addToOperation(OperationWithAttributes op, Transaction tx)* on the 
> TransactionAwareHTable.
> I've created a TransactionAwareHTableFix with the *addToOperation(txPut, tx)* 
> commented, and used it in my code, and each batch started to take ~0.5s.
> Then I checked what was being done inside the *addToOperation* method and 
> verified that the issue has something to do with the serialization of the 
> Transaction object. The serialized Transaction object has 104171 bytes of 
> length. Considering that it happens for each put, basically my batch of 
> ~10000 elements has ~970MB of serialized transactions, which explains the 12s 
> vs 5s to be processed at the same time that the network is exhausted.
> It seems that the transactions' metadata, despite being sent to HBase, is not 
> stored so the final table size, with or without transactions, is the same.
> Is this metadata encoding and send behaviour expected? This is making Tephra 
> unusable, at least with only 1Gbps bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to