[ 
https://issues.apache.org/jira/browse/TEPHRA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micael Capitão updated TEPHRA-232:
----------------------------------
    Affects Version/s: 0.11.0-incubating
                       0.12.0-incubating

> Transaction metadata sent on each put is too big
> ------------------------------------------------
>
>                 Key: TEPHRA-232
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-232
>             Project: Tephra
>          Issue Type: Bug
>    Affects Versions: 0.11.0-incubating, 0.12.0-incubating
>         Environment: HBase 1.2.0-cdh5.11
> CentOS 7.3
> 4x machines
> Bandwidth between machines 1Gbps
>            Reporter: Micael Capitão
>            Assignee: Poorna Chandra
>            Priority: Critical
>
> I've been testing Tephra 0.11.0 (and more recently 0.12.0) for a project that 
> may need transactions on top of HBase and I find it's performance, for 
> instance, for a bulk load, very poor. Let's not discuss why am I doing a bulk 
> load with transactions.
> In my use case I am generating batches of ~10000 elements and inserting them 
> with the *put(List<Put> puts)* method. There is no concurrent writers or 
> readers.
> If I do the put without transactions it takes ~0.5s. If I use the 
> *TransactionAwareHTable* it takes ~12s.
> In both cases the network bandwidth is fully utilised.
> I've tracked down the performance killer to be the 
> *addToOperation(OperationWithAttributes op, Transaction tx)* on the 
> TransactionAwareHTable.
> I've created a TransactionAwareHTableFix with the *addToOperation(txPut, tx)* 
> commented, and used it in my code, and each batch started to take ~0.5s.
> Then I checked what was being done inside the *addToOperation* method and 
> verified that the issue has something to do with the serialization of the 
> Transaction object. The serialized Transaction object has 104171 bytes of 
> length. Considering that it happens for each put, basically my batch of 
> ~10000 elements has ~970MB of serialized transactions, which explains the 12s 
> vs 5s to be processed at the same time that the network is exhausted.
> It seems that the transactions' metadata, despite being sent to HBase, is not 
> stored so the final table size, with or without transactions, is the same.
> Is this metadata encoding and send behaviour expected? This is making Tephra 
> unusable, at least with only 1Gbps bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to