[
https://issues.apache.org/jira/browse/TEPHRA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Micael Capitão updated TEPHRA-232:
----------------------------------
Affects Version/s: 0.11.0-incubating
0.12.0-incubating
> Transaction metadata sent on each put is too big
> ------------------------------------------------
>
> Key: TEPHRA-232
> URL: https://issues.apache.org/jira/browse/TEPHRA-232
> Project: Tephra
> Issue Type: Bug
> Affects Versions: 0.11.0-incubating, 0.12.0-incubating
> Environment: HBase 1.2.0-cdh5.11
> CentOS 7.3
> 4x machines
> Bandwidth between machines 1Gbps
> Reporter: Micael Capitão
> Assignee: Poorna Chandra
> Priority: Critical
>
> I've been testing Tephra 0.11.0 (and more recently 0.12.0) for a project that
> may need transactions on top of HBase and I find it's performance, for
> instance, for a bulk load, very poor. Let's not discuss why am I doing a bulk
> load with transactions.
> In my use case I am generating batches of ~10000 elements and inserting them
> with the *put(List<Put> puts)* method. There is no concurrent writers or
> readers.
> If I do the put without transactions it takes ~0.5s. If I use the
> *TransactionAwareHTable* it takes ~12s.
> In both cases the network bandwidth is fully utilised.
> I've tracked down the performance killer to be the
> *addToOperation(OperationWithAttributes op, Transaction tx)* on the
> TransactionAwareHTable.
> I've created a TransactionAwareHTableFix with the *addToOperation(txPut, tx)*
> commented, and used it in my code, and each batch started to take ~0.5s.
> Then I checked what was being done inside the *addToOperation* method and
> verified that the issue has something to do with the serialization of the
> Transaction object. The serialized Transaction object has 104171 bytes of
> length. Considering that it happens for each put, basically my batch of
> ~10000 elements has ~970MB of serialized transactions, which explains the 12s
> vs 5s to be processed at the same time that the network is exhausted.
> It seems that the transactions' metadata, despite being sent to HBase, is not
> stored so the final table size, with or without transactions, is the same.
> Is this metadata encoding and send behaviour expected? This is making Tephra
> unusable, at least with only 1Gbps bandwidth.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)