[ 
https://issues.apache.org/jira/browse/TEPHRA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159316#comment-16159316
 ] 

Andreas Neumann commented on TEPHRA-232:
----------------------------------------

Two issues here
1. The transaction is encoded and sent over and over again (addressed in 
TEPHRA-247 and 248)
2. Only the write pointer is needed for puts, which is much smaller (addressed 
in TEPHRA-234)

We'll keep this open to track it, but will push fixes individually for the 
other Jiras. 

> Transaction metadata sent on each put is too big
> ------------------------------------------------
>
>                 Key: TEPHRA-232
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-232
>             Project: Tephra
>          Issue Type: Bug
>    Affects Versions: 0.11.0-incubating, 0.12.0-incubating
>         Environment: HBase 1.2.0-cdh5.11
> CentOS 7.3
> 4x machines
> Bandwidth between machines 1Gbps
>            Reporter: Micael Capitão
>            Assignee: Poorna Chandra
>            Priority: Minor
>
> I've been testing Tephra 0.11.0 (and more recently 0.12.0) for a project that 
> may need transactions on top of HBase and I find it's performance, for 
> instance, for a bulk load, very poor. Let's not discuss why am I doing a bulk 
> load with transactions.
> In my use case I am generating batches of ~10000 elements and inserting them 
> with the *put(List<Put> puts)* method. There is no concurrent writers or 
> readers.
> If I do the put without transactions it takes ~0.5s. If I use the 
> *TransactionAwareHTable* it takes ~12s.
> In both cases the network bandwidth is fully utilised.
> I've tracked down the performance killer to be the 
> *addToOperation(OperationWithAttributes op, Transaction tx)* on the 
> TransactionAwareHTable.
> I've created a TransactionAwareHTableFix with the *addToOperation(txPut, tx)* 
> commented, and used it in my code, and each batch started to take ~0.5s.
> Then I checked what was being done inside the *addToOperation* method and 
> verified that the issue has something to do with the serialization of the 
> Transaction object. The serialized Transaction object has 104171 bytes of 
> length. Considering that it happens for each put, basically my batch of 
> ~10000 elements has ~970MB of serialized transactions, which explains the 12s 
> vs 5s to be processed at the same time that the network is exhausted.
> It seems that the transactions' metadata, despite being sent to HBase, is not 
> stored so the final table size, with or without transactions, is the same.
> Is this metadata encoding and send behaviour expected? This is making Tephra 
> unusable, at least with only 1Gbps bandwidth.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to