Hi,

(I have inadvertently deleted the previous reply email so this email is a response from my previous email)

Probably I have lots of invalidated transactions because of the first tests I was performing that were taking more than 30s per transaction. It is possible that the invalidated transactions have pilled up.

Bellow is the stats on the Transaction object. And yes, I have lots of invalid transactions and that explains the absurd size I am getting on the serialized representation. Where does Tephra store that? Zookeeper?

2017-06-07 09:50:08 INFO TransactionAwareHTableFix:109 - startTx Encoded transaction size: 104203 bytes
2017-06-07 09:50:08 INFO  TransactionAwareHTableFix:110 - inprogress Tx: 0
2017-06-07 09:50:08 INFO  TransactionAwareHTableFix:111 - invalid Tx: 13015
2017-06-07 09:50:08 INFO TransactionAwareHTableFix:112 - checkpoint write pointers: 0

Another question: does the Transaction object may change outside the startTx and updateTx calls? I was wondering if it is really needed to serialize it on each single operation.


Regards.

On 31/05/17 09:49, Micael Capitão wrote:
Hi all,

I've been testing Tephra 0.11.0 for a project that may need transactions on top of HBase and I find it's performance, for instance, for a bulk load, very poor. Let's not discuss why am I doing a bulk load with transactions.

In my use case I am generating batches of ~10000 elements and inserting them with the *put(List<Put> puts)* method. There is no concurrent writers or readers. If I do the put without transactions it takes ~0.5s. If I use the *TransactionAwareHTable* it takes ~12s. I've tracked down the performance killer to be the *addToOperation(OperationWithAttributes op, Transaction tx)*, more specifically the *txCodec.encode(tx)*.

I've created a TransactionAwareHTableFix with the *addToOperation(txPut, tx)* commented, and used it in my code, and each batch started to take ~0.5s.

I've noticed that inside the *TransactionCodec* you were instantiating a new TSerializer and TDeserializer on each call to encode/decode. I tried instantiating the ser/deser on the constructor but even that way each of my batches would take the same ~12s.

Further investigation has shown me that the Transaction instance, after being encoded by the TransactionCodec, has 104171 bytes of length. So in my 10000 elements batch, ~970MB is metadata. Is that supposed to happen?


Regards,

Micael Capitão

--

Micael Capitão
*BIG DATA ENGINEER*

*E-mail: *[email protected]
*Mobile: *(+351) 91 260 94 27 | *Skype*: micaelcapitao

x       

Xpand IT | Delivering Innovation and Technology
Phone: (+351) 21 896 71 50
Fax:(+351) 21 896 71 51
Site:www.xpand-it.com <http://www.xpand-it.com>

Facebook <http://www.xpand-it.com/facebook> Linkedin <http://www.xpand-it.com/linkedin> Twitter <http://www.xpand-it.com/twitter> Youtube <http://www.xpand-it.com/youtube>

Reply via email to