Hi,
(I have inadvertently deleted the previous reply email so this email is
a response from my previous email)
Probably I have lots of invalidated transactions because of the first
tests I was performing that were taking more than 30s per transaction.
It is possible that the invalidated transactions have pilled up.
Bellow is the stats on the Transaction object. And yes, I have lots of
invalid transactions and that explains the absurd size I am getting on
the serialized representation. Where does Tephra store that? Zookeeper?
2017-06-07 09:50:08 INFO TransactionAwareHTableFix:109 - startTx
Encoded transaction size: 104203 bytes
2017-06-07 09:50:08 INFO TransactionAwareHTableFix:110 - inprogress Tx: 0
2017-06-07 09:50:08 INFO TransactionAwareHTableFix:111 - invalid Tx: 13015
2017-06-07 09:50:08 INFO TransactionAwareHTableFix:112 - checkpoint
write pointers: 0
Another question: does the Transaction object may change outside the
startTx and updateTx calls? I was wondering if it is really needed to
serialize it on each single operation.
Regards.
On 31/05/17 09:49, Micael Capitão wrote:
Hi all,
I've been testing Tephra 0.11.0 for a project that may need
transactions on top of HBase and I find it's performance, for
instance, for a bulk load, very poor. Let's not discuss why am I doing
a bulk load with transactions.
In my use case I am generating batches of ~10000 elements and
inserting them with the *put(List<Put> puts)* method. There is no
concurrent writers or readers.
If I do the put without transactions it takes ~0.5s. If I use the
*TransactionAwareHTable* it takes ~12s.
I've tracked down the performance killer to be the
*addToOperation(OperationWithAttributes op, Transaction tx)*, more
specifically the *txCodec.encode(tx)*.
I've created a TransactionAwareHTableFix with the
*addToOperation(txPut, tx)* commented, and used it in my code, and
each batch started to take ~0.5s.
I've noticed that inside the *TransactionCodec* you were instantiating
a new TSerializer and TDeserializer on each call to encode/decode. I
tried instantiating the ser/deser on the constructor but even that way
each of my batches would take the same ~12s.
Further investigation has shown me that the Transaction instance,
after being encoded by the TransactionCodec, has 104171 bytes of
length. So in my 10000 elements batch, ~970MB is metadata. Is that
supposed to happen?
Regards,
Micael Capitão
--
Micael Capitão
*BIG DATA ENGINEER*
*E-mail: *[email protected]
*Mobile: *(+351) 91 260 94 27 | *Skype*: micaelcapitao
x
Xpand IT | Delivering Innovation and Technology
Phone: (+351) 21 896 71 50
Fax:(+351) 21 896 71 51
Site:www.xpand-it.com <http://www.xpand-it.com>
Facebook <http://www.xpand-it.com/facebook> Linkedin
<http://www.xpand-it.com/linkedin> Twitter
<http://www.xpand-it.com/twitter> Youtube <http://www.xpand-it.com/youtube>