Hi Micael,

the transaction state is kept in memory by the transaction manager, and its
edits are written to a write-ahead log to be able to reconstruct the state
after a failure.

You are right that the transaction object does not need to be serialized
for each put: I opened two improvement Jiras (TEPHRA-233 ands -234) to
address this.

Were you able to clean up the transaction state and rerun your benchmark?

Cheers -Andreas.

On Thu, Jun 8, 2017 at 2:02 AM, Micael Capitão <[email protected]>
wrote:

> Hi,
>
> (I have inadvertently deleted the previous reply email so this email is a
> response from my previous email)
>
> Probably I have lots of invalidated transactions because of the first
> tests I was performing that were taking more than 30s per transaction. It
> is possible that the invalidated transactions have pilled up.
>
> Bellow is the stats on the Transaction object. And yes, I have lots of
> invalid transactions and that explains the absurd size I am getting on the
> serialized representation. Where does Tephra store that? Zookeeper?
>
> 2017-06-07 09:50:08 INFO  TransactionAwareHTableFix:109 - startTx Encoded
> transaction size: 104203 bytes
> 2017-06-07 09:50:08 INFO  TransactionAwareHTableFix:110 - inprogress Tx: 0
> 2017-06-07 09:50:08 INFO  TransactionAwareHTableFix:111 - invalid Tx: 13015
> 2017-06-07 09:50:08 INFO  TransactionAwareHTableFix:112 - checkpoint write
> pointers: 0
>
> Another question: does the Transaction object may change outside the
> startTx and updateTx calls? I was wondering if it is really needed to
> serialize it on each single operation.
>
>
> Regards.
>
>
> On 31/05/17 09:49, Micael Capitão wrote:
>
>> Hi all,
>>
>> I've been testing Tephra 0.11.0 for a project that may need transactions
>> on top of HBase and I find it's performance, for instance, for a bulk load,
>> very poor. Let's not discuss why am I doing a bulk load with transactions.
>>
>> In my use case I am generating batches of ~10000 elements and inserting
>> them with the *put(List<Put> puts)* method. There is no concurrent writers
>> or readers.
>> If I do the put without transactions it takes ~0.5s. If I use the
>> *TransactionAwareHTable* it takes ~12s.
>> I've tracked down the performance killer to be the
>> *addToOperation(OperationWithAttributes op, Transaction tx)*, more
>> specifically the *txCodec.encode(tx)*.
>>
>> I've created a TransactionAwareHTableFix with the *addToOperation(txPut,
>> tx)* commented, and used it in my code, and each batch started to take
>> ~0.5s.
>>
>> I've noticed that inside the *TransactionCodec* you were instantiating a
>> new TSerializer and TDeserializer on each call to encode/decode. I tried
>> instantiating the ser/deser on the constructor but even that way each of my
>> batches would take the same ~12s.
>>
>> Further investigation has shown me that the Transaction instance, after
>> being encoded by the TransactionCodec, has 104171 bytes of length. So in my
>> 10000 elements batch, ~970MB is metadata. Is that supposed to happen?
>>
>>
>> Regards,
>>
>> Micael Capitão
>>
>
> --
>
> Micael Capitão
> *BIG DATA ENGINEER*
>
> *E-mail: *[email protected]
> *Mobile: *(+351) 91 260 94 27 | *Skype*: micaelcapitao
>
> x
>
> Xpand IT | Delivering Innovation and Technology
> Phone: (+351) 21 896 71 50
> Fax:(+351) 21 896 71 51
> Site:www.xpand-it.com <http://www.xpand-it.com>
>
> Facebook <http://www.xpand-it.com/facebook> Linkedin <
> http://www.xpand-it.com/linkedin> Twitter <http://www.xpand-it.com/twitter>
> Youtube <http://www.xpand-it.com/youtube>
>
>

Reply via email to