Hi,

We have a problem when we are writing lots of records to HBase.
We are not specifying timestamps explicitly and so the situation arises where 
multiple records are being written in the same millisecond.
Unfortunately when the records are written and the timestamps are the same then 
later writes are treated as updates of the previous records and not separate 
records, which is what we want.
So we want to be able to guarantee that records are not treated as overwrites 
(unless we explicitly make them so).

As I understand it there are (at least) two different ways to proceed.

The first approach is to increase the resolution of the timestamp.
So we could use something like java.lang.System.nanoTime()
However although this seems to ameliorate the problem it seems to introduce 
other problems.
Also ideally we would like something that guarantees that we don't lose writes 
rather than making them more unlikely.

The second approach is to write a prePut co-processor.
In the prePut I can do a read using the same rowkey, column family and column 
qualifier and omit the timestamp.
As I understand it this will return me the latest timestamp.
Then I can update the timestamp that I am going to write, if necessary, to make 
sure that the timestamp is always unique.
In this way I can guarantee that none of my writes are accidentally turned into 
updates.

However this approach seems to be expensive.
I have to do a read before each write, and although (I believe) it will be on 
the same region server, it's still going to slow things down a lot.
Also I am assuming that the prePut co-processor is executed inside a record 
lock so that I don't have to worry about synchronization.
Is this true?

Is there a better way?

Maybe there is some implementation of this already that I can pick up?

Maybe there is some way that I can implement this more efficiently?

It seems to me that this might be better handled at compaction.
Shouldn't there be some way that I can mark writes with some sort of special 
value of timestamp that means that this write should never be considered as an 
update but always as a separate write?

Any advice gratefully received.

Peter Marron

Reply via email to