How to findout Hbase PUT inserts a new Row or Update an exisiting row

2014-11-03 Thread Bora, Venu
Hello,
We have a requirement to determine whether a PUT will create a new row or 
update an existing one. I looked at using preBatchMutate in a co-processor and 
have the code below.

Few things I need to ask:
1) Is there a more efficient way of doing this?
2) Will region.getClosestRowBefore() add additional IO to go to disk? or will 
the row be in memory since the row lock was already acquired before 
preBatchMutate is called?
3) Will region.getClosestRowBefore() always give the correct result? Or are 
there scenarios where the previous state will not be visible?


@Override
public void preBatchMutate(ObserverContextRegionCoprocessorEnvironment c, 
MiniBatchOperationInProgressMutation miniBatchOp) throws IOException {
for (int i = 0; i  miniBatchOp.size(); i++) {
Mutation operation = miniBatchOp.getOperation(i);
byte[] rowKey = operation.getRow();
NavigableMapbyte[], ListCell familyCellMap = 
operation.getFamilyCellMap();

for (Entrybyte[], ListCell entry : familyCellMap.entrySet()) {
for (IteratorCell iterator = entry.getValue().iterator(); 
iterator.hasNext();) {
Cell cell = iterator.next();
byte[] family = CellUtil.cloneFamily(cell);
Result closestRowBefore = 
c.getEnvironment().getRegion().getClosestRowBefore(rowKey, family);
// closestRowBefore would return null if there is not 
record for the rowKey and family
if (closestRowBefore != null) {
// PUT is doing an update for the given rowKey, family
} else {
// PUT is doing an insert for the given rowKey, family
}
}
}
}
super.preBatchMutate(c, miniBatchOp);
}


Thanks
Venu Bora




This e-mail and files transmitted with it are confidential, and are intended 
solely for the use of the individual or entity to whom this e-mail is 
addressed. If you are not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you are not one of the named recipient(s) or otherwise 
have reason to believe that you received this message in error, please 
immediately notify sender by e-mail, and destroy the original message. Thank 
You.


How to findout Hbase PUT inserts a new Row or Update an exisiting row

2014-10-31 Thread Bora, Venu
Hello,
We have a requirement to determine whether a PUT will create a new row or 
update an existing one. I looked at using preBatchMutate in a co-processor and 
have the code below.

Few things I need to ask:
1) Is there a more efficient way of doing this?
2) Will region.getClosestRowBefore() add additional IO to go to disk? or will 
the row be in memory since the row lock was already acquired before 
preBatchMutate is called?
3) Will region.getClosestRowBefore() always give the correct result? Or are 
there scenarios where the previous state will not be visible?


@Override
public void preBatchMutate(ObserverContextRegionCoprocessorEnvironment c, 
MiniBatchOperationInProgressMutation miniBatchOp) throws IOException {
for (int i = 0; i  miniBatchOp.size(); i++) {
Mutation operation = miniBatchOp.getOperation(i);
byte[] rowKey = operation.getRow();
NavigableMapbyte[], ListCell familyCellMap = 
operation.getFamilyCellMap();

for (Entrybyte[], ListCell entry : familyCellMap.entrySet()) {
for (IteratorCell iterator = entry.getValue().iterator(); 
iterator.hasNext();) {
Cell cell = iterator.next();
byte[] family = CellUtil.cloneFamily(cell);
Result closestRowBefore = 
c.getEnvironment().getRegion().getClosestRowBefore(rowKey, family);
// closestRowBefore would return null if there is not 
record for the rowKey and family
if (closestRowBefore != null) {
// PUT is doing an update for the given rowKey, family
} else {
// PUT is doing an insert for the given rowKey, family
}
}
}
}
super.preBatchMutate(c, miniBatchOp);
}


Thanks
Venu Bora



This e-mail and files transmitted with it are confidential, and are intended 
solely for the use of the individual or entity to whom this e-mail is 
addressed. If you are not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you are not one of the named recipient(s) or otherwise 
have reason to believe that you received this message in error, please 
immediately notify sender by e-mail, and destroy the original message. Thank 
You.


Re: How to findout Hbase PUT inserts a new Row or Update an exisiting row

2014-10-31 Thread Ted Yu
For #2, region.getClosestRowBefore() calls store.getRowKeyAtOrBefore(row)

Take a look at HStore#getRowKeyAtOrBefore() (around line 1619):

  IteratorStoreFile sfIterator = this.storeEngine
.getStoreFileManager()

  .getCandidateFilesForRowKeyBefore(state.getTargetKey());

  while (sfIterator.hasNext()) {

StoreFile sf = sfIterator.next();

sfIterator.remove(); // Remove sf from iterator.

boolean haveNewCandidate = rowAtOrBeforeFromStoreFile(sf, state);

In short, I/O is likely in this code path.

On Fri, Oct 31, 2014 at 2:19 PM, Bora, Venu venu.b...@epsilon.com wrote:

 Hello,
 We have a requirement to determine whether a PUT will create a new row or
 update an existing one. I looked at using preBatchMutate in a co-processor
 and have the code below.

 Few things I need to ask:
 1) Is there a more efficient way of doing this?
 2) Will region.getClosestRowBefore() add additional IO to go to disk? or
 will the row be in memory since the row lock was already acquired before
 preBatchMutate is called?
 3) Will region.getClosestRowBefore() always give the correct result? Or
 are there scenarios where the previous state will not be visible?


 @Override
 public void
 preBatchMutate(ObserverContextRegionCoprocessorEnvironment c,
 MiniBatchOperationInProgressMutation miniBatchOp) throws IOException {
 for (int i = 0; i  miniBatchOp.size(); i++) {
 Mutation operation = miniBatchOp.getOperation(i);
 byte[] rowKey = operation.getRow();
 NavigableMapbyte[], ListCell familyCellMap =
 operation.getFamilyCellMap();

 for (Entrybyte[], ListCell entry :
 familyCellMap.entrySet()) {
 for (IteratorCell iterator =
 entry.getValue().iterator(); iterator.hasNext();) {
 Cell cell = iterator.next();
 byte[] family = CellUtil.cloneFamily(cell);
 Result closestRowBefore =
 c.getEnvironment().getRegion().getClosestRowBefore(rowKey, family);
 // closestRowBefore would return null if there is not
 record for the rowKey and family
 if (closestRowBefore != null) {
 // PUT is doing an update for the given rowKey,
 family
 } else {
 // PUT is doing an insert for the given rowKey,
 family
 }
 }
 }
 }
 super.preBatchMutate(c, miniBatchOp);
 }


 Thanks
 Venu Bora

 

 This e-mail and files transmitted with it are confidential, and are
 intended solely for the use of the individual or entity to whom this e-mail
 is addressed. If you are not the intended recipient, or the employee or
 agent responsible to deliver it to the intended recipient, you are hereby
 notified that any dissemination, distribution or copying of this
 communication is strictly prohibited. If you are not one of the named
 recipient(s) or otherwise have reason to believe that you received this
 message in error, please immediately notify sender by e-mail, and destroy
 the original message. Thank You.