How to findout Hbase PUT inserts a new Row or Update an exisiting row
Hello, We have a requirement to determine whether a PUT will create a new row or update an existing one. I looked at using preBatchMutate in a co-processor and have the code below. Few things I need to ask: 1) Is there a more efficient way of doing this? 2) Will region.getClosestRowBefore() add additional IO to go to disk? or will the row be in memory since the row lock was already acquired before preBatchMutate is called? 3) Will region.getClosestRowBefore() always give the correct result? Or are there scenarios where the previous state will not be visible? @Override public void preBatchMutate(ObserverContextRegionCoprocessorEnvironment c, MiniBatchOperationInProgressMutation miniBatchOp) throws IOException { for (int i = 0; i miniBatchOp.size(); i++) { Mutation operation = miniBatchOp.getOperation(i); byte[] rowKey = operation.getRow(); NavigableMapbyte[], ListCell familyCellMap = operation.getFamilyCellMap(); for (Entrybyte[], ListCell entry : familyCellMap.entrySet()) { for (IteratorCell iterator = entry.getValue().iterator(); iterator.hasNext();) { Cell cell = iterator.next(); byte[] family = CellUtil.cloneFamily(cell); Result closestRowBefore = c.getEnvironment().getRegion().getClosestRowBefore(rowKey, family); // closestRowBefore would return null if there is not record for the rowKey and family if (closestRowBefore != null) { // PUT is doing an update for the given rowKey, family } else { // PUT is doing an insert for the given rowKey, family } } } } super.preBatchMutate(c, miniBatchOp); } Thanks Venu Bora This e-mail and files transmitted with it are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you are not one of the named recipient(s) or otherwise have reason to believe that you received this message in error, please immediately notify sender by e-mail, and destroy the original message. Thank You.
How to findout Hbase PUT inserts a new Row or Update an exisiting row
Hello, We have a requirement to determine whether a PUT will create a new row or update an existing one. I looked at using preBatchMutate in a co-processor and have the code below. Few things I need to ask: 1) Is there a more efficient way of doing this? 2) Will region.getClosestRowBefore() add additional IO to go to disk? or will the row be in memory since the row lock was already acquired before preBatchMutate is called? 3) Will region.getClosestRowBefore() always give the correct result? Or are there scenarios where the previous state will not be visible? @Override public void preBatchMutate(ObserverContextRegionCoprocessorEnvironment c, MiniBatchOperationInProgressMutation miniBatchOp) throws IOException { for (int i = 0; i miniBatchOp.size(); i++) { Mutation operation = miniBatchOp.getOperation(i); byte[] rowKey = operation.getRow(); NavigableMapbyte[], ListCell familyCellMap = operation.getFamilyCellMap(); for (Entrybyte[], ListCell entry : familyCellMap.entrySet()) { for (IteratorCell iterator = entry.getValue().iterator(); iterator.hasNext();) { Cell cell = iterator.next(); byte[] family = CellUtil.cloneFamily(cell); Result closestRowBefore = c.getEnvironment().getRegion().getClosestRowBefore(rowKey, family); // closestRowBefore would return null if there is not record for the rowKey and family if (closestRowBefore != null) { // PUT is doing an update for the given rowKey, family } else { // PUT is doing an insert for the given rowKey, family } } } } super.preBatchMutate(c, miniBatchOp); } Thanks Venu Bora This e-mail and files transmitted with it are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you are not one of the named recipient(s) or otherwise have reason to believe that you received this message in error, please immediately notify sender by e-mail, and destroy the original message. Thank You.
Re: How to findout Hbase PUT inserts a new Row or Update an exisiting row
For #2, region.getClosestRowBefore() calls store.getRowKeyAtOrBefore(row) Take a look at HStore#getRowKeyAtOrBefore() (around line 1619): IteratorStoreFile sfIterator = this.storeEngine .getStoreFileManager() .getCandidateFilesForRowKeyBefore(state.getTargetKey()); while (sfIterator.hasNext()) { StoreFile sf = sfIterator.next(); sfIterator.remove(); // Remove sf from iterator. boolean haveNewCandidate = rowAtOrBeforeFromStoreFile(sf, state); In short, I/O is likely in this code path. On Fri, Oct 31, 2014 at 2:19 PM, Bora, Venu venu.b...@epsilon.com wrote: Hello, We have a requirement to determine whether a PUT will create a new row or update an existing one. I looked at using preBatchMutate in a co-processor and have the code below. Few things I need to ask: 1) Is there a more efficient way of doing this? 2) Will region.getClosestRowBefore() add additional IO to go to disk? or will the row be in memory since the row lock was already acquired before preBatchMutate is called? 3) Will region.getClosestRowBefore() always give the correct result? Or are there scenarios where the previous state will not be visible? @Override public void preBatchMutate(ObserverContextRegionCoprocessorEnvironment c, MiniBatchOperationInProgressMutation miniBatchOp) throws IOException { for (int i = 0; i miniBatchOp.size(); i++) { Mutation operation = miniBatchOp.getOperation(i); byte[] rowKey = operation.getRow(); NavigableMapbyte[], ListCell familyCellMap = operation.getFamilyCellMap(); for (Entrybyte[], ListCell entry : familyCellMap.entrySet()) { for (IteratorCell iterator = entry.getValue().iterator(); iterator.hasNext();) { Cell cell = iterator.next(); byte[] family = CellUtil.cloneFamily(cell); Result closestRowBefore = c.getEnvironment().getRegion().getClosestRowBefore(rowKey, family); // closestRowBefore would return null if there is not record for the rowKey and family if (closestRowBefore != null) { // PUT is doing an update for the given rowKey, family } else { // PUT is doing an insert for the given rowKey, family } } } } super.preBatchMutate(c, miniBatchOp); } Thanks Venu Bora This e-mail and files transmitted with it are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you are not one of the named recipient(s) or otherwise have reason to believe that you received this message in error, please immediately notify sender by e-mail, and destroy the original message. Thank You.