> On Dec. 22, 2014, 11:42 p.m., Christopher Tubbs wrote:
> > I'm curious about the expected behavior for when the source data contains 
> > deletes. It seems that those deletes will be merged into the destination, 
> > clobbering any non-deleted data in the destination. Clobbering (or at 
> > least, some sensible merge, based on timestamp) is probably expected 
> > behavior for non-deleted data from the source, but deleted data is tricky, 
> > because the user has no insight into the current state of deletes in the 
> > source. They can "cloneInto" in one instant and have deletes clobber their 
> > destination, and then do it again without any clobbering (because the 
> > source has since compacted).
> > 
> > To prevent this behavior, it seems you'd need to either 1) force a 
> > compaction, or 2) add metadata to the value of the file entry instructing 
> > the tserver to ignore deleted data (delete markers and any data it is 
> > hiding) coming from that particular file, before merging with the other 
> > files, similar to how we re-write timestamps when reading data from 
> > bulk-imported files.
> > 
> > This unexpected behavior can occur even if all other parts of the cloneInto 
> > feature were working perfectly. It gets even more complicated and 
> > unexpected when we begin considering that the user's view of the data is 
> > different than what the underlying data actually is in other ways, as the 
> > result of iterators. It may not be obvious that they are injecting the 
> > underlying data, and not their view of it, into another table. How do you 
> > propose to address that? With additional permissions to ensure not just 
> > everybody can perform this action? With a check to verify that the same 
> > iterators exist on the destination table that existed on the source table?

I see your concerns no different as those for a standard bulk import. And these 
concerns are currently handled by the user doing the bulk import needs to be 
aware of the data they're calling the command on.


- John


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27198/#review65843
-----------------------------------------------------------


On Dec. 22, 2014, 7:59 p.m., John Vines wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27198/
> -----------------------------------------------------------
> 
> (Updated Dec. 22, 2014, 7:59 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-3236
>     https://issues.apache.org/jira/browse/ACCUMULO-3236
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Includes all code to support feature, including thrift changes
> Includes minor code cleanup to TableLocator and items in the Bulk path to 
> remove signature items that are unused (arguments & exceptions)
> Includes renaming of some bulk import functions to clarify their purpose 
> (because they're now multi-purpose)
> 
> Patch is based on 1.6, but we can choose to make it target only 1.7 if we 
> choose (this conversation should be taken up on jira, not in RB)
> 
> 
> Diffs
> -----
> 
>   
> core/src/main/java/org/apache/accumulo/core/client/admin/TableOperations.java 
> 97f538d 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/RootTabletLocator.java
>  97d476b 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TableOperationsImpl.java
>  07df1bd 
>   core/src/main/java/org/apache/accumulo/core/client/impl/TabletLocator.java 
> e396d82 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TabletLocatorImpl.java
>  c550f15 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TimeoutTabletLocator.java
>  bcbe561 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/thrift/TableOperation.java
>  7716823 
>   
> core/src/main/java/org/apache/accumulo/core/client/mock/MockTableOperationsImpl.java
>  de19137 
>   
> core/src/main/java/org/apache/accumulo/core/client/mock/impl/MockTabletLocator.java
>  35f160f 
>   
> core/src/main/java/org/apache/accumulo/core/master/thrift/FateOperation.java 
> f65f552 
>   
> core/src/main/java/org/apache/accumulo/core/tabletserver/thrift/TabletClientService.java
>  2ba7674 
>   core/src/main/thrift/client.thrift 38a8076 
>   core/src/main/thrift/master.thrift 38e9227 
>   core/src/main/thrift/tabletserver.thrift 25e0b10 
>   
> core/src/test/java/org/apache/accumulo/core/client/admin/TableOperationsHelperTest.java
>  1d91574 
>   
> core/src/test/java/org/apache/accumulo/core/client/impl/TableOperationsHelperTest.java
>  02838ed 
>   
> server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java 
> 4cc13a9 
>   
> server/base/src/main/java/org/apache/accumulo/server/client/ClientServiceHandler.java
>  fe17a62 
>   
> server/base/src/main/java/org/apache/accumulo/server/security/SecurityOperation.java
>  258080c 
>   
> server/base/src/test/java/org/apache/accumulo/server/client/BulkImporterTest.java
>  3680341 
>   
> server/master/src/main/java/org/apache/accumulo/master/FateServiceHandler.java
>  5818da3 
>   
> server/master/src/main/java/org/apache/accumulo/master/tableOps/CloneIntoTable.java
>  PRE-CREATION 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/Tablet.java 
> 9a07a4a 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java 
> 3f594cc 
>   
> test/src/main/java/org/apache/accumulo/test/performance/thrift/NullTserver.java
>  0591b19 
>   test/src/test/java/org/apache/accumulo/test/functional/CloneIntoIT.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27198/diff/
> 
> 
> Testing
> -------
> 
> Includes CloneIntoIT, which exercises all permutations of the flags. Existing 
> BulkIT still functions as intended for validation of no feature loss in 
> refactoring exiting code for multi-purposing.
> 
> 
> Thanks,
> 
> John Vines
> 
>

Reply via email to