-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27198/#review65843
-----------------------------------------------------------


I'm curious about the expected behavior for when the source data contains 
deletes. It seems that those deletes will be merged into the destination, 
clobbering any non-deleted data in the destination. Clobbering (or at least, 
some sensible merge, based on timestamp) is probably expected behavior for 
non-deleted data from the source, but deleted data is tricky, because the user 
has no insight into the current state of deletes in the source. They can 
"cloneInto" in one instant and have deletes clobber their destination, and then 
do it again without any clobbering (because the source has since compacted).

To prevent this behavior, it seems you'd need to either 1) force a compaction, 
or 2) add metadata to the value of the file entry instructing the tserver to 
ignore deleted data (delete markers and any data it is hiding) coming from that 
particular file, before merging with the other files, similar to how we 
re-write timestamps when reading data from bulk-imported files.

This unexpected behavior can occur even if all other parts of the cloneInto 
feature were working perfectly. It gets even more complicated and unexpected 
when we begin considering that the user's view of the data is different than 
what the underlying data actually is in other ways, as the result of iterators. 
It may not be obvious that they are injecting the underlying data, and not 
their view of it, into another table. How do you propose to address that? With 
additional permissions to ensure not just everybody can perform this action? 
With a check to verify that the same iterators exist on the destination table 
that existed on the source table?

- Christopher Tubbs


On Dec. 22, 2014, 2:59 p.m., John Vines wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27198/
> -----------------------------------------------------------
> 
> (Updated Dec. 22, 2014, 2:59 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-3236
>     https://issues.apache.org/jira/browse/ACCUMULO-3236
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Includes all code to support feature, including thrift changes
> Includes minor code cleanup to TableLocator and items in the Bulk path to 
> remove signature items that are unused (arguments & exceptions)
> Includes renaming of some bulk import functions to clarify their purpose 
> (because they're now multi-purpose)
> 
> Patch is based on 1.6, but we can choose to make it target only 1.7 if we 
> choose (this conversation should be taken up on jira, not in RB)
> 
> 
> Diffs
> -----
> 
>   
> core/src/main/java/org/apache/accumulo/core/client/admin/TableOperations.java 
> 97f538d 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/RootTabletLocator.java
>  97d476b 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TableOperationsImpl.java
>  07df1bd 
>   core/src/main/java/org/apache/accumulo/core/client/impl/TabletLocator.java 
> e396d82 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TabletLocatorImpl.java
>  c550f15 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TimeoutTabletLocator.java
>  bcbe561 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/thrift/TableOperation.java
>  7716823 
>   
> core/src/main/java/org/apache/accumulo/core/client/mock/MockTableOperationsImpl.java
>  de19137 
>   
> core/src/main/java/org/apache/accumulo/core/client/mock/impl/MockTabletLocator.java
>  35f160f 
>   
> core/src/main/java/org/apache/accumulo/core/master/thrift/FateOperation.java 
> f65f552 
>   
> core/src/main/java/org/apache/accumulo/core/tabletserver/thrift/TabletClientService.java
>  2ba7674 
>   core/src/main/thrift/client.thrift 38a8076 
>   core/src/main/thrift/master.thrift 38e9227 
>   core/src/main/thrift/tabletserver.thrift 25e0b10 
>   
> core/src/test/java/org/apache/accumulo/core/client/admin/TableOperationsHelperTest.java
>  1d91574 
>   
> core/src/test/java/org/apache/accumulo/core/client/impl/TableOperationsHelperTest.java
>  02838ed 
>   
> server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java 
> 4cc13a9 
>   
> server/base/src/main/java/org/apache/accumulo/server/client/ClientServiceHandler.java
>  fe17a62 
>   
> server/base/src/main/java/org/apache/accumulo/server/security/SecurityOperation.java
>  258080c 
>   
> server/base/src/test/java/org/apache/accumulo/server/client/BulkImporterTest.java
>  3680341 
>   
> server/master/src/main/java/org/apache/accumulo/master/FateServiceHandler.java
>  5818da3 
>   
> server/master/src/main/java/org/apache/accumulo/master/tableOps/CloneIntoTable.java
>  PRE-CREATION 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/Tablet.java 
> 9a07a4a 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java 
> 3f594cc 
>   
> test/src/main/java/org/apache/accumulo/test/performance/thrift/NullTserver.java
>  0591b19 
>   test/src/test/java/org/apache/accumulo/test/functional/CloneIntoIT.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27198/diff/
> 
> 
> Testing
> -------
> 
> Includes CloneIntoIT, which exercises all permutations of the flags. Existing 
> BulkIT still functions as intended for validation of no feature loss in 
> refactoring exiting code for multi-purposing.
> 
> 
> Thanks,
> 
> John Vines
> 
>

Reply via email to