> On Dec. 22, 2014, 11:42 p.m., Christopher Tubbs wrote:
> > I'm curious about the expected behavior for when the source data contains 
> > deletes. It seems that those deletes will be merged into the destination, 
> > clobbering any non-deleted data in the destination. Clobbering (or at 
> > least, some sensible merge, based on timestamp) is probably expected 
> > behavior for non-deleted data from the source, but deleted data is tricky, 
> > because the user has no insight into the current state of deletes in the 
> > source. They can "cloneInto" in one instant and have deletes clobber their 
> > destination, and then do it again without any clobbering (because the 
> > source has since compacted).
> > 
> > To prevent this behavior, it seems you'd need to either 1) force a 
> > compaction, or 2) add metadata to the value of the file entry instructing 
> > the tserver to ignore deleted data (delete markers and any data it is 
> > hiding) coming from that particular file, before merging with the other 
> > files, similar to how we re-write timestamps when reading data from 
> > bulk-imported files.
> > 
> > This unexpected behavior can occur even if all other parts of the cloneInto 
> > feature were working perfectly. It gets even more complicated and 
> > unexpected when we begin considering that the user's view of the data is 
> > different than what the underlying data actually is in other ways, as the 
> > result of iterators. It may not be obvious that they are injecting the 
> > underlying data, and not their view of it, into another table. How do you 
> > propose to address that? With additional permissions to ensure not just 
> > everybody can perform this action? With a check to verify that the same 
> > iterators exist on the destination table that existed on the source table?
> 
> John Vines wrote:
>     I see your concerns no different as those for a standard bulk import. And 
> these concerns are currently handled by the user doing the bulk import needs 
> to be aware of the data they're calling the command on.
> 
> Christopher Tubbs wrote:
>     I agree that they aren't much different than bulk import. However, this 
> is not a "bulkImportFrom" feature, it is a "cloneInto" feature, and it's 
> *very* different behavior than clone. With bulk import, also, we have a 
> special permission, that can be reserved to those who understand its 
> particulars. It's also the case that typical bulk import use cases involve 
> writing data that will not have deletes, or will not come from a source with 
> a different view than the underlying files.
>     
>     I'm just concerned about user expectations, and how we address them.
> 
> John Vines wrote:
>     What if we require the source table to be compacted down to 1 file per 
> tablet to ease concerns about deletes?
>     
>     And I totally agree with the permissions. Would you be okay with 
> requiring bulk import on the destination, or would you rather it's own 
> permission?
> 
> John Vines wrote:
>     Actually, compacting down to 1 file is expensive and may not be required 
> if there are no deletes in play. I think this is best left to user's of the 
> system to quiesce to their needs.
> 
> Christopher Tubbs wrote:
>     Yeah, compacting would be expensive. It's fine to leave it to the user... 
> but we do need to communicate the expected behavior and pitfalls very 
> thoroughly, and perhaps lock the feature down with a special permission, like 
> bulk import has, to prevent non-permitted users from performing this 
> function, who may not understand the pitfalls.

I see a few ways to achieve this, curious what your opinion is-
For the destination table I see 2 options-
1. Just utilize bulk import because it has known pitfalls that need to be 
accomodated
2. Introduce new permission for this

And then for the source table, there's a few more options-
1. READ
2. if we introduce a new permission for the destination table, use that same 
permission
3. A new permisssion for cloning FROM


- John


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27198/#review65843
-----------------------------------------------------------


On Dec. 22, 2014, 7:59 p.m., John Vines wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27198/
> -----------------------------------------------------------
> 
> (Updated Dec. 22, 2014, 7:59 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-3236
>     https://issues.apache.org/jira/browse/ACCUMULO-3236
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Includes all code to support feature, including thrift changes
> Includes minor code cleanup to TableLocator and items in the Bulk path to 
> remove signature items that are unused (arguments & exceptions)
> Includes renaming of some bulk import functions to clarify their purpose 
> (because they're now multi-purpose)
> 
> Patch is based on 1.6, but we can choose to make it target only 1.7 if we 
> choose (this conversation should be taken up on jira, not in RB)
> 
> 
> Diffs
> -----
> 
>   
> core/src/main/java/org/apache/accumulo/core/client/admin/TableOperations.java 
> 97f538d 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/RootTabletLocator.java
>  97d476b 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TableOperationsImpl.java
>  07df1bd 
>   core/src/main/java/org/apache/accumulo/core/client/impl/TabletLocator.java 
> e396d82 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TabletLocatorImpl.java
>  c550f15 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TimeoutTabletLocator.java
>  bcbe561 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/thrift/TableOperation.java
>  7716823 
>   
> core/src/main/java/org/apache/accumulo/core/client/mock/MockTableOperationsImpl.java
>  de19137 
>   
> core/src/main/java/org/apache/accumulo/core/client/mock/impl/MockTabletLocator.java
>  35f160f 
>   
> core/src/main/java/org/apache/accumulo/core/master/thrift/FateOperation.java 
> f65f552 
>   
> core/src/main/java/org/apache/accumulo/core/tabletserver/thrift/TabletClientService.java
>  2ba7674 
>   core/src/main/thrift/client.thrift 38a8076 
>   core/src/main/thrift/master.thrift 38e9227 
>   core/src/main/thrift/tabletserver.thrift 25e0b10 
>   
> core/src/test/java/org/apache/accumulo/core/client/admin/TableOperationsHelperTest.java
>  1d91574 
>   
> core/src/test/java/org/apache/accumulo/core/client/impl/TableOperationsHelperTest.java
>  02838ed 
>   
> server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java 
> 4cc13a9 
>   
> server/base/src/main/java/org/apache/accumulo/server/client/ClientServiceHandler.java
>  fe17a62 
>   
> server/base/src/main/java/org/apache/accumulo/server/security/SecurityOperation.java
>  258080c 
>   
> server/base/src/test/java/org/apache/accumulo/server/client/BulkImporterTest.java
>  3680341 
>   
> server/master/src/main/java/org/apache/accumulo/master/FateServiceHandler.java
>  5818da3 
>   
> server/master/src/main/java/org/apache/accumulo/master/tableOps/CloneIntoTable.java
>  PRE-CREATION 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/Tablet.java 
> 9a07a4a 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java 
> 3f594cc 
>   
> test/src/main/java/org/apache/accumulo/test/performance/thrift/NullTserver.java
>  0591b19 
>   test/src/test/java/org/apache/accumulo/test/functional/CloneIntoIT.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27198/diff/
> 
> 
> Testing
> -------
> 
> Includes CloneIntoIT, which exercises all permutations of the flags. Existing 
> BulkIT still functions as intended for validation of no feature loss in 
> refactoring exiting code for multi-purposing.
> 
> 
> Thanks,
> 
> John Vines
> 
>

Reply via email to