> On Dec. 22, 2014, 11:42 p.m., Christopher Tubbs wrote: > > I'm curious about the expected behavior for when the source data contains > > deletes. It seems that those deletes will be merged into the destination, > > clobbering any non-deleted data in the destination. Clobbering (or at > > least, some sensible merge, based on timestamp) is probably expected > > behavior for non-deleted data from the source, but deleted data is tricky, > > because the user has no insight into the current state of deletes in the > > source. They can "cloneInto" in one instant and have deletes clobber their > > destination, and then do it again without any clobbering (because the > > source has since compacted). > > > > To prevent this behavior, it seems you'd need to either 1) force a > > compaction, or 2) add metadata to the value of the file entry instructing > > the tserver to ignore deleted data (delete markers and any data it is > > hiding) coming from that particular file, before merging with the other > > files, similar to how we re-write timestamps when reading data from > > bulk-imported files. > > > > This unexpected behavior can occur even if all other parts of the cloneInto > > feature were working perfectly. It gets even more complicated and > > unexpected when we begin considering that the user's view of the data is > > different than what the underlying data actually is in other ways, as the > > result of iterators. It may not be obvious that they are injecting the > > underlying data, and not their view of it, into another table. How do you > > propose to address that? With additional permissions to ensure not just > > everybody can perform this action? With a check to verify that the same > > iterators exist on the destination table that existed on the source table? > > John Vines wrote: > I see your concerns no different as those for a standard bulk import. And > these concerns are currently handled by the user doing the bulk import needs > to be aware of the data they're calling the command on. > > Christopher Tubbs wrote: > I agree that they aren't much different than bulk import. However, this > is not a "bulkImportFrom" feature, it is a "cloneInto" feature, and it's > *very* different behavior than clone. With bulk import, also, we have a > special permission, that can be reserved to those who understand its > particulars. It's also the case that typical bulk import use cases involve > writing data that will not have deletes, or will not come from a source with > a different view than the underlying files. > > I'm just concerned about user expectations, and how we address them. > > John Vines wrote: > What if we require the source table to be compacted down to 1 file per > tablet to ease concerns about deletes? > > And I totally agree with the permissions. Would you be okay with > requiring bulk import on the destination, or would you rather it's own > permission? > > John Vines wrote: > Actually, compacting down to 1 file is expensive and may not be required > if there are no deletes in play. I think this is best left to user's of the > system to quiesce to their needs. > > Christopher Tubbs wrote: > Yeah, compacting would be expensive. It's fine to leave it to the user... > but we do need to communicate the expected behavior and pitfalls very > thoroughly, and perhaps lock the feature down with a special permission, like > bulk import has, to prevent non-permitted users from performing this > function, who may not understand the pitfalls.
I see a few ways to achieve this, curious what your opinion is- For the destination table I see 2 options- 1. Just utilize bulk import because it has known pitfalls that need to be accomodated 2. Introduce new permission for this And then for the source table, there's a few more options- 1. READ 2. if we introduce a new permission for the destination table, use that same permission 3. A new permisssion for cloning FROM - John ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27198/#review65843 ----------------------------------------------------------- On Dec. 22, 2014, 7:59 p.m., John Vines wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27198/ > ----------------------------------------------------------- > > (Updated Dec. 22, 2014, 7:59 p.m.) > > > Review request for accumulo. > > > Bugs: ACCUMULO-3236 > https://issues.apache.org/jira/browse/ACCUMULO-3236 > > > Repository: accumulo > > > Description > ------- > > Includes all code to support feature, including thrift changes > Includes minor code cleanup to TableLocator and items in the Bulk path to > remove signature items that are unused (arguments & exceptions) > Includes renaming of some bulk import functions to clarify their purpose > (because they're now multi-purpose) > > Patch is based on 1.6, but we can choose to make it target only 1.7 if we > choose (this conversation should be taken up on jira, not in RB) > > > Diffs > ----- > > > core/src/main/java/org/apache/accumulo/core/client/admin/TableOperations.java > 97f538d > > core/src/main/java/org/apache/accumulo/core/client/impl/RootTabletLocator.java > 97d476b > > core/src/main/java/org/apache/accumulo/core/client/impl/TableOperationsImpl.java > 07df1bd > core/src/main/java/org/apache/accumulo/core/client/impl/TabletLocator.java > e396d82 > > core/src/main/java/org/apache/accumulo/core/client/impl/TabletLocatorImpl.java > c550f15 > > core/src/main/java/org/apache/accumulo/core/client/impl/TimeoutTabletLocator.java > bcbe561 > > core/src/main/java/org/apache/accumulo/core/client/impl/thrift/TableOperation.java > 7716823 > > core/src/main/java/org/apache/accumulo/core/client/mock/MockTableOperationsImpl.java > de19137 > > core/src/main/java/org/apache/accumulo/core/client/mock/impl/MockTabletLocator.java > 35f160f > > core/src/main/java/org/apache/accumulo/core/master/thrift/FateOperation.java > f65f552 > > core/src/main/java/org/apache/accumulo/core/tabletserver/thrift/TabletClientService.java > 2ba7674 > core/src/main/thrift/client.thrift 38a8076 > core/src/main/thrift/master.thrift 38e9227 > core/src/main/thrift/tabletserver.thrift 25e0b10 > > core/src/test/java/org/apache/accumulo/core/client/admin/TableOperationsHelperTest.java > 1d91574 > > core/src/test/java/org/apache/accumulo/core/client/impl/TableOperationsHelperTest.java > 02838ed > > server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java > 4cc13a9 > > server/base/src/main/java/org/apache/accumulo/server/client/ClientServiceHandler.java > fe17a62 > > server/base/src/main/java/org/apache/accumulo/server/security/SecurityOperation.java > 258080c > > server/base/src/test/java/org/apache/accumulo/server/client/BulkImporterTest.java > 3680341 > > server/master/src/main/java/org/apache/accumulo/master/FateServiceHandler.java > 5818da3 > > server/master/src/main/java/org/apache/accumulo/master/tableOps/CloneIntoTable.java > PRE-CREATION > server/tserver/src/main/java/org/apache/accumulo/tserver/Tablet.java > 9a07a4a > server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java > 3f594cc > > test/src/main/java/org/apache/accumulo/test/performance/thrift/NullTserver.java > 0591b19 > test/src/test/java/org/apache/accumulo/test/functional/CloneIntoIT.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/27198/diff/ > > > Testing > ------- > > Includes CloneIntoIT, which exercises all permutations of the flags. Existing > BulkIT still functions as intended for validation of no feature loss in > refactoring exiting code for multi-purposing. > > > Thanks, > > John Vines > >