keith-turner opened a new issue, #5387: URL: https://github.com/apache/accumulo/issues/5387
**Is your feature request related to a problem? Please describe.** Many files referenced are only used by a single tablet and these files could be deleted by compaction if this was known. Instead a delete marker is always added for files and GC has to process this delete marker. **Describe the solution you'd like** Each files in a tablets metadata could have a shared marker that tracks if more than one tablet references the file. * When compaction creates a new files it sets shared=false * When a tablet splits it will set shared=true on any files that go to multiple tablets * When a table is cloned it will set shared=true in the source table on any files it references in the new table. * Bulk import could marks files as shared or not depending on if the files go to multiple tablets. * The fate operation that commits a compaction could either delete the input files or write a delete markers depending on if the files were shared or not. For this feature to be possible all of the above operations must be able to be done safely using conditional mutations. The shared marker could be added to the per file metadata that is already stored in the tablet. **Describe alternatives you've considered** #2729 may be an alternative if HDFS supports hard links. **Additional context** This feature would reduce the work on the Accumulo GC process and avoid storing delete markers. The trade off is that the new shared marker would be required and compaction commit would now be making calls to the namenode to delete files in some cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org