pdeva opened a new issue #8264: permanently deleting old segments should be 
part of datasource lifecycle
URL: https://github.com/apache/incubator-druid/issues/8264
 
 
   ### Description
   
   Currently datasources have lifecycle methods that simply goven whether or 
not they are downloaded from deep storage.
   
   However, not all data needs to be preserved for eternity.
   There is a real need to delete very old data.
   
   For example, say for a datsource A, you want to:
   
   1. load 90 days of data in historical
   2. keep 1 year in deep storage
   3. completely delete data > 1 year
   
   
   You can partially accomplish 3. by setting lifecyle rules in S3 or GCP, no 
doubt.
   
   But, the druid metadata storage will still retain segment information about 
those really old segments. Over time, this will make metadata storage grow 
indefinitely and make operations on that table slower due to its size. This is 
worse when you have lots of small segments, since they will contribute to more 
rows in the metadata storage table.
   
   
   Druid should allow adding a lifecycle rule to a datasource that deletes 
segments from metadata storage that are too old. Currently this is a manual and 
error-prone operation. But given the importance of getting this right (you are 
permanently deleting data), this should be as easy as setting load/drop rules 
for a datasource.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to