Re: One time major deletion/purge vs periodic deletion
It's possible you'll run into compaction headaches. Likely actually. If you have time-bucketed purge/archives, I'd implement a time bucketing strategy using rotating tables dedicated to a time period so that when an entire table is ready for archiving you just snapshot its sstables and then TRUNCATE/nuke the time bucket table. Queries that span buckets and calculating the table to target on inserts are a major pain in the ass, but at scale you'll probably want to consider dingo something like this. On Wed, Mar 7, 2018 at 8:19 PM, kurt greaves wrote: > The important point to consider is whether you are deleting old data or > recently written data. How old/recent depends on your write rate to the > cluster and there's no real formula. Basically you want to avoid deleting a > lot of old data all at once because the tombstones will end up in new > SSTables and the data to be deleted will live in higher levels (LCS) or > large SSTables (STCS), which won't get compacted together for a long time. > In this case it makes no difference if you do a big purge or if you break > it up, because at the end of the day if your big purge is just old data, > all the tombstones will have to stick around for awhile until they make it > to the higher levels/bigger SSTables. > > If you have to purge large amounts of old data, the easiest way is to 1. > Make sure you have at least 50% disk free (for large/major compactions) > and/or 2. Use garbagecollect compactions (3.10+) > >
Re: One time major deletion/purge vs periodic deletion
The important point to consider is whether you are deleting old data or recently written data. How old/recent depends on your write rate to the cluster and there's no real formula. Basically you want to avoid deleting a lot of old data all at once because the tombstones will end up in new SSTables and the data to be deleted will live in higher levels (LCS) or large SSTables (STCS), which won't get compacted together for a long time. In this case it makes no difference if you do a big purge or if you break it up, because at the end of the day if your big purge is just old data, all the tombstones will have to stick around for awhile until they make it to the higher levels/bigger SSTables. If you have to purge large amounts of old data, the easiest way is to 1. Make sure you have at least 50% disk free (for large/major compactions) and/or 2. Use garbagecollect compactions (3.10+)
Re: One time major deletion/purge vs periodic deletion
Charu, I am aware of what type of things you are trying to do and why. Not sure if DCS will solve your problem. Consider a process that identifies the data that needs to be deleted and sets a TTL on that row or cell sometime in the future such as 10 days. The process could be run daily , hourly, etc. depending on the volume but it would spread out the actual deletes. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 7, 2018, 3:26 AM -0500, Ben Slater , wrote: > I would say you are better off spreading out the deletes so compactions have > the best chance of actually removing them from disk before they become a > problem. You will likely need to pay close attempting to compaction strategy > tuning. > > I don’t have any personal experience with it but you may also want to check > out deleting compaction strategy to see if it works for your use case: > https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy > > Cheers > Ben > > > On Wed, 7 Mar 2018 at 17:19 Charulata Sharma (charshar) > > wrote: > > > Well it’s not like that. We don’t just purge. There are business rules > > > which will decide the records to be purged or archived and then purged, > > > so cannot rely on TTL. > > > > > > Thanks, > > > Charu > > > > > > From: Jens Rantil > > > Reply-To: "user@cassandra.apache.org" > > > Date: Tuesday, March 6, 2018 at 12:34 AM > > > To: "user@cassandra.apache.org" > > > Subject: Re: One time major deletion/purge vs periodic deletion > > > > > > Sounds like you are using Cassandra as a queue. It's an antibiotic > > > pattern. What I would do would be to rely on TTL for removal of data and > > > use the TWCS compaction strategy to handle removal and you just focus on > > > insertion. > > > On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) > > > wrote: > > > > quote_type > > > > Hi, > > > > > > > > Wanted the community’s feedback on deciding the schedule of > > > > Archive and Purge job. > > > > Is it better to Purge a large volume of data at regular intervals (like > > > > run A&P jobs once in 3 months ) or purge smaller amounts more > > > > frequently (run the job weekly??) > > > > > > > > Some estimates on the number of deletes performed would be…upto 80-90K > > > > rows purged in 3 months vs 10K deletes every week ?? > > > > > > > > Thanks, > > > > Charu > > > > > > > -- > > > Jens Rantil > > > Backend Developer @ Tink > > > Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden > > > For urgent matters you can reach me at +46-708-84 18 32. > -- > Ben Slater > Chief Product Officer > > > Read our latest technical blog posts here. > This email has been sent on behalf of Instaclustr Pty. Limited (Australia) > and Instaclustr Inc (USA). > This email and any attachments may contain confidential and legally > privileged information. If you are not the intended recipient, do not copy > or disclose its content, but please reply to this email immediately and > highlight the error to the sender and then immediately delete the message.
Re: One time major deletion/purge vs periodic deletion
I would say you are better off spreading out the deletes so compactions have the best chance of actually removing them from disk before they become a problem. You will likely need to pay close attempting to compaction strategy tuning. I don’t have any personal experience with it but you may also want to check out deleting compaction strategy to see if it works for your use case: https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy Cheers Ben On Wed, 7 Mar 2018 at 17:19 Charulata Sharma (charshar) wrote: > Well it’s not like that. We don’t just purge. There are business rules > which will decide the records to be purged or archived and then purged, so > cannot rely on TTL. > > > > Thanks, > > Charu > > > > *From: *Jens Rantil > *Reply-To: *"user@cassandra.apache.org" > *Date: *Tuesday, March 6, 2018 at 12:34 AM > *To: *"user@cassandra.apache.org" > *Subject: *Re: One time major deletion/purge vs periodic deletion > > > > Sounds like you are using Cassandra as a queue. It's an antibiotic > pattern. What I would do would be to rely on TTL for removal of data and > use the TWCS compaction strategy to handle removal and you just focus on > insertion. > > On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) > wrote: > > Hi, > > > > Wanted the community’s feedback on deciding the schedule of Archive > and Purge job. > > Is it better to Purge a large volume of data at regular intervals (like > run A&P jobs once in 3 months ) or purge smaller amounts more frequently > (run the job weekly??) > > > > Some estimates on the number of deletes performed would be…upto 80-90K > rows purged in 3 months vs 10K deletes every week ?? > > > > Thanks, > > Charu > > > > -- > > Jens Rantil > Backend Developer @ Tink > > Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden > <https://maps.google.com/?q=Wallingatan+5,+111+60+Stockholm,+Sweden&entry=gmail&source=g> > For urgent matters you can reach me at +46-708-84 18 32. > -- *Ben Slater* *Chief Product Officer <https://www.instaclustr.com/>* <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr> <https://www.linkedin.com/company/instaclustr> Read our latest technical blog posts here <https://www.instaclustr.com/blog/>. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message.
Re: One time major deletion/purge vs periodic deletion
Well it’s not like that. We don’t just purge. There are business rules which will decide the records to be purged or archived and then purged, so cannot rely on TTL. Thanks, Charu From: Jens Rantil Reply-To: "user@cassandra.apache.org" Date: Tuesday, March 6, 2018 at 12:34 AM To: "user@cassandra.apache.org" Subject: Re: One time major deletion/purge vs periodic deletion Sounds like you are using Cassandra as a queue. It's an antibiotic pattern. What I would do would be to rely on TTL for removal of data and use the TWCS compaction strategy to handle removal and you just focus on insertion. On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) mailto:chars...@cisco.com>> wrote: Hi, Wanted the community’s feedback on deciding the schedule of Archive and Purge job. Is it better to Purge a large volume of data at regular intervals (like run A&P jobs once in 3 months ) or purge smaller amounts more frequently (run the job weekly??) Some estimates on the number of deletes performed would be…upto 80-90K rows purged in 3 months vs 10K deletes every week ?? Thanks, Charu -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: One time major deletion/purge vs periodic deletion
Sounds like you are using Cassandra as a queue. It's an antibiotic pattern. What I would do would be to rely on TTL for removal of data and use the TWCS compaction strategy to handle removal and you just focus on insertion. On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) wrote: > Hi, > > > > Wanted the community’s feedback on deciding the schedule of Archive > and Purge job. > > Is it better to Purge a large volume of data at regular intervals (like > run A&P jobs once in 3 months ) or purge smaller amounts more frequently > (run the job weekly??) > > > > Some estimates on the number of deletes performed would be…upto 80-90K > rows purged in 3 months vs 10K deletes every week ?? > > > > Thanks, > > Charu > > > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.