If you want to keep the load out of the cassandra process and do the join to
sql off line, take a look at the bin/sstablekeys utility. This will let you
output the keys in an sstable. You will need to do it for every sstable on
every node, create the unique list and then check in your SQL db thi
Great! I'm glad at least one of those ideas was helpful for you.
That's a road we've travelled before and as one last suggestion that might
help, you could alter all client writers to cassandra beforehand so that
they write to BOTH keyspaces BEFORE beginning the SQL based transfer. This
might he
Thanks for your suggestions, Eric!
> One of the application uses 1.5TB out of 1.8TB
I'm sorry, maybe that statment was slightly ambiguous. I meant to say, that one
application uses 1.5TB, while the others use 300GB, totalling in 1.8TB of data.
Our total disk capacity, however, is at about 7 TB,
Is there any way that you could do that lookup in reverse where you pull
the records from your SQL database, figure out which keys aren't necessary,
and then delete any unnecessary keys that may or may not exist in
cassandra?
If that's not a possibility, then what about creating the same Cassandra
We're running a 8 node cluster with different CFs for different applications.
One of the application uses 1.5TB out of 1.8TB in total, but only because we
started out with a deletion mechanism and implemented one later on. So there is
probably a high amount of old data in there, that we don't ev