So maybe this idea has been sent around before but I would like to
know what everyone thinks. We have a huge column family called bigdata
let's say 200 gb a node. We have used cass* as you would expect we
never read before writing and during our bulk loading we can get rates
like 2000 inserts per second per node. This morning I noticed this cf
on only some nodes had a lot of reads which went on for hours.

Since our apps should not have been reading I dove in. What was
happening was a node was down during the bulk load period. As a resukt
when it came alive the other node with hints went to deliver them. The
problem was the other node was high io trying to deliver hints. I see
why.

Cassandra does NOT write before read EXCEPT when writing a handoff.

This is not a good thing. It means the bigger big data cf gets the
more intensive delivering the hint will be on the sender side. Write
rate may be 2000 but they can not be read that fast.

I know you can now drop and throttle hh in 0.7.0 but this is not good
enough since this only takes longer to get consistent. Or you never
get consistent so here is my thinking...

Store hints in separate physical files and or possibly deliver those
file by streaming.

Maybe there is already a jira out there on this. I just work up so to
me it is an original idea :)

Reply via email to