[ https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970249#action_12970249 ]
Peter Schuller commented on CASSANDRA-1470: ------------------------------------------- (1) - great (2) - i'm pretty sure it will get instantly evicted. See http://lxr.free-electrons.com/source/mm/fadvise.c#L118 and http://lxr.free-electrons.com/source/mm/truncate.c#L309 (however I agree that with the mythical "good enough" implementation the hint would really just be that - a hint - but that can easily backfire; sometimes you want instant eviction; in reality I think that posix_fadvise() is too limited an interface and while you can imagine an implementation that does something correctly for a particular use-case, it's too limited to be generally suitable for everyone...). On posix_fadvise: Yes, I was only thinking of scattered pages as a problem. Contiguous ranges are fine and what one wants for fadvise purposes. On overcommitting: Certainly mincore+advise with fallback to overcommit would be an improvement still, but my gut feeling is that lots of real-life cases will definitely have very scattered hotness. Pretty much any use-case where row keys are spread randomly with respect to hotness (which I believe is very often the case), and each row is pretty small. I'm trying to think when one would expect it not to be pretty scattered. I suppose if using OPP and the row keys correspond directly to something which is correlated with hotness? So I guess something like time series data with OPP, or with RP and large rows. But it feels like a pretty narrow subset of use cases. It is worth noting that for truly large data sets scattering is fine since the cost of fadvise() per page read is still low since the contiguous ranges to drop will be fairly large. But "unfortunately" a lot of use cases, I assume, are with data that is either similar to memory size or a few factors of memory size (significantly smaller than memory is a non-issue since it's all in memory anyway with the current code). (As an aside, and this is not a serious suggestion since Cassandra isn't in the business of delivering kernel patches, but the implementation seems to iterate over individual pages anyway. So it seems that the only thing preventing a more efficient fadvise() for discontiguous ranges is the interface to the kernel, rather than an implementation problem. At least based on a very brief look...) > use direct io for compaction > ---------------------------- > > Key: CASSANDRA-1470 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1470 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Ellis > Assignee: Pavel Yaskevich > Fix For: 0.7.1 > > Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, > CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, > CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, > CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, > CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, > CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, > CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, > CASSANDRA-1470.patch, > use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch > > > When compaction scans through a group of sstables, it forces the data in the > os buffer cache being used for hot reads, which can have a dramatic negative > effect on performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.