[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175841#comment-13175841 ]
Pavel Yaskevich commented on CASSANDRA-3623: -------------------------------------------- bq. Mean while your claim here is that snappy library is taking more CPU because we give it DirectBB? First of all I don't claim that it takes more CPU, I claim that it takes longer time to decompress data comparing to normal reads. Second, I don't think it's a problem with direct BB itself (btw, there is no way you can pass not direct buffer) but instead with mmap'ed I/O in that case. bq. Can you plz conform you tried v2 and gives a worse performance than trunk and it is Linux (v1 doesn't give a better performance gains where as v2 does)? Yes I tried v2 and it wasn't easy because first of all it wasn't rebased, then I figured out that I needed to apply CASSANDRA-3611 and change call to FBUtilities.newCRC32() to "new CRC32()" for it to compile, after that I added "disk_access_mode: mmap" to the conf/cassandra.yaml and I used stress "./bin/stress -n 300000 -S 512 -I SnappyCompressor" to insert test data (which don't fit into page cache) and tried to read with "./bin/stress -n 300000 -I SnappyCompressor -o read" but got the following exceptions: {code} java.lang.RuntimeException: java.lang.UnsupportedOperationException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1283) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.UnsupportedOperationException at org.apache.cassandra.io.compress.CompressedMappedFileDataInput.mark(CompressedMappedFileDataInput.java:212) at org.apache.cassandra.db.columniterator.SimpleSliceReader.<init>(SimpleSliceReader.java:62) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:90) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:66) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:78) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:232) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1283) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1169) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1136) at org.apache.cassandra.db.Table.getRow(Table.java:375) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:800) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1279) ... 3 more {code} and {code} ava.lang.RuntimeException: java.lang.UnsupportedOperationException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1283) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.UnsupportedOperationException at org.apache.cassandra.io.compress.CompressedMappedFileDataInput.reset(CompressedMappedFileDataInput.java:207) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:78) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:107) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145) at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:88) at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:47) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:137) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:246) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1283) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1169) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1136) at org.apache.cassandra.db.Table.getRow(Table.java:375) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:800) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1279) ... 3 more {code} After I managed to implement mark()/reset() methods I got the following results: current trunk 67 sec and your patch 101 sec to run read on 300000 rows. I have tested everything on the server without any interference network and it seems that my results are clearer from side effects than yours. I'm still not convinced that mmap'ed I/O is better for compressed data than syscalls and I know that it has side effects that we can't control from java (mentioned above) so I'm waiting for convincing results or we should close this ticket... > use MMapedBuffer in CompressedSegmentedFile.getSegment > ------------------------------------------------------ > > Key: CASSANDRA-3623 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3623 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 1.1 > Reporter: Vijay > Assignee: Vijay > Labels: compression > Fix For: 1.1 > > Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, > 0001-MMaped-Compression-segmented-file.patch, > 0002-tests-for-MMaped-Compression-segmented-file-v2.patch > > > CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to > use the MMap and hence a higher CPU on the nodes and higher latencies on > reads. > This ticket is to implement the TODO mentioned in CompressedRandomAccessReader > // TODO refactor this to separate concept of "buffer to avoid lots of read() > syscalls" and "compression buffer" > but i think a separate class for the Buffer will be better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira