[ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293852#comment-13293852 ]
Yuki Morishita commented on CASSANDRA-4297: ------------------------------------------- V2 attached based on the review + some test related change. bq. Would prefer to have CDIS just implement IS, and let callers wrap in DIS when desired, similar to how we use SnappyInputStream in IncomingTcpConnection, or LZFInputStream in ISR CDIS now implements InputStream only and renamed to CompressedInputStream. bq. Why the changes to OutboundTcpConnection? The changes are made in order to obtain nio.SocketChannel, socket has to be created using SocketChannel.open. bq. Re MS changes: when would header.file be null? When a node requests range but target node doesn't have corresponding data. I reverted the change in MS to send at least send streaming header when header.file is null. It seems redundant but for now, it's necessary to terminate stream session of requesting node. bq. Chunk[] sort can use Guava Longs.compare done. bq. I suggest adding a comment to explain why sort is necessary (b/c ranges are from replication strategy, so may not be sorted?) Instead of using Set + copy into array, why not use an ArrayList + trimToSize() The reason why I use Set here is to eliminate duplicate chunks. Given two different file section can be mapped to just one chunk. bq. is the FST comment // TODO just use a raw RandomAccessFile since we're managing our own buffer here obsolete? is the CompressedRandomAccessReader path used at all in FST anymore? I removed CRAR from FST in v2. Even if nio is not available (in case of inter-node SSL), streaming uses CompressedFileStreamTask with socket's InputStream to transfer file directly. {quote} Nit: avoid double negation in if statements with else clauses Nit: suggest moving serialization code for Chunk and CompressionParameters into ChunkSerializer and ChunkParametersSerializer classes, respectively, just to make the code discoverable for re-use later {quote} done. bq. Should we make nio transfer the default for uncompressed sstables as well, and add an option to enable compression? Alternatively, now that compression is the default for new sstables, I'd be okay with removing LZF stream compression entirely I don't do any benchmark, but I think always using LZF compression is fine when transferring uncompressed data. bq. Does this over-transfer data on chunk boundaries? Put another way, do we stream data that doesn't actually belong on the target node? (I'm okay with this, just want to be clear about what's happening.) Source node can send unrelated range of data inside chunk, but receiving node ignores (or skips) that part when reading from socket, so, the answer is no. > Use java NIO as much as possible when streaming compressed SSTables > ------------------------------------------------------------------- > > Key: CASSANDRA-4297 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4297 > Project: Cassandra > Issue Type: Improvement > Reporter: Yuki Morishita > Assignee: Yuki Morishita > Priority: Minor > Labels: streaming > Fix For: 1.2 > > Attachments: 4297-v2.txt, 4297.txt > > > Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to > perform zero copy file transfer between nodes. Since 1.0, in order to add new > features like sstable compression and internode encryption we had to switch > to java IO Input/OutputStreams. What we currently do to transfer compressed > SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using > LZF for network, and in destination node, 3) decompress using LZF as reading > from socket, 4) compress for SSTable on disk. > Now, 1.1 comes out with SSTable compression turned on by default. It is > reasonable to transfer compressed file as is using NIO instead of > decompress/compress in source node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira