[ 
https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293852#comment-13293852
 ] 

Yuki Morishita commented on CASSANDRA-4297:
-------------------------------------------


V2 attached based on the review + some test related change.

bq. Would prefer to have CDIS just implement IS, and let callers wrap in DIS 
when desired, similar to how we use SnappyInputStream in IncomingTcpConnection, 
or LZFInputStream in ISR

CDIS now implements InputStream only and renamed to CompressedInputStream.

bq. Why the changes to OutboundTcpConnection?

The changes are made in order to obtain nio.SocketChannel, socket has to be 
created using SocketChannel.open.

bq. Re MS changes: when would header.file be null?

When a node requests range but target node doesn't have corresponding data. I 
reverted the change in MS to send at least send streaming header when 
header.file is null. It seems redundant but for now, it's necessary to 
terminate stream session of requesting node.

bq. Chunk[] sort can use Guava Longs.compare

done.

bq. I suggest adding a comment to explain why sort is necessary (b/c ranges are 
from replication strategy, so may not be sorted?) Instead of using Set + copy 
into array, why not use an ArrayList + trimToSize()

The reason why I use Set here is to eliminate duplicate chunks. Given two 
different file section can be mapped to just one chunk.

bq. is the FST comment // TODO just use a raw RandomAccessFile since we're 
managing our own buffer here obsolete? is the CompressedRandomAccessReader path 
used at all in FST anymore?

I removed CRAR from FST in v2. Even if nio is not available (in case of 
inter-node SSL), streaming uses CompressedFileStreamTask with socket's 
InputStream to transfer file directly.

{quote}
Nit: avoid double negation in if statements with else clauses
Nit: suggest moving serialization code for Chunk and CompressionParameters into 
ChunkSerializer and ChunkParametersSerializer classes, respectively, just to 
make the code discoverable for re-use later
{quote}

done.

bq. Should we make nio transfer the default for uncompressed sstables as well, 
and add an option to enable compression? Alternatively, now that compression is 
the default for new sstables, I'd be okay with removing LZF stream compression 
entirely

I don't do any benchmark, but I think always using LZF compression is fine when 
transferring uncompressed data.

bq. Does this over-transfer data on chunk boundaries? Put another way, do we 
stream data that doesn't actually belong on the target node? (I'm okay with 
this, just want to be clear about what's happening.)

Source node can send unrelated range of data inside chunk, but receiving node 
ignores (or skips) that part when reading from socket, so, the answer is no.

                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>         Attachments: 4297-v2.txt, 4297.txt
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to 
> perform zero copy file transfer between nodes. Since 1.0, in order to add new 
> features like sstable compression and internode encryption we had to switch 
> to java IO Input/OutputStreams. What we currently do to transfer compressed 
> SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using 
> LZF for network, and in destination node, 3) decompress using LZF as reading 
> from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is 
> reasonable to transfer compressed file as is using NIO instead of 
> decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to