[ https://issues.apache.org/jira/browse/HADOOP-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258063#comment-13258063 ]
Tim Broberg commented on HADOOP-8148: ------------------------------------- bq. Sorry for my ignorance in this area, but: this implies that hardware codecs are pipelined? In the LineReader use case, you're saying you would provide a much larger block to the codec ahead of what is being read out of the decompression side? Yes, command/result handling, DMA transfer, and processing happen in parallel, which is crucial for small-packet performance. Given what I expect will be 32kB - 128kB blocks here, this isn't as huge an issue, but it's still non-trivial. The important pipelining, will be performing stream io in parallel with (de)compression. In addition to pipelining, there is parallelism. Non-low-end processors are multicore such that we will want to read several blocks ahead and process them in parallel. (One could also perform multithreaded software (de)compression in much the same fashion, but I have no plans to implement it.) So, yes, the HW version of the CompressionInputStream would have a separate thread which preemptively reads records in and drops them in an input queue, plugging when that queue is full. HW processes blocks from the input queue dumping results in an output queue which feeds the read() call. So, the HW is dumping stuff into buffers before the read() gets a chance to provide one. > Zero-copy ByteBuffer-based compressor / decompressor API > -------------------------------------------------------- > > Key: HADOOP-8148 > URL: https://issues.apache.org/jira/browse/HADOOP-8148 > Project: Hadoop Common > Issue Type: New Feature > Components: io > Reporter: Tim Broberg > Assignee: Tim Broberg > Attachments: hadoop8148.patch > > > Per Todd Lipcon's comment in HDFS-2834, " > Whenever a native decompression codec is being used, ... we generally have > the following copies: > 1) Socket -> DirectByteBuffer (in SocketChannel implementation) > 2) DirectByteBuffer -> byte[] (in SocketInputStream) > 3) byte[] -> Native buffer (set up for decompression) > 4*) decompression to a different native buffer (not really a copy - > decompression necessarily rewrites) > 5) native buffer -> byte[] > with the proposed improvement we can hopefully eliminate #2,#3 for all > applications, and #2,#3,and #5 for libhdfs. > " > The interfaces in the attached patch attempt to address: > A - Compression and decompression based on ByteBuffers (HDFS-2834) > B - Zero-copy compression and decompression (HDFS-3051) > C - Provide the caller a way to know how the max space required to hold > compressed output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira