[jira] [Commented] (HADOOP-8148) Zero-copy ByteBuffer-based compressor / decompressor API

Tim Broberg (Commented) (JIRA) Fri, 20 Apr 2012 00:19:05 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258063#comment-13258063
 ]


Tim Broberg commented on HADOOP-8148:
-------------------------------------

bq. Sorry for my ignorance in this area, but: this implies that hardware codecs 
are pipelined? In the LineReader use case, you're saying you would provide a 
much larger block to the codec ahead of what is being read out of the 
decompression side?

Yes, command/result handling, DMA transfer, and processing happen in parallel, 
which is crucial for small-packet performance. Given what I expect will be 32kB 
- 128kB blocks here, this isn't as huge an issue, but it's still non-trivial. 
The important pipelining, will be performing stream io in parallel with 
(de)compression.

In addition to pipelining, there is parallelism. Non-low-end processors are 
multicore such that we will want to read several blocks ahead and process them 
in parallel. (One could also perform multithreaded software (de)compression in 
much the same fashion, but I have no plans to implement it.)

So, yes, the HW version of the CompressionInputStream would have a separate 
thread which preemptively reads records in and drops them in an input queue, 
plugging when that queue is full. HW processes blocks from the input queue 
dumping results in an output queue which feeds the read() call.

So, the HW is dumping stuff into buffers before the read() gets a chance to 
provide one.
                
> Zero-copy ByteBuffer-based compressor / decompressor API
> --------------------------------------------------------
>
>                 Key: HADOOP-8148
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8148
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io
>            Reporter: Tim Broberg
>            Assignee: Tim Broberg
>         Attachments: hadoop8148.patch
>
>
> Per Todd Lipcon's comment in HDFS-2834, "
>   Whenever a native decompression codec is being used, ... we generally have 
> the following copies:
>   1) Socket -> DirectByteBuffer (in SocketChannel implementation)
>   2) DirectByteBuffer -> byte[] (in SocketInputStream)
>   3) byte[] -> Native buffer (set up for decompression)
>   4*) decompression to a different native buffer (not really a copy - 
> decompression necessarily rewrites)
>   5) native buffer -> byte[]
>   with the proposed improvement we can hopefully eliminate #2,#3 for all 
> applications, and #2,#3,and #5 for libhdfs.
> "
> The interfaces in the attached patch attempt to address:
>  A - Compression and decompression based on ByteBuffers (HDFS-2834)
>  B - Zero-copy compression and decompression (HDFS-3051)
>  C - Provide the caller a way to know how the max space required to hold 
> compressed output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8148) Zero-copy ByteBuffer-based compressor / decompressor API

Reply via email to