[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated HDFS-4953: ------------------------------ Attachment: benchmark.png Attached a benchmark plot I did in support of this work. The benchmark times processing a 1GB file which fits in buffer cache. The "processing" here is summing up the entire file as if it's a bunch of integers end-to-end (written using SSE, etc to be as efficient as possible). The various items plotted here are: - h: the current libhdfs code, with SCR, but without ZCR (averages around 3G/sec or so) - m: a C program which mallocs 1GB, reads the data into that buffer, and then runs the analysis on the malloced buffer. This is the upper bound performance. Gets about 8GB/sec - mmap-each: C program which opens a local file, and on each iteration of processing, calls "mmap" and then processes it. Gets about 3G/sec. "perf top" indicates that this is slow because of page table entry population overhead (minor page faults) - mmap-populate-each: the same, but with the MAP_POPULATE flag. Gets around 4500M/sec. This is faster because it pre-populates the page table entries. - mmap-once: the same, but only mmaps once, and doesn't count the mmap time. Gets around the same speed as the "malloc" path. - z: the ZCR implementation _without_ the mmap caching. Gets the same as mmap-each, more or less, because of the same PTE faulting overhead. These graphs show why we have to have the mmap cache -- and indicate that, with that cache, we should be in the same ballpark as the optimal (~9GB/sec/core). > enable HDFS local reads via mmap > -------------------------------- > > Key: HDFS-4953 > URL: https://issues.apache.org/jira/browse/HDFS-4953 > Project: Hadoop HDFS > Issue Type: New Feature > Affects Versions: 2.2.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch > > > Currently, the short-circuit local read pathway allows HDFS clients to access > files directly without going through the DataNode. However, all of these > reads involve a copy at the operating system level, since they rely on the > read() / pread() / etc family of kernel interfaces. > We would like to enable HDFS to read local files via mmap. This would enable > truly zero-copy reads. > In the initial implementation, zero-copy reads will only be performed when > checksums were disabled. Later, we can use the DataNode's cache awareness to > only perform zero-copy reads when we know that checksum has already been > verified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira