The High-Performance Big Data (HiBD) team is pleased to announce the release of Hadoop-2.x 0.9.6 package (for Hadoop 2.x series) with the following features.
* RDMA for Apache Hadoop-2.x 0.9.6 Features - Based on Apache Hadoop 2.6.0 - High performance design with native InfiniBand and RoCE support at the verbs level for HDFS, MapReduce, and RPC components - Compliant with Apache Hadoop 2.6.0 APIs and applications - Easily configurable for different running modes (HHH, HHH-M, HHH-L, and MapReduce over Lustre) and different protocols (native InfiniBand, RoCE, and IPoIB) - On-demand connection setup - HDFS over native InfiniBand and RoCE - RDMA-based write - RDMA-based replication - Parallel replication support - Overlapping in different stages of write and replication - Enhanced hybrid HDFS design with in-memory and heterogeneous storage (HHH) - Supports three modes of operations - HHH (default) with I/O operations over RAM disk, SSD, and HDD - HHH-M (in-memory) with I/O operations in-memory - HHH-L (Lustre-integrated) with I/O operations in local storage and Lustre - Policies to efficiently utilize heterogeneous storage devices (RAM Disk, SSD, HDD, and Lustre) - Greedy and Balanced policies support - Automatic policy selection based on available storage types - Hybrid replication (in-memory and persistent storage) for HHH default mode - Memory replication (in-memory only with lazy persistence) for HHH-M mode - Lustre-based fault-tolerance for HHH-L mode - No HDFS replication - Reduced local storage space usage - MapReduce over native InfiniBand and RoCE - RDMA-based shuffle - Pre-fetching and caching of map output - In-memory merge - Advanced optimization in overlapping - map, shuffle, and merge - shuffle, merge, and reduce - Optional disk-assisted shuffle - High performance design of MapReduce over Lustre - Supports two shuffle approaches - Lustre read based shuffle - RDMA based shuffle - Hybrid shuffle based on both shuffle approaches - Configurable distribution support - In-memory merge and overlapping of different phases - RPC over native InfiniBand and RoCE - JVM-bypassed buffer management - RDMA or send/recv based adaptive communication - Intelligent buffer allocation and adjustment for serialization - Tested with - Mellanox InfiniBand adapters (DDR, QDR, and FDR) - RoCE support with Mellanox adapters - Various multi-core platforms - RAM Disks, SSDs, HDDs, and Lustre Bug Fixes (since Apache Hadoop-2.x 0.9.5) - Fix a hang issue in running with WordCount-like benchmarks - Thanks to Amit Sangroya@TCS for reporting the issue - Fix an issue for NameNode running with HA enabled mode - Thanks to Qihu Yang@AsiaInfo for reporting the issue For downloading RDMA for Apache Hadoop-2.x 0.9.6 package and the associated user guide, please visit the following URL: http://hibd.cse.ohio-state.edu Sample performance numbers for benchmarks using RDMA for Apache Hadoop-2.x 0.9.6 version can be viewed by visiting the `Performance' tab of the above website. All questions, feedbacks and bug reports are welcome. Please post it to the rdma-hadoop-discuss mailing list (rdma-hadoop-discuss at cse.ohio-state.edu). Thanks, The High-Performance Big Data (HiBD) Team