The High-Performance Big Data (HiBD) team is pleased to announce the release of Hadoop-2.x 0.9.8 package (for Hadoop 2.x series) with the following features.
New features compared to Hadoop-2.x 0.9.7 are: - Based on Apache Hadoop 2.7.1 - Compliant with Apache Hadoop 2.7.1 and Hortonworks Data Platform (HDP) 2.3.0.0 APIs and applications - Plugin-based architecture supporting RDMA-based designs for HDFS (HHH, HHH-M, HHH-L), MapReduce, MapReduce over Lustre and RPC, etc. - Plugin for Apache Hadoop distribution (tested with 2.7.1) - Plugin for Hortonworks Data Platform (HDP) (tested with 2.3.0.0) The complete set of features for RDMA Apache Hadoop-2.x 0.9.8: - Based on Apache Hadoop 2.7.1 - High performance design with native InfiniBand and RoCE support at the verbs level for HDFS, MapReduce, and RPC components - Compliant with Apache Hadoop 2.7.1 and Hortonworks Data Platform (HDP) 2.3.0.0 APIs and applications - Plugin-based architecture supporting RDMA-based designs for HDFS (HHH, HHH-M, HHH-L), MapReduce, MapReduce over Lustre and RPC, etc. - Plugin for Apache Hadoop distribution (tested with 2.7.1) - Plugin for Hortonworks Data Platform (HDP) (tested with 2.3.0.0) - Supports deploying Hadoop with Slurm and PBS in different running modes (HHH, HHH-M, HHH-L, and MapReduce over Lustre) - Easily configurable for different running modes (HHH, HHH-M, HHH-L, and MapReduce over Lustre) and different protocols (native InfiniBand, RoCE, and IPoIB) - On-demand connection setup - HDFS over native InfiniBand and RoCE - RDMA-based write - RDMA-based replication - Parallel replication support - Overlapping in different stages of write and replication - Enhanced hybrid HDFS design with in-memory and heterogeneous storage (HHH) - Supports three modes of operations - HHH (default) with I/O operations over RAM disk, SSD, and HDD - HHH-M (in-memory) with I/O operations in-memory - HHH-L (Lustre-integrated) with I/O operations in local storage and Lustre - Policies to efficiently utilize heterogeneous storage devices (RAM Disk, SSD, HDD, and Lustre) - Greedy and Balanced policies support - Automatic policy selection based on available storage types - Hybrid replication (in-memory and persistent storage) for HHH default mode - Memory replication (in-memory only with lazy persistence) for HHH-M mode - Lustre-based fault-tolerance for HHH-L mode - No HDFS replication - Reduced local storage space usage - MapReduce over native InfiniBand and RoCE - RDMA-based shuffle - Pre-fetching and caching of map output - In-memory merge - Advanced optimization in overlapping - map, shuffle, and merge - shuffle, merge, and reduce - Optional disk-assisted shuffle - High performance design of MapReduce over Lustre - Supports two shuffle approaches - Lustre read based shuffle - RDMA based shuffle - Hybrid shuffle based on both shuffle approaches - Configurable distribution support - In-memory merge and overlapping of different phases - RPC over native InfiniBand and RoCE - JVM-bypassed buffer management - RDMA or send/recv based adaptive communication - Intelligent buffer allocation and adjustment for serialization - Tested with - Mellanox InfiniBand adapters (DDR, QDR, and FDR) - RoCE support with Mellanox adapters - Various multi-core platforms - RAM Disks, SSDs, HDDs, and Lustre For downloading RDMA for Apache Hadoop-2.x 0.9.8 package and the associated user guide, please visit the following URL: http://hibd.cse.ohio-state.edu Sample performance numbers for benchmarks using RDMA for Apache Hadoop-2.x 0.9.8 version can be viewed by visiting the `Performance' tab of the above website. All questions, feedback and bug reports are welcome. Please post it to the rdma-hadoop-discuss mailing list (rdma-hadoop-discuss at cse.ohio-state.edu). Thanks, The High-Performance Big Data (HiBD) Team