[ https://issues.apache.org/jira/browse/SPARK-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wang, Gang closed SPARK-13004. ------------------------------ Resolution: Later Preparing actionable items so close it temporarily as Sean Owen suggested. > Support Non-Volatile Data and Operations > ---------------------------------------- > > Key: SPARK-13004 > URL: https://issues.apache.org/jira/browse/SPARK-13004 > Project: Spark > Issue Type: Epic > Components: Input/Output, Spark Core > Affects Versions: 1.5.0, 1.6.0 > Reporter: Wang, Gang > Labels: Non-VolatileRDD, Non-volatileComputing, RDD, performance > > Based on our experiments, the SerDe-like operations have some significant > negative performance impacts on majority of industrial Spark workloads, > especially, when the volumn of datasets are much larger than the system > memory volumns of Spark cluster available to caching, checkpoint, > shuffling/dispatching, data loading and Storing. the JVM on-heap management > would downgrade the performance as well when under pressure incurred by large > memory demand and frequently memory allocation/free operations. > With the trend of adopting advanced server platform technologies e.g. Large > Memory Server, Non-volatile Memory and NVMe/Fast SSD Array Storage, This > project focuses on adopting new features provided by server platform for > Spark applications and retrofitting the utilization of hybrid addressable > memory resources onto Spark whenever possible. > *Data Object Managment* > * Using our non-volatile generic object programming model (NVGOP) to avoid > SerDe as well as reduce GC overhead. > * Minimizing memory footprint to load data lazily. > * Being naturally fit for RDD schemas in non-volatile RDD and off-heap RDD. > * Using non-volatile/off-heap RDDs to transform Spark datasets. > * Avoiding the memory caching part by the way of in-place non-volatile RDD > operations. > * Avoiding the checkpoints for Spark computing. > *Data Memory Management* > > * Managing hereogeneous memory devices as an unified hybrid memory cache > pool for Spark. > * Using non-volatile memory-like devices for Spark checkpoint and shuffle. > * Supporting to Reclaim allocated memory blocks automatically. > * Providing an unified memory block APIs for the general purpose of memory > usage. > > *Computing device management* > * AVX instructions, programmable FPGA and GPU. > > Our customized Spark prototype has shown some potential improvements. > [https://github.com/NonVolatileComputing/spark/tree/NonVolatileRDD] > !http://bigdata-memory.github.io/images/Spark_mlib_kmeans.png|width=300! > !http://bigdata-memory.github.io/images/total_GC_STW_pausetime.png|width=300! > > This epic tries to further improve the Spark performance with our > non-volatile solutions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org