[ https://issues.apache.org/jira/browse/SPARK-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520499#comment-14520499 ]
Reynold Xin commented on SPARK-7075: ------------------------------------ Yup I will post more thoughts and plans in the next few days. > Project Tungsten: Improving Physical Execution and Memory Management > -------------------------------------------------------------------- > > Key: SPARK-7075 > URL: https://issues.apache.org/jira/browse/SPARK-7075 > Project: Spark > Issue Type: Epic > Components: Block Manager, Shuffle, Spark Core, SQL > Reporter: Reynold Xin > Assignee: Reynold Xin > > Based on our observation, majority of Spark workloads are not bottlenecked by > I/O or network, but rather CPU and memory. This project focuses on 3 areas to > improve the efficiency of memory and CPU for Spark applications, to push > performance closer to the limits of the underlying hardware. > 1. Memory Management and Binary Processing: leveraging application semantics > to manage memory explicitly and eliminate the overhead of JVM object model > and garbage collection > 2. Cache-aware computation: algorithms and data structures to exploit memory > hierarchy > 3. Code generation: using code generation to exploit modern compilers and CPUs > Several parts of project Tungsten leverage the DataFrame model, which gives > us more semantics about the application. We will also retrofit the > improvements onto Spark’s RDD API whenever possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org