[ 
https://issues.apache.org/jira/browse/SPARK-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-7075:
-------------------------------
    Description: 
Based on our observation, majority of Spark workloads are not bottlenecked by 
I/O or network, but rather CPU and memory. This project focuses on 3 areas to 
improve the efficiency of memory and CPU for Spark applications, to push 
performance closer to the limits of the underlying hardware.

1. Memory Management and Binary Processing: leveraging application semantics to 
manage memory explicitly and eliminate the overhead of JVM object model and 
garbage collection
2. Cache-aware computation: algorithms and data structures to exploit memory 
hierarchy
3. Code generation: using code generation to exploit modern compilers and CPUs

Several parts of project Tungsten leverage the DataFrame model, which gives us 
more semantics about the application. We will also retrofit the improvements 
onto Spark’s RDD API whenever possible.


  was:
Based on our observation, majority of Spark workloads are not bottlenecked by 
I/O or network, but rather CPU and memory. This project aims to

1. Much more efficient memory usage & robustness
2. Much more efficient execution

We will start with the DataFrame API, which gives us more application 
semantics, and eventually improve core as well.


> Improving Physical Execution and Memory Management
> --------------------------------------------------
>
>                 Key: SPARK-7075
>                 URL: https://issues.apache.org/jira/browse/SPARK-7075
>             Project: Spark
>          Issue Type: Epic
>          Components: Block Manager, Shuffle, Spark Core, SQL
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>
> Based on our observation, majority of Spark workloads are not bottlenecked by 
> I/O or network, but rather CPU and memory. This project focuses on 3 areas to 
> improve the efficiency of memory and CPU for Spark applications, to push 
> performance closer to the limits of the underlying hardware.
> 1. Memory Management and Binary Processing: leveraging application semantics 
> to manage memory explicitly and eliminate the overhead of JVM object model 
> and garbage collection
> 2. Cache-aware computation: algorithms and data structures to exploit memory 
> hierarchy
> 3. Code generation: using code generation to exploit modern compilers and CPUs
> Several parts of project Tungsten leverage the DataFrame model, which gives 
> us more semantics about the application. We will also retrofit the 
> improvements onto Spark’s RDD API whenever possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to