[GitHub] incubator-spark pull request: Adding an option to persist Spark RD...

alig Thu, 13 Feb 2014 15:24:16 -0800

Github user alig commented on the pull request:

    https://github.com/apache/incubator-spark/pull/468#issuecomment-35038876
  
    @RongGu @haoyuan Please also add a new file called ```tachyon.md``` in docs 
similar to this:
    https://github.com/apache/incubator-spark/blob/master/docs/ec2-scripts.md
    
    That file will just describe that Spark has a block manager inside the 
Executors that let you chose memory, disk, or Tachyon. The latter is for 
storing RDDs off-heap outside the Executor JVM on top of the memory management 
system Tachyon. This has the advantage that: a) executor crash won't lose the 
data cached b) executors can have smaller memory footprint, allowing you to run 
more executors on the same machine as the bulk of the memory will be inside 
Tachyon c) There won't be GC overheads with data stored in the cache.
    
    You can link to the Tachyon homepage for the installation, but please 
describe in this document how to configure the block manager to use Tachyon. 
    
    Many thanks!

[GitHub] incubator-spark pull request: Adding an option to persist Spark RD...

Reply via email to