Github user staple commented on the pull request:

    https://github.com/apache/spark/pull/2362#issuecomment-55681425
  
    @mengxr I ran for 100 iterations. Loaded data from disk using python's 
SparkContext.pickleFile() (disk is ssd). I did not do any manual caching. For 
more details, you can also see the test script I included in my description 
above.
    
    I also saved the logs from my test runs if those are helpful to see. During 
the 10m record run I saw many log messages about 'CacheManager: Not enough 
space to cache partition' which I interpreted as indicating lack of caching due 
to memory exhaustion. But I haven't diagnosed the slowdown beyond that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to