spark streaming disk hit

2015-07-21 Thread Abhishek R. Singh
Is it fair to say that Storm stream processing is completely in memory, whereas spark streaming would take a disk hit because of how shuffle works? Does spark streaming try to avoid disk usage out of the box? -Abhishek- - To

Re: spark streaming disk hit

2015-07-21 Thread Abhishek R. Singh
Thanks TD - appreciate the response ! On Jul 21, 2015, at 1:54 PM, Tathagata Das t...@databricks.com wrote: Most shuffle files are really kept around in the OS's buffer/disk cache, so it is still pretty much in memory. If you are concerned about performance, you have to do a holistic

Re: spark streaming disk hit

2015-07-21 Thread Tathagata Das
Most shuffle files are really kept around in the OS's buffer/disk cache, so it is still pretty much in memory. If you are concerned about performance, you have to do a holistic comparison for end-to-end performance. You could take a look at this.