Re: Spark shuffle and inevitability of writing to Disk

2023-05-17 Thread Mich Talebzadeh
Ok, I did a bit of a test that shows that the shuffle does spill to memory then to disk if my assertion is valid. The sample code I wrote is as follows: import sys from pyspark.sql import SparkSession from pyspark import SparkContext from pyspark.sql import SQLContext from pyspark.sql import

Spark shuffle and inevitability of writing to Disk

2023-05-16 Thread Mich Talebzadeh
Hi, On the issue of Spark shuffle it is accepted that shuffle *often involves* the following if not all below: - Disk I/O - Data serialization and deserialization - Network I/O Excluding external shuffle service and without relying on the configuration options provided by spark for