thanks a lot! ------- Original Message ------- On Saturday, July 2nd, 2022 at 6:07 PM, Sean Owen <sro...@gmail.com> wrote:
> I think that is more accurate yes. Though, shuffle files are local, not on > distributed storage too, which is an advantage. MR also had map only > transforms and chained mappers, but harder to use. Not impossible but you > could also say Spark just made it easier to do the more efficient thing. > > On Sat, Jul 2, 2022, 9:34 AM krexos <kre...@protonmail.com.invalid> wrote: > >> You said Spark performs IO only when reading data and writing final data to >> the disk. I though by that you meant that it only reads the input files of >> the job and writes the output of the whole job to the disk, but in reality >> spark does store intermediate results on disk, just in less places than MR >> >> ------- Original Message ------- >> On Saturday, July 2nd, 2022 at 5:27 PM, Sid <flinkbyhe...@gmail.com> wrote: >> >>> I have explained the same thing in a very layman's terms. Go through it >>> once. >>> >>> On Sat, 2 Jul 2022, 19:45 krexos, <kre...@protonmail.com.invalid> wrote: >>> >>>> I think I understand where Spark saves IO. >>>> >>>> in MR we have map -> reduce -> map -> reduce -> map -> reduce ... >>>> >>>> which writes results do disk at the end of each such "arrow", >>>> >>>> on the other hand in spark we have >>>> >>>> map -> reduce + map -> reduce + map -> reduce ... >>>> >>>> which saves about 2 times the IO >>>> >>>> thanks everyone, >>>> krexos >>>> >>>> ------- Original Message ------- >>>> On Saturday, July 2nd, 2022 at 1:35 PM, krexos <kre...@protonmail.com> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> One of the main "selling points" of Spark is that unlike Hadoop >>>>> map-reduce that persists intermediate results of its computation to HDFS >>>>> (disk), Spark keeps all its results in memory. I don't understand this as >>>>> in reality when a Spark stage finishes[it writes all of the data into >>>>> shuffle files stored on the >>>>> disk](https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/4-shuffleDetails.md). >>>>> How then is this an improvement on map-reduce? >>>>> >>>>> Image from https://youtu.be/7ooZ4S7Ay6Y >>>>> >>>>> thanks!