Re: What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread sam smith
" In this case your program may work because effectively you are not using the spark in yarn on the hadoop cluster " I am actually using Yarn as mentioned (client mode) I already know that, but it is not just about collectAsList, the execution freezes also for example when using save() on the

Re: What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread Mich Talebzadeh
collectAsList brings all the data into the driver which is a single JVM on a single node. In this case your program may work because effectively you are not using the spark in yarn on the hadoop cluster. The benefit of Spark is that you can process a large amount of data using the memory and

Re: What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread sam smith
not sure what you mean by your question, but it is not helping in any case Le sam. 11 mars 2023 à 19:54, Mich Talebzadeh a écrit : > > > ... To note that if I execute collectAsList on the dataset at the > beginning of the program > > What do you think collectAsList does? > > > >view

Re: What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread Mich Talebzadeh
... To note that if I execute collectAsList on the dataset at the beginning of the program What do you think collectAsList does? view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it

What could be the cause of an execution freeze on Hadoop for small datasets?

2023-03-11 Thread sam smith
Hello guys, I am launching through code (client mode) a Spark program to run in Hadoop. If I execute on the dataset methods of the likes of show() and count() or collectAsList() (that are displayed in the Spark UI) after performing heavy transformations on the columns then the mentioned methods