Re: about cpu cores

2022-07-11 Thread Yong Walt
(can be set to >1 >> too, but that's a different story). If there are more tasks ready to >> execute than available cores, some tasks simply wait. >> >> On Sun, Jul 10, 2022 at 3:31 AM Yong Walt wrote: >> >>> given my spark cluster has 128 cores totally.

about cpu cores

2022-07-10 Thread Yong Walt
given my spark cluster has 128 cores totally. If the jobs (each job was assigned only one core) I submitted to the cluster are over 128, what will happen? Thank you.

Re: Will it lead to OOM error?

2022-06-22 Thread Yong Walt
We have many cases like this. it won't cause OOM. Thanks On Wed, Jun 22, 2022 at 8:28 PM Sid wrote: > I have a 150TB CSV file. > > I have a total of 100 TB RAM and 100TB disk. So If I do something like this > > spark.read.option("header","true").csv(filepath).show(false) > > Will it lead to an

Re: Spark Doubts

2022-06-21 Thread Yong Walt
These are the basic concepts in spark :) You may take a bit time to read this small book: https://cloudcache.net/resume/PDDWS2-V2.pdf regards On Wed, Jun 22, 2022 at 3:17 AM Sid wrote: > Hi Team, > > I have a few doubts about the below questions: > > 1) data frame will reside where? memory?

Re: input file size

2022-06-18 Thread Yong Walt
import java.io.Fileval someFile = new File("somefile.txt")val fileSize = someFile.length This one? On Sun, Jun 19, 2022 at 4:33 AM mbreuer wrote: > Hello Community, > > I am working on optimizations for file sizes and number of files. In the > data frame there is a function input_file_name