Re: Spark Doubts

Apostolos N. Papadopoulos Tue, 21 Jun 2022 12:33:55 -0700

Dear Sid.

You are asking questions for which answers exist in the Apache Sparkwebsite or in books or in MOOCS or in other URLs.

For example, take a look at this one:https://sparkbyexamples.com/spark/spark-dataframe-cache-and-persist-explained/

<https://sparkbyexamples.com/spark/spark-dataframe-cache-and-persist-explained/>

https://spark.apache.org/docs/latest/sql-programming-guide.html

What do you mean by question 2?

About question 3, it depends on how you load the file. For example, ifyou have a text file in HDFS and you want to

use an RDD, initially, the number of partitions equals the number ofHDFS blocks, unless you specify the number of


partitions when you create the RDD from the file.

I would suggest first to go through a book devoted to Spark, like TheDefinitive Guide, or any other similar resource.

Also, I would suggest to take a MOOC on Spark (e.g., in Coursera, edX,etc).


All the best,

Apostolos


On 21/6/22 22:16, Sid wrote:

Hi Team,

I have a few doubts about the below questions:
1) data frame will reside where? memory? disk? memory allocation aboutdata frame?
2) How do you configure each partition?
3) Is there any way to calculate the exact partitions needed to load aspecific file?
Thanks,
Sid


--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papad...@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark Doubts

Reply via email to