Dear Sid.

You are asking questions for which answers exist in the Apache Spark website or in books or in MOOCS or in other URLs.

For example, take a look at this one: https://sparkbyexamples.com/spark/spark-dataframe-cache-and-persist-explained/
<https://sparkbyexamples.com/spark/spark-dataframe-cache-and-persist-explained/>

https://spark.apache.org/docs/latest/sql-programming-guide.html

What do you mean by question 2?

About question 3, it depends on how you load the file. For example, if you have a text file in HDFS and you want to

use an RDD, initially, the number of partitions equals the number of HDFS blocks, unless you specify the number of

partitions when you create the RDD from the file.

I would suggest first to go through a book devoted to Spark, like The Definitive Guide, or any other similar resource.

Also, I would suggest to take a MOOC on Spark (e.g., in Coursera, edX, etc).

All the best,

Apostolos


On 21/6/22 22:16, Sid wrote:
Hi Team,

I have a few doubts about the below questions:

1) data frame will reside where? memory? disk? memory allocation about data frame?
2) How do you configure each partition?
3) Is there any way to calculate the exact partitions needed to load a specific file?

Thanks,
Sid

--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papad...@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to