Dear Sid.
You are asking questions for which answers exist in the Apache Spark
website or in books or in MOOCS or in other URLs.
For example, take a look at this one:
https://sparkbyexamples.com/spark/spark-dataframe-cache-and-persist-explained/
<https://sparkbyexamples.com/spark/spark-dataframe-cache-and-persist-explained/>
https://spark.apache.org/docs/latest/sql-programming-guide.html
What do you mean by question 2?
About question 3, it depends on how you load the file. For example, if
you have a text file in HDFS and you want to
use an RDD, initially, the number of partitions equals the number of
HDFS blocks, unless you specify the number of
partitions when you create the RDD from the file.
I would suggest first to go through a book devoted to Spark, like The
Definitive Guide, or any other similar resource.
Also, I would suggest to take a MOOC on Spark (e.g., in Coursera, edX,
etc).
All the best,
Apostolos
On 21/6/22 22:16, Sid wrote:
Hi Team,
I have a few doubts about the below questions:
1) data frame will reside where? memory? disk? memory allocation about
data frame?
2) How do you configure each partition?
3) Is there any way to calculate the exact partitions needed to load a
specific file?
Thanks,
Sid
--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papad...@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org