Hi @Mich Talebzadeh , community,
Where can I find such insights on the Spark Architecture ?
I found few sites below which did/does cover internals :
1. https://github.com/JerryLead/SparkInternals
2. https://books.japila.pl/apache-spark-internals/overview/
3.
On Mon, Mar 18, 2024 at 1:16 PM Mich Talebzadeh
wrote:
>
> "I may need something like that for synthetic data for testing. Any way to
> do that ?"
>
> Have a look at this.
>
> https://github.com/joke2k/faker
>
No I was not actually referring to data that can be faked. I want data to
actually
Yes, transformations are indeed executed on the worker nodes, but they are
only performed when necessary, usually when an action is called. This lazy
evaluation helps in optimizing the execution of Spark jobs by allowing
Spark to optimize the execution plan and perform optimizations such as
On Fri, Mar 15, 2024 at 3:10 AM Mich Talebzadeh
wrote:
>
> No Data Transfer During Creation: --> Data transfer occurs only when an
> action is triggered.
> Distributed Processing: --> DataFrames are distributed for parallel
> execution, not stored entirely on the driver node.
> Lazy Evaluation
Hi,
When you create a DataFrame from Python objects using
spark.createDataFrame, here it goes:
*Initial Local Creation:*
The DataFrame is initially created in the memory of the driver node. The
data is not yet distributed to executors at this point.
*The role of lazy Evaluation:*
Spark