Dear guys,
We're performing some tests to evaluate the behavior of transformations and
actions in Spark with Spark SQL. In our tests, first we conceive a simple
dataflow with 2 transformations and 1 action:
LOAD (result: df_1) > SELECT ALL FROM df_1 (result: df_2) > COUNT(df_2)
The execution
Dear all,
We're performing some tests with cache and persist in datasets. In RDD, we
know that the transformations are lazy, being executed only when an action
occurs. So, for example, we put a .cache() in a RDD after an action, which
in turn is executed as the last operations of a sequence of
Dear guys,
I'm investigating the differences between RDDs and Dataframes/Datasets. I
couldn't find the answer for this question: Dataframes acts as a new layer
in the Spark stack? I mean, in the execution there is a conversion to RDD?
For example, if I create a Dataframe and perform a query, in