About transformations

2016-12-09 Thread brccosta
Dear guys, We're performing some tests to evaluate the behavior of transformations and actions in Spark with Spark SQL. In our tests, first we conceive a simple dataflow with 2 transformations and 1 action: LOAD (result: df_1) > SELECT ALL FROM df_1 (result: df_2) > COUNT(df_2) The execution

Spark SQL - Actions and Transformations

2016-09-13 Thread brccosta
Dear all, We're performing some tests with cache and persist in datasets. In RDD, we know that the transformations are lazy, being executed only when an action occurs. So, for example, we put a .cache() in a RDD after an action, which in turn is executed as the last operations of a sequence of

RDD and Dataframes

2016-07-07 Thread brccosta
Dear guys, I'm investigating the differences between RDDs and Dataframes/Datasets. I couldn't find the answer for this question: Dataframes acts as a new layer in the Spark stack? I mean, in the execution there is a conversion to RDD? For example, if I create a Dataframe and perform a query, in