You can take a look at the code that Spark generates:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.execution.debug.codegenString
val spark: SparkSession
import org.apache.spark.sql.functions._
import spark.implicits._
val data = Seq("A","b","c").toDF("col")
Go to localhost:4040
While sparksession is running.
Go to localhost:4040
Select Stages from menu option.
Select Job you are interested in.
You can select additional metrics
Including DAG visualisation.
On Tue, 7 Apr 2020, 17:14 yeikel valdes, wrote:
> Thanks for your input Soma ,
Thanks for your input Soma , but I am actually looking to understand the
differences and not only on the performance.
On Sun, 05 Apr 2020 02:21:07 -0400 somplastic...@gmail.com wrote
If you want to measure optimisation in terms of time taken , then here is an
idea :)
If you want to measure optimisation in terms of time taken , then here is
an idea :)
public class MyClass {
public static void main(String args[])
throws InterruptedException
{
long start = System.currentTimeMillis();
// replace with your add column code
// enough data
Dear Community,
Recently, I had to solve the following problem "for every entry of a
Dataset[String], concat a constant value" , and to solve it, I used built-in
functions :
val data = Seq("A","b","c").toDS
scala> data.withColumn("valueconcat",concat(col(data.columns.head),lit("