Re: Serialization or internal functions?

2020-04-09 Thread Vadim Semenov
You can take a look at the code that Spark generates: import org.apache.spark.sql.SparkSession import org.apache.spark.sql.execution.debug.codegenString val spark: SparkSession import org.apache.spark.sql.functions._ import spark.implicits._ val data = Seq("A","b","c").toDF("col")

Re: Serialization or internal functions?

2020-04-07 Thread Som Lima
Go to localhost:4040 While sparksession is running. Go to localhost:4040 Select Stages from menu option. Select Job you are interested in. You can select additional metrics Including DAG visualisation. On Tue, 7 Apr 2020, 17:14 yeikel valdes, wrote: > Thanks for your input Soma ,

Re: Serialization or internal functions?

2020-04-07 Thread yeikel valdes
Thanks for your input Soma , but I am actually looking to understand the differences and not only on the performance.  On Sun, 05 Apr 2020 02:21:07 -0400 somplastic...@gmail.com wrote If you want to  measure optimisation in terms of time taken , then here is an idea  :)  

Re: Serialization or internal functions?

2020-04-05 Thread Som Lima
If you want to measure optimisation in terms of time taken , then here is an idea :) public class MyClass { public static void main(String args[]) throws InterruptedException { long start = System.currentTimeMillis(); // replace with your add column code // enough data

Serialization or internal functions?

2020-04-04 Thread email
Dear Community, Recently, I had to solve the following problem "for every entry of a Dataset[String], concat a constant value" , and to solve it, I used built-in functions : val data = Seq("A","b","c").toDS scala> data.withColumn("valueconcat",concat(col(data.columns.head),lit("