Hello,

I am writing the one spark application, it runs well but takes long
execution time can anyone help me to optimize my query to increase the
processing speed.


I am writing one application in which i have to construct the histogram and
compare the histograms in order to find the final candidate.


My code in which i read the text file and matches the first field and
subtract the second fild from the matched candidates and update the table.

Here is my code, Please help me to optimize it.


val sqlContext = new org.apache.spark.sql.SQLContext(sc)


import sqlContext.implicits._


val Array_Ele =
sc.textFile("/root/Desktop/database_200/patch_time_All_20_modified_1.txt").flatMap(line=>line.split("
")).take(900)


val df1=
sqlContext.read.parquet("hdfs://hadoopm0:8020/tmp/input1/database_modified_No_name_400.parquet")


var k = df1.filter(df1("Address").equalTo(Array_Ele(0) ))

var a= 0


for( a <-2 until 900 by 2){

k=k.unionAll(
df1.filter(df1("Address").equalTo(Array_Ele(a))).select(df1("Address"),df1("Couple_time")-Array_Ele(a+1),df1("WT_ID")))}


k.cache()


val WT_ID_Sort  = k.groupBy("WT_ID").count().sort(desc("count"))


val temp = WT_ID_Sort.select("WT_ID").rdd.map(r=>r(0)).take(10)


val Table0=
k.filter(k("WT_ID").equalTo(temp(0))).groupBy("Couple_time").count().select(max($"count")).show()

val Table1=
k.filter(k("WT_ID").equalTo(temp(1))).groupBy("Couple_time").count().select(max($"count")).show()

val Table2=
k.filter(k("WT_ID").equalTo(temp(2))).groupBy("Couple_time").count().select(max($"count")).show()

val Table3=
k.filter(k("WT_ID").equalTo(temp(3))).groupBy("Couple_time").count().select(max($"count")).show()

val Table4=
k.filter(k("WT_ID").equalTo(temp(4))).groupBy("Couple_time").count().select(max($"count")).show()

val Table5=
k.filter(k("WT_ID").equalTo(temp(5))).groupBy("Couple_time").count().select(max($"count")).show()

val Table6=
k.filter(k("WT_ID").equalTo(temp(6))).groupBy("Couple_time").count().select(max($"count")).show()

val Table7=
k.filter(k("WT_ID").equalTo(temp(7))).groupBy("Couple_time").count().select(max($"count")).show()

val Table8=
k.filter(k("WT_ID").equalTo(temp(8))).groupBy("Couple_time").count().select(max($"count")).show()



val Table10=
k.filter(k("WT_ID").equalTo(temp(10))).groupBy("Couple_time").count().select(max($"count")).show()


val Table11=
k.filter(k("WT_ID").equalTo(temp(11))).groupBy("Couple_time").count().select(max($"count")).show()


and last one how can i compare the all this tables to find the maximum
value.




Thanks,

Reply via email to