Hi ,
While running the following spark code in the cluster with following configuration it is spread into 3 job Id's CLUSTER CONFIGURATION 3 NODE CLUSTER NODE 1 - 64GB 16CORES NODE 2 - 64GB 16CORES NODE 3 - 64GB 16CORES At Job Id 2 job is stuck at the stage 51 of 254 and then it starts utilising the disk space I am not sure why is this happening and my work is completely ruined . could someone help me on this I have attached screen shot of spark stages which are stuck for reference Please let me know for more questions with the setup and code Thanks code: def main(args: Array[String]) { Logger.getLogger("org").setLevel(Level.ERROR) val ss = SparkSession .builder .appName("join_association").master("local[*]") .getOrCreate() import ss.implicits._ val dframe = ss.read.option("inferSchema", value=true).option("delimiter", ",").csv("in/matrimony.txt") dframe.show() dframe.printSchema() //left_frame val dfLeft = dframe.withColumnRenamed("_c1", "left_data") val dfRight = dframe.withColumnRenamed("_c1", "right_data") //Join val joined = dfLeft.join(dfRight , dfLeft.col("_c0") === dfRight.col("_c0") ).filter(col("left_data") !== col("right_data") ) joined.show() val result = joined.select(col("left_data"), col("right_data") as "similar_ids" ) result.write.csv("/output") ss.stop() } -- REGARDS BALAKUMAR SEETHARAMAN
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org