[jira] [Created] (SPARK-27036) Even Broadcast thread is timed out, BroadCast Job is not aborted.

Babulal (JIRA) Sun, 03 Mar 2019 11:09:46 -0800

Babulal created SPARK-27036:
-------------------------------

             Summary: Even Broadcast thread is timed out, BroadCast Job is not 
aborted.
                 Key: SPARK-27036
                 URL: https://issues.apache.org/jira/browse/SPARK-27036
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.2
            Reporter: Babulal
         Attachments: image-2019-03-04-00-38-52-401.png


During broadcast table job is execution if broadcast timeout 
(spark.sql.broadcastTimeout) happens ,broadcast Job still continue till 
completion whereas it should abort on broadcast timeout.

Exception is thrown in console  but Spark Job is still continue.

!image-2019-03-04-00-31-34-364.png!

 Spark UI !image-2019-03-04-00-32-22-663.png!

 wait for some time

!image-2019-03-04-00-34-47-884.png!

 

How to Reproduce Issue

Option1 using SQL:- 
create Table t1(Big Table,1M Records)
val rdd1=spark.sparkContext.parallelize(1 to 1000000,100).map(x=> 
("name_"+x,x%3,x))
val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as c1","_1 
as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as c8","_1 as 
c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as c14","_1 as c15","_1 
as c16","_1 as c17","_1 as c18","_1 as c19","_1 as c20","_1 as c21","_1 as 
c22","_1 as c23","_1 as c24","_1 as c25","_1 as c26","_1 as c27","_1 as 
c28","_1 as c29","_1 as c30")
df.write.csv("D:/data/par1/t4");
spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t4')");

create Table t2(Small Table,100K records)
val rdd1=spark.sparkContext.parallelize(1 to 100000,100).map(x=> 
("name_"+x,x%3,x))
val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as c1","_1 
as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as c8","_1 as 
c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as c14","_1 as c15","_1 
as c16","_1 as c17","_1 as c18","_1 as c19","_1 as c20","_1 as c21","_1 as 
c22","_1 as c23","_1 as c24","_1 as c25","_1 as c26","_1 as c27","_1 as 
c28","_1 as c29","_1 as c30")
df.write.csv("D:/data/par1/t4");
spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t5')");

spark.sql("set spark.sql.autoBroadcastJoinThreshold=73400320").show(false)
spark.sql("set spark.sql.broadcastTimeout=2").show(false)
Run Below Query 
spark.sql("create table s using parquet as select t1.* from csv_2 as t1,csv_1 
as t2 where t1._c3=t2._c3")

Option 2:- Use External DataSource and Add Delay in the #buildScan. and use 
datasource for query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27036) Even Broadcast thread is timed out, BroadCast Job is not aborted.

Reply via email to