[ 
https://issues.apache.org/jira/browse/SPARK-27036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Babulal updated SPARK-27036:
----------------------------
    Attachment: image-2019-03-04-00-39-12-210.png

> Even Broadcast thread is timed out, BroadCast Job is not aborted.
> -----------------------------------------------------------------
>
>                 Key: SPARK-27036
>                 URL: https://issues.apache.org/jira/browse/SPARK-27036
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: Babulal
>            Priority: Minor
>         Attachments: image-2019-03-04-00-38-52-401.png, 
> image-2019-03-04-00-39-12-210.png, image-2019-03-04-00-39-38-779.png
>
>
> During broadcast table job is execution if broadcast timeout 
> (spark.sql.broadcastTimeout) happens ,broadcast Job still continue till 
> completion whereas it should abort on broadcast timeout.
> Exception is thrown in console  but Spark Job is still continue.
> !image-2019-03-04-00-31-34-364.png!
>  Spark UI !image-2019-03-04-00-32-22-663.png!
>  wait for some time
> !image-2019-03-04-00-34-47-884.png!
>  
> How to Reproduce Issue
> Option1 using SQL:- 
> create Table t1(Big Table,1M Records)
> val rdd1=spark.sparkContext.parallelize(1 to 1000000,100).map(x=> 
> ("name_"+x,x%3,x))
> val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
> c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
> c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as 
> c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as 
> c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as 
> c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30")
> df.write.csv("D:/data/par1/t4");
> spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t4')");
> create Table t2(Small Table,100K records)
> val rdd1=spark.sparkContext.parallelize(1 to 100000,100).map(x=> 
> ("name_"+x,x%3,x))
> val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
> c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
> c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as 
> c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as 
> c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as 
> c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30")
> df.write.csv("D:/data/par1/t4");
> spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t5')");
> spark.sql("set spark.sql.autoBroadcastJoinThreshold=73400320").show(false)
> spark.sql("set spark.sql.broadcastTimeout=2").show(false)
> Run Below Query 
> spark.sql("create table s using parquet as select t1.* from csv_2 as t1,csv_1 
> as t2 where t1._c3=t2._c3")
> Option 2:- Use External DataSource and Add Delay in the #buildScan. and use 
> datasource for query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to