[jira] [Updated] (SPARK-27036) Even Broadcast thread is timed out, BroadCast Job is not aborted.

2019-03-03 Thread Babulal (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Babulal updated SPARK-27036:

Attachment: image-2019-03-04-00-39-38-779.png

> Even Broadcast thread is timed out, BroadCast Job is not aborted.
> -
>
> Key: SPARK-27036
> URL: https://issues.apache.org/jira/browse/SPARK-27036
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Babulal
>Priority: Minor
> Attachments: image-2019-03-04-00-38-52-401.png, 
> image-2019-03-04-00-39-12-210.png, image-2019-03-04-00-39-38-779.png
>
>
> During broadcast table job is execution if broadcast timeout 
> (spark.sql.broadcastTimeout) happens ,broadcast Job still continue till 
> completion whereas it should abort on broadcast timeout.
> Exception is thrown in console  but Spark Job is still continue.
> !image-2019-03-04-00-31-34-364.png!
>  Spark UI !image-2019-03-04-00-32-22-663.png!
>  wait for some time
> !image-2019-03-04-00-34-47-884.png!
>  
> How to Reproduce Issue
> Option1 using SQL:- 
> create Table t1(Big Table,1M Records)
> val rdd1=spark.sparkContext.parallelize(1 to 100,100).map(x=> 
> ("name_"+x,x%3,x))
> val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
> c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
> c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as 
> c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as 
> c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as 
> c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30")
> df.write.csv("D:/data/par1/t4");
> spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t4')");
> create Table t2(Small Table,100K records)
> val rdd1=spark.sparkContext.parallelize(1 to 10,100).map(x=> 
> ("name_"+x,x%3,x))
> val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
> c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
> c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as 
> c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as 
> c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as 
> c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30")
> df.write.csv("D:/data/par1/t4");
> spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t5')");
> spark.sql("set spark.sql.autoBroadcastJoinThreshold=73400320").show(false)
> spark.sql("set spark.sql.broadcastTimeout=2").show(false)
> Run Below Query 
> spark.sql("create table s using parquet as select t1.* from csv_2 as t1,csv_1 
> as t2 where t1._c3=t2._c3")
> Option 2:- Use External DataSource and Add Delay in the #buildScan. and use 
> datasource for query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27036) Even Broadcast thread is timed out, BroadCast Job is not aborted.

2019-03-03 Thread Babulal (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Babulal updated SPARK-27036:

Description: 
During broadcast table job is execution if broadcast timeout 
(spark.sql.broadcastTimeout) happens ,broadcast Job still continue till 
completion whereas it should abort on broadcast timeout.

Exception is thrown in console  but Spark Job is still continue.

 

!image-2019-03-04-00-39-38-779.png!

!image-2019-03-04-00-39-12-210.png!

 

 wait for some time

!image-2019-03-04-00-38-52-401.png!

!image-2019-03-04-00-34-47-884.png!

 

How to Reproduce Issue

Option1 using SQL:- 
 create Table t1(Big Table,1M Records)
 val rdd1=spark.sparkContext.parallelize(1 to 100,100).map(x=> 
("name_"+x,x%3,x))
 val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as c14","_1 
as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as c20","_1 as 
c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as c26","_1 as 
c27","_1 as c28","_1 as c29","_1 as c30")
 df.write.csv("D:/data/par1/t4");
 spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t4')");

create Table t2(Small Table,100K records)
 val rdd1=spark.sparkContext.parallelize(1 to 10,100).map(x=> 
("name_"+x,x%3,x))
 val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as c14","_1 
as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as c20","_1 as 
c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as c26","_1 as 
c27","_1 as c28","_1 as c29","_1 as c30")
 df.write.csv("D:/data/par1/t4");
 spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t5')");

spark.sql("set spark.sql.autoBroadcastJoinThreshold=73400320").show(false)
 spark.sql("set spark.sql.broadcastTimeout=2").show(false)
 Run Below Query 
 spark.sql("create table s using parquet as select t1.* from csv_2 as t1,csv_1 
as t2 where t1._c3=t2._c3")

Option 2:- Use External DataSource and Add Delay in the #buildScan. and use 
datasource for query.

  was:
During broadcast table job is execution if broadcast timeout 
(spark.sql.broadcastTimeout) happens ,broadcast Job still continue till 
completion whereas it should abort on broadcast timeout.

Exception is thrown in console  but Spark Job is still continue.

!image-2019-03-04-00-31-34-364.png!

 Spark UI !image-2019-03-04-00-32-22-663.png!

 wait for some time

!image-2019-03-04-00-34-47-884.png!

 

How to Reproduce Issue

Option1 using SQL:- 
create Table t1(Big Table,1M Records)
val rdd1=spark.sparkContext.parallelize(1 to 100,100).map(x=> 
("name_"+x,x%3,x))
val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as c1","_1 
as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as c8","_1 as 
c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as c14","_1 as c15","_1 
as c16","_1 as c17","_1 as c18","_1 as c19","_1 as c20","_1 as c21","_1 as 
c22","_1 as c23","_1 as c24","_1 as c25","_1 as c26","_1 as c27","_1 as 
c28","_1 as c29","_1 as c30")
df.write.csv("D:/data/par1/t4");
spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t4')");

create Table t2(Small Table,100K records)
val rdd1=spark.sparkContext.parallelize(1 to 10,100).map(x=> 
("name_"+x,x%3,x))
val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as c1","_1 
as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as c8","_1 as 
c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as c14","_1 as c15","_1 
as c16","_1 as c17","_1 as c18","_1 as c19","_1 as c20","_1 as c21","_1 as 
c22","_1 as c23","_1 as c24","_1 as c25","_1 as c26","_1 as c27","_1 as 
c28","_1 as c29","_1 as c30")
df.write.csv("D:/data/par1/t4");
spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t5')");

spark.sql("set spark.sql.autoBroadcastJoinThreshold=73400320").show(false)
spark.sql("set spark.sql.broadcastTimeout=2").show(false)
Run Below Query 
spark.sql("create table s using parquet as select t1.* from csv_2 as t1,csv_1 
as t2 where t1._c3=t2._c3")

Option 2:- Use External DataSource and Add Delay in the #buildScan. and use 
datasource for query.


> Even Broadcast thread is timed out, BroadCast Job is not aborted.
> -
>
> Key: SPARK-27036
> URL: https://issues.apache.org/jira/browse/SPARK-27036
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Babulal
>Priority: Minor
> Attachments: image-2019-03-04-00-38-52-401.png, 
> 

[jira] [Updated] (SPARK-27036) Even Broadcast thread is timed out, BroadCast Job is not aborted.

2019-03-03 Thread Babulal (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Babulal updated SPARK-27036:

Attachment: image-2019-03-04-00-39-12-210.png

> Even Broadcast thread is timed out, BroadCast Job is not aborted.
> -
>
> Key: SPARK-27036
> URL: https://issues.apache.org/jira/browse/SPARK-27036
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Babulal
>Priority: Minor
> Attachments: image-2019-03-04-00-38-52-401.png, 
> image-2019-03-04-00-39-12-210.png, image-2019-03-04-00-39-38-779.png
>
>
> During broadcast table job is execution if broadcast timeout 
> (spark.sql.broadcastTimeout) happens ,broadcast Job still continue till 
> completion whereas it should abort on broadcast timeout.
> Exception is thrown in console  but Spark Job is still continue.
> !image-2019-03-04-00-31-34-364.png!
>  Spark UI !image-2019-03-04-00-32-22-663.png!
>  wait for some time
> !image-2019-03-04-00-34-47-884.png!
>  
> How to Reproduce Issue
> Option1 using SQL:- 
> create Table t1(Big Table,1M Records)
> val rdd1=spark.sparkContext.parallelize(1 to 100,100).map(x=> 
> ("name_"+x,x%3,x))
> val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
> c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
> c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as 
> c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as 
> c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as 
> c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30")
> df.write.csv("D:/data/par1/t4");
> spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t4')");
> create Table t2(Small Table,100K records)
> val rdd1=spark.sparkContext.parallelize(1 to 10,100).map(x=> 
> ("name_"+x,x%3,x))
> val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
> c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
> c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as 
> c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as 
> c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as 
> c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30")
> df.write.csv("D:/data/par1/t4");
> spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t5')");
> spark.sql("set spark.sql.autoBroadcastJoinThreshold=73400320").show(false)
> spark.sql("set spark.sql.broadcastTimeout=2").show(false)
> Run Below Query 
> spark.sql("create table s using parquet as select t1.* from csv_2 as t1,csv_1 
> as t2 where t1._c3=t2._c3")
> Option 2:- Use External DataSource and Add Delay in the #buildScan. and use 
> datasource for query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27036) Even Broadcast thread is timed out, BroadCast Job is not aborted.

2019-03-03 Thread Babulal (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Babulal updated SPARK-27036:

Attachment: image-2019-03-04-00-38-52-401.png

> Even Broadcast thread is timed out, BroadCast Job is not aborted.
> -
>
> Key: SPARK-27036
> URL: https://issues.apache.org/jira/browse/SPARK-27036
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Babulal
>Priority: Minor
> Attachments: image-2019-03-04-00-38-52-401.png
>
>
> During broadcast table job is execution if broadcast timeout 
> (spark.sql.broadcastTimeout) happens ,broadcast Job still continue till 
> completion whereas it should abort on broadcast timeout.
> Exception is thrown in console  but Spark Job is still continue.
> !image-2019-03-04-00-31-34-364.png!
>  Spark UI !image-2019-03-04-00-32-22-663.png!
>  wait for some time
> !image-2019-03-04-00-34-47-884.png!
>  
> How to Reproduce Issue
> Option1 using SQL:- 
> create Table t1(Big Table,1M Records)
> val rdd1=spark.sparkContext.parallelize(1 to 100,100).map(x=> 
> ("name_"+x,x%3,x))
> val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
> c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
> c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as 
> c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as 
> c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as 
> c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30")
> df.write.csv("D:/data/par1/t4");
> spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t4')");
> create Table t2(Small Table,100K records)
> val rdd1=spark.sparkContext.parallelize(1 to 10,100).map(x=> 
> ("name_"+x,x%3,x))
> val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
> c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
> c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as 
> c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as 
> c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as 
> c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30")
> df.write.csv("D:/data/par1/t4");
> spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t5')");
> spark.sql("set spark.sql.autoBroadcastJoinThreshold=73400320").show(false)
> spark.sql("set spark.sql.broadcastTimeout=2").show(false)
> Run Below Query 
> spark.sql("create table s using parquet as select t1.* from csv_2 as t1,csv_1 
> as t2 where t1._c3=t2._c3")
> Option 2:- Use External DataSource and Add Delay in the #buildScan. and use 
> datasource for query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org