[jira] [Commented] (SPARK-30316) data size boom after shuffle writing dataframe save as parquet

2019-12-21 Thread Terry Kim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001833#comment-17001833
 ] 

Terry Kim commented on SPARK-30316:
---

This is a possible scenario because when you repartition/shuffle the data, the 
values you are storing could be reordered such that the compression ratio could 
become worse, for example.  

> data size boom after shuffle writing dataframe save as parquet
> --
>
> Key: SPARK-30316
> URL: https://issues.apache.org/jira/browse/SPARK-30316
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, SQL
>Affects Versions: 2.4.4
>Reporter: Cesc 
>Priority: Blocker
>
> When I read a same parquet file and then save it in two ways, with shuffle 
> and without shuffle, I found the size of output parquet files are quite 
> different. For example,  an origin parquet file with 800 MB size, if save 
> without shuffle, the size is still 800MB, whereas if I use method repartition 
> and then save it as in parquet format, the data size increase to 2.5GB. Row 
> numbers, column numbers and content of two output files are all the same.
> I wonder:
> firstly, why data size will increase after repartition/shuffle?
> secondly, if I need shuffle the input dataframe, how to save it as parquet 
> file efficiently to avoid data size boom?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30326) Raise exception if analyzer exceed max iterations

2019-12-21 Thread Xin Wu (Jira)
Xin Wu created SPARK-30326:
--

 Summary: Raise exception if analyzer exceed max iterations
 Key: SPARK-30326
 URL: https://issues.apache.org/jira/browse/SPARK-30326
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xin Wu


Currently, both analyzer and optimizer just log warning message if rule 
execution exceed max iterations. They should have different behavior. Analyzer 
should raise exception to indicates logical plan resolve failed, while 
optimizer just log warning to keep the current plan. This is more feasible 
after SPARK-30138 was introduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29294) Update Kafka to a verison that supports Scala 2.13

2019-12-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29294:
-

Assignee: Jungtaek Lim

> Update Kafka to a verison that supports Scala 2.13
> --
>
> Key: SPARK-29294
> URL: https://issues.apache.org/jira/browse/SPARK-29294
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Jungtaek Lim
>Priority: Major
>
> Kafka 2.3.0 doesn't seem to support 2.13 yet. We'll need to update when one 
> is published, to support 2.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29294) Update Kafka to a verison that supports Scala 2.13

2019-12-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29294.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26960
[https://github.com/apache/spark/pull/26960]

> Update Kafka to a verison that supports Scala 2.13
> --
>
> Key: SPARK-29294
> URL: https://issues.apache.org/jira/browse/SPARK-29294
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Kafka 2.3.0 doesn't seem to support 2.13 yet. We'll need to update when one 
> is published, to support 2.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28144) Remove ZKUtils from Kafka tests

2019-12-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28144:
-

Assignee: Jungtaek Lim

> Remove ZKUtils from Kafka tests
> ---
>
> Key: SPARK-28144
> URL: https://issues.apache.org/jira/browse/SPARK-28144
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Jungtaek Lim
>Priority: Major
>
> ZKUtils is deprecated from Kafka version 2.0.0 so it would be good to replace.
> I've taken a look at the possibilities but seems like there is no working 
> alternative at the moment. Please see KAFKA-8468.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28144) Remove ZKUtils from Kafka tests

2019-12-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28144.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26960
[https://github.com/apache/spark/pull/26960]

> Remove ZKUtils from Kafka tests
> ---
>
> Key: SPARK-28144
> URL: https://issues.apache.org/jira/browse/SPARK-28144
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> ZKUtils is deprecated from Kafka version 2.0.0 so it would be good to replace.
> I've taken a look at the possibilities but seems like there is no working 
> alternative at the moment. Please see KAFKA-8468.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30056) Skip building test artifacts in `dev/make-distribution.sh`

2019-12-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30056.
---
Resolution: Later

> Skip building test artifacts in `dev/make-distribution.sh`
> --
>
> Key: SPARK-30056
> URL: https://issues.apache.org/jira/browse/SPARK-30056
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> This issue aims to skip building test artifacts in dev/make-distribution.sh.
> Since Apache Spark 3.0.0, we need to build additional binary distribution, 
> this helps the release process by speeding up building multiple binary 
> distributions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30056) Skip building test artifacts in `dev/make-distribution.sh`

2019-12-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30056:
-

Assignee: (was: Dongjoon Hyun)

> Skip building test artifacts in `dev/make-distribution.sh`
> --
>
> Key: SPARK-30056
> URL: https://issues.apache.org/jira/browse/SPARK-30056
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to skip building test artifacts in dev/make-distribution.sh.
> Since Apache Spark 3.0.0, we need to build additional binary distribution, 
> this helps the release process by speeding up building multiple binary 
> distributions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30056) Skip building test artifacts in `dev/make-distribution.sh`

2019-12-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30056:
--
Parent: (was: SPARK-30034)
Issue Type: Improvement  (was: Sub-task)

> Skip building test artifacts in `dev/make-distribution.sh`
> --
>
> Key: SPARK-30056
> URL: https://issues.apache.org/jira/browse/SPARK-30056
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> This issue aims to skip building test artifacts in dev/make-distribution.sh.
> Since Apache Spark 3.0.0, we need to build additional binary distribution, 
> this helps the release process by speeding up building multiple binary 
> distributions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30280) Update documentation

2019-12-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30280:
-

Assignee: Yuming Wang

> Update documentation
> 
>
> Key: SPARK-30280
> URL: https://issues.apache.org/jira/browse/SPARK-30280
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30280) Update documentation

2019-12-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30280.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26919
[https://github.com/apache/spark/pull/26919]

> Update documentation
> 
>
> Key: SPARK-30280
> URL: https://issues.apache.org/jira/browse/SPARK-30280
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30318) Bump jetty to 9.3.27.v20190418

2019-12-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-30318:


Assignee: Sandeep Katta

> Bump jetty to 9.3.27.v20190418
> --
>
> Key: SPARK-30318
> URL: https://issues.apache.org/jira/browse/SPARK-30318
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Sandeep Katta
>Assignee: Sandeep Katta
>Priority: Minor
>
> Upgrade jetty to 9.3.27.v20190418 to fix CVE-2019-10241 and CVE-2019-10247



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30318) Bump jetty to 9.3.27.v20190418

2019-12-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-30318.
--
Fix Version/s: 2.4.5
   Resolution: Fixed

Issue resolved by pull request 26967
[https://github.com/apache/spark/pull/26967]

> Bump jetty to 9.3.27.v20190418
> --
>
> Key: SPARK-30318
> URL: https://issues.apache.org/jira/browse/SPARK-30318
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Sandeep Katta
>Assignee: Sandeep Katta
>Priority: Minor
> Fix For: 2.4.5
>
>
> Upgrade jetty to 9.3.27.v20190418 to fix CVE-2019-10241 and CVE-2019-10247



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30318) Bump jetty to 9.3.27.v20190418

2019-12-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-30318:
-
Issue Type: Improvement  (was: Bug)
  Priority: Minor  (was: Major)

> Bump jetty to 9.3.27.v20190418
> --
>
> Key: SPARK-30318
> URL: https://issues.apache.org/jira/browse/SPARK-30318
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Sandeep Katta
>Priority: Minor
>
> Upgrade jetty to 9.3.27.v20190418 to fix CVE-2019-10241 and CVE-2019-10247



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30325) Stage retry and executor crashed cause app hung up forever

2019-12-21 Thread haiyangyu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001661#comment-17001661
 ] 

haiyangyu commented on SPARK-30325:
---

[https://github.com/apache/spark/pull/26975]

> Stage retry and executor crashed cause app hung up forever
> --
>
> Key: SPARK-30325
> URL: https://issues.apache.org/jira/browse/SPARK-30325
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.4
>Reporter: haiyangyu
>Priority: Major
> Attachments: image-2019-12-21-17-11-38-565.png, 
> image-2019-12-21-17-15-51-512.png, image-2019-12-21-17-16-40-998.png, 
> image-2019-12-21-17-17-42-244.png
>
>
> h3. Corner case
> The bugs occurs in the coren case as follows:
>  # The stage occurs for fetchFailed and some task hasn't finished, scheduler 
> will resubmit a new stage as retry with those unfinished tasks.
>  # The unfinished task in origin stage finished and the same task on the new 
> retry stage hasn't finished, it will mark the task partition on the new retry 
> stage as succesuful.  !image-2019-12-21-17-11-38-565.png|width=427,height=154!
>  # The executor running those 'successful task' crashed, it cause 
> taskSetManager run executorLost to rescheduler the task on the executor, here 
> will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
> not finished, the variable copiesRunning will decreate to -1 as result. 
> !image-2019-12-21-17-15-51-512.png|width=437,height=340!!image-2019-12-21-17-16-40-998.png|width=398,height=139!
>  # 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
> when rescheduler tasks, and now it is -1, can't to reschedule, and the app 
> will hung forever. !image-2019-12-21-17-17-42-244.png|width=366,height=282!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30325) Stage retry and executor crashed cause app hung up forever

2019-12-21 Thread haiyangyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyangyu updated SPARK-30325:
--
Description: 
h3. Corner case

The bugs occurs in the coren case as follows:
 # The stage occurs for fetchFailed and some task hasn't finished, scheduler 
will resubmit a new stage as retry with those unfinished tasks.
 # The unfinished task in origin stage finished and the same task on the new 
retry stage hasn't finished, it will mark the task partition on the new retry 
stage as succesuful.  !image-2019-12-21-17-11-38-565.png|width=427,height=154!
 # The executor running those 'successful task' crashed, it cause 
taskSetManager run executorLost to rescheduler the task on the executor, here 
will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
not finished, the variable copiesRunning will decreate to -1 as result. 
!image-2019-12-21-17-15-51-512.png|width=437,height=340!!image-2019-12-21-17-16-40-998.png|width=398,height=139!
 # 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
when rescheduler tasks, and now it is -1, can't to reschedule, and the app will 
hung forever. !image-2019-12-21-17-17-42-244.png|width=366,height=282!

  was:
h3. Corner case

The bugs occurs in the coren case as follows:
 # The stage occurs for fetchFailed and some task hasn't finished, scheduler 
will resubmit a new stage as retry with those unfinished tasks.
 # The unfinished task in origin stage finished and the same task on the new 
retry stage hasn't finished, it will mark the task partition on the new retry 
stage as succesuful. !image-2019-12-21-17-11-38-565.png!
 # The executor running those 'successful task' crashed, it cause 
taskSetManager run executorLost to rescheduler the task on the executor, here 
will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
not finished, the variable copiesRunning will decreate to -1 as result.
 # 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
when rescheduler tasks, and now it is -1, can't to reschedule, and the app will 
hung forever.


> Stage retry and executor crashed cause app hung up forever
> --
>
> Key: SPARK-30325
> URL: https://issues.apache.org/jira/browse/SPARK-30325
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.4
>Reporter: haiyangyu
>Priority: Major
> Attachments: image-2019-12-21-17-11-38-565.png, 
> image-2019-12-21-17-15-51-512.png, image-2019-12-21-17-16-40-998.png, 
> image-2019-12-21-17-17-42-244.png
>
>
> h3. Corner case
> The bugs occurs in the coren case as follows:
>  # The stage occurs for fetchFailed and some task hasn't finished, scheduler 
> will resubmit a new stage as retry with those unfinished tasks.
>  # The unfinished task in origin stage finished and the same task on the new 
> retry stage hasn't finished, it will mark the task partition on the new retry 
> stage as succesuful.  !image-2019-12-21-17-11-38-565.png|width=427,height=154!
>  # The executor running those 'successful task' crashed, it cause 
> taskSetManager run executorLost to rescheduler the task on the executor, here 
> will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
> not finished, the variable copiesRunning will decreate to -1 as result. 
> !image-2019-12-21-17-15-51-512.png|width=437,height=340!!image-2019-12-21-17-16-40-998.png|width=398,height=139!
>  # 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
> when rescheduler tasks, and now it is -1, can't to reschedule, and the app 
> will hung forever. !image-2019-12-21-17-17-42-244.png|width=366,height=282!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30325) Stage retry and executor crashed cause app hung up forever

2019-12-21 Thread haiyangyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyangyu updated SPARK-30325:
--
Attachment: image-2019-12-21-17-17-42-244.png

> Stage retry and executor crashed cause app hung up forever
> --
>
> Key: SPARK-30325
> URL: https://issues.apache.org/jira/browse/SPARK-30325
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.4
>Reporter: haiyangyu
>Priority: Major
> Attachments: image-2019-12-21-17-11-38-565.png, 
> image-2019-12-21-17-15-51-512.png, image-2019-12-21-17-16-40-998.png, 
> image-2019-12-21-17-17-42-244.png
>
>
> h3. Corner case
> The bugs occurs in the coren case as follows:
>  # The stage occurs for fetchFailed and some task hasn't finished, scheduler 
> will resubmit a new stage as retry with those unfinished tasks.
>  # The unfinished task in origin stage finished and the same task on the new 
> retry stage hasn't finished, it will mark the task partition on the new retry 
> stage as succesuful. !image-2019-12-21-17-11-38-565.png!
>  # The executor running those 'successful task' crashed, it cause 
> taskSetManager run executorLost to rescheduler the task on the executor, here 
> will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
> not finished, the variable copiesRunning will decreate to -1 as result.
>  # 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
> when rescheduler tasks, and now it is -1, can't to reschedule, and the app 
> will hung forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30325) Stage retry and executor crashed cause app hung up forever

2019-12-21 Thread haiyangyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyangyu updated SPARK-30325:
--
Attachment: image-2019-12-21-17-16-40-998.png

> Stage retry and executor crashed cause app hung up forever
> --
>
> Key: SPARK-30325
> URL: https://issues.apache.org/jira/browse/SPARK-30325
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.4
>Reporter: haiyangyu
>Priority: Major
> Attachments: image-2019-12-21-17-11-38-565.png, 
> image-2019-12-21-17-15-51-512.png, image-2019-12-21-17-16-40-998.png
>
>
> h3. Corner case
> The bugs occurs in the coren case as follows:
>  # The stage occurs for fetchFailed and some task hasn't finished, scheduler 
> will resubmit a new stage as retry with those unfinished tasks.
>  # The unfinished task in origin stage finished and the same task on the new 
> retry stage hasn't finished, it will mark the task partition on the new retry 
> stage as succesuful. !image-2019-12-21-17-11-38-565.png!
>  # The executor running those 'successful task' crashed, it cause 
> taskSetManager run executorLost to rescheduler the task on the executor, here 
> will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
> not finished, the variable copiesRunning will decreate to -1 as result.
>  # 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
> when rescheduler tasks, and now it is -1, can't to reschedule, and the app 
> will hung forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30325) Stage retry and executor crashed cause app hung up forever

2019-12-21 Thread haiyangyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyangyu updated SPARK-30325:
--
Attachment: image-2019-12-21-17-15-51-512.png

> Stage retry and executor crashed cause app hung up forever
> --
>
> Key: SPARK-30325
> URL: https://issues.apache.org/jira/browse/SPARK-30325
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.4
>Reporter: haiyangyu
>Priority: Major
> Attachments: image-2019-12-21-17-11-38-565.png, 
> image-2019-12-21-17-15-51-512.png
>
>
> h3. Corner case
> The bugs occurs in the coren case as follows:
>  # The stage occurs for fetchFailed and some task hasn't finished, scheduler 
> will resubmit a new stage as retry with those unfinished tasks.
>  # The unfinished task in origin stage finished and the same task on the new 
> retry stage hasn't finished, it will mark the task partition on the new retry 
> stage as succesuful. !image-2019-12-21-17-11-38-565.png!
>  # The executor running those 'successful task' crashed, it cause 
> taskSetManager run executorLost to rescheduler the task on the executor, here 
> will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
> not finished, the variable copiesRunning will decreate to -1 as result.
>  # 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
> when rescheduler tasks, and now it is -1, can't to reschedule, and the app 
> will hung forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30325) Stage retry and executor crashed cause app hung up forever

2019-12-21 Thread haiyangyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyangyu updated SPARK-30325:
--
Attachment: image-2019-12-21-17-11-38-565.png

> Stage retry and executor crashed cause app hung up forever
> --
>
> Key: SPARK-30325
> URL: https://issues.apache.org/jira/browse/SPARK-30325
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.4
>Reporter: haiyangyu
>Priority: Major
> Attachments: image-2019-12-21-17-11-38-565.png
>
>
> Kill tasks which succeeded in origin stage when new retry stage has started 
> the same task and hasn't finished.
> This can decreate stage run time, resouce cost and most important of all, it 
> can avoid a bug which can cause app hung.
> The bugs occurs in the coren case as follows:
> 1. The stage occurs for fetchFailed and some task hasn't finished, scheduler 
> will resubmit a new stage as retry with those unfinished tasks.
> 2. The unfinished task in origin stage finished and the same task on the new 
> retry stage hasn't finished, it will mark the task partition on the new retry 
> stage as succesuful. 
> 3. The executor running those 'successful task' crashed, it cause 
> taskSetManager run executorLost to rescheduler the task on the executor, here 
> will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
> not finished, the variable copiesRunning will decreate to -1 as result.
> 4. 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
> when rescheduler tasks, and now it is -1, can't to reschedule, and the app 
> will hung forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30325) Stage retry and executor crashed cause app hung up forever

2019-12-21 Thread haiyangyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyangyu updated SPARK-30325:
--
Description: 
h3. Corner case

The bugs occurs in the coren case as follows:
 # The stage occurs for fetchFailed and some task hasn't finished, scheduler 
will resubmit a new stage as retry with those unfinished tasks.
 # The unfinished task in origin stage finished and the same task on the new 
retry stage hasn't finished, it will mark the task partition on the new retry 
stage as succesuful. !image-2019-12-21-17-11-38-565.png!
 # The executor running those 'successful task' crashed, it cause 
taskSetManager run executorLost to rescheduler the task on the executor, here 
will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
not finished, the variable copiesRunning will decreate to -1 as result.
 # 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
when rescheduler tasks, and now it is -1, can't to reschedule, and the app will 
hung forever.

  was:
Kill tasks which succeeded in origin stage when new retry stage has started the 
same task and hasn't finished.
This can decreate stage run time, resouce cost and most important of all, it 
can avoid a bug which can cause app hung.
The bugs occurs in the coren case as follows:
1. The stage occurs for fetchFailed and some task hasn't finished, scheduler 
will resubmit a new stage as retry with those unfinished tasks.
2. The unfinished task in origin stage finished and the same task on the new 
retry stage hasn't finished, it will mark the task partition on the new retry 
stage as succesuful. 
3. The executor running those 'successful task' crashed, it cause 
taskSetManager run executorLost to rescheduler the task on the executor, here 
will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
not finished, the variable copiesRunning will decreate to -1 as result.
4. 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
when rescheduler tasks, and now it is -1, can't to reschedule, and the app will 
hung forever.


> Stage retry and executor crashed cause app hung up forever
> --
>
> Key: SPARK-30325
> URL: https://issues.apache.org/jira/browse/SPARK-30325
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.4
>Reporter: haiyangyu
>Priority: Major
> Attachments: image-2019-12-21-17-11-38-565.png
>
>
> h3. Corner case
> The bugs occurs in the coren case as follows:
>  # The stage occurs for fetchFailed and some task hasn't finished, scheduler 
> will resubmit a new stage as retry with those unfinished tasks.
>  # The unfinished task in origin stage finished and the same task on the new 
> retry stage hasn't finished, it will mark the task partition on the new retry 
> stage as succesuful. !image-2019-12-21-17-11-38-565.png!
>  # The executor running those 'successful task' crashed, it cause 
> taskSetManager run executorLost to rescheduler the task on the executor, here 
> will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
> not finished, the variable copiesRunning will decreate to -1 as result.
>  # 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
> when rescheduler tasks, and now it is -1, can't to reschedule, and the app 
> will hung forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30325) Stage retry and executor crashed cause app hung up forever

2019-12-21 Thread haiyangyu (Jira)
haiyangyu created SPARK-30325:
-

 Summary: Stage retry and executor crashed cause app hung up forever
 Key: SPARK-30325
 URL: https://issues.apache.org/jira/browse/SPARK-30325
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4, 2.4.0
Reporter: haiyangyu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30325) Stage retry and executor crashed cause app hung up forever

2019-12-21 Thread haiyangyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyangyu updated SPARK-30325:
--
Description: 
Kill tasks which succeeded in origin stage when new retry stage has started the 
same task and hasn't finished.
This can decreate stage run time, resouce cost and most important of all, it 
can avoid a bug which can cause app hung.
The bugs occurs in the coren case as follows:
1. The stage occurs for fetchFailed and some task hasn't finished, scheduler 
will resubmit a new stage as retry with those unfinished tasks.
2. The unfinished task in origin stage finished and the same task on the new 
retry stage hasn't finished, it will mark the task partition on the new retry 
stage as succesuful. 
3. The executor running those 'successful task' crashed, it cause 
taskSetManager run executorLost to rescheduler the task on the executor, here 
will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
not finished, the variable copiesRunning will decreate to -1 as result.
4. 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
when rescheduler tasks, and now it is -1, can't to reschedule, and the app will 
hung forever.

> Stage retry and executor crashed cause app hung up forever
> --
>
> Key: SPARK-30325
> URL: https://issues.apache.org/jira/browse/SPARK-30325
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.4
>Reporter: haiyangyu
>Priority: Major
>
> Kill tasks which succeeded in origin stage when new retry stage has started 
> the same task and hasn't finished.
> This can decreate stage run time, resouce cost and most important of all, it 
> can avoid a bug which can cause app hung.
> The bugs occurs in the coren case as follows:
> 1. The stage occurs for fetchFailed and some task hasn't finished, scheduler 
> will resubmit a new stage as retry with those unfinished tasks.
> 2. The unfinished task in origin stage finished and the same task on the new 
> retry stage hasn't finished, it will mark the task partition on the new retry 
> stage as succesuful. 
> 3. The executor running those 'successful task' crashed, it cause 
> taskSetManager run executorLost to rescheduler the task on the executor, here 
> will cause copiesRunning decreate 1 twice, beause those 'successful task' are 
> not finished, the variable copiesRunning will decreate to -1 as result.
> 4. 'dequeueTaskFromList' will use copiesRunning equal 0 as reschedule basis 
> when rescheduler tasks, and now it is -1, can't to reschedule, and the app 
> will hung forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org