[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-21 Thread via GitHub


kuwii commented on PR #39190:
URL: https://github.com/apache/spark/pull/39190#issuecomment-1399216960

   Tried the example code in the 
[JIRA](https://issues.apache.org/jira/browse/SPARK-24415), and it is not 
affected by this change. Tasks showed in the stage are the same before and 
after this change.
   
   
![image](https://user-images.githubusercontent.com/10705175/213859013-55e9b979-64a1-4ebd-b3a8-8b19e1d6a04e.png)
   
   Also, `numActiveStages` of that example is also `-1`. I think the reason we 
didn't notice it is because currently the property seems to be only available 
in `jobs` REST API, not web UI.
   
   
![image](https://user-images.githubusercontent.com/10705175/213861066-02e896bd-dd34-4a05-8d41-4e44ccd1fa6a.png)
   
   I've checked comments about these lines in that 
[PR](https://github.com/apache/spark/pull/22209). Code here is for handling 
stages metrics when `onStageCompleted` event is dropped somehow. But as 
mentioned in this PR, I think the logic to reduce `activeStages` here is 
incorrect, which should be removed when handling `onJobEnd` event.
   
   
![image](https://user-images.githubusercontent.com/10705175/213859974-c9922f03-d93a-4144-bf06-c0e16829be6f.png)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-20 Thread GitBox


kuwii commented on PR #39190:
URL: https://github.com/apache/spark/pull/39190#issuecomment-1398579998

   I'm not familiar with how Spark creates and runs jobs and stages for a 
query, but I think it may be related to this case. I can reproduce this locally 
using Spark on Yarn mode with this code:
   
   ```python
   from pyspark import SparkConf, SparkContext
   from pyspark.sql import SQLContext
   from pyspark.sql.functions import countDistinct, col, count, when
   import time
   
   conf = SparkConf().setAppName('test')
   sc = SparkContext(conf = conf)
   spark = SQLContext(sc).sparkSession
   
   spark.range(1, 100).count()
   ```
   
   The execution for `count` creates 2 jobs: job 0 with stage 0 and job 1 with 
stage 1, 2.
   
   
![image](https://user-images.githubusercontent.com/10705175/213734447-2b1748e2-f073-4d68-b2b0-7793fbd80ca0.png)
   
   Because of some logic, stage 1 will always be skipped, not even submitted.
   
   
![image](https://user-images.githubusercontent.com/10705175/213736105-c5d0eedc-ed0a-4f23-933b-eebe34244db5.png)
   
   This is the case that is mentioned in the PR's description. And because the 
incorrect logic of updating `numActiveStages`, it will be `-1` in jobs API. 
This PR can fix it.
   
   
![image](https://user-images.githubusercontent.com/10705175/213740564-47b6e6eb-8d09-4eca-a340-3a98c912c69a.png)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-20 Thread GitBox


kuwii commented on PR #39190:
URL: https://github.com/apache/spark/pull/39190#issuecomment-1398263751

   @srowen We found this issue in some of Spark applications. Here's the event 
log of an example, which can be loaded through history server:
   
[application_1671519030791_0001_1.zip](https://github.com/apache/spark/files/10465796/application_1671519030791_0001_1.zip)
   
   In `/api/v1/applications/application_1671519030791_0001/1/jobs`, 
`numActiveStages` of job 3, 4, 5, 8 are less than 0.
   
![image](https://user-images.githubusercontent.com/10705175/213685483-36a1f933-3f0c-4c9a-a0b0-50e293794067.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-18 Thread GitBox


kuwii commented on PR #39190:
URL: https://github.com/apache/spark/pull/39190#issuecomment-1396457172

   Hi @srowen, could you please help to take a look at this PR? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-11 Thread GitBox


kuwii commented on PR #39190:
URL: https://github.com/apache/spark/pull/39190#issuecomment-1379868894

   > Hi. this impacts Jobs API so this is a user facing change right?
   
   @VindhyaG Thanks for the comment. I've updated the PR description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-03 Thread GitBox


kuwii commented on PR #39190:
URL: https://github.com/apache/spark/pull/39190#issuecomment-1370535541

   Kindly ping @ankuriitg @vanzin @mridulm @thejdeep 
   Could you please help to take a look at this PR? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2022-12-22 Thread GitBox


kuwii commented on PR #39190:
URL: https://github.com/apache/spark/pull/39190#issuecomment-1363653779

   Related Change: #22209
   Kindly ping @ankuriitg @vanzin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org