[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-07-22 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884917062


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-07-22 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-884916961


   > @AngersZh Thanks. Please update the screenshot in the PR description 
as well.
   
   DOne


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-07-20 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883463577


   > @AngersZh can you update the screenshot in the pr description?
   
   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-07-20 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-883354294


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-07-20 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882546959


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-07-20 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882546959


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-07-19 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882546959


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-07-18 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-882176352


   Any more suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-07-15 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-881127826


   ping @cloud-fan @gengliangwang @maropu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-03-24 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-806307672


   gentle ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-03-09 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-794964106


   > We can periodically log something in the built-in file commit protocol, 
but there is nothing we can do if people are using a custom file commit 
protocol.
   
   A new thread to log these can be ok, but looks weird. (We use this way to 
show thrift server's progress)
   
   > I checked other plan nodes and found that the file scan node has a 
"metadata time" metrics. I think it makes sense to have something similar in 
the write nodes, but we need to think about the naming and what to include 
(shall we include the hive LOAD TABLE time?).
   
   If possible, I think the more comprehensive information the better.  As I 
mentioned in https://github.com/apache/spark/pull/31522#issuecomment-793672126, 
if we have `LOAD TABLE` time in SQL Tab's node, it will be easier for us to 
explain to our user/customer.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-03-09 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-794853587


   > And for the extreme case like taking hours on committing, I think more 
important thing is to log periodically to let end users determine whether the 
Spark driver is hang or not, without enabling DEBUG log for sure. Maybe 
off-topic, but if we'd like to have priority on these things, I'd rather say 
that's more needed.
   
   Yea,  I think you have point out the most important concern. There is no log 
when bad case happened. I think this idea is nice , WDYT @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-03-09 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-793672126


   > How often is that? We can also improve the log to make it easier to search 
for a certain job.
   
   another case , we run a MSCK of table, in SQL tab it shows nothing.
   ![截屏2021-03-09 下午6 16 
20](https://user-images.githubusercontent.com/46485123/110455575-92a36980-8103-11eb-9a3a-565d3d4f7c15.png)
   
   But when it slow, we only can know how long it cost to collect path info in 
stage page.
   ![截屏2021-03-09 下午6 15 
55](https://user-images.githubusercontent.com/46485123/110455524-83bcb700-8103-11eb-82d6-8e36eb962197.png)
   
   But after collect path info and partition statistics. It also need to 
interact with hive. sometimes it is slow, user will ask why the job finished 
only cost 2 minutes but SQL's duration is 10 minutes.
   
   These duration metrics also important for such command for Spark admin to 
quick find the reason and reply to user.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-03-08 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-793484750


   In fact, the cluster environment of many companies is not so healthy, and 
there are often slow nodes that cause the commit and hive metadata load 
table/partition to be very slow. We can indeed view it through the log, but for 
long-running service, especially the Spark Thrift Server, we have a lot of SQL 
running on it, we also need to go to the background log to find and confirm 
which SQL the log belongs to. Under normal circumstances, our SQL runs for a 
long time or there is a problem then we will to view these metrics information. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-02-24 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-785554038


   > Normally, committing a job should be fast. I don't think it is a good idea 
to put this in the SQL graph. For debug purposes, the log message should be 
enough.
   > Besides, the name "duration of committing the job" can be confusing to 
end-users.
   > I have to leave -1 for this one.
   
   All right, for quick debug, all message shown directly may be more help for 
spark admins.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-02-22 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-783854398


   > The file commit is a driver side thing, why do we need to update 
`BasicWriteJobStatsTracker`? I think we can follow `BroadcastExchangeExec` and 
simply call `SQLMetrics.postDriverMetricUpdates`
   
   Since we compute WritingCommand's metrics in driver side and all metrics 
stored in `BasicWriteJobStatsTracker `, so I changed 
`BasicWriteJobStatsTracker`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-02-19 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-782549928


   Gentle ping @HeartSaVioR @dongjoon-hyun @HyukjinKwon @maropu @cloud-fan 
Could you help to review this I think it's really help since always `INSERT` 
statement slow caused by commit file.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-02-08 Thread GitBox


AngersZh commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-775725455


   Gentle ping @dongjoon-hyun @HeartSaVioR @HyukjinKwon 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org