[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-884917062 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-884916961 > @AngersZh Thanks. Please update the screenshot in the PR description as well. DOne -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-883463577 > @AngersZh can you update the screenshot in the pr description? Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-883354294 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-882546959 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-882546959 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-882546959 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-882176352 Any more suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-881127826 ping @cloud-fan @gengliangwang @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-806307672 gentle ping @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-794964106 > We can periodically log something in the built-in file commit protocol, but there is nothing we can do if people are using a custom file commit protocol. A new thread to log these can be ok, but looks weird. (We use this way to show thrift server's progress) > I checked other plan nodes and found that the file scan node has a "metadata time" metrics. I think it makes sense to have something similar in the write nodes, but we need to think about the naming and what to include (shall we include the hive LOAD TABLE time?). If possible, I think the more comprehensive information the better. As I mentioned in https://github.com/apache/spark/pull/31522#issuecomment-793672126, if we have `LOAD TABLE` time in SQL Tab's node, it will be easier for us to explain to our user/customer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-794853587 > And for the extreme case like taking hours on committing, I think more important thing is to log periodically to let end users determine whether the Spark driver is hang or not, without enabling DEBUG log for sure. Maybe off-topic, but if we'd like to have priority on these things, I'd rather say that's more needed. Yea, I think you have point out the most important concern. There is no log when bad case happened. I think this idea is nice , WDYT @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-793672126 > How often is that? We can also improve the log to make it easier to search for a certain job. another case , we run a MSCK of table, in SQL tab it shows nothing. ![截屏2021-03-09 下午6 16 20](https://user-images.githubusercontent.com/46485123/110455575-92a36980-8103-11eb-9a3a-565d3d4f7c15.png) But when it slow, we only can know how long it cost to collect path info in stage page. ![截屏2021-03-09 下午6 15 55](https://user-images.githubusercontent.com/46485123/110455524-83bcb700-8103-11eb-82d6-8e36eb962197.png) But after collect path info and partition statistics. It also need to interact with hive. sometimes it is slow, user will ask why the job finished only cost 2 minutes but SQL's duration is 10 minutes. These duration metrics also important for such command for Spark admin to quick find the reason and reply to user. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-793484750 In fact, the cluster environment of many companies is not so healthy, and there are often slow nodes that cause the commit and hive metadata load table/partition to be very slow. We can indeed view it through the log, but for long-running service, especially the Spark Thrift Server, we have a lot of SQL running on it, we also need to go to the background log to find and confirm which SQL the log belongs to. Under normal circumstances, our SQL runs for a long time or there is a problem then we will to view these metrics information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-785554038 > Normally, committing a job should be fast. I don't think it is a good idea to put this in the SQL graph. For debug purposes, the log message should be enough. > Besides, the name "duration of committing the job" can be confusing to end-users. > I have to leave -1 for this one. All right, for quick debug, all message shown directly may be more help for spark admins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-783854398 > The file commit is a driver side thing, why do we need to update `BasicWriteJobStatsTracker`? I think we can follow `BroadcastExchangeExec` and simply call `SQLMetrics.postDriverMetricUpdates` Since we compute WritingCommand's metrics in driver side and all metrics stored in `BasicWriteJobStatsTracker `, so I changed `BasicWriteJobStatsTracker`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-782549928 Gentle ping @HeartSaVioR @dongjoon-hyun @HyukjinKwon @maropu @cloud-fan Could you help to review this I think it's really help since always `INSERT` statement slow caused by commit file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-775725455 Gentle ping @dongjoon-hyun @HeartSaVioR @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org