[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 OK well if you see evidence later that the disk spilled bytes are unreasonably high, it's worth reinvestigating to see if there's a problem like this. If you aren't seeing bad metrics though, then maybe there's no problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen Yes, the file seems always empty before write, so the origin way is OK. Sorry for this PR is not thoughtful enough, I just get a mislead by the other method in the shuffle.py, which used the pos before write. I will close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 I get that, but if it's always true, then there was no problem to begin with. That's what the code seems to think right now. I haven't looked at the code much but that's the question -- are you sure the files are non-empty in some scenario? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen No. It does not matter whether the file is empty or not, if the file is empty, the `getsize()` just return 0, and this should be OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 Is the idea that the file may be non empty when written ? There is at least one more instance of this call but maybe the file is known to be empty before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen I update PR using an increment way to update the DiskBytesSpilled metrics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen you are right, I will correct it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 Given how DiskBytesSpilled is used, and still used in other parts of the code, this doesn't look correct. It seems to be a global that is always incremented. Here you reset the value in certain cases, effectively. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen @davies mind taking a look? This PR is very simple. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15052 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org