Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/62#issuecomment-36483058
Unfortunately this isn't very useful for getting network bandwidth...if you
consider a simple case where two shuffle reads (for one task) occur
simultaneously and both take t time, the metric will report a time of 2t.
The metric would also report 2t if two shuffles happened one after
another, so it's hard to extract raw network bandwidth from the metric. I
thought about adding a metric that just records the time spent doing
network stuff (that doesn't double count when things are overlapping) to
get bandwidth but @pwendell said this existed before and was also
misleading because if many tasks are fetching shuffle data at the same
time, they will each individually have low bandwidth, even if the aggregate
bandwidth is good.
On Sun, Mar 2, 2014 at 9:02 PM, Shivaram Venkataraman <
[email protected]> wrote:
> Hmm -- I have been confused by this before, but if I am reading the
> comment right, this could be useful for to get an estimate of the raw
> network bandwidth used for shuffle ? If not could we have an explicit
> metric for that ?
>
> --
> Reply to this email directly or view it on
GitHub<https://github.com/apache/spark/pull/62#issuecomment-36481963>
> .
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---