Re: Improving system design logging in spark

Ted Yu Wed, 20 Apr 2016 10:49:00 -0700

Interesting.

For #3:


bq. reading data from,

I guess you meant reading from disk.

On Wed, Apr 20, 2016 at 10:45 AM, atootoonchian <[email protected]> wrote:

> Current spark logging mechanism can be improved by adding the following
> parameters. It will help in understanding system bottlenecks and provide
> useful guidelines for Spark application developer to design an optimized
> application.
>
> 1. Shuffle Read Local Time: Time for a task to read shuffle data from local
> storage.
> 2. Shuffle Read Remote Time: Time for a  task to read shuffle data from
> remote node.
> 3. Distribution processing time between computation, I/O, network: Show
> distribution of processing time of each task between computation, reading
> data from, and reading data from network.
> 4. Average I/O bandwidth: Average time of I/O throughput for each task when
> it fetches data from disk.
> 5. Average Network bandwidth: Average network throughput for each task when
> it fetches data from remote nodes.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Improving-system-design-logging-in-spark-tp17291.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Improving system design logging in spark

Reply via email to