There is one piece of information that'd be useful to know, which is the
source of the input. Even in the presence of an IOException, the input
metrics still specifies the task is reading from Hadoop.

However, I'm slightly confused by this -- I think usually we'd want to
report the number of bytes read, rather than the total input size. For
example, if there is a limit (only read the first 5 records), the actual
number of bytes read is much smaller than the total split size.

Kay, am I mis-interpreting this?



On Sat, Jul 26, 2014 at 7:42 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Hi,
> Starting at line 203:
>       try {
>         /* bytesRead may not exactly equal the bytes read by a task: split
> boundaries aren't
>          * always at record boundaries, so tasks may need to read into
> other splits to complete
>          * a record. */
>         inputMetrics.bytesRead = split.inputSplit.value.getLength()
>       } catch {
>         case e: java.io.IOException =>
>           logWarning("Unable to get input size to set InputMetrics for
> task", e)
>       }
>       context.taskMetrics.inputMetrics = Some(inputMetrics)
>
> If there is IOException, context.taskMetrics.inputMetrics is set by
> wrapping inputMetrics - as if there wasn't any error.
>
> I wonder if the above code should distinguish the error condition.
>
> Cheers
>

Reply via email to