[
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280782#comment-15280782
]
Colin Patrick McCabe edited comment on HADOOP-13028 at 5/11/16 8:43 PM:
In the past I've written code for Spark that used reflection to make use of
APIs that may or may not be present in Hadoop. HBase often does this as well,
so that it can use multiple versions of Hadoop. It seems like this wouldn't be
a lot of code. Is that feasible in this case?
I just find the argument that we should overload an existing unrelated API to
output statistics very off-putting. I guess you could argue that the
statistics is part of the stream state, and toString is intended to reflect
stream state. But it will result in very long output from toString which
probably isn't what most existing callers want. And it's not consistent with
the way any other hadoop streams work, including other s3 ones like s3n.
[~andrew.wang], [~cnauroth], [~liuml07], what do you think about this? Is it
acceptable to overload {{toString}} in this way, to output statistics? The
argument seems to be that this easier than using reflection to get the actual
stream statistics object.
was (Author: cmccabe):
In the past I've written code for Spark that used reflection to make use of
APIs that may or may not be present in Hadoop. HBase often does this as well,
so that it can use multiple versions of Hadoop. It seems like this wouldn't be
a lot of code. Is that feasible in this case?
I just find the argument that we should overload an existing unrelated API to
output statistics very off-putting. It's like saying we should override
hashCode to output the number of times the user called {{seek()}} on the stream.
I guess you could argue that the statistics is part of the stream state, and
toString is intended to reflect stream state. But it will result in very long
output from toString which probably isn't what most existing callers want. And
it's not consistent with the way any other hadoop streams work, including other
s3 ones like s3n.
> add low level counter metrics for S3A; use in read performance tests
>
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch,
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch,
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch,
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch,
> HADOOP-13028-branch-2-010.patch, HADOOP-13028-branch-2-011.patch,
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive,
> closing connections may be expensive (a sign of a regression).
> S3A FS and individual input streams should have counters of the # of
> open/close/failure+reconnect operations, timers of how long things take. This
> can be used downstream to measure efficiency of the code (how often
> connections are being made), connection reliability, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org