[jira] [Commented] (HDFS-10742) Measurement of lock held time in FsDatasetImpl

Chris Douglas (JIRA) Sun, 04 Sep 2016 16:56:44 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15463720#comment-15463720
 ]


Chris Douglas commented on HDFS-10742:
--------------------------------------

bq. Removed per-method measurement and aggregate stats to minimize overhead for 
cheap operations

Recording stack traces, etc. may be useful. I raised those overheads for two 
reasons. First, that diagnostic data should only be built when necessary. 
Previous versions of the patch generated diagnostic data that was ultimately 
discarded. Second, minutiae like "GC-friendliness", whether a wrapper class on 
{{Lock}} adds dispatch overhead, class loading overhead, etc. when this JIRA 
logs critical sections longer than *300ms* are irrelevant, literally by orders 
of magnitude.

I'll ask again after the motivation. Is the intent to identify subsystems that 
do too much work (and/or I/O) in a critical section? To measure where the DN is 
spending cycles? In some of these cases, stack traces could be very helpful. As 
long as this telemetry can be tuned by configuration, it shouldn't create new 
problems for operators. To that point, instead of reusing the {{FsDatasetImpl}} 
log, this may want to use its own (so disabling/redirecting it doesn't affect 
other, critical information written to that log).

bq. Was your concern was with exposing AutoCloseableLock in hadoop utils or 
just an overall objection to this idiom even if it's not exposed outside the DN.

I have two objections, one to the idiom and one to the design.

{{AutoCloseableLock}} is a new abstraction that doesn't elide any details of 
the thing it obscures. Every path for {{Lock}} objects is possible, so it 
doesn't make it easier to reason about lock correctness. It creates a special 
type for the _simplest_ possible case. This case is so simple, we have 
automated tooling to detect it. Further, instead of every Java developer 
looking at a {{Lock}} and scanning for well-worn expectations, we introduce an 
abstraction that requires investigation. No advantage beyond developer 
convenience has been raised, and it is a demonstrably _inconvenient_ 
abstraction.

If the interface were limited so it could _only_ be used in the resource idiom 
(as [proposed|https://s.apache.org/Eyaw] earlier), then a developer could know 
that the {{Lock}} acquisition was limited to scope. As semantic sugar it would 
still be pointless, because Java programmers also know how to use 
{{synchronized}}.

This could be redeemed by changing the design. If the {{AutoCloseableLock}} 
constructor accepted a {{Lock}} instance and cut its API to scope-limited 
changes, then telemetry like this could be added as implementations of the 
{{Lock}} type with all the perceived advantages of ACL. Devs can reason about 
the simpler API of ACL (scope-based acquisition), but still monitor all 
acquisitions as required by this JIRA (impractical to implement using 
{{synchronized}}).

I remain skeptical of {{AutoCloseableLock}}, since one could just use the 
{{Lock}} in interface and existing tooling to implement the same. But if its 
virtues are obvious to everyone but me I won't stand in the way of including it.

> Measurement of lock held time in FsDatasetImpl
> ----------------------------------------------
>
>                 Key: HDFS-10742
>                 URL: https://issues.apache.org/jira/browse/HDFS-10742
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>         Attachments: HDFS-10742.001.patch, HDFS-10742.002.patch, 
> HDFS-10742.003.patch, HDFS-10742.004.patch, HDFS-10742.005.patch, 
> HDFS-10742.006.patch, HDFS-10742.007.patch, HDFS-10742.008.patch, 
> HDFS-10742.009.patch, HDFS-10742.010.patch, HDFS-10742.011.patch
>
>
> This JIRA proposes to measure the time the of lock of {{FsDatasetImpl}} is 
> held by a thread. Doing so will allow us to measure lock statistics.
> This can be done by extending the {{AutoCloseableLock}} lock object in 
> {{FsDatasetImpl}}. In the future we can also consider replacing the lock with 
> a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10742) Measurement of lock held time in FsDatasetImpl

Reply via email to