[ 
https://issues.apache.org/jira/browse/TEZ-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084977#comment-17084977
 ] 

László Bodor commented on TEZ-3967:
-----------------------------------

[~jeagles]: attached  [^TEZ-3967.02.patch] , which contains the following 
basically:
1. DagStatusHandler: which encapsulates the dag status refresh logic
2. I noticed that getCachedCounters were invalid, [it didn't refresh 
cachedCounters at 
all|https://github.com/apache/tez/blob/214c544472f80997755b62c382f016fbe8373177/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java]...but
 with this patch I started to use cached counters

the basic idea is to refresh different fields with a different frequency 
(please let me know which should be configurable in your opinion)
2a) dag status refresh: 500ms - DAG_STATUS_BUILD_INTERVAL_MS
2b) counters refresh: >1000ms - COUNTER_CACHE_INTERVAL_THRESHOLD_FAST_MS

3. DrainDispatcher minor fix (I've found sometimes that it was hanging in the 
while loop, even when queue was empty), this class is used only in tests

4. TestDAGImpl: added some dag.entityUpdateTracker.stop(), becuase we manually 
set entityUpdateTracker into DAGImpl, but there is another one, which already 
started its thread (I haven't indentified error because of this, so just for 
clarity's sake)

> DAGImpl: dag lock is unfair and can starve the writers
> ------------------------------------------------------
>
>                 Key: TEZ-3967
>                 URL: https://issues.apache.org/jira/browse/TEZ-3967
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Gopal Vijayaraghavan
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: TEZ-3967.01.patch, TEZ-3967.02.patch
>
>
> Found when debugging HIVE-20103, that a reader arriving when another reader 
> is active can postpone a writer from obtaining a write-lock.
> This is fundamentally bad for the DAGImpl as useful progress can only happen 
> when the writeLock is held.
> {code}
>   public void handle(DAGEvent event) {
> ...
>     try {
>       writeLock.lock();
> {code}
> {code}
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007efb02246f40> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>         at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1162)
>         at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:149)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2251)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:2242)
>         at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
>         at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> while read-lock is passed around between 
> {code}
>        at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:901)
>         at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:940)
>         at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:73)
> {code}
> calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to