[ https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bikas Saha updated TEZ-2217: ---------------------------- Attachment: TEZ-2217.2.patch New patch. The problem was that the expire time was not update until the min held container expire time actually elapsed. But if task requests would come in just before the update would happen, then in the next allocation cycle the min held containers would be released because they just crossed the expire time boundary. Looks like the timing of the next dag is currently hitting that race condition and probably was not hitting it earlier. Can you please try this out? > The min-held-containers constraint is not enforced during query runtime > ------------------------------------------------------------------------ > > Key: TEZ-2217 > URL: https://issues.apache.org/jira/browse/TEZ-2217 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.6.0, 0.7.0 > Reporter: Gopal V > Assignee: Bikas Saha > Attachments: TEZ-2217-debug.txt.bz2, TEZ-2217.1.patch, > TEZ-2217.2.patch, TEZ-2217.txt.bz2 > > > The min-held containers constraint is respected during query idle times, but > is not respected when a query is actually in motion. > The AM releases unused containers during dag execution without checking for > min-held containers. > {code} > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing > container, containerId=container_1424502260528_1348_01_000013, > containerExpiryTime=1426891313264, idleTimeoutMin=5000 > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Releasing unused container: > container_1424502260528_1348_01_000013 > {code} > This is actually useful only after the AM has received a soft pre-emption > message, doing it on an idle cluster slows down one of the most common query > patterns in BI systems. > {code} > create temporary table smalltable as ...; > select ... bigtable JOIN smalltable ON ...; > {code} > The smaller query in the beginning throws away the pre-warmed capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)