[ https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554856#comment-16554856 ]
Sahil Takiar commented on HIVE-17684: ------------------------------------- I'll look into the serialization issues, it seems to be a classpath issue and it actually only affects {{TestSparkCliDriver}}. I checked the other test failures, and am seeing a different stack-trace. It looks like the GC monitor is causing a bunch of q-tests to fail. Perhaps because it is tuned too aggressively for our tests? For example, {{TestCliDriver}} {{union21.q}} is failing due to: {code:java} 2018-07-23T23:12:36,334 ERROR [f9eb6e6a-a735-48ff-97c9-466febf5387a main] exec.Task: Hive Runtime Error: Map local work exhausted memoryorg.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionError: GC time percentage = 60, exceeded threshold. at org.apache.hadoop.hive.ql.exec.Operator.checkGcOverhead(Operator.java:1654) at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.process(HashTableSinkOperator.java:230) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:1021) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:967) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:954) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:1021) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:967) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:954) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:126) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:1021) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:967) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:460) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:431) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:392) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:162) {code} > HoS memory issues with MapJoinMemoryExhaustionHandler > ----------------------------------------------------- > > Key: HIVE-17684 > URL: https://issues.apache.org/jira/browse/HIVE-17684 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Sahil Takiar > Assignee: Misha Dmitriev > Priority: Major > Attachments: HIVE-17684.01.patch, HIVE-17684.02.patch, > HIVE-17684.03.patch > > > We have seen a number of memory issues due the {{HashSinkOperator}} use of > the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect > scenarios where the small table is taking too much space in memory, in which > case a {{MapJoinMemoryExhaustionError}} is thrown. > The configs to control this logic are: > {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90) > {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55) > The handler works by using the {{MemoryMXBean}} and uses the following logic > to estimate how much memory the {{HashMap}} is consuming: > {{MemoryMXBean#getHeapMemoryUsage().getUsed() / > MemoryMXBean#getHeapMemoryUsage().getMax()}} > The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be > inaccurate. The value returned by this method returns all reachable and > unreachable memory on the heap, so there may be a bunch of garbage data, and > the JVM just hasn't taken the time to reclaim it all. This can lead to > intermittent failures of this check even though a simple GC would have > reclaimed enough space for the process to continue working. > We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. > In Hive-on-MR this probably made sense to use because every Hive task was run > in a dedicated container, so a Hive Task could assume it created most of the > data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks > running in a single executor, each doing different things. -- This message was sent by Atlassian JIRA (v7.6.3#76005)