[ https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933583#comment-15933583 ]
Misha Dmitriev commented on HIVE-16166: --------------------------------------- I ran 'mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_if_expr.q' locally, and it passed. I then checked the hive log at http://104.198.109.242/logs/PreCommit-HIVE-Build-4250/failed/141-TestMiniLlapLocalCliDriver-skewjoinopt15.q-vector_coalesce.q-orc_ppd_decimal.q-and-27-more/logs/hive.log It does have a bunch of exception stack traces, but it doesn't look like they are related with my changes. At least I don't see 'StringInternUtils' (my class where an NPE or some such is most likely to happen), and a bunch of NPEs all across this log are all of the same type and have no traces of the code that I've modified. I can't see where in this log the problematic test (vector_if_expr) starts, or do all the tests run in parallel? > HS2 may still waste up to 15% of memory on duplicate strings > ------------------------------------------------------------ > > Key: HIVE-16166 > URL: https://issues.apache.org/jira/browse/HIVE-16166 > Project: Hive > Issue Type: Improvement > Reporter: Misha Dmitriev > Assignee: Misha Dmitriev > Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch, > HIVE-16166.02.patch > > > A heap dump obtained from one of our users shows that 15% of memory is wasted > on duplicate strings, despite the recent optimizations that I made. The > problematic strings just come from different sources this time. See the > excerpt from the jxray (www.jxray.com) analysis attached. > Adding String.intern() calls in the appropriate places reduces the overhead > of duplicate strings with this workload to ~6%. The remaining duplicates come > mostly from JDK internal and MapReduce data structures, and thus are more > difficult to fix. -- This message was sent by Atlassian JIRA (v6.3.15#6346)