[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213456#comment-16213456 ] Prasanth Jayachandran commented on HIVE-17304: -- No. This needs further testing. I need to do some in-depth analysis to see which one is more accurate to the actual value based on heapdumps. So it's not ready for commit yet. > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch, HIVE-17304.2.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213445#comment-16213445 ] Ashutosh Chauhan commented on HIVE-17304: - [~prasanth_j] Shall we commit this? > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch, HIVE-17304.2.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146556#comment-16146556 ] Hive QA commented on HIVE-17304: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884351/HIVE-17304.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11014 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] (batchId=158) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6593/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6593/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6593/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884351 - PreCommit-HIVE-Build > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch, HIVE-17304.2.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146111#comment-16146111 ] Sergey Shelukhin commented on HIVE-17304: - +1 with some testing > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146107#comment-16146107 ] Prasanth Jayachandran commented on HIVE-17304: -- The config changed because we are often very close to estimates in most of the cases (vectorized atleast). I have seen some heapdumps with 2GB hash tables and estimates from log lines are also very close to 2GB (<5%). Initial 2x factor was added earlier primarily for non-vectorized cases + object overhead + key/value size misestimation. Also 2x factor is after memory overscription which already gives some more room for hash tables. With this patch even in non-vectorized case we are pretty close when ThreadMXBean info is used. The idea is to get close to noconditional task size + oversubscribed memory. So relaxed it to 1.5x :) > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146089#comment-16146089 ] Sergey Shelukhin commented on HIVE-17304: - Why did the config change? Otherwise looks good. Might need some realistic testing. > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124653#comment-16124653 ] Hive QA commented on HIVE-17304: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881589/HIVE-17304.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=228) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6368/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6368/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6368/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881589 - PreCommit-HIVE-Build > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124382#comment-16124382 ] Prasanth Jayachandran commented on HIVE-17304: -- This provides a better estimates and also the estimates are pretty close the actual object size (observed this from heapdumps) atleast for vectorized case. Also bringing down the inflation factor from 2.0 to 1.5 as a result. Still testing this patch on larger dataset. > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)