[ https://issues.apache.org/jira/browse/HIVE-19206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462921#comment-16462921 ]
Hive QA commented on HIVE-19206: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12921497/HIVE-19206.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 14316 tests executed *Failed tests:* {noformat} TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed out) (batchId=247) TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed out) (batchId=247) TestNonCatCallsWithCatalog - did not produce a TEST-*.xml file (likely timed out) (batchId=217) TestSequenceFileReadWrite - did not produce a TEST-*.xml file (likely timed out) (batchId=247) TestTxnExIm - did not produce a TEST-*.xml file (likely timed out) (batchId=286) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_13] (batchId=253) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_1] (batchId=68) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_2] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative3] (batchId=29) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_1_23] (batchId=81) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_skew_1_23] (batchId=9) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_13] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_7] (batchId=27) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_4] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_dynpart_hashjoin_1] (batchId=174) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_stats] (batchId=159) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=105) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insertsel_fail] (batchId=95) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[load_data_parquet_empty] (batchId=96) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_reflect_neg] (batchId=96) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_test_error] (batchId=96) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_1] (batchId=137) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_2] (batchId=133) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucketmapjoin_negative3] (batchId=120) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_sort_1_23] (batchId=143) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_sort_skew_1_23] (batchId=111) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_13] (batchId=122) org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=228) org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 (batchId=228) org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 (batchId=232) org.apache.hadoop.hive.ql.parse.TestCopyUtils.testPrivilegedDistCpWithSameUserAsCurrentDoesNotTryToImpersonate (batchId=231) org.apache.hadoop.hive.ql.parse.TestReplicationOnHDFSEncryptedZones.targetAndSourceHaveDifferentEncryptionZoneKeys (batchId=231) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=235) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel (batchId=235) org.apache.hive.jdbc.TestSSL.testSSLFetchHttp (batchId=239) org.apache.hive.jdbc.TestTriggersWorkloadManager.testMultipleTriggers2 (batchId=241) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomCreatedFiles (batchId=241) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomNonExistent (batchId=241) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomReadOps (batchId=241) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighBytesRead (batchId=241) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighBytesWrite (batchId=241) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes (batchId=241) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerSlowQueryElapsedTime (batchId=241) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerSlowQueryExecutionTime (batchId=241) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerVertexRawInputSplitsNoKill (batchId=241) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/10658/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/10658/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-10658/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 48 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12921497 - PreCommit-HIVE-Build > Automatic memory management for open streaming writers > ------------------------------------------------------ > > Key: HIVE-19206 > URL: https://issues.apache.org/jira/browse/HIVE-19206 > Project: Hive > Issue Type: Sub-task > Components: Streaming > Affects Versions: 3.0.0, 3.1.0 > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Priority: Major > Attachments: HIVE-19206.1.patch, HIVE-19206.2.patch, > HIVE-19206.3.patch > > > Problem: > When there are 100s of record updaters open, the amount of memory required > by orc writers keeps growing because of ORC's internal buffers. This can lead > to potential high GC or OOM during streaming ingest. > Solution: > The high level idea is for the streaming connection to remember all the open > record updaters and flush the record updater periodically (at some interval). > Records written to each record updater can be used as a metric to determine > the candidate record updaters for flushing. > If stripe size of orc file is 64MB, the default memory management check > happens only after every 5000 rows which may which may be too late when there > are too many concurrent writers in a process. Example case would be 100 > writers open and each of them have almost full stripe of 64MB buffered data, > this would take 100*64MB ~=6GB of memory. When all of the record writers > flush, the memory usage drops down to 100*~2MB which is just ~200MB memory > usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)