[jira] [Commented] (HIVE-19206) Automatic memory management for open streaming writers

Hive QA (JIRA) Thu, 03 May 2018 11:50:26 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-19206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462921#comment-16462921
 ]


Hive QA commented on HIVE-19206:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12921497/HIVE-19206.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 14316 tests 
executed
*Failed tests:*
{noformat}
TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed 
out) (batchId=247)
TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed 
out) (batchId=247)
TestNonCatCallsWithCatalog - did not produce a TEST-*.xml file (likely timed 
out) (batchId=217)
TestSequenceFileReadWrite - did not produce a TEST-*.xml file (likely timed 
out) (batchId=247)
TestTxnExIm - did not produce a TEST-*.xml file (likely timed out) (batchId=286)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_13] 
(batchId=253)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_1] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_2] 
(batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative3] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_1_23] 
(batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_skew_1_23] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_13] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_7] 
(batchId=27)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_4]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_dynpart_hashjoin_1]
 (batchId=174)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_stats]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=105)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insertsel_fail] 
(batchId=95)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[load_data_parquet_empty]
 (batchId=96)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_reflect_neg] 
(batchId=96)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_test_error] 
(batchId=96)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_1] 
(batchId=137)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucketmapjoin_negative3]
 (batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_sort_1_23] 
(batchId=143)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_sort_skew_1_23]
 (batchId=111)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_13] 
(batchId=122)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=228)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=228)
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 (batchId=232)
org.apache.hadoop.hive.ql.parse.TestCopyUtils.testPrivilegedDistCpWithSameUserAsCurrentDoesNotTryToImpersonate
 (batchId=231)
org.apache.hadoop.hive.ql.parse.TestReplicationOnHDFSEncryptedZones.targetAndSourceHaveDifferentEncryptionZoneKeys
 (batchId=231)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=235)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=235)
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp (batchId=239)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testMultipleTriggers2 
(batchId=241)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomCreatedFiles 
(batchId=241)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomNonExistent 
(batchId=241)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerCustomReadOps 
(batchId=241)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighBytesRead 
(batchId=241)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighBytesWrite 
(batchId=241)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes 
(batchId=241)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerSlowQueryElapsedTime
 (batchId=241)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerSlowQueryExecutionTime
 (batchId=241)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerVertexRawInputSplitsNoKill
 (batchId=241)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/10658/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/10658/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-10658/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 48 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12921497 - PreCommit-HIVE-Build

> Automatic memory management for open streaming writers
> ------------------------------------------------------
>
>                 Key: HIVE-19206
>                 URL: https://issues.apache.org/jira/browse/HIVE-19206
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Streaming
>    Affects Versions: 3.0.0, 3.1.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Major
>         Attachments: HIVE-19206.1.patch, HIVE-19206.2.patch, 
> HIVE-19206.3.patch
>
>
> Problem:
>  When there are 100s of record updaters open, the amount of memory required 
> by orc writers keeps growing because of ORC's internal buffers. This can lead 
> to potential high GC or OOM during streaming ingest.
> Solution:
>  The high level idea is for the streaming connection to remember all the open 
> record updaters and flush the record updater periodically (at some interval). 
> Records written to each record updater can be used as a metric to determine 
> the candidate record updaters for flushing. 
>  If stripe size of orc file is 64MB, the default memory management check 
> happens only after every 5000 rows which may which may be too late when there 
> are too many concurrent writers in a process. Example case would be 100 
> writers open and each of them have almost full stripe of 64MB buffered data, 
> this would take 100*64MB ~=6GB of memory. When all of the record writers 
> flush, the memory usage drops down to 100*~2MB which is just ~200MB memory 
> usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19206) Automatic memory management for open streaming writers

Reply via email to