[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846065#comment-17846065 ]
Riza Suminto commented on IMPALA-13075: --------------------------------------- Comparing between [^Mem_Limit_1G_Failed.txt] and [^Batch_size_0_Success.txt], I see that both has "Per-Host Resurce Estimates" beyond 1GB MEM_LIMIT, and "Per-Host Resource Estimates" that is barely under MEM_LIMIT for some nodes. That being said, I also see abnormally high "PeakMemoryUsage" for HASH_JOIN_NODE (id=8) at cdp-datahub-prod-worker4.cdp-prod.c8g3-pfxs.cloudera.site:27000: {code:java} Fragment F07 - InactiveTotalTime: 0ns (0) - TotalTime: 0ns (0) Instance eb4b5cd4eb0fa41e:80d577e000000053 (host=cdp-datahub-prod-worker4.cdp-prod.c8g3-pfxs.cloudera.site:27000) ... HASH_JOIN_NODE (id=8) ExecOption: Join Build-Side Prepared Asynchronously - InactiveTotalTime: 0ns (0) - LocalTime: 2.7m (164763538054) - PeakMemoryUsage: 7.6 GiB (8132571136) - ProbeRows: 65,536 (65536) - ProbeRowsPartitioned: 0 (0) - ProbeTime: 2.7m (164759538041) - RowsReturned: 0 (0) - RowsReturnedRate: 0 per second (0) - TotalTime: 2.8m (166909545061) {code} This high memory usage seems linked with IMPALA-3286 where HashTableCtx::ExprValuesCache::capacity_ is pushed up by BATCH_SIZE number. [https://github.com/apache/impala/blob/0d215da8d4e3f93ad3c1cd72aa801fbcb9464fb0/be/src/exec/hash-table.cc#L369-L373] Note also that the failed query does not have "Probe Side Codegen Enabled" as in the success query. > Setting very high BATCH_SIZE can blow up memory usage of fragments > ------------------------------------------------------------------ > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 4.0.0 > Reporter: Ezra Zerihun > Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org