[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855178#comment-17855178 ] ASF subversion and git services commented on IMPALA-13075: -- Commit b1320bd1d646eba3f044ef647b7d4497487d4674 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b1320bd1d ] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB ExprValuesCache uses BATCH_SIZE as a deciding factor to set its capacity. It bounds the capacity such that expr_values_array_ memory usage stays below 256KB. This patch tightens that limit to include all memory usage from ExprValuesCache::MemUsage() instead of expr_values_array_ only. Therefore, setting a very high BATCH_SIZE will not push the total memory usage of ExprValuesCache beyond 256KB. Simplify table dimension creation methods and fix few flake8 warnings in test_dimensions.py. Testing: - Add test_join_queries.py::TestExprValueCache. - Pass core tests. Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0 Reviewed-on: http://gerrit.cloudera.org:8080/21455 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.5.0 > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849193#comment-17849193 ] Riza Suminto commented on IMPALA-13075: --- Filed patch to tighten memory usage of ExprValuesCache at: [https://gerrit.cloudera.org/c/21455/] > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Assignee: Riza Suminto >Priority: Major > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848720#comment-17848720 ] Riza Suminto commented on IMPALA-13075: --- I think I managed to reproduce the issue myself with TPC-DS Q97 {code:java} set RUNTIME_FILTER_MODE=OFF; set BATCH_SIZE=65536; set MEM_LIMIT=149mb; use tpcds_partitioned_parquet_snap; with ssci as ( select ss_customer_sk customer_sk ,ss_item_sk item_sk from store_sales,date_dim where ss_sold_date_sk = d_date_sk and d_month_seq between 1199 and 1199 + 11 group by ss_customer_sk ,ss_item_sk), csci as( select cs_bill_customer_sk customer_sk ,cs_item_sk item_sk from catalog_sales,date_dim where cs_sold_date_sk = d_date_sk and d_month_seq between 1199 and 1199 + 11 group by cs_bill_customer_sk ,cs_item_sk) select sum(case when ssci.customer_sk is not null and csci.customer_sk is null then 1 else 0 end) store_only ,sum(case when ssci.customer_sk is null and csci.customer_sk is not null then 1 else 0 end) catalog_only ,sum(case when ssci.customer_sk is not null and csci.customer_sk is not null then 1 else 0 end) store_and_catalog from ssci full outer join csci on (ssci.customer_sk=csci.customer_sk and ssci.item_sk = csci.item_sk) limit 100;{code} Query failed with following status in profile: {code:java} Query State: EXCEPTION Impala Query State: ERROR Query Status: Memory limit exceeded: ParquetColumnChunkReader::ReadDataPage() failed to allocate 258 bytes for decompressed data. HDFS_SCAN_NODE (id=4) could not allocate 258.00 B without exceeding limit. Error occurred on backend rsuminto-22746:27000 by fragment 084e26373d5447b1:c38102460006 Memory left in process limit: 11.62 GB Memory left in query limit: -92.75 KB Query(084e26373d5447b1:c3810246): memory limit exceeded. Limit=149.00 MB Reservation=116.00 MB ReservationLimit=117.00 MB OtherMemory=33.09 MB Total=149.09 MB Peak=149.88 MB Unclaimed reservations: Reservation=51.00 MB OtherMemory=0 Total=51.00 MB Peak=117.00 MB Fragment 084e26373d5447b1:c38102460005: Reservation=0 OtherMemory=0 Total=0 Peak=3.56 MB HDFS_SCAN_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=3.02 MB KrpcDataStreamSender (dst_id=13): Total=0 Peak=49.12 KB RowBatchSerialization: Total=0 Peak=6.46 KB CodeGen: Total=1.86 KB Peak=1.86 KB Fragment 084e26373d5447b1:c38102460006: Reservation=11.06 MB OtherMemory=12.29 MB Total=23.36 MB Peak=28.42 MB AGGREGATION_NODE (id=7): Reservation=9.00 MB OtherMemory=8.79 MB Total=17.79 MB Peak=18.61 MB GroupingAggregator 0: Reservation=9.00 MB OtherMemory=452.00 KB Total=9.44 MB Peak=9.44 MB Exprs: Total=452.00 KB Peak=452.00 KB HASH_JOIN_NODE (id=6): Reservation=1.94 MB OtherMemory=1.64 MB Total=3.58 MB Peak=4.58 MB Exprs: Total=584.00 KB Peak=584.00 KB Hash Join Builder (join_node_id=6): Total=584.00 KB Peak=1.07 MB Hash Join Builder (join_node_id=6) Exprs: Total=584.00 KB Peak=584.00 KB HDFS_SCAN_NODE (id=4): Reservation=128.00 KB OtherMemory=1.32 MB Total=1.44 MB Peak=6.20 MB EXCHANGE_NODE (id=13): Reservation=0 OtherMemory=0 Total=0 Peak=16.00 KB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=14): Total=42.66 KB Peak=42.66 KB RowBatchSerialization: Total=0 Peak=0 Fragment 084e26373d5447b1:c38102460001: Reservation=0 OtherMemory=0 Total=0 Peak=3.37 MB HDFS_SCAN_NODE (id=1): Reservation=0 OtherMemory=0 Total=0 Peak=2.83 MB KrpcDataStreamSender (dst_id=10): Total=0 Peak=49.12 KB RowBatchSerialization: Total=0 Peak=6.46 KB CodeGen: Total=1.86 KB Peak=1.86 KB Fragment 084e26373d5447b1:c38102460002: Reservation=19.94 MB OtherMemory=16.35 MB Total=36.28 MB Peak=38.69 MB AGGREGATION_NODE (id=3): Reservation=17.00 MB OtherMemory=9.61 MB Total=26.61 MB Peak=26.61 MB GroupingAggregator 0: Reservation=17.00 MB OtherMemory=452.00 KB Total=17.44 MB Peak=17.44 MB Exprs: Total=452.00 KB Peak=452.00 KB HASH_JOIN_NODE (id=2): Reservation=1.94 MB OtherMemory=1.64 MB Total=3.58 MB Peak=4.58 MB Exprs: Total=584.00 KB Peak=584.00 KB Hash Join Builder (join_node_id=2): Total=584.00 KB Peak=1.07 MB Hash Join Builder (join_node_id=2) Exprs: Total=584.00 KB Peak=584.00 KB HDFS_SCAN_NODE (id=0): Reservation=1.00 MB OtherMemory=4.32 MB Total=5.32 MB Peak=7.96 MB Queued Batches: Total=4.32 MB Peak=5.64 MB EXCHANGE_NODE (id=10): Reservation=0 OtherMemory=0 Total=0 Peak=16.00 KB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=11): Total=254.71 KB Peak=294.71 KB RowBatchSerialization: Total=128.05 KB Peak=144.05 KB Fragment 084e26373d5447b1:c38102460009: Reservation=34.00 MB OtherMemory=3.84 MB Total=37.84 MB Peak=37.84 MB AGGREGATION_NODE (id=9): Total=4.00 KB Peak=4.00 KB
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846065#comment-17846065 ] Riza Suminto commented on IMPALA-13075: --- Comparing between [^Mem_Limit_1G_Failed.txt] and [^Batch_size_0_Success.txt], I see that both has "Per-Host Resurce Estimates" beyond 1GB MEM_LIMIT, and "Per-Host Resource Estimates" that is barely under MEM_LIMIT for some nodes. That being said, I also see abnormally high "PeakMemoryUsage" for HASH_JOIN_NODE (id=8) at cdp-datahub-prod-worker4.cdp-prod.c8g3-pfxs.cloudera.site:27000: {code:java} Fragment F07 - InactiveTotalTime: 0ns (0) - TotalTime: 0ns (0) Instance eb4b5cd4eb0fa41e:80d577e00053 (host=cdp-datahub-prod-worker4.cdp-prod.c8g3-pfxs.cloudera.site:27000) ... HASH_JOIN_NODE (id=8) ExecOption: Join Build-Side Prepared Asynchronously - InactiveTotalTime: 0ns (0) - LocalTime: 2.7m (164763538054) - PeakMemoryUsage: 7.6 GiB (8132571136) - ProbeRows: 65,536 (65536) - ProbeRowsPartitioned: 0 (0) - ProbeTime: 2.7m (164759538041) - RowsReturned: 0 (0) - RowsReturnedRate: 0 per second (0) - TotalTime: 2.8m (166909545061) {code} This high memory usage seems linked with IMPALA-3286 where HashTableCtx::ExprValuesCache::capacity_ is pushed up by BATCH_SIZE number. [https://github.com/apache/impala/blob/0d215da8d4e3f93ad3c1cd72aa801fbcb9464fb0/be/src/exec/hash-table.cc#L369-L373] Note also that the failed query does not have "Probe Side Codegen Enabled" as in the success query. > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846040#comment-17846040 ] Ezra Zerihun commented on IMPALA-13075: --- [^Failed (1).txt] [^Success (1).txt] [^Failed_Cognos_pool.txt] [^Mem_Limit_1G_Failed.txt] [^Success_Tableau_Pool.txt] [^Batch_size_0_Success.txt] Main comparisons, which I made in description can be: [^Mem_Limit_1G_Failed.txt] vs [^Batch_size_0_Success.txt] > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > Attachments: Batch_size_0_Success.txt, Failed (1).txt, > Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, > Success_Tableau_Pool.txt > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846017#comment-17846017 ] Riza Suminto commented on IMPALA-13075: --- Yes, BATCH_SIZE number is a basic unit of how Impala estimate / allocate memory. [https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches] Both Frontend Planner and Backend Executor respect this BATCH_SIZE number. If MEM_LIMIT still above minimum memory resource requirement, I would expect that query can still get admitted and run even though it is not performant (ie., it need to spill rows to disk). Each fragment claim their minimum memory requirement right after they're instantiated. Please attach the full query profile of both good and bad run so we can analyze it more. > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846008#comment-17846008 ] Ezra Zerihun commented on IMPALA-13075: --- This seems to be expected behavior as high BATCH_SIZE will store more rows into memory. Even documentation mentions the higher memory footprint. But I have query profiles from a customer who observed behavior above and did not realize why queries failed with out of memory when their pool set BATCH_SIZE to max limit of 65536. So just thought to make this improvement Jira in case anything can be improved memory consumption of setting BATCH_SIZE. If not, feel free to close. > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Priority: Major > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org