[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-06-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855178#comment-17855178
 ] 

ASF subversion and git services commented on IMPALA-13075:
--

Commit b1320bd1d646eba3f044ef647b7d4497487d4674 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b1320bd1d ]

IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB

ExprValuesCache uses BATCH_SIZE as a deciding factor to set its
capacity. It bounds the capacity such that expr_values_array_ memory
usage stays below 256KB. This patch tightens that limit to include all
memory usage from ExprValuesCache::MemUsage() instead of
expr_values_array_ only. Therefore, setting a very high BATCH_SIZE will
not push the total memory usage of ExprValuesCache beyond 256KB.

Simplify table dimension creation methods and fix few flake8 warnings in
test_dimensions.py.

Testing:
- Add test_join_queries.py::TestExprValueCache.
- Pass core tests.

Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0
Reviewed-on: http://gerrit.cloudera.org:8080/21455
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For 

[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-24 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849193#comment-17849193
 ] 

Riza Suminto commented on IMPALA-13075:
---

Filed patch to tighten memory usage of ExprValuesCache at:

[https://gerrit.cloudera.org/c/21455/] 

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Assignee: Riza Suminto
>Priority: Major
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-22 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848720#comment-17848720
 ] 

Riza Suminto commented on IMPALA-13075:
---

I think I managed to reproduce the issue myself with TPC-DS Q97

 
{code:java}
set RUNTIME_FILTER_MODE=OFF;
set BATCH_SIZE=65536;
set MEM_LIMIT=149mb;

use tpcds_partitioned_parquet_snap;

with ssci as (
select ss_customer_sk customer_sk
  ,ss_item_sk item_sk
from store_sales,date_dim
where ss_sold_date_sk = d_date_sk
  and d_month_seq between 1199 and 1199 + 11
group by ss_customer_sk
,ss_item_sk),
csci as(
 select cs_bill_customer_sk customer_sk
  ,cs_item_sk item_sk
from catalog_sales,date_dim
where cs_sold_date_sk = d_date_sk
  and d_month_seq between 1199 and 1199 + 11
group by cs_bill_customer_sk
,cs_item_sk)
 select  sum(case when ssci.customer_sk is not null and csci.customer_sk is 
null then 1 else 0 end) store_only
  ,sum(case when ssci.customer_sk is null and csci.customer_sk is not null 
then 1 else 0 end) catalog_only
  ,sum(case when ssci.customer_sk is not null and csci.customer_sk is not 
null then 1 else 0 end) store_and_catalog
from ssci full outer join csci on (ssci.customer_sk=csci.customer_sk
   and ssci.item_sk = csci.item_sk)
limit 100;{code}
Query failed with following status in profile:
{code:java}
Query State: EXCEPTION
Impala Query State: ERROR
Query Status: Memory limit exceeded: 
ParquetColumnChunkReader::ReadDataPage() failed to allocate 258 bytes for 
decompressed data.
HDFS_SCAN_NODE (id=4) could not allocate 258.00 B without exceeding limit.
Error occurred on backend rsuminto-22746:27000 by fragment 
084e26373d5447b1:c38102460006
Memory left in process limit: 11.62 GB
Memory left in query limit: -92.75 KB
Query(084e26373d5447b1:c3810246): memory limit exceeded. Limit=149.00 
MB Reservation=116.00 MB ReservationLimit=117.00 MB OtherMemory=33.09 MB 
Total=149.09 MB Peak=149.88 MB
  Unclaimed reservations: Reservation=51.00 MB OtherMemory=0 Total=51.00 MB 
Peak=117.00 MB
  Fragment 084e26373d5447b1:c38102460005: Reservation=0 OtherMemory=0 
Total=0 Peak=3.56 MB
HDFS_SCAN_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=3.02 MB
KrpcDataStreamSender (dst_id=13): Total=0 Peak=49.12 KB
  RowBatchSerialization: Total=0 Peak=6.46 KB
  CodeGen: Total=1.86 KB Peak=1.86 KB
  Fragment 084e26373d5447b1:c38102460006: Reservation=11.06 MB 
OtherMemory=12.29 MB Total=23.36 MB Peak=28.42 MB
AGGREGATION_NODE (id=7): Reservation=9.00 MB OtherMemory=8.79 MB 
Total=17.79 MB Peak=18.61 MB
  GroupingAggregator 0: Reservation=9.00 MB OtherMemory=452.00 KB 
Total=9.44 MB Peak=9.44 MB
Exprs: Total=452.00 KB Peak=452.00 KB
HASH_JOIN_NODE (id=6): Reservation=1.94 MB OtherMemory=1.64 MB Total=3.58 
MB Peak=4.58 MB
  Exprs: Total=584.00 KB Peak=584.00 KB
  Hash Join Builder (join_node_id=6): Total=584.00 KB Peak=1.07 MB
Hash Join Builder (join_node_id=6) Exprs: Total=584.00 KB Peak=584.00 KB
HDFS_SCAN_NODE (id=4): Reservation=128.00 KB OtherMemory=1.32 MB Total=1.44 
MB Peak=6.20 MB
EXCHANGE_NODE (id=13): Reservation=0 OtherMemory=0 Total=0 Peak=16.00 KB
  KrpcDeferredRpcs: Total=0 Peak=0
KrpcDataStreamSender (dst_id=14): Total=42.66 KB Peak=42.66 KB
  RowBatchSerialization: Total=0 Peak=0
  Fragment 084e26373d5447b1:c38102460001: Reservation=0 OtherMemory=0 
Total=0 Peak=3.37 MB
HDFS_SCAN_NODE (id=1): Reservation=0 OtherMemory=0 Total=0 Peak=2.83 MB
KrpcDataStreamSender (dst_id=10): Total=0 Peak=49.12 KB
  RowBatchSerialization: Total=0 Peak=6.46 KB
  CodeGen: Total=1.86 KB Peak=1.86 KB
  Fragment 084e26373d5447b1:c38102460002: Reservation=19.94 MB 
OtherMemory=16.35 MB Total=36.28 MB Peak=38.69 MB
AGGREGATION_NODE (id=3): Reservation=17.00 MB OtherMemory=9.61 MB 
Total=26.61 MB Peak=26.61 MB
  GroupingAggregator 0: Reservation=17.00 MB OtherMemory=452.00 KB 
Total=17.44 MB Peak=17.44 MB
Exprs: Total=452.00 KB Peak=452.00 KB
HASH_JOIN_NODE (id=2): Reservation=1.94 MB OtherMemory=1.64 MB Total=3.58 
MB Peak=4.58 MB
  Exprs: Total=584.00 KB Peak=584.00 KB
  Hash Join Builder (join_node_id=2): Total=584.00 KB Peak=1.07 MB
Hash Join Builder (join_node_id=2) Exprs: Total=584.00 KB Peak=584.00 KB
HDFS_SCAN_NODE (id=0): Reservation=1.00 MB OtherMemory=4.32 MB Total=5.32 
MB Peak=7.96 MB
  Queued Batches: Total=4.32 MB Peak=5.64 MB
EXCHANGE_NODE (id=10): Reservation=0 OtherMemory=0 Total=0 Peak=16.00 KB
  KrpcDeferredRpcs: Total=0 Peak=0
KrpcDataStreamSender (dst_id=11): Total=254.71 KB Peak=294.71 KB
  RowBatchSerialization: Total=128.05 KB Peak=144.05 KB
  Fragment 084e26373d5447b1:c38102460009: Reservation=34.00 MB 
OtherMemory=3.84 MB Total=37.84 MB Peak=37.84 MB
AGGREGATION_NODE (id=9): Total=4.00 KB Peak=4.00 KB
  

[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846065#comment-17846065
 ] 

Riza Suminto commented on IMPALA-13075:
---

Comparing between [^Mem_Limit_1G_Failed.txt] and [^Batch_size_0_Success.txt], I 
see that both has "Per-Host Resurce Estimates" beyond 1GB MEM_LIMIT, and 
"Per-Host Resource Estimates" that is barely under MEM_LIMIT for some nodes.

That being said, I also see abnormally high "PeakMemoryUsage" for  
HASH_JOIN_NODE (id=8) at 
cdp-datahub-prod-worker4.cdp-prod.c8g3-pfxs.cloudera.site:27000:

 
{code:java}
Fragment F07
  - InactiveTotalTime: 0ns (0)
  - TotalTime: 0ns (0)
  Instance eb4b5cd4eb0fa41e:80d577e00053 
(host=cdp-datahub-prod-worker4.cdp-prod.c8g3-pfxs.cloudera.site:27000)
...
  HASH_JOIN_NODE (id=8)
ExecOption: Join Build-Side Prepared Asynchronously
- InactiveTotalTime: 0ns (0)
- LocalTime: 2.7m (164763538054)
- PeakMemoryUsage: 7.6 GiB (8132571136)
- ProbeRows: 65,536 (65536)
- ProbeRowsPartitioned: 0 (0)
- ProbeTime: 2.7m (164759538041)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0 per second (0)
- TotalTime: 2.8m (166909545061) {code}
This high memory usage seems linked with IMPALA-3286 where 
HashTableCtx::ExprValuesCache::capacity_ is pushed up by BATCH_SIZE number.

[https://github.com/apache/impala/blob/0d215da8d4e3f93ad3c1cd72aa801fbcb9464fb0/be/src/exec/hash-table.cc#L369-L373]

Note also that the failed query does not have "Probe Side Codegen Enabled" as 
in the success query.

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
> Attachments: Batch_size_0_Success.txt, Failed (1).txt, 
> Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, 
> Success_Tableau_Pool.txt
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       

[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846040#comment-17846040
 ] 

Ezra Zerihun commented on IMPALA-13075:
---

[^Failed (1).txt]

[^Success (1).txt]

[^Failed_Cognos_pool.txt]

[^Mem_Limit_1G_Failed.txt]

[^Success_Tableau_Pool.txt]

[^Batch_size_0_Success.txt]

 

Main comparisons, which I made in description can be:
[^Mem_Limit_1G_Failed.txt] vs [^Batch_size_0_Success.txt]

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
> Attachments: Batch_size_0_Success.txt, Failed (1).txt, 
> Failed_Cognos_pool.txt, Mem_Limit_1G_Failed.txt, Success (1).txt, 
> Success_Tableau_Pool.txt
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846017#comment-17846017
 ] 

Riza Suminto commented on IMPALA-13075:
---

Yes, BATCH_SIZE number is a basic unit of how Impala estimate / allocate memory.
[https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches] 

Both Frontend Planner and Backend Executor respect this BATCH_SIZE number. If 
MEM_LIMIT still above minimum memory resource requirement, I would expect that 
query can still get admitted and run even though it is not performant (ie., it 
need to spill rows to disk). Each fragment claim their minimum memory 
requirement right after they're instantiated.

Please attach the full query profile of both good and bad run so we can analyze 
it more.

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-05-13 Thread Ezra Zerihun (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846008#comment-17846008
 ] 

Ezra Zerihun commented on IMPALA-13075:
---

This seems to be expected behavior as high BATCH_SIZE will store more rows into 
memory. Even documentation mentions the higher memory footprint.

But I have query profiles from a customer who observed behavior above and did 
not realize why queries failed with out of memory when their pool set 
BATCH_SIZE to max limit of 65536. So just thought to make this improvement Jira 
in case anything can be improved memory consumption of setting BATCH_SIZE. If 
not, feel free to close.

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Priority: Major
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org