[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823852#comment-17823852
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

cgivre merged PR #2878:
URL: https://github.com/apache/drill/pull/2878




> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823846#comment-17823846
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1513773463


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   stack 
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823843#comment-17823843
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1513768876


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {
+  rightIterator.close();
+  throw UserException.executionError(e)

Review Comment:
it throw exception from  method clearInflightBatches() , but it has cleared 
the memory by   clear(); so it does not affect memory leaks  ,see following 
code 
   
   `  public void close() {
   clear();
   clearInflightBatches();
 }`





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823842#comment-17823842
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1513766703


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
add exception info  ?
   ```
   try {
 leftIterator.close();
   } catch (QueryCancelledException qce) {
 throw UserException.executionError(qce)
 .message("Failed when depleting incoming batches, probably because 
query was cancelled " +
 "by executor had some error")
 .build(logger);
   } catch (Exception e) {
 throw UserException.internalError(e)
 .message("Failed when depleting incoming batches")
 .build(logger);
   } finally {
 // todo catch exception info or By default,the exception is thrown 
directly ?
 rightIterator.close();
   }
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823643#comment-17823643
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

cgivre commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512908376


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {
+  rightIterator.close();
+  throw UserException.executionError(e)

Review Comment:
   What happens if the right iterator doesn't close properly?





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823641#comment-17823641
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   @cgivre 
   In my test case ,it throw  "QueryCancelledException",because some 
minorfragment throw .OutOfMemoryException ,so it inform foreman failed.
   
   foreman send "QueryCancel" commands to other minorfragments. it  throws 
QueryCancelledException after the method "incoming.next()"   called method 
checkContinue() 
   
   Although the "checkContinue" phase throws a fixed "QueryCancelledException" 
message, I am not sure what is causing it (In my test case 
,OutOfMemoryException cause exception)
   
   
```
public void clearInflightBatches() {
   while (lastOutcome == IterOutcome.OK || lastOutcome == 
IterOutcome.OK_NEW_SCHEMA) {
 // Clear all buffers from incoming.
 for (VectorWrapper wrapper : incoming) {
   wrapper.getValueVector().clear();
 }
 lastOutcome = incoming.next();
   }
 }
  
  public void checkContinue() {
 if (!shouldContinue()) {
   throw new QueryCancelledException();
 }
   }
 }
   ```
   
   **stack**
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was 

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823611#comment-17823611
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   @cgivre 
   In my test case ,it throw  "QueryCancelledException",because some 
minorfragment throw .OutOfMemoryException ,so it inform foreman failed.
   
   foreman send "QueryCancel" commands to other minorfragments. it  throws 
QueryCancelledException after the method "incoming.next()"   called method 
checkContinue() 
   
   Although the "checkContinue" phase throws a fixed "QueryCancelledException" 
message, I am not sure what is causing it (In my test case 
,OutOfMemoryException cause exception)
   
   
```
public void clearInflightBatches() {
   while (lastOutcome == IterOutcome.OK || lastOutcome == 
IterOutcome.OK_NEW_SCHEMA) {
 // Clear all buffers from incoming.
 for (VectorWrapper wrapper : incoming) {
   wrapper.getValueVector().clear();
 }
 lastOutcome = incoming.next();
   }
 }
  
  public void checkContinue() {
 if (!shouldContinue()) {
   throw new QueryCancelledException();
 }
   }
 }
   ```
   
   **stack**
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was 

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823597#comment-17823597
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   @cgivre 
   In my test case ,it throw  "QueryCancelledException",because some 
minorfragment throw .OutOfMemoryException ,so it inform foreman failed.
   
   foreman send "QueryCancel" commands to other minorfragments. it  throws 
QueryCancelledException after the method "incoming.next()"   called method 
checkContinue() method
   
   Although the "checkContinue" phase throws a fixed "QueryCancelledException" 
message, I am not sure what is causing it (In my test case 
,OutOfMemoryException cause exception)
   
   
```
public void clearInflightBatches() {
   while (lastOutcome == IterOutcome.OK || lastOutcome == 
IterOutcome.OK_NEW_SCHEMA) {
 // Clear all buffers from incoming.
 for (VectorWrapper wrapper : incoming) {
   wrapper.getValueVector().clear();
 }
 lastOutcome = incoming.next();
   }
 }
  
  public void checkContinue() {
 if (!shouldContinue()) {
   throw new QueryCancelledException();
 }
   }
 }
   ```
   
   **stack**
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message