from:"ASF GitHub Bot \(Jira\)"

[jira] [Commented] (DRILL-8501) Json Conversion UDF Not Respecting System JSON Options

2024-07-04 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863057#comment-17863057
 ] 

ASF GitHub Bot commented on DRILL-8501:
---

cgivre opened a new pull request, #2921:
URL: https://github.com/apache/drill/pull/2921

   # [DRILL-8501](https://issues.apache.org/jira/browse/DRILL-): Json 
Conversion UDF Not Respecting System JSON Options
   
   ## Description
   The `convert_fromJSON()` function was ignoring Drill system configuration 
variables for reading JSON.  This PR adds support for `allTextMode` and 
`readNumbersAsDouble` to this function.  Once merged, the `convert_fromJSON()` 
function will follow the system settings.
   
   I also split one of the unit test files because it had all the UDF tests 
mixed with NaN tests. 
   
   ## Documentation
   No user facing changes.
   
   ## Testing
   Added unit tests.  




> Json Conversion UDF Not Respecting System JSON Options
> --
>
> Key: DRILL-8501
> URL: https://issues.apache.org/jira/browse/DRILL-8501
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.21.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>
> The convert_fromJSON() UDF does not respect the system JSON options of 
> allTextMode and readAllNumbersAsDouble.  
> This PR fixes that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8498) Sqlline illegal reflective access warning

2024-06-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854278#comment-17854278
 ] 

ASF GitHub Bot commented on DRILL-8498:
---

jnturton merged PR #2915:
URL: https://github.com/apache/drill/pull/2915




> Sqlline illegal reflective access warning
> -
>
> Key: DRILL-8498
> URL: https://issues.apache.org/jira/browse/DRILL-8498
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Minor
>
> Sqlline has the following warnings on connection to Drill
> {code:java}
> apache drill> !connect jdbc:drill:drillbit=localhost;
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> javassist.util.proxy.SecurityActions 
> ([file:/apache-drill-1.21.2/jars/3rdparty/javassist-3.28.0-GA.jar]file:/apache-drill-1.21.2/jars/3rdparty/javassist-3.28.0-GA.jar
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8498) Sqlline illegal reflective access warning

2024-06-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854277#comment-17854277
 ] 

ASF GitHub Bot commented on DRILL-8498:
---

jnturton commented on PR #2915:
URL: https://github.com/apache/drill/pull/2915#issuecomment-2162255485

   Thank you!




> Sqlline illegal reflective access warning
> -
>
> Key: DRILL-8498
> URL: https://issues.apache.org/jira/browse/DRILL-8498
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Minor
>
> Sqlline has the following warnings on connection to Drill
> {code:java}
> apache drill> !connect jdbc:drill:drillbit=localhost;
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> javassist.util.proxy.SecurityActions 
> ([file:/apache-drill-1.21.2/jars/3rdparty/javassist-3.28.0-GA.jar]file:/apache-drill-1.21.2/jars/3rdparty/javassist-3.28.0-GA.jar
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8495) Tried to remove unmanaged buffer

2024-05-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847325#comment-17847325
 ] 

ASF GitHub Bot commented on DRILL-8495:
---

jnturton merged PR #2913:
URL: https://github.com/apache/drill/pull/2913




> Tried to remove unmanaged buffer
> 
>
> Key: DRILL-8495
> URL: https://issues.apache.org/jira/browse/DRILL-8495
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
>
>  
> Drill throws an exception on Hive table:
> {code:java}
>   (java.lang.IllegalStateException) Tried to remove unmanaged buffer.
>     org.apache.drill.exec.ops.BufferManagerImpl.replace():51
>     io.netty.buffer.DrillBuf.reallocIfNeeded():101
>     
> org.apache.drill.exec.store.hive.writers.primitive.HiveStringWriter.write():38
>     
> org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.readHiveRecordAndInsertIntoRecordBatch():416
>     
> org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.next():402
>     org.apache.drill.exec.physical.impl.ScanBatch.internalNext():235
>     org.apache.drill.exec.physical.impl.ScanBatch.next():299
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractRecordBatch.next():101
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():59
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():160
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>     
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():93
>     org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0():321
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1899
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():310
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748 {code}
>  
>  
> Reproduce:
>  # Create Hive table:
> {code:java}
> create table if NOT EXISTS students(id int, name string, surname string) 
> stored as parquet;{code}
>  # Insert a new row with 2 string values of size > 256 bytes:
> {code:java}
> insert into students values (1, 
> 'Veeery
>  long name', 
> 'bg
>  surname');{code}
>  # Execute Drill query:
> {code:java}
> select * from hive.`students` {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8492) Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values

2024-05-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847324#comment-17847324
 ] 

ASF GitHub Bot commented on DRILL-8492:
---

jnturton commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2117674516

   It's always bugged me that we don't have a globally accessible way of 
accessing at least one of DrillbitContext, QueryContext, FragmentContext or 
just OptionManager. We hardly want to have to spray these things through APIs 
everywhere in Drill. I'll take a look at whether something can be done...




> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> ---
>
> Key: DRILL-8492
> URL: https://issues.apache.org/jira/browse/DRILL-8492
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.21.1
>Reporter: Peter Franzen
>Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM  WHERE  = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8495) Tried to remove unmanaged buffer

2024-05-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847241#comment-17847241
 ] 

ASF GitHub Bot commented on DRILL-8495:
---

rymarm commented on PR #2913:
URL: https://github.com/apache/drill/pull/2913#issuecomment-2117347650

   @jnturton I addressed checkstyle issues and failed java tests. Should be 
fine now)




> Tried to remove unmanaged buffer
> 
>
> Key: DRILL-8495
> URL: https://issues.apache.org/jira/browse/DRILL-8495
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
>
>  
> Drill throws an exception on Hive table:
> {code:java}
>   (java.lang.IllegalStateException) Tried to remove unmanaged buffer.
>     org.apache.drill.exec.ops.BufferManagerImpl.replace():51
>     io.netty.buffer.DrillBuf.reallocIfNeeded():101
>     
> org.apache.drill.exec.store.hive.writers.primitive.HiveStringWriter.write():38
>     
> org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.readHiveRecordAndInsertIntoRecordBatch():416
>     
> org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.next():402
>     org.apache.drill.exec.physical.impl.ScanBatch.internalNext():235
>     org.apache.drill.exec.physical.impl.ScanBatch.next():299
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractRecordBatch.next():101
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():59
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():160
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>     
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():93
>     org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0():321
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1899
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():310
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748 {code}
>  
>  
> Reproduce:
>  # Create Hive table:
> {code:java}
> create table if NOT EXISTS students(id int, name string, surname string) 
> stored as parquet;{code}
>  # Insert a new row with 2 string values of size > 256 bytes:
> {code:java}
> insert into students values (1, 
> 'Veeery
>  long name', 
> 'bg
>  surname');{code}
>  # Execute Drill query:
> {code:java}
> select * from hive.`students` {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8495) Tried to remove unmanaged buffer

2024-05-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846934#comment-17846934
 ] 

ASF GitHub Bot commented on DRILL-8495:
---

jnturton commented on PR #2913:
URL: https://github.com/apache/drill/pull/2913#issuecomment-2115109847

   P.S. I see that checkstyle is still upset.




> Tried to remove unmanaged buffer
> 
>
> Key: DRILL-8495
> URL: https://issues.apache.org/jira/browse/DRILL-8495
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
>
>  
> Drill throws an exception on Hive table:
> {code:java}
>   (java.lang.IllegalStateException) Tried to remove unmanaged buffer.
>     org.apache.drill.exec.ops.BufferManagerImpl.replace():51
>     io.netty.buffer.DrillBuf.reallocIfNeeded():101
>     
> org.apache.drill.exec.store.hive.writers.primitive.HiveStringWriter.write():38
>     
> org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.readHiveRecordAndInsertIntoRecordBatch():416
>     
> org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.next():402
>     org.apache.drill.exec.physical.impl.ScanBatch.internalNext():235
>     org.apache.drill.exec.physical.impl.ScanBatch.next():299
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractRecordBatch.next():101
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():59
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():160
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>     
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():93
>     org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0():321
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1899
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():310
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748 {code}
>  
>  
> Reproduce:
>  # Create Hive table:
> {code:java}
> create table if NOT EXISTS students(id int, name string, surname string) 
> stored as parquet;{code}
>  # Insert a new row with 2 string values of size > 256 bytes:
> {code:java}
> insert into students values (1, 
> 'Veeery
>  long name', 
> 'bg
>  surname');{code}
>  # Execute Drill query:
> {code:java}
> select * from hive.`students` {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8495) Tried to remove unmanaged buffer

2024-05-15 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846594#comment-17846594
 ] 

ASF GitHub Bot commented on DRILL-8495:
---

rymarm opened a new pull request, #2913:
URL: https://github.com/apache/drill/pull/2913

   # [DRILL-8495](https://issues.apache.org/jira/browse/DRILL-8495): Tried to 
remove unmanaged buffer
   
   The root cause of the issue is that multiple HiveWriters use the same 
`DrillBuf` and during execution they may reallocate the buffer if size of the 
buffer is not enough for a value (256 bytes+). Since 
`drillBuf.reallocIfNeeded(int size)` returns a new instance of `DrillBuf`, all 
other writers still have a reference for the old one buffer, which after 
`drillBuf.reallocIfNeeded(int size)` execution is unmanaged now.
   
   ## Description
   
   `HiveValueWriterFactory` now creates a unique `DrillBif` for each writer. 
   
   HiveWriters are actually used one-by-one and we could utilize a single 
buffer for all the writers. To do this, I could create a class holder for 
`DrillBuf`, so each writer has a reference for the same holder, where will be 
stored a new buffer from every `drillBuf.reallocIfNeeded(int size)` call. But I 
thought that such logic looked slightly confusing and I decided just to let 
each HiveWriter use its own buffer.
   
   ## Documentation
   \-
   
   ## Testing
   Add a new unit test to query a Hive table with variable-length values of 
Binary, VarChar, Char and String types.
   




> Tried to remove unmanaged buffer
> 
>
> Key: DRILL-8495
> URL: https://issues.apache.org/jira/browse/DRILL-8495
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
>
>  
> Drill throws an exception on Hive table:
> {code:java}
>   (java.lang.IllegalStateException) Tried to remove unmanaged buffer.
>     org.apache.drill.exec.ops.BufferManagerImpl.replace():51
>     io.netty.buffer.DrillBuf.reallocIfNeeded():101
>     
> org.apache.drill.exec.store.hive.writers.primitive.HiveStringWriter.write():38
>     
> org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.readHiveRecordAndInsertIntoRecordBatch():416
>     
> org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.next():402
>     org.apache.drill.exec.physical.impl.ScanBatch.internalNext():235
>     org.apache.drill.exec.physical.impl.ScanBatch.next():299
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractRecordBatch.next():101
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():59
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():160
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>     
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():93
>     org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0():321
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1899
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():310
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748 {code}
>  
>  
> Reproduce:
>  # Create Hive table:
> {code:java}
> create table if NOT EXISTS students(id int, name string, surname string) 
> stored as parquet;{code}
>  # Insert a new row with 2 string values of size > 256 bytes:
> {code:java}
> insert into students values (1, 
> 'Veeery
>  long name', 
>

[jira] [Commented] (DRILL-8492) Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values

2024-05-13 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845996#comment-17845996
 ] 

ASF GitHub Bot commented on DRILL-8492:
---

handmadecode commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2108103905

   > Follow up: I see use of the FragmentContext class for accessing config 
options in the old Parquet reader, perhaps it's a good a vehicle...
   
   `FragmentContext` is used to access the new config options where an instance 
already was available, i.e. in `ColumnReaderFactory`, 
`ParquetToDrillTypeConverter`, and `DrillParquetGroupConverter`.
   
   `FileMetadataCollector` doesn't have access to a `FragmentContext`, only to 
a `ParquetReaderConfig`.
   I can only find a FragmentContext in one of the two call paths to 
`FileMetadataCollector::addColumnMetadata`, so trying to inject a 
`FragmentContext` into `FileMetadataCollector` will probably have an impact on 
quite a few other classes.




> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> ---
>
> Key: DRILL-8492
> URL: https://issues.apache.org/jira/browse/DRILL-8492
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.21.1
>Reporter: Peter Franzen
>Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM  WHERE  = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8492) Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values

2024-05-13 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845816#comment-17845816
 ] 

ASF GitHub Bot commented on DRILL-8492:
---

jnturton commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2106890548

   > However, I could very well have overlooked some way to access the global 
configuration, and I'd be grateful for any pointers.
   
   We should existing find examples in the Parquet format plugin. E.g. the 
"old" reader is affected by the option store.parquet.reader.pagereader.async.




> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> ---
>
> Key: DRILL-8492
> URL: https://issues.apache.org/jira/browse/DRILL-8492
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.21.1
>Reporter: Peter Franzen
>Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM  WHERE  = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8492) Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values

2024-05-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845701#comment-17845701
 ] 

ASF GitHub Bot commented on DRILL-8492:
---

handmadecode commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2106221734

   > Awesome work. I can backport this too because you've left default 
behaviour unchanged (and it's self contained). My only question is about 
ParquetReaderConfig 

> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> ---
>
> Key: DRILL-8492
> URL: https://issues.apache.org/jira/browse/DRILL-8492
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.21.1
>Reporter: Peter Franzen
>Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM  WHERE  = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8480) Cleanup before finished. 0 out of 1 streams have finished

2024-05-11 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845661#comment-17845661
 ] 

ASF GitHub Bot commented on DRILL-8480:
---

jnturton merged PR #2897:
URL: https://github.com/apache/drill/pull/2897




> Cleanup before finished. 0 out of 1 streams have finished
> -
>
> Key: DRILL-8480
> URL: https://issues.apache.org/jira/browse/DRILL-8480
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
> Attachments: 1a349ff1-d1f9-62bf-ed8c-26346c548005.sys.drill, 
> tableWithNumber2.parquet
>
>
> Drill fails to execute a query with the following exception:
> {code:java}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Cleanup before finished. 0 out of 1 streams have 
> finished
> Fragment: 1:0
> Please, refer to logs for more information.
> [Error Id: 270da8f4-0bb6-4985-bf4f-34853138881c on 
> compute7.vmcluster.com:31010]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:395)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:245)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:362)
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 
> 1 streams have finished
>         at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.close(BaseRawBatchBuffer.java:111)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:71)
>         at 
> org.apache.drill.exec.work.batch.AbstractDataCollector.close(AbstractDataCollector.java:121)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91)
>         at 
> org.apache.drill.exec.work.batch.IncomingBuffers.close(IncomingBuffers.java:144)
>         at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>         at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:567)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:417)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:240)
>         ... 5 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Cleanup before finished. 
> 0 out of 1 streams have finished
>                 ... 15 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Memory was leaked by 
> query. Memory leaked: (32768)
> Allocator(op:1:0:8:UnorderedReceiver) 100/32768/32768/100 
> (res/actual/peak/limit)
>                 at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
>                 at 
> org.apache.drill.exec.ops.BaseOperatorContext.close(BaseOperatorContext.java:159)
>                 at 
> org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:77)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:571)
>                 ... 7 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Memory was leaked by 
> query. Memory leaked: (1016640)
> Allocator(frag:1:0) 3000/1016640/30016640/90715827882 
> (res/actual/peak/limit)
>                 at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:574)
>                 ... 7 common frames omitted {code}
> Steps to reproduce:
>   1.Enable unequal join:
> {code:java}
> alter session set `planner.enable_nljoin_for_scalar_only`=false; {code}
>   2. Disable join optimization to prevent Drill from flipping sides of 
> join that may break the query execution because the NestedLoopJoin operator 
> that executes unequal joins supports

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845113#comment-17845113
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on code in PR #2909:
URL: https://github.com/apache/drill/pull/2909#discussion_r1595903385


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java:
##
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.daffodil.schema;
+
+import org.apache.daffodil.japi.Compiler;
+import org.apache.daffodil.japi.Daffodil;
+import org.apache.daffodil.japi.DataProcessor;
+import org.apache.daffodil.japi.Diagnostic;
+import org.apache.daffodil.japi.InvalidParserException;
+import org.apache.daffodil.japi.InvalidUsageException;
+import org.apache.daffodil.japi.ProcessorFactory;
+import org.apache.daffodil.japi.ValidationMode;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.nio.channels.Channels;
+import java.util.List;
+import java.util.Objects;
+
+/**
+ * Compiles a DFDL schema (mostly for tests) or loads a pre-compiled DFDL 
schema so that one can
+ * obtain a DataProcessor for use with DaffodilMessageParser.
+ * 
+ * TODO: Needs to use a cache to avoid reloading/recompiling every time.
+ */
+public class DaffodilDataProcessorFactory {
+  // Default constructor is used.
+
+  private static final Logger logger = 
LoggerFactory.getLogger(DaffodilDataProcessorFactory.class);
+
+  private DataProcessor dp;
+
+  /**
+   * Gets a Daffodil DataProcessor given the necessary arguments to compile or 
reload it.
+   *
+   * @param schemaFileURI
+   * pre-compiled dfdl schema (.bin extension) or DFDL schema source (.xsd 
extension)
+   * @param validationMode
+   * Use true to request Daffodil built-in 'limited' validation. Use false 
for no validation.
+   * @param rootName
+   * Local name of root element of the message. Can be null to use the 
first element declaration
+   * of the primary schema file. Ignored if reloading a pre-compiled 
schema.
+   * @param rootNS
+   * Namespace URI as a string. Can be null to use the target namespace of 
the primary schema
+   * file or if it is unambiguous what element is the rootName. Ignored if 
reloading a
+   * pre-compiled schema.
+   * @return the DataProcessor
+   * @throws CompileFailure
+   * - if schema compilation fails
+   */
+  public DataProcessor getDataProcessor(URI schemaFileURI, boolean 
validationMode, String rootName,
+  String rootNS)
+  throws CompileFailure {
+
+DaffodilDataProcessorFactory dmp = new DaffodilDataProcessorFactory();
+boolean isPrecompiled = schemaFileURI.toString().endsWith(".bin");
+if (isPrecompiled) {
+  if (Objects.nonNull(rootName) && !rootName.isEmpty()) {
+// A usage error. You shouldn't supply the name and optionally 
namespace if loading
+// precompiled schema because those are built into it. Should be null 
or "".
+logger.warn("Root element name '{}' is ignored when used with 
precompiled DFDL schema.",
+rootName);
+  }
+  try {
+dmp.loadSchema(schemaFileURI);
+  } catch (IOException | InvalidParserException e) {
+throw new CompileFailure(e);
+  }
+  dmp.setupDP(validationMode, null);
+} else {
+  List pfDiags;
+  try {
+pfDiags = dmp.compileSchema(schemaFileURI, rootName, rootNS);
+  } catch (URISyntaxException | IOException e) {
+throw new CompileFailure(e);
+  }
+  dmp.setupDP(validationMode, pfDiags);
+}
+return dmp.dp;
+  }
+
+  private void loadSchema(URI schemaFileURI) throws IOException, 
InvalidParserException {
+Compiler c = Daffodil.compiler();
+dp = c.reload(Channels.newChannel(schemaFileURI.toURL().openStream()));
+  }
+
+  private List compileSchema(URI schemaFileURI, String rootName, 
String rootNS)
+  throws URISyntaxException, IOException, CompileFailure {
+Compiler c =

[jira] [Commented] (DRILL-8492) Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values

2024-05-07 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844316#comment-17844316
 ] 

ASF GitHub Bot commented on DRILL-8492:
---

cgivre commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2098546012

   > > LGTM +1 Thanks for the contribution! Can you please update the 
documentation for the Parquet reader to include this? Otherwise looks good!
   > 
   > Happy to contribute!
   > 
   > Do you mean the documentation in the `drill-site` repo? 
(https://github.com/apache/drill-site/blob/master/_docs/en/data-sources-and-file-formats/040-parquet-format.md)
   
   That's the one!




> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> ---
>
> Key: DRILL-8492
> URL: https://issues.apache.org/jira/browse/DRILL-8492
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.21.1
>Reporter: Peter Franzen
>Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM  WHERE  = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8492) Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values

2024-05-07 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844315#comment-17844315
 ] 

ASF GitHub Bot commented on DRILL-8492:
---

handmadecode commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2098543073

   > LGTM +1 Thanks for the contribution! Can you please update the 
documentation for the Parquet reader to include this? Otherwise looks good!
   
   Happy to contribute!
   
   Do you mean the documentation in the `drill-site` repo? 
(https://github.com/apache/drill-site/blob/master/_docs/en/data-sources-and-file-formats/040-parquet-format.md)




> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> ---
>
> Key: DRILL-8492
> URL: https://issues.apache.org/jira/browse/DRILL-8492
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.21.1
>Reporter: Peter Franzen
>Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM  WHERE  = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-05-06 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843832#comment-17843832
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2909:
URL: https://github.com/apache/drill/pull/2909#issuecomment-209976

   > Hi Mike, Are you free at all this week? My apologies... We're in the 
middle of putting an offer on a house and my life is very hectic at the moment. 
Best, 

> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-05-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843601#comment-17843601
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on PR #2909:
URL: https://github.com/apache/drill/pull/2909#issuecomment-2095044801

   Hi Mike, 
   Are you free at all this week?  My apologies... We're in the middle of 
putting an offer on a house and my life is very hectic at the moment.
   Best,
   

> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8488) HashJoinPOP memory leak is caused by OutOfMemoryException

2024-05-01 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842680#comment-17842680
 ] 

ASF GitHub Bot commented on DRILL-8488:
---

cgivre merged PR #2900:
URL: https://github.com/apache/drill/pull/2900




> HashJoinPOP memory leak is caused by  OutOfMemoryException
> --
>
> Key: DRILL-8488
> URL: https://issues.apache.org/jira/browse/DRILL-8488
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> [DRILL-8485|[DRILL-8485] HashJoinPOP memory leak is caused by an oom 
> exception when read data from InputStream - ASF JIRA (apache.org)] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8489) Sender memory leak when rpc encode exception

2024-05-01 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842679#comment-17842679
 ] 

ASF GitHub Bot commented on DRILL-8489:
---

cgivre merged PR #2901:
URL: https://github.com/apache/drill/pull/2901




> Sender memory leak when rpc encode exception
> 
>
> Key: DRILL-8489
> URL: https://issues.apache.org/jira/browse/DRILL-8489
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> When encode throw Exception, if encode msg instanceof ReferenceCounted, netty 
> can release msg, but drill convert msg to OutboundRpcMessage, so netty can 
> not release msg. this  causes sender memory leaks
> exception info 
> {code:java}
> 2024-04-16 16:25:57,998 [DataClient-7] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.32.112.138:47924 <--> /10.32.112.138:31012 (data client).  
> Closing connection.
> io.netty.handler.codec.EncoderException: 
> org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate 
> buffer of size 4096 due to memory limit (9223372036854775807). Current 
> allocation: 0
>         at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:881)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940)
>         at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
>         at 
> io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
>         at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
>         at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 4096 due to memory limit (9223372036854775807). 
> Current allocation: 0
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:245)
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220)
>         at 
> org.apache.drill.exec.memory.DrillByteBufAllocator.buffer(DrillByteBufAllocator.java:55)
>         at 
> org.apache.drill.exec.memory.DrillByteBufAllocator.buffer(DrillByteBufAllocator.java:50)
>         at org.apache.drill.exec.rpc.RpcEncoder.encode(safeRelease.java:87)
>         at org.apache.drill.exec.rpc.RpcEncoder.encode(RpcEncoder.java:38)
>         at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:90){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841807#comment-17841807
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2909:
URL: https://github.com/apache/drill/pull/2909#issuecomment-2081781546

   Tests are now failing due to these two things in TestDaffodilReader.scala
   ```
 String schemaURIRoot = 
"file:///opt/drill/contrib/format-daffodil/src/test/resources/";
   ```
   That's an absolute URI that is used to obtain access to the schema files in 
this statement:
   ```
 private String selectRow(String schema, String file) {
   return "SELECT * FROM table(dfs.`data/" + file + "` " + " (type => 
'daffodil'," + " " +
   "validationMode => 'true', " + " schemaURI => '" + schemaURIRoot + 
"schema/" + schema +
   ".dfdl.xsd'," + " rootName => 'row'," + " rootNamespace => null " + 
"))";
 }
   ```
   This is assembling a select statement, and puts this absolute schemaURI into 
the schemaURI part of the select. 
   
   What should I be doing to arrange for these schema URIs to be found. 
   
   The schemas are a large complex set of files, not just a single file. Many 
files must be found relative to the initial root schema file. (Hundreds of 
files potentially). As they include/import other schema files using relative 
paths. 
   




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841775#comment-17841775
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2909:
URL: https://github.com/apache/drill/pull/2909#discussion_r1582375084


##
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java:
##
@@ -185,6 +192,26 @@ public MapBuilder resumeMap() {
 return (MapBuilder) parent;
   }
 
+  /**
+   * Depending on whether the parent is a schema builder or map builder
+   * we resume appropriately.
+   */
+  @Override
+  public void resume() {
+if (Objects.isNull(parent))

Review Comment:
   I just built Drill using the following command:
   
   ```sh
   mvn clean install -DskipTests
   ```
   When I did that, I was getting the same error as on GitHub.  After adding 
the braces as described above, it built without issues.
   With that said, I think you can do just run the check style with:
   
   ```sh
   mvn checkstyle:checkstyle
   ```





> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841774#comment-17841774
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2909:
URL: https://github.com/apache/drill/pull/2909#discussion_r1582375084


##
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java:
##
@@ -185,6 +192,26 @@ public MapBuilder resumeMap() {
 return (MapBuilder) parent;
   }
 
+  /**
+   * Depending on whether the parent is a schema builder or map builder
+   * we resume appropriately.
+   */
+  @Override
+  public void resume() {
+if (Objects.isNull(parent))

Review Comment:
   I just built Drill using the following command:
   
   ```sh
   mvn clean install -DskipTests
   ```
   
   I think you can do just run the check style with:
   
   ```sh
   mvn checkstyle:checkstyle
   ```





> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841768#comment-17841768
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on code in PR #2909:
URL: https://github.com/apache/drill/pull/2909#discussion_r1582367382


##
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java:
##
@@ -185,6 +192,26 @@ public MapBuilder resumeMap() {
 return (MapBuilder) parent;
   }
 
+  /**
+   * Depending on whether the parent is a schema builder or map builder
+   * we resume appropriately.
+   */
+  @Override
+  public void resume() {
+if (Objects.isNull(parent))

Review Comment:
   What is the maven command line to just make it run this checkstyle?





> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841667#comment-17841667
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2909:
URL: https://github.com/apache/drill/pull/2909#discussion_r1582206247


##
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java:
##
@@ -185,6 +192,26 @@ public MapBuilder resumeMap() {
 return (MapBuilder) parent;
   }
 
+  /**
+   * Depending on whether the parent is a schema builder or map builder
+   * we resume appropriately.
+   */
+  @Override
+  public void resume() {
+if (Objects.isNull(parent))

Review Comment:
   @mbeckerle Confirmed.  I successfully built your branch by adding the 
aforementioned braces.  I'll save you some additional trouble.  There's another 
check style violation in `DaffodilBatchReader`.  Drill doesn't like star 
imports for some reason.





> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841663#comment-17841663
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2909:
URL: https://github.com/apache/drill/pull/2909#discussion_r1582202511


##
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java:
##
@@ -185,6 +192,26 @@ public MapBuilder resumeMap() {
 return (MapBuilder) parent;
   }
 
+  /**
+   * Depending on whether the parent is a schema builder or map builder
+   * we resume appropriately.
+   */
+  @Override
+  public void resume() {
+if (Objects.isNull(parent))

Review Comment:
   @mbeckerle I don't know why the checkstyle is telling you the wrong file, 
but here, you'll need braces as well as at line 203. 
   
   ie:
   ```java
   if (parent instanceof MapBuilder) {
 resumeMap();
   }
   ```
   





> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841637#comment-17841637
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

shfshihuafeng commented on PR #2909:
URL: https://github.com/apache/drill/pull/2909#issuecomment-2081475418

   > This fails its tests due to a maven checkstyle failure. It's complaining 
about Drill:Exec:Vectors, which my code has no changes to.
   > 
   > Can someone advise on what is wrong here?
   
if (Objects.isNull(parent)) {
   throw new IllegalStateException("Call to resume() on MapBuilder with no 
parent.");
   }




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841636#comment-17841636
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

shfshihuafeng commented on PR #2909:
URL: https://github.com/apache/drill/pull/2909#issuecomment-2081475241

   > This fails its tests due to a maven checkstyle failure. It's complaining 
about Drill:Exec:Vectors, which my code has no changes to.
   > 
   > Can someone advise on what is wrong here?
   
/home/runner/work/drill/drill/exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java:201:5
   you need add if' construct must use '{}',like following ?
   
if (Objects.isNull(parent)) {
   throw new IllegalStateException("Call to resume() on MapBuilder with no 
parent.");
   }
 
   
   > This fails its tests due to a maven checkstyle failure. It's complaining 
about Drill:Exec:Vectors, which my code has no changes to.
   > 
   > Can someone advise on what is wrong here?
   
   
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/MapBuilder.java 
 201 

   i think  you  need add {} for if
   ```
if (Objects.isNull(parent)) {
   throw new IllegalStateException("Call to resume() on MapBuilder with no 
parent.");
   }
   ```




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8493) Drill Unable to Read XML Files with Namespaces

2024-04-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841556#comment-17841556
 ] 

ASF GitHub Bot commented on DRILL-8493:
---

cgivre merged PR #2908:
URL: https://github.com/apache/drill/pull/2908




> Drill Unable to Read XML Files with Namespaces
> --
>
> Key: DRILL-8493
> URL: https://issues.apache.org/jira/browse/DRILL-8493
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Format - XML
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.2
>
>
> This is a bug fix whereby Drill ignores all data when an XML file has a 
> namespace.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-2835) IndexOutOfBoundsException in partition sender when doing streaming aggregate with LIMIT

2024-04-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841535#comment-17841535
 ] 

ASF GitHub Bot commented on DRILL-2835:
---

mbeckerle commented on PR #2909:
URL: https://github.com/apache/drill/pull/2909#issuecomment-2081179778

   This fails its tests due to a maven checkstyle failure. It's complaining 
about Drill:Exec:Vectors, which my code has no changes to. 
   
   Can someone advise on what is wrong here?
   




> IndexOutOfBoundsException in partition sender when doing streaming aggregate 
> with LIMIT 
> 
>
> Key: DRILL-2835
> URL: https://issues.apache.org/jira/browse/DRILL-2835
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 0.8.0
>Reporter: Aman Sinha
>Assignee: Venki Korukanti
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: DRILL-2835-1.patch, DRILL-2835-2.patch
>
>
> Following CTAS run on a TPC-DS 100GB scale factor on a 10-node cluster: 
> {code}
> alter session set `planner.enable_hashagg` = false;
> alter session set `planner.enable_multiphase_agg` = true;
> create table dfs.tmp.stream9 as 
> select cr_call_center_sk , cr_catalog_page_sk ,  cr_item_sk , cr_reason_sk , 
> cr_refunded_addr_sk , count(*) from catalog_returns_dri100 
>  group by cr_call_center_sk , cr_catalog_page_sk ,  cr_item_sk , cr_reason_sk 
> , cr_refunded_addr_sk
>  limit 100
> ;
> {code}
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: index: 1023, length: 1 
> (expected: range(0, 0))
> at io.netty.buffer.DrillBuf.checkIndexD(DrillBuf.java:200) 
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:4.0.24.Final]
> at io.netty.buffer.DrillBuf.chk(DrillBuf.java:222) 
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:4.0.24.Final]
> at io.netty.buffer.DrillBuf.setByte(DrillBuf.java:621) 
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:4.0.24.Final]
> at 
> org.apache.drill.exec.vector.UInt1Vector$Mutator.set(UInt1Vector.java:342) 
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.NullableBigIntVector$Mutator.set(NullableBigIntVector.java:372)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.NullableBigIntVector.copyFrom(NullableBigIntVector.java:284)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.PartitionerGen4$OutgoingRecordBatch.doEval(PartitionerTemplate.java:370)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.PartitionerGen4$OutgoingRecordBatch.copy(PartitionerTemplate.java:249)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.PartitionerGen4.doCopy(PartitionerTemplate.java:208)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.PartitionerGen4.partitionBatch(PartitionerTemplate.java:176)
>  ~[na:na]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841530#comment-17841530
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle closed pull request #2836: DRILL-8474: Add Daffodil Format Plugin
URL: https://github.com/apache/drill/pull/2836




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841531#comment-17841531
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-2081176156

   Creating a new squashed PR so as to avoid loss of the comments on this PR. 




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-04-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841528#comment-17841528
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-2081164073

   This now passes all the daffodil contrib tests using the published official 
Daffodil 3.7.0.
   
   It does not yet run in any scalable fashion, but the metadata/data 
interfacing is complete. 
   
   I would like to squash this to a single commit before merging, and it needs 
to be tested rebased onto the latest Drill commit. 




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8493) Drill Unable to Read XML Files with Namespaces

2024-04-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841368#comment-17841368
 ] 

ASF GitHub Bot commented on DRILL-8493:
---

cgivre opened a new pull request, #2908:
URL: https://github.com/apache/drill/pull/2908

   # [DRILL-8493](https://issues.apache.org/jira/browse/DRILL-8493): Drill 
Unable to Read XML Files with Namespaces
   
   
   ## Description
   This PR fixes an issue whereby if an XML file has a namespace defined, Drill 
may ignore all data.
   
   
   ## Documentation
   No user facing changes.
   
   ## Testing
   Added unit test.




> Drill Unable to Read XML Files with Namespaces
> --
>
> Key: DRILL-8493
> URL: https://issues.apache.org/jira/browse/DRILL-8493
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Format - XML
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.2
>
>
> This is a bug fix whereby Drill ignores all data when an XML file has a 
> namespace.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8489) Sender memory leak when rpc encode exception

2024-04-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837949#comment-17837949
 ] 

ASF GitHub Bot commented on DRILL-8489:
---

shfshihuafeng opened a new pull request, #2901:
URL: https://github.com/apache/drill/pull/2901

   # [DRILL-8489](https://issues.apache.org/jira/browse/DRILL-8489): Sender 
memory leak when rpc encode exception
   
   ## Description
   
   When encode throw Exception, if encode msg  instanceof ReferenceCounted, 
netty can release msg, but drill convert msg to OutboundRpcMessage, so netty 
can not release msg. 
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   1. export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"2G"}
   2. tpch 1s
   3. tpch sql 8
   
   ```
   select
   o_year,
   sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as 
mkt_share
   from (
   select
   extract(year from o_orderdate) as o_year,
   l_extendedprice * 1.0 as volume,
   n2.n_name as nation
   from hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
hive.tpch1s.nation n2, hive.tpch1s.region
   where
   p_partkey = l_partkey
   and s_suppkey = l_suppkey
   and l_orderkey = o_orderkey
   and o_custkey = c_custkey
   and c_nationkey = n1.n_nationkey
   and n1.n_regionkey = r_regionkey
   and r_name = 'ASIA'
   and s_nationkey = n2.n_nationkey
   and o_orderdate between date '1995-01-01'
   and date '1996-12-31'
   and p_type = 'LARGE BRUSHED BRASS') as all_nations
   group by o_year
   order by o_year;   
   ```
   
   5. This scenario is relatively easy to Reproduce by running the following 
script
   ```
   drill_home=/data/shf/apache-drill-1.22.0-SNAPSHOT/bin
   fileName=/data/shf/1s/shf.txt
   
   random_sql(){
   #for i in `seq 1 3`
   while true
   do
 num=$((RANDOM%22+1))
 if [ -f $fileName ]; then
 echo "$fileName" " is exit"
 exit 0
 else
 $drill_home/sqlline -u 
\"jdbc:drill:zk=jupiter-2:2181/drill_shf/jupiterbits_shf1\" -f tpch_sql8.sql >> 
sql8.log 2>&1
 fi
   done
   }
   main(){
   unset HADOOP_CLASSPATH
   #TPCH power test
   for i in `seq 1 25`
   do
   random_sql &
   done
   
   
   }
   
   ```




> Sender memory leak when rpc encode exception
> 
>
> Key: DRILL-8489
> URL: https://issues.apache.org/jira/browse/DRILL-8489
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8488) HashJoinPOP memory leak is caused by OutOfMemoryException

2024-04-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837525#comment-17837525
 ] 

ASF GitHub Bot commented on DRILL-8488:
---

shfshihuafeng opened a new pull request, #2900:
URL: https://github.com/apache/drill/pull/2900

   # [DRILL-8488](https://issues.apache.org/jira/browse/DRILL-8488): 
HashJoinPOP memory leak is caused by  OutOfMemoryException
   
   (Please replace `PR Title` with actual PR Title)
   
   ## Description
   
   We should catch the OutOfMemoryException instead of OutOfMemoryError
   
   ```
public DrillBuf buffer(final int initialRequestSize, BufferManager manager) 
{
   assertOpen();
   
   Preconditions.checkArgument(initialRequestSize >= 0, "the requested size 
must be non-negative");
   
   if (initialRequestSize == 0) {
 return empty;
   }
   
   // round to next largest power of two if we're within a chunk since that 
is how our allocator operates
   final int actualRequestSize = initialRequestSize < CHUNK_SIZE ?
   nextPowerOfTwo(initialRequestSize)
   : initialRequestSize;
   AllocationOutcome outcome = allocateBytes(actualRequestSize);
   if (!outcome.isOk()) {
 **throw new OutOfMemoryException**(createErrorMsg(this, 
actualRequestSize, initialRequestSize));
   }
   }
   ```
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   [drill-848](https://issues.apache.org/jira/browse/DRILL-8485))
   




> HashJoinPOP memory leak is caused by  OutOfMemoryException
> --
>
> Key: DRILL-8488
> URL: https://issues.apache.org/jira/browse/DRILL-8488
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> [DRILL-8485|[DRILL-8485] HashJoinPOP memory leak is caused by an oom 
> exception when read data from InputStream - ASF JIRA (apache.org)] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8480) Cleanup before finished. 0 out of 1 streams have finished

2024-04-10 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835829#comment-17835829
 ] 

ASF GitHub Bot commented on DRILL-8480:
---

rymarm commented on PR #2897:
URL: https://github.com/apache/drill/pull/2897#issuecomment-2048081465

   @cgivre Actually, there is one more thing, that I would fix in the scope of 
this PR:
   
https://github.com/apache/drill/blob/a726a4544dfbf1427f41fb916d3d976bd511189b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java#L396-L399
   
   These few lines seem to be kludge which may not work in some very rare 
cases, but I forgot what issue occurs if remove it. 
   I would like to take a look at this, but I can do this in a separate PR 
because the current issue is completely fixed with the current changes. 




> Cleanup before finished. 0 out of 1 streams have finished
> -
>
> Key: DRILL-8480
> URL: https://issues.apache.org/jira/browse/DRILL-8480
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
> Attachments: 1a349ff1-d1f9-62bf-ed8c-26346c548005.sys.drill, 
> tableWithNumber2.parquet
>
>
> Drill fails to execute a query with the following exception:
> {code:java}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Cleanup before finished. 0 out of 1 streams have 
> finished
> Fragment: 1:0
> Please, refer to logs for more information.
> [Error Id: 270da8f4-0bb6-4985-bf4f-34853138881c on 
> compute7.vmcluster.com:31010]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:395)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:245)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:362)
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 
> 1 streams have finished
>         at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.close(BaseRawBatchBuffer.java:111)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:71)
>         at 
> org.apache.drill.exec.work.batch.AbstractDataCollector.close(AbstractDataCollector.java:121)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91)
>         at 
> org.apache.drill.exec.work.batch.IncomingBuffers.close(IncomingBuffers.java:144)
>         at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>         at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:567)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:417)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:240)
>         ... 5 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Cleanup before finished. 
> 0 out of 1 streams have finished
>                 ... 15 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Memory was leaked by 
> query. Memory leaked: (32768)
> Allocator(op:1:0:8:UnorderedReceiver) 100/32768/32768/100 
> (res/actual/peak/limit)
>                 at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
>                 at 
> org.apache.drill.exec.ops.BaseOperatorContext.close(BaseOperatorContext.java:159)
>                 at 
> org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:77)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:571)
>                 ... 7 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Memory was leaked by 
> query. Memory leaked: (1016640)
> Allocator(frag:1:0) 3000/1016640/30016640/90715827882 
> (res/actual/peak/limit)
>                 at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
>                 at 
>

[jira] [Commented] (DRILL-8480) Cleanup before finished. 0 out of 1 streams have finished

2024-04-10 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835737#comment-17835737
 ] 

ASF GitHub Bot commented on DRILL-8480:
---

cgivre commented on PR #2897:
URL: https://github.com/apache/drill/pull/2897#issuecomment-2047593214

   @rymarm. What is the status of this PR?  Is it ready for merging?




> Cleanup before finished. 0 out of 1 streams have finished
> -
>
> Key: DRILL-8480
> URL: https://issues.apache.org/jira/browse/DRILL-8480
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
> Attachments: 1a349ff1-d1f9-62bf-ed8c-26346c548005.sys.drill, 
> tableWithNumber2.parquet
>
>
> Drill fails to execute a query with the following exception:
> {code:java}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Cleanup before finished. 0 out of 1 streams have 
> finished
> Fragment: 1:0
> Please, refer to logs for more information.
> [Error Id: 270da8f4-0bb6-4985-bf4f-34853138881c on 
> compute7.vmcluster.com:31010]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:395)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:245)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:362)
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 
> 1 streams have finished
>         at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.close(BaseRawBatchBuffer.java:111)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:71)
>         at 
> org.apache.drill.exec.work.batch.AbstractDataCollector.close(AbstractDataCollector.java:121)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91)
>         at 
> org.apache.drill.exec.work.batch.IncomingBuffers.close(IncomingBuffers.java:144)
>         at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>         at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:567)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:417)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:240)
>         ... 5 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Cleanup before finished. 
> 0 out of 1 streams have finished
>                 ... 15 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Memory was leaked by 
> query. Memory leaked: (32768)
> Allocator(op:1:0:8:UnorderedReceiver) 100/32768/32768/100 
> (res/actual/peak/limit)
>                 at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
>                 at 
> org.apache.drill.exec.ops.BaseOperatorContext.close(BaseOperatorContext.java:159)
>                 at 
> org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:77)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:571)
>                 ... 7 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Memory was leaked by 
> query. Memory leaked: (1016640)
> Allocator(frag:1:0) 3000/1016640/30016640/90715827882 
> (res/actual/peak/limit)
>                 at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:574)
>                 ... 7 common frames omitted {code}
> Steps to reproduce:
>   1.Enable unequal join:
> {code:java}
> alter session set `planner.enable_nljoin_for_scalar_only`=false; {code}
>   2. Disable join optimization to prevent Drill from flipping sides of 
> join that may break

[jira] [Commented] (DRILL-8486) ParquetDecodingException: could not read bytes at offset

2024-04-10 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835735#comment-17835735
 ] 

ASF GitHub Bot commented on DRILL-8486:
---

cgivre merged PR #2898:
URL: https://github.com/apache/drill/pull/2898




> ParquetDecodingException: could not read bytes at offset 
> -
>
> Key: DRILL-8486
> URL: https://issues.apache.org/jira/browse/DRILL-8486
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
> Attachments: test.parquet
>
>
> Drill fails to read a parquet file with the following exception:
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: could not read 
> bytes at offset 591804
>   at 
> org.apache.parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:42)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput$ValuesReaderWrapper.getNextEntry(VarLenColumnBulkInput.java:754)
>   ... 43 common frames omitted
> Caused by: java.io.EOFException: null
>   at 
> org.apache.parquet.bytes.SingleBufferInputStream.read(SingleBufferInputStream.java:52)
>   at 
> org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:83)
>   at 
> org.apache.parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:39)
>   ... 44 common frames omitted {code}
>  
>  
> This issue only affects queries with {{store.parquet.flat.reader.bulk}} set 
> to {{{}true{}}}(by default).
> Attaching the parquet file for the reproduce: [^test.parquet].
> Query: {{select log, app_name from dfs.tmp.`test.parquet`}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8483) SpilledRecordBatch memory leak when the program threw an exception during the process of building a hash table

2024-03-29 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832322#comment-17832322
 ] 

ASF GitHub Bot commented on DRILL-8483:
---

cgivre merged PR #2888:
URL: https://github.com/apache/drill/pull/2888




> SpilledRecordBatch memory leak when the program threw an exception during the 
> process of building a hash table
> --
>
> Key: DRILL-8483
> URL: https://issues.apache.org/jira/browse/DRILL-8483
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
>
> During the process of reading data from disk to building hash tables in 
> memory, if an exception is thrown, it will result in a memory  
> SpilledRecordBatch leak
> exception log as following
> {code:java}
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 8192 due to memory limit (41943040). Current 
> allocation: 3684352
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
>         at 
> org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:411)
>         at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:270)
>         at 
> org.apache.drill.exec.physical.impl.common.HashPartition.allocateNewVectorContainer(HashPartition.java:215)
>         at 
> org.apache.drill.exec.physical.impl.common.HashPartition.allocateNewCurrentBatchAndHV(HashPartition.java:238)
>         at 
> org.apache.drill.exec.physical.impl.common.HashPartition.(HashPartition.java:165){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-29 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832321#comment-17832321
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

cgivre merged PR #2889:
URL: https://github.com/apache/drill/pull/2889




> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8486) ParquetDecodingException: could not read bytes at offset

2024-03-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831924#comment-17831924
 ] 

ASF GitHub Bot commented on DRILL-8486:
---

rymarm opened a new pull request, #2898:
URL: https://github.com/apache/drill/pull/2898

   # [DRILL-8486](https://issues.apache.org/jira/browse/DRILL-8486): fix 
handling of long variable length entries during bulk parquet reading
   
   ## Description
   
   Drill, during a bulk reading of a parquet file, unproperly handles a long 
value of parquet file entry. Drill reads the value, but after he finds that he 
can’t handle the value in the current batch, he just moves on, without 
persisting the read value. Since the value wasn’t pushed back to the reader 
object, the total read and left-to-read records counts are now in unproper 
state which causes data reading to fail in the future.
   
   This issue hasn’t been faced before, because the conditions to get into this 
state are rare.
   
   **Solution**
   
   Push back the value to the reader object to read it in the next iteration, 
if the current batch can’t hold it.
   
   ## Documentation
\-
   
   ## Testing
   Manual testing with a parquet file from the Jira ticket: 
[DRILL-8486](https://issues.apache.org/jira/browse/DRILL-8486). It's hard to 
reproduce this particular issue with random data.
   




> ParquetDecodingException: could not read bytes at offset 
> -
>
> Key: DRILL-8486
> URL: https://issues.apache.org/jira/browse/DRILL-8486
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
> Attachments: test.parquet
>
>
> Drill fails to read a parquet file with the following exception:
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: could not read 
> bytes at offset 591804
>   at 
> org.apache.parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:42)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput$ValuesReaderWrapper.getNextEntry(VarLenColumnBulkInput.java:754)
>   ... 43 common frames omitted
> Caused by: java.io.EOFException: null
>   at 
> org.apache.parquet.bytes.SingleBufferInputStream.read(SingleBufferInputStream.java:52)
>   at 
> org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:83)
>   at 
> org.apache.parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:39)
>   ... 44 common frames omitted {code}
>  
>  
> This issue only affects queries with {{store.parquet.flat.reader.bulk}} set 
> to {{{}true{}}}(by default).
> Attaching the parquet file for the reproduce: [^test.parquet].
> Query: {{select log, app_name from dfs.tmp.`test.parquet`}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8480) Cleanup before finished. 0 out of 1 streams have finished

2024-03-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831903#comment-17831903
 ] 

ASF GitHub Bot commented on DRILL-8480:
---

rymarm opened a new pull request, #2897:
URL: https://github.com/apache/drill/pull/2897

   
   
   # [DRILL-8480](https://issues.apache.org/jira/browse/DRILL-8480): Make 
Nested Loop Join operator properly process empty batches and batches with new 
schema
   
   ## Description
   Nested Loop Join operator (`NestedLoopJoinBatch`, `NestedLoopJoin`) 
unproperly handles batch iteration outcome `OK` with 0 records. Drill design of 
the processing of batches involves 5 states:
   * `NONE` (batch can have only 0 records)
   * `OK` (batch can have 0+ records)
   * `OK_NEW_SCHEMA` (batch can have 0+ records)
   * `NOT_YET` (undefined)
   * `EMIT` (batch can have 0+ records)
   The Nested Loop Join operator in some circumstances could receive `OK` 
outcome with 0 records, and instead of requesting the next batch, the operator 
stops data processing and returns `NONE` outcome to upstream batches(operators) 
without freeing resources of underlying batches.
   
   
   ## Documentation
   -
   
   ## Testing
   Manual testing with a file from the Jira ticket 
[DRILL-8480](https://issues.apache.org/jira/browse/DRILL-8480)




> Cleanup before finished. 0 out of 1 streams have finished
> -
>
> Key: DRILL-8480
> URL: https://issues.apache.org/jira/browse/DRILL-8480
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
> Attachments: 1a349ff1-d1f9-62bf-ed8c-26346c548005.sys.drill, 
> tableWithNumber2.parquet
>
>
> Drill fails to execute a query with the following exception:
> {code:java}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Cleanup before finished. 0 out of 1 streams have 
> finished
> Fragment: 1:0
> Please, refer to logs for more information.
> [Error Id: 270da8f4-0bb6-4985-bf4f-34853138881c on 
> compute7.vmcluster.com:31010]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:395)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:245)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:362)
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 
> 1 streams have finished
>         at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.close(BaseRawBatchBuffer.java:111)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:71)
>         at 
> org.apache.drill.exec.work.batch.AbstractDataCollector.close(AbstractDataCollector.java:121)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91)
>         at 
> org.apache.drill.exec.work.batch.IncomingBuffers.close(IncomingBuffers.java:144)
>         at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>         at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:567)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:417)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:240)
>         ... 5 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Cleanup before finished. 
> 0 out of 1 streams have finished
>                 ... 15 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Memory was leaked by 
> query. Memory leaked: (32768)
> Allocator(op:1:0:8:UnorderedReceiver) 100/32768/32768/100 
> (res/actual/peak/limit)
>                 at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
>                 at 
> org.apache.drill.exec.ops.BaseOperatorContext.close(BaseOperatorContext.java:159)
>                 at 
> org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:77)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831155#comment-17831155
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng commented on PR #2889:
URL: https://github.com/apache/drill/pull/2889#issuecomment-2021914479

   > @shfshihuafeng Can you please resolve merge conflicts.
   
   it is done




> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830942#comment-17830942
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

cgivre commented on PR #2889:
URL: https://github.com/apache/drill/pull/2889#issuecomment-2020523775

   @shfshihuafeng Can you please resolve merge conflicts.




> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8485) HashJoinPOP memory leak is caused by an oom exception when read data from InputStream

2024-03-24 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830330#comment-17830330
 ] 

ASF GitHub Bot commented on DRILL-8485:
---

shfshihuafeng commented on PR #2891:
URL: https://github.com/apache/drill/pull/2891#issuecomment-2017129746

   > LGTM +1 Thanks @shfshihuafeng for all these memory leak fixes.
   
   I am honored to get your approved.




> HashJoinPOP memory leak is caused by an oom exception when read data from 
> InputStream
> -
>
> Key: DRILL-8485
> URL: https://issues.apache.org/jira/browse/DRILL-8485
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.1
>
>
> when traversing fieldList druing read data from InputStream, if the 
> intermediate process throw exception,we can not release previously 
> constructed vectors. it result in memory leak。
> it is similar to DRILL-8484



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8485) HashJoinPOP memory leak is caused by an oom exception when read data from InputStream

2024-03-24 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830305#comment-17830305
 ] 

ASF GitHub Bot commented on DRILL-8485:
---

cgivre merged PR #2891:
URL: https://github.com/apache/drill/pull/2891




> HashJoinPOP memory leak is caused by an oom exception when read data from 
> InputStream
> -
>
> Key: DRILL-8485
> URL: https://issues.apache.org/jira/browse/DRILL-8485
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.1
>
>
> when traversing fieldList druing read data from InputStream, if the 
> intermediate process throw exception,we can not release previously 
> constructed vectors. it result in memory leak。
> it is similar to DRILL-8484



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8485) HashJoinPOP memory leak is caused by an oom exception when read data from InputStream

2024-03-24 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830261#comment-17830261
 ] 

ASF GitHub Bot commented on DRILL-8485:
---

shfshihuafeng opened a new pull request, #2891:
URL: https://github.com/apache/drill/pull/2891

   …n read data from InputStream
   
   # [DRILL-8485](https://issues.apache.org/jira/browse/DRILL-8485): 
HashJoinPOP memory leak is caused by an oom exception when read data from 
InputStream
   
   ## Description
   
   it is similar to 
[DRILL-8484](https://issues.apache.org/jira/browse/DRILL-8484)
   
   **exception info** 
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 15364) due to memory limit 
(41943040). Current allocation: 4337664
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.memory.BaseAllocator.read(BaseAllocator.java:856)
   ```
   **leak  info**
   
   ```
 Allocator(frag:5:1) 500/100/27824128/40041943040 
(res/actual/peak/limit)
 child allocators: 1
   Allocator(op:5:1:1:HashJoinPOP) 100/16384/22822912/41943040 
(res/actual/peak/limit)
 child allocators: 0
 ledgers: 2
   ledger[442780] allocator: op:5:1:1:HashJoinPOP), isOwning: true, 
size: 8192, references: 2, life: 4486836603491..0, allocatorManager: [390894, 
life: 4486836601180..0] holds 4 buffers. 
   DrillBuf[458469], udle: [390895 1024..8192]
event log for: DrillBuf[458469]
   ```
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   The testing method for drill-8485 is the similar as for 
[DRILL-8484](https://issues.apache.org/jira/browse/DRILL-8484). we can throw 
exception  in the method readVectors
   




> HashJoinPOP memory leak is caused by an oom exception when read data from 
> InputStream
> -
>
> Key: DRILL-8485
> URL: https://issues.apache.org/jira/browse/DRILL-8485
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.1
>
>
> when traversing fieldList druing read data from InputStream, if the 
> intermediate process throw exception,we can not release previously 
> constructed vectors. it result in memory leak。
> it is similar to DRILL-8484



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828185#comment-17828185
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng commented on code in PR #2889:
URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261


##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
 when we prepare to allocator memory  using "allocator.buffer(dataLength)" 
for hashjoinPop allocator, if actual memory > maxAllocation(The parameter is 
calculated  by call computeOperatorMemory) ,then it throw exception, like 
following my test。
 user  can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or 
reduce concurrency based on actual  conditions. 
   
   **throw exception code**
   ```
   public DrillBuf buffer(final int initialRequestSize, BufferManager manager) {
   assertOpen();
   
   AllocationOutcome outcome = allocateBytes(actualRequestSize);
   if (!outcome.isOk()) {
 throw new OutOfMemoryException(createErrorMsg(this, actualRequestSize, 
initialRequestSize));
   }
   ```
   **my test scenario**
   
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 14359) due to memory limit 
(41943040). Current allocation: 22583616
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)
   ```





> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828184#comment-17828184
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng commented on code in PR #2889:
URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261


##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
 when we prepare to allocator memory  using "allocator.buffer(dataLength)" 
for hashjoinPop allocator, if actualmemory > maxAllocation(The parameter is 
calculated  by call computeOperatorMemory) ,then it throw exception, like 
following my test。
 user  can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or 
reduce concurrency based on actual  conditions. 
   
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 14359) due to memory limit 
(41943040). Current allocation: 22583616
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)
   ```



##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
 when we prepare to allocator memory  using "allocator.buffer(dataLength)" 
for hashjoinPop allocator, if actual memory > maxAllocation(The parameter is 
calculated  by call computeOperatorMemory) ,then it throw exception, like 
following my test。
 user  can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or 
reduce concurrency based on actual  conditions. 
   
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 14359) due to memory limit 
(41943040). Current allocation: 22583616
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)
   ```





> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException ,

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828183#comment-17828183
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng commented on code in PR #2889:
URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261


##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
 when we prepare to allocator memory  using "allocator.buffer(dataLength)" 
for hashjoinPop allocator, if memory > maxAllocation(The parameter is 
calculated  by call computeOperatorMemory) ,then it throw exception, like 
following my test。
 user  can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or 
reduce concurrency based on actual  conditions. 
   
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 14359) due to memory limit 
(41943040). Current allocation: 22583616
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)
   ```





> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828182#comment-17828182
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng commented on code in PR #2889:
URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261


##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
   when we prepare to allocator memory  using "allocator.buffer(dataLength)" 
for hashjoinPop allocator, if memory > maxAllocation(The parameter is 
calculated  by call computeOperatorMemory) then it throw exception, like 
following my test。user   can adjust directMemory parameters 
(DRILL_MAX_DIRECT_MEMORY) or reduce concurrency based on actual  conditions. 
   
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 14359) due to memory limit 
(41943040). Current allocation: 22583616
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)
   ```





> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828026#comment-17828026
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

cgivre commented on code in PR #2889:
URL: https://github.com/apache/drill/pull/2889#discussion_r1528802856


##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
   Do we know what would cause an error like this?  If so what would the user 
need to do to fix this?





> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827949#comment-17827949
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng opened a new pull request, #2889:
URL: https://github.com/apache/drill/pull/2889

   …en read data from Stream with container
   
   # [DRILL-8484](https://issues.apache.org/jira/browse/DRILL-8484): 
HashJoinPOP memory leak is caused by  an oom exception when read data from 
Stream with container 
   
   ## Description
   
   
   
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   You can add debugging code to reproduce this scenario as following or test 
tpch
   like [drill8483](https://github.com/apache/drill/pull/2888)
   **(1) debug code**
   ```
 public void readFromStreamWithContainer(VectorContainer myContainer, 
InputStream input) throws IOException {
   final VectorContainer container = new VectorContainer();
   final UserBitShared.RecordBatchDef batchDef = 
UserBitShared.RecordBatchDef.parseDelimitedFrom(input);
   recordCount = batchDef.getRecordCount();
   if (batchDef.hasCarriesTwoByteSelectionVector() && 
batchDef.getCarriesTwoByteSelectionVector()) {
   
 if (sv2 == null) {
   sv2 = new SelectionVector2(allocator);
 }
 sv2.allocateNew(recordCount * SelectionVector2.RECORD_SIZE);
 sv2.getBuffer().setBytes(0, input, recordCount * 
SelectionVector2.RECORD_SIZE);
 svMode = BatchSchema.SelectionVectorMode.TWO_BYTE;
   }
   final List vectorList = Lists.newArrayList();
   final List fieldList = batchDef.getFieldList();
   int i = 0;
   for (SerializedField metaData : fieldList) {
 i++;
 final int dataLength = metaData.getBufferLength();
 final MaterializedField field = MaterializedField.create(metaData);
 final DrillBuf buf = allocator.buffer(dataLength);
 ValueVector vector = null;
 try {
   buf.writeBytes(input, dataLength);
   vector = TypeHelper.getNewVector(field, allocator);
   if (i == 3) {
 logger.warn("shf test memory except");
 throw new OutOfMemoryException("test memory except");
   }
   vector.load(metaData, buf);
 } catch (Exception e) {
   if (vectorList.size() > 0 ) {
 for (ValueVector valueVector : vectorList) {
   DrillBuf[] buffers = valueVector.getBuffers(false);
   logger.warn("shf leak buffers " + Arrays.asList(buffers));
   // valueVector.clear();
 }
   }
   throw e;
 } finally {
   buf.release();
 }
 vectorList.add(vector);
   }
   ```
   **(2) run following sql  (tpch8)**
   
   ```
   select
   o_year,
   sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as 
mkt_share
   from (
   select
   extract(year from o_orderdate) as o_year,
   l_extendedprice * 1.0 as volume,
   n2.n_name as nation
   from hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
hive.tpch1s.nation n2, hive.tpch1s.region
   where
   p_partkey = l_partkey
   and s_suppkey = l_suppkey
   and l_orderkey = o_orderkey
   and o_custkey = c_custkey
   and c_nationkey = n1.n_nationkey
   and n1.n_regionkey = r_regionkey
   and r_name = 'ASIA'
   and s_nationkey = n2.n_nationkey
   and o_orderdate between date '1995-01-01'
   and date '1996-12-31'
   and p_type = 'LARGE BRUSHED BRASS') as all_nations
   group by o_year
   order by o_year;
   ```
   **(3) you  find  memory leak  ,but there is no sql**
   
   https://github.com/apache/drill/assets/25974968/e716ab12-4eeb-4a69-9c0f-07664bcb80a4;>
   




> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
>

[jira] [Commented] (DRILL-8483) SpilledRecordBatch memory leak when the program threw an exception during the process of building a hash table

2024-03-08 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824832#comment-17824832
 ] 

ASF GitHub Bot commented on DRILL-8483:
---

shfshihuafeng opened a new pull request, #2888:
URL: https://github.com/apache/drill/pull/2888

   …exception during the process of building a hash table (#2887)
   
   # [DRILL-8483](https://issues.apache.org/jira/browse/DRILL-8483): 
SpilledRecordBatch memory leak when the program threw an exception during the 
process of building a hash table
   
   (Please replace `PR Title` with actual PR Title)
   
   ## Description
   
   During the process of reading data from disk to building hash tables in 
memory, if an exception is thrown, it will result in a memory  
SpilledRecordBatch leak
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   prepare data for tpch 1s
   1. 30 concurrent for tpch sql8
   2. set direct memory 5g
   3. when it had OutOfMemoryException , stopped all sql.
   4. finding memory leak
   
   test script
   
   ```
   random_sql(){
   #for i in `seq 1 3`
   while true
   do
   
 num=$((RANDOM%22+1))
 if [ -f $fileName ]; then
 echo "$fileName" " is exit"
 exit 0
 else
 $drill_home/sqlline -u \"jdbc:drillr:zk=ip:2181/drillbits_shf\" -f 
tpch_sql8.sql >> sql8.log 2>&1
 fi
   done
   }
   
   main(){
   #sleep 2h
   
   #TPCH power test
   for i in `seq 1 30`
   do
   random_sql &
   done
   }
   ```




> SpilledRecordBatch memory leak when the program threw an exception during the 
> process of building a hash table
> --
>
> Key: DRILL-8483
> URL: https://issues.apache.org/jira/browse/DRILL-8483
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
>
> During the process of reading data from disk to building hash tables in 
> memory, if an exception is thrown, it will result in a memory  
> SpilledRecordBatch leak
> exception log as following
> {code:java}
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 8192 due to memory limit (41943040). Current 
> allocation: 3684352
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
>         at 
> org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:411)
>         at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:270)
>         at 
> org.apache.drill.exec.physical.impl.common.HashPartition.allocateNewVectorContainer(HashPartition.java:215)
>         at 
> org.apache.drill.exec.physical.impl.common.HashPartition.allocateNewCurrentBatchAndHV(HashPartition.java:238)
>         at 
> org.apache.drill.exec.physical.impl.common.HashPartition.(HashPartition.java:165){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823852#comment-17823852
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

cgivre merged PR #2878:
URL: https://github.com/apache/drill/pull/2878




> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823846#comment-17823846
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1513773463


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   stack 
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823843#comment-17823843
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1513768876


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {
+  rightIterator.close();
+  throw UserException.executionError(e)

Review Comment:
it throw exception from  method clearInflightBatches() , but it has cleared 
the memory by   clear(); so it does not affect memory leaks  ,see following 
code 
   
   `  public void close() {
   clear();
   clearInflightBatches();
 }`





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823842#comment-17823842
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1513766703


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
add exception info  ?
   ```
   try {
 leftIterator.close();
   } catch (QueryCancelledException qce) {
 throw UserException.executionError(qce)
 .message("Failed when depleting incoming batches, probably because 
query was cancelled " +
 "by executor had some error")
 .build(logger);
   } catch (Exception e) {
 throw UserException.internalError(e)
 .message("Failed when depleting incoming batches")
 .build(logger);
   } finally {
 // todo catch exception info or By default,the exception is thrown 
directly ?
 rightIterator.close();
   }
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823643#comment-17823643
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

cgivre commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512908376


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {
+  rightIterator.close();
+  throw UserException.executionError(e)

Review Comment:
   What happens if the right iterator doesn't close properly?





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823641#comment-17823641
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   @cgivre 
   In my test case ,it throw  "QueryCancelledException",because some 
minorfragment throw .OutOfMemoryException ,so it inform foreman failed.
   
   foreman send "QueryCancel" commands to other minorfragments. it  throws 
QueryCancelledException after the method "incoming.next()"   called method 
checkContinue() 
   
   Although the "checkContinue" phase throws a fixed "QueryCancelledException" 
message, I am not sure what is causing it (In my test case 
,OutOfMemoryException cause exception)
   
   
```
public void clearInflightBatches() {
   while (lastOutcome == IterOutcome.OK || lastOutcome == 
IterOutcome.OK_NEW_SCHEMA) {
 // Clear all buffers from incoming.
 for (VectorWrapper wrapper : incoming) {
   wrapper.getValueVector().clear();
 }
 lastOutcome = incoming.next();
   }
 }
  
  public void checkContinue() {
 if (!shouldContinue()) {
   throw new QueryCancelledException();
 }
   }
 }
   ```
   
   **stack**
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823611#comment-17823611
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   @cgivre 
   In my test case ,it throw  "QueryCancelledException",because some 
minorfragment throw .OutOfMemoryException ,so it inform foreman failed.
   
   foreman send "QueryCancel" commands to other minorfragments. it  throws 
QueryCancelledException after the method "incoming.next()"   called method 
checkContinue() 
   
   Although the "checkContinue" phase throws a fixed "QueryCancelledException" 
message, I am not sure what is causing it (In my test case 
,OutOfMemoryException cause exception)
   
   
```
public void clearInflightBatches() {
   while (lastOutcome == IterOutcome.OK || lastOutcome == 
IterOutcome.OK_NEW_SCHEMA) {
 // Clear all buffers from incoming.
 for (VectorWrapper wrapper : incoming) {
   wrapper.getValueVector().clear();
 }
 lastOutcome = incoming.next();
   }
 }
  
  public void checkContinue() {
 if (!shouldContinue()) {
   throw new QueryCancelledException();
 }
   }
 }
   ```
   
   **stack**
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823597#comment-17823597
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   @cgivre 
   In my test case ,it throw  "QueryCancelledException",because some 
minorfragment throw .OutOfMemoryException ,so it inform foreman failed.
   
   foreman send "QueryCancel" commands to other minorfragments. it  throws 
QueryCancelledException after the method "incoming.next()"   called method 
checkContinue() method
   
   Although the "checkContinue" phase throws a fixed "QueryCancelledException" 
message, I am not sure what is causing it (In my test case 
,OutOfMemoryException cause exception)
   
   
```
public void clearInflightBatches() {
   while (lastOutcome == IterOutcome.OK || lastOutcome == 
IterOutcome.OK_NEW_SCHEMA) {
 // Clear all buffers from incoming.
 for (VectorWrapper wrapper : incoming) {
   wrapper.getValueVector().clear();
 }
 lastOutcome = incoming.next();
   }
 }
  
  public void checkContinue() {
 if (!shouldContinue()) {
   throw new QueryCancelledException();
 }
   }
 }
   ```
   
   **stack**
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-04 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823405#comment-17823405
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

cgivre commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512093859


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   Do we know what kind(s) of exceptions to expect here?   Also, can we throw a 
better error message?   Specifically, can we tell the user more information 
about the cause of the crash and how to fix it?





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node

2024-03-02 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822878#comment-17822878
 ] 

ASF GitHub Bot commented on DRILL-8482:
---

cgivre merged PR #2885:
URL: https://github.com/apache/drill/pull/2885




> Assign region throw exception when some region is deployed on affinity node 
> and some on non-affinity node
> -
>
> Key: DRILL-8482
> URL: https://issues.apache.org/jira/browse/DRILL-8482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
> Attachments: 
> 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch
>
>
> *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe 
> the bug*
>    Assign region throw exception when some region is deployed on affinity 
> node and some on non-affinity node。
> *To Reproduce*
> Steps to reproduce the behavior:
>  # 
> {code:java}
> NavigableMap regionsToScan = Maps.newTreeMap();
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), 
> SERVER_D);
> final List endpoints = Lists.newArrayList();
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build());
> HBaseGroupScan scan = new HBaseGroupScan();
> scan.setRegionsToScan(regionsToScan);
> scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], 
> null));
> scan.applyAssignments(endpoints);{code}
> *Expected behavior*
>  A has 3 regions
>  B has 2 regions
>  C has 3 regions
> *Error detail, log output or screenshots*
> {code:java}
> Caused by: java.lang.NullPointerException: null
>         at 
> org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node

2024-03-01 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822730#comment-17822730
 ] 

ASF GitHub Bot commented on DRILL-8482:
---

shfshihuafeng commented on PR #2885:
URL: https://github.com/apache/drill/pull/2885#issuecomment-1974124079

   @cgivre yes , when hbase region  are distributed as follows  , you select * 
from table , we do not get result.
   
   ```
   NavigableMap regionsToScan = Maps.newTreeMap();
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), 
SERVER_A);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), 
SERVER_A);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), 
SERVER_B);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), 
SERVER_B);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), 
SERVER_D);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), 
SERVER_D);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), 
SERVER_D);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), 
SERVER_D);
   final List endpoints = Lists.newArrayList();
   
endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build());
   
endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build());
   
endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build());
   ```




> Assign region throw exception when some region is deployed on affinity node 
> and some on non-affinity node
> -
>
> Key: DRILL-8482
> URL: https://issues.apache.org/jira/browse/DRILL-8482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
> Attachments: 
> 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch
>
>
> *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe 
> the bug*
>    Assign region throw exception when some region is deployed on affinity 
> node and some on non-affinity node。
> *To Reproduce*
> Steps to reproduce the behavior:
>  # 
> {code:java}
> NavigableMap regionsToScan = Maps.newTreeMap();
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), 
> SERVER_D);
> final List endpoints = Lists.newArrayList();
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build());
> HBaseGroupScan scan = new HBaseGroupScan();
> scan.setRegionsToScan(regionsToScan);
> scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], 
> null));
> scan.applyAssignments(endpoints);{code}
> *Expected behavior*
>  A has 3 regions
>  B has 2 regions
>  C has 3 regions
> *Error detail, log output or screenshots*
> {code:java}
> Caused by: java.lang.NullPointerException: null
>         at 
> org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node

2024-03-01 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822598#comment-17822598
 ] 

ASF GitHub Bot commented on DRILL-8482:
---

cgivre commented on PR #2885:
URL: https://github.com/apache/drill/pull/2885#issuecomment-1973318466

   @shfshihuafeng Is this a bug?




> Assign region throw exception when some region is deployed on affinity node 
> and some on non-affinity node
> -
>
> Key: DRILL-8482
> URL: https://issues.apache.org/jira/browse/DRILL-8482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
> Attachments: 
> 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch
>
>
> *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe 
> the bug*
>    Assign region throw exception when some region is deployed on affinity 
> node and some on non-affinity node。
> *To Reproduce*
> Steps to reproduce the behavior:
>  # 
> {code:java}
> NavigableMap regionsToScan = Maps.newTreeMap();
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), 
> SERVER_D);
> final List endpoints = Lists.newArrayList();
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build());
> HBaseGroupScan scan = new HBaseGroupScan();
> scan.setRegionsToScan(regionsToScan);
> scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], 
> null));
> scan.applyAssignments(endpoints);{code}
> *Expected behavior*
>  A has 3 regions
>  B has 2 regions
>  C has 3 regions
> *Error detail, log output or screenshots*
> {code:java}
> Caused by: java.lang.NullPointerException: null
>         at 
> org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node

2024-03-01 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822459#comment-17822459
 ] 

ASF GitHub Bot commented on DRILL-8482:
---

shfshihuafeng opened a new pull request, #2885:
URL: https://github.com/apache/drill/pull/2885

   … on affinity node and some on non-affinity node
   
   # [DRILL-8482](https://issues.apache.org/jira/browse/DRILL-8482): 
   
   Assign region throw exception when some region is deployed on affinity node 
and some on non-affinity node
   
   ## Description
   
Assign region throw exception when some region is deployed on affinity node 
and some on non-affinity node。
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
 
   Refer to unit test cases on 
TestHBaseRegionScanAssignments#testHBaseGroupScanAssignmentSomeAfinedAndSomeWithOrphans




> Assign region throw exception when some region is deployed on affinity node 
> and some on non-affinity node
> -
>
> Key: DRILL-8482
> URL: https://issues.apache.org/jira/browse/DRILL-8482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
> Attachments: 
> 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch
>
>
> *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe 
> the bug*
>    Assign region throw exception when some region is deployed on affinity 
> node and some on non-affinity node。
> *To Reproduce*
> Steps to reproduce the behavior:
>  # 
> {code:java}
> NavigableMap regionsToScan = Maps.newTreeMap();
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), 
> SERVER_D);
> final List endpoints = Lists.newArrayList();
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build());
> HBaseGroupScan scan = new HBaseGroupScan();
> scan.setRegionsToScan(regionsToScan);
> scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], 
> null));
> scan.applyAssignments(endpoints);{code}
> *Expected behavior*
>  A has 3 regions
>  B has 2 regions
>  C has 3 regions
> *Error detail, log output or screenshots*
> {code:java}
> Caused by: java.lang.NullPointerException: null
>         at 
> org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8475) Update the binary distributions LICENSE

2024-02-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820931#comment-17820931
 ] 

ASF GitHub Bot commented on DRILL-8475:
---

cgivre merged PR #2879:
URL: https://github.com/apache/drill/pull/2879




> Update the binary distributions LICENSE
> ---
>
> Key: DRILL-8475
> URL: https://issues.apache.org/jira/browse/DRILL-8475
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Calvin Kirs
>Assignee: James Turton
>Priority: Blocker
> Fix For: 1.21.2
>
> Attachments: dependencies.txt, drill-dep-list.txt
>
>
> I checked the latest released version, and it does not follow the 
> corresponding rules[1]. This is very important and I hope it will be taken 
> seriously by the PMC team. I'd be happy to do it if needed.
> [1] [https://infra.apache.org/licensing-howto.html#binary]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8475) Update the binary distributions LICENSE

2024-02-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819870#comment-17819870
 ] 

ASF GitHub Bot commented on DRILL-8475:
---

cgivre commented on PR #2879:
URL: https://github.com/apache/drill/pull/2879#issuecomment-1960638549

   @jnturton Are we closed to merging this?




> Update the binary distributions LICENSE
> ---
>
> Key: DRILL-8475
> URL: https://issues.apache.org/jira/browse/DRILL-8475
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Calvin Kirs
>Assignee: James Turton
>Priority: Blocker
> Fix For: 1.21.2
>
> Attachments: dependencies.txt, drill-dep-list.txt
>
>
> I checked the latest released version, and it does not follow the 
> corresponding rules[1]. This is very important and I hope it will be taken 
> seriously by the PMC team. I'd be happy to do it if needed.
> [1] [https://infra.apache.org/licensing-howto.html#binary]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8475) Update the binary distributions LICENSE

2024-02-04 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814095#comment-17814095
 ] 

ASF GitHub Bot commented on DRILL-8475:
---

jnturton commented on PR #2879:
URL: https://github.com/apache/drill/pull/2879#issuecomment-1925767446

   TODO: determine whether too much has been pruned from the JDBC driver, 
specifically libraries related to Kerberos.




> Update the binary distributions LICENSE
> ---
>
> Key: DRILL-8475
> URL: https://issues.apache.org/jira/browse/DRILL-8475
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Calvin Kirs
>Assignee: James Turton
>Priority: Blocker
> Fix For: 1.21.2
>
> Attachments: dependencies.txt, drill-dep-list.txt
>
>
> I checked the latest released version, and it does not follow the 
> corresponding rules[1]. This is very important and I hope it will be taken 
> seriously by the PMC team. I'd be happy to do it if needed.
> [1] [https://infra.apache.org/licensing-howto.html#binary]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8475) The binary version License and NOTICE do not comply with the corresponding terms.

2024-02-04 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814055#comment-17814055
 ] 

ASF GitHub Bot commented on DRILL-8475:
---

jnturton opened a new pull request, #2879:
URL: https://github.com/apache/drill/pull/2879

   # [DRILL-8475](https://issues.apache.org/jira/browse/DRILL-8475): Update the 
binary dist LICENSE
   
   ## Description
   
   The LICENSE file included in the binary distributions of Drill becomes an 
artifact that is generated automatically by the 
org.codehaus.mojo:license-maven-plugin (and so is no longer part of the Git 
source tree). Dependencies that it cannot detect are kept in the 
LICENSE-base.txt file which is combined with the generated license notices by a 
new Freemarker template. Various other dependency related changes are included 
as part of this work. It is still possible that fat jars have introduced hidden 
depedencies but I propose that those are analysed in a subsequent Jira issue.
   
   ## Documentation
   Comments and updated dev docs.
   
   ## Testing
   Comparison of the jars/ directory of a Drill build against the generated 
LICENSE file to check that every bundled jar has a license notice in LICENSE.
   




> The binary version License and NOTICE do not comply with the corresponding 
> terms.
> -
>
> Key: DRILL-8475
> URL: https://issues.apache.org/jira/browse/DRILL-8475
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Calvin Kirs
>Assignee: James Turton
>Priority: Blocker
> Fix For: 1.21.2
>
> Attachments: dependencies.txt, drill-dep-list.txt
>
>
> I checked the latest released version, and it does not follow the 
> corresponding rules[1]. This is very important and I hope it will be taken 
> seriously by the PMC team. I'd be happy to do it if needed.
> [1] [https://infra.apache.org/licensing-howto.html#binary]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810231#comment-17810231
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng opened a new pull request, #2878:
URL: https://github.com/apache/drill/pull/2878

   … (#2876)
   
   # [DRILL-8479](https://issues.apache.org/jira/browse/DRILL-8479):  mergejoin 
leak when Depleting incoming batches throw exception
   
   ## Description
   
   when fragment failed, it call close() from MergeJoinBatch. but if  
leftIterator.close() throw exception, we could not call  rightIterator.close() 
to release memory。
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   
   The test method is the same with link, only one parameter needs to be 
modified,
   set planner.enable_hashjoin =false  to  ensure use mergejoin operator
   [](https://github.com/apache/drill/pull/2875)
   
   




> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810171#comment-17810171
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464222619


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/AbstractHashBinaryRecordBatch.java:
##
@@ -1312,7 +1312,9 @@ private void cleanup() {
 }
 // clean (and deallocate) each partition, and delete its spill file
 for (HashPartition partn : partitions) {
-  partn.close();
+  if (partn != null) {
+partn.close();
+  }

Review Comment:
(partn != null)  are necessary ,see Comment above on 1. fix idea





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810170#comment-17810170
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   ### 2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   
   ```
   //1. add isException parameter when construct HashPartition object
   
   HashPartition(FragmentContext context, BufferAllocator allocator, 
ChainedHashTable baseHashTable,
  RecordBatch buildBatch, RecordBatch probeBatch, 
boolean semiJoin,
  int recordsPerBatch, SpillSet spillSet, int partNum, 
int cycleNum, int numPartitions , boolean **isException** )
   //2. catch except to ensure  HashPartition object has been created
 } catch (OutOfMemoryException oom) {
//do not call  close ,do  not  throw except
 isException =true;
   }
   //3.deal with exception
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions,**isException** );
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810168#comment-17810168
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   ### 2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   
   ```
   //add isException parameter when construct HashPartition object
   
   HashPartition(FragmentContext context, BufferAllocator allocator, 
ChainedHashTable baseHashTable,
  RecordBatch buildBatch, RecordBatch probeBatch, 
boolean semiJoin,
  int recordsPerBatch, SpillSet spillSet, int partNum, 
int cycleNum, int numPartitions , boolean **isException** )
   
 } catch (OutOfMemoryException oom) {
//do not call  close ,do  not  throw except
 isException =true;
   }
   
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions,**isException** );
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders,

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810167#comment-17810167
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   ### 2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   but if 
   
   ```
   //add isException parameter when construct HashPartition object
   
   HashPartition(FragmentContext context, BufferAllocator allocator, 
ChainedHashTable baseHashTable,
  RecordBatch buildBatch, RecordBatch probeBatch, 
boolean semiJoin,
  int recordsPerBatch, SpillSet spillSet, int partNum, 
int cycleNum, int numPartitions , boolean **isException** )
   
 } catch (OutOfMemoryException oom) {
//do not call  close ,do  not  throw except
 isException =true;
   }
   
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions,**isException** );
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
>

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810166#comment-17810166
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   ### 2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   but if 
   
   ```
   HashPartition(FragmentContext context, BufferAllocator allocator, 
ChainedHashTable baseHashTable,
  RecordBatch buildBatch, RecordBatch probeBatch, 
boolean semiJoin,
  int recordsPerBatch, SpillSet spillSet, int partNum, 
int cycleNum, int numPartitions , boolean **isException** )
   
 } catch (OutOfMemoryException oom) {
//do not call  close ,do  not  throw except
 isException =false;
   }
   
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions,**isException** );
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
>

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810162#comment-17810162
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   ### 2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   but if 
   
   ```
 } catch (OutOfMemoryException oom) {
//do not call  close ,only throw except
 throw UserException.memoryError(oom)
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions);
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey =

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810161#comment-17810161
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   but if 
   
   ```
 } catch (OutOfMemoryException oom) {
//do not call  close ,only throw except
 throw UserException.memoryError(oom)
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions);
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey =

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810092#comment-17810092
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1906827568

   Ok, so the geo-ip UDF stuff has no special mechanisms or description about 
those resource files, so the generic code that "scans" must find them and drag 
them along automatically. 
   
   That's the behavior I want. 
   
   What is "Drill's 3rd Party Jar folder"? 
   
   If a magic folder just gets dragged over to all nodes, and drill uses a 
class loader that arranges for jars in that folder to be searched, then there 
is very little to do, since a DFDL schema can be just a set of jar files 
containing related resources, and the classes for Daffodil's own UDFs and 
layers which are java code extensions of its own kind. 




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810091#comment-17810091
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

paul-rogers commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1463921977


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/AbstractHashBinaryRecordBatch.java:
##
@@ -1312,7 +1312,9 @@ private void cleanup() {
 }
 // clean (and deallocate) each partition, and delete its spill file
 for (HashPartition partn : partitions) {
-  partn.close();
+  if (partn != null) {
+partn.close();
+  }

Review Comment:
   The above is OK as a work-around. I wonder, however, where the code added a 
null pointer to the partition list. That should never happen. If it does, it 
should be fixed at the point where the null pointer is added to the list. 
Fixing it here is incomplete: there are other places where we loop through the 
list, and those will also fail.



##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   This call is _probably_ fine. However, the design is that if any operator 
fails, the entire operator stack is closed. So, `close()` should be called by 
the fragment executor. There is probably no harm in calling `close()` here, as 
long as the `close()` method is safe to call twice.
   
   If the fragment executor _does not_ call close when the failure occurs 
during setup, then there is a bug since failing to call `close()` results in 
just this kind of error.





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810070#comment-17810070
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1906689793

   > > > @cgivre @paul-rogers is there an example of a Drill UDF that is not 
part of the drill repository tree?
   > > > I'd like to understand the mechanisms for distributing any jar files 
and dependencies of the UDF that drill uses. I can't find any such in the 
quasi-USFs that are in the Drill tree, because well, since they are part of 
Drill, and so are their dependencies, this problem doesn't exist.
   > > 
   > > 
   > > @mbeckerle Here's an example: 
https://github.com/datadistillr/drill-humanname-functions. I'm sorry we weren't 
able to connect last week.
   > 
   > If I understand this correctly, if a jar is on the classpath and has 
drill-module.conf in its root dir, then drill will find it and read that HOCON 
file to get the package to add to drill.classpath.scanning.packages.
   
   I believe that is correct.
   
   > 
   > Drill then appears to scan jars for class files for those packages. Not 
sure what it is doing with the class files. I imagine it is repackaging them 
somehow so Drill can use them on the drill distributed nodes. But it isn't yet 
clear to me how this aspect works. Do these classes just get loaded on the 
distributed drill nodes? Or is the classpath augmented in some way on the drill 
nodes so that they see a jar that contains all these classes?
   > 
   > I have two questions:
   > 
   > (1) what about dependencies? The UDF may depend on libraries which depend 
on other libraries, etc.
   
   So UDFs are a bit of a special case, but if they do have dependencies, you 
have to also include those JAR files in the UDF directory, or in Drill's 3rd 
party JAR folder.   I'm not that good with maven, but I've often wondered about 
making a so-called fat-JAR which includes the dependencies as part of the UDF 
JAR file.
   
   > 
   > (2) what about non-class files, e.g., things under src/main/resources of 
the project that go into the jar, but aren't "class" files? How do those things 
also get moved? How would code running in the drill node access these? The 
usual method is to call getResource(URL) with a URL that gives the path within 
a jar file to the resource in question.
   
   Take a look at this UDF. 
https://github.com/datadistillr/drill-geoip-functions
   This UDF has a few external resources including a CSV file and the MaxMind 
databases.
   
   
   > 
   > Thanks for any info.
   
   




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810051#comment-17810051
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1906561549

   > > @cgivre @paul-rogers is there an example of a Drill UDF that is not part 
of the drill repository tree?
   > > I'd like to understand the mechanisms for distributing any jar files and 
dependencies of the UDF that drill uses. I can't find any such in the 
quasi-USFs that are in the Drill tree, because well, since they are part of 
Drill, and so are their dependencies, this problem doesn't exist.
   > 
   > @mbeckerle Here's an example: 
https://github.com/datadistillr/drill-humanname-functions. I'm sorry we weren't 
able to connect last week.
   
   If I understand this correctly, if a jar is on the classpath and has 
drill-module.conf in its root dir, then drill will find it and read that HOCON 
file to get the package to add to drill.classpath.scanning.packages. 
   
   Drill then appears to scan jars for class files for those packages. Not sure 
what it is doing with the class files. I imagine it is repackaging them somehow 
so Drill can use them on the drill distributed nodes. But it isn't yet clear to 
me how this aspect works. Do these classes just get loaded on the distributed 
drill nodes? Or is the classpath augmented in some way on the drill nodes so 
that they see a jar that contains all these classes?
   
   I have two questions: 
   
   (1) what about dependencies? The UDF may depend on libraries which depend on 
other libraries, etc. 
   
   (2) what about non-class files, e.g., things under src/main/resources of the 
project that go into the jar, but aren't "class" files? How do those things 
also get moved? How would code running in the drill node access these? The 
usual method is to call getResource(URL) with a URL that gives the path within 
a jar file to the resource in question. 
   
   Thanks for any info. 
   




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809982#comment-17809982
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

jnturton merged PR #2875:
URL: https://github.com/apache/drill/pull/2875




> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809816#comment-17809816
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on PR #2875:
URL: https://github.com/apache/drill/pull/2875#issuecomment-1905599592

   > [An unsued import crept 
in](https://github.com/apache/drill/actions/runs/7622586264/job/20762475705#step:6:1277),
 could you remove it please?
   
   removed it




> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809814#comment-17809814
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

jnturton commented on PR #2875:
URL: https://github.com/apache/drill/pull/2875#issuecomment-1905598192

   [An unsued import crept 
in](https://github.com/apache/drill/actions/runs/7622586264/job/20762475705#step:6:1277),
 could you remove it please?




> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809792#comment-17809792
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1462854154


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();
+  throw UserException.memoryError(oom)
+  .message("OutOfMemory while allocate memory for hash partition.")

Review Comment:
   i resubmit pr and supply test step





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809771#comment-17809771
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

paul-rogers commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1462817821


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();
+  throw UserException.memoryError(oom)
+  .message("OutOfMemory while allocate memory for hash partition.")

Review Comment:
   Suggested: `"Failed to allocate hash partition."`
   
   The `memoryError()` already indicate it is an OOM error.
   



##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/AbstractHashBinaryRecordBatch.java:
##
@@ -1312,7 +1313,9 @@ private void cleanup() {
 }
 // clean (and deallocate) each partition, and delete its spill file
 for (HashPartition partn : partitions) {
-  partn.close();
+  if (Objects.nonNull(partn)) {

Review Comment:
   Simpler `if (partn != null) {`





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809772#comment-17809772
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

paul-rogers commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1462817821


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();
+  throw UserException.memoryError(oom)
+  .message("OutOfMemory while allocate memory for hash partition.")

Review Comment:
   Suggested: `"Failed to allocate hash partition."`
   
   The `memoryError()` already indicates that it is an OOM error.
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-22 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809763#comment-17809763
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng opened a new pull request, #2875:
URL: https://github.com/apache/drill/pull/2875

   # [DRILL-](https://issues.apache.org/jira/browse/DRILL-): PR Title
   
   DRILL-8478. HashPartition memory leak when it allocate memory exception with 
OutOfMemoryException (#2874)
   
   ## Description
   
when allocating memory for hashParttion with OutOfMemoryException,it cause 
memory leak.
beacuase hashpartiton object  cannot be created successfully, so it cannot 
be cleaned up In the closing phase.
   
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   (Please describe how this PR has been tested.)
   




> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-21 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809174#comment-17809174
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1902751729

   > @cgivre @paul-rogers is there an example of a Drill UDF that is not part 
of the drill repository tree?
   > 
   > I'd like to understand the mechanisms for distributing any jar files and 
dependencies of the UDF that drill uses. I can't find any such in the 
quasi-USFs that are in the Drill tree, because well, since they are part of 
Drill, and so are their dependencies, this problem doesn't exist.
   
   
   @mbeckerle Here's an example: 
https://github.com/datadistillr/drill-humanname-functions.I'm sorry we 
weren't able to connect last week.  
   
   




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-21 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809173#comment-17809173
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1902750285

   @cgivre @paul-rogers is there an example of a Drill UDF that is not part of 
the drill repository tree? 
   
   I'd like to understand the mechanisms for distributing any jar files and 
dependencies of the UDF that drill uses. I can't find any such in the 
quasi-USFs that are in the Drill tree, because well, since they are part of 
Drill, and so are their dependencies, this problem doesn't exist. 




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-21 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809172#comment-17809172
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1461099077


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaVisitor.java:
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.daffodil.schema;
+
+import org.apache.daffodil.runtime1.api.ChoiceMetadata;
+import org.apache.daffodil.runtime1.api.ComplexElementMetadata;
+import org.apache.daffodil.runtime1.api.ElementMetadata;
+import org.apache.daffodil.runtime1.api.InfosetSimpleElement;
+import org.apache.daffodil.runtime1.api.MetadataHandler;
+import org.apache.daffodil.runtime1.api.SequenceMetadata;
+import org.apache.daffodil.runtime1.api.SimpleElementMetadata;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Stack;
+
+/**
+ * This class transforms a DFDL/Daffodil schema into a Drill Schema.
+ */
+public class DrillDaffodilSchemaVisitor extends MetadataHandler {
+  private static final Logger logger = 
LoggerFactory.getLogger(DrillDaffodilSchemaVisitor.class);
+  /**
+   * Unfortunately, SchemaBuilder and MapBuilder, while similar, do not share 
a base class so we
+   * have a stack of MapBuilders, and when empty we use the SchemaBuilder

Review Comment:
   This is fixed in the latest commit. Created MapBuilderLike interface shared 
by SchemaBuilder and MapBuilder. I only populated it with the methods I needed. 
   
   The corresponding problem doesn't really occur in the rowWriter area as 
tupleWriter is the common underlying class used. 





> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807233#comment-17807233
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1453422371


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.daffodil;
+
+import org.apache.daffodil.japi.DataProcessor;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader;
+import org.apache.drill.exec.physical.impl.scan.v3.file.FileDescrip;
+import org.apache.drill.exec.physical.impl.scan.v3.file.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import 
org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Objects;
+
+import static 
org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory.*;
+import static 
org.apache.drill.exec.store.daffodil.schema.DrillDaffodilSchemaUtils.daffodilDataProcessorToDrillSchema;
+
+public class DaffodilBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(DaffodilBatchReader.class);
+  private final RowSetLoader rowSetLoader;
+  private final CustomErrorContext errorContext;
+  private final DaffodilMessageParser dafParser;
+  private final InputStream dataInputStream;
+
+  public DaffodilBatchReader(DaffodilReaderConfig readerConfig, EasySubScan 
scan,
+  FileSchemaNegotiator negotiator) {
+
+errorContext = negotiator.parentErrorContext();
+DaffodilFormatConfig dafConfig = readerConfig.plugin.getConfig();
+
+String schemaURIString = dafConfig.getSchemaURI(); // 
"schema/complexArray1.dfdl.xsd";
+String rootName = dafConfig.getRootName();
+String rootNamespace = dafConfig.getRootNamespace();
+boolean validationMode = dafConfig.getValidationMode();
+
+URI dfdlSchemaURI;
+try {
+  dfdlSchemaURI = new URI(schemaURIString);
+} catch (URISyntaxException e) {
+  throw UserException.validationError(e).build(logger);
+}
+
+FileDescrip file = negotiator.file();
+DrillFileSystem fs = file.fileSystem();
+URI fsSchemaURI = fs.getUri().resolve(dfdlSchemaURI);
+
+DaffodilDataProcessorFactory dpf = new DaffodilDataProcessorFactory();
+DataProcessor dp;
+try {
+  dp = dpf.getDataProcessor(fsSchemaURI, validationMode, rootName, 
rootNamespace);
+} catch (CompileFailure e) {
+  throw UserException.dataReadError(e)
+  .message(String.format("Failed to get Daffodil DFDL processor for: 
%s", fsSchemaURI))
+  .addContext(errorContext).addContext(e.getMessage()).build(logger);
+}
+// Create the corresponding Drill schema.
+// Note: this could be a very large schema. Think of a large complex RDBMS 
schema,
+// all of it, hundreds of tables, but all part of the same metadata tree.
+TupleMetadata drillSchema = daffodilDataProcessorToDrillSchema(dp);
+// Inform Drill about the schema
+negotiator.tableSchema(drillSchema, true);
+
+//
+// DATA TIME: Next we construct the runtime objects, and open files.
+//
+// We get the DaffodilMessageParser, which is a stateful driver for 
daffodil that
+// actually does the parsing.
+rowSetLoader = negotiator.build().writer();
+
+// We construct the Daffodil InfosetOutputter which the daffodil parser 
uses to
+// convert infoset event

[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-14 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806651#comment-17806651
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

jnturton merged PR #2515:
URL: https://github.com/apache/drill/pull/2515




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-14 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806649#comment-17806649
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

jnturton commented on code in PR #2515:
URL: https://github.com/apache/drill/pull/2515#discussion_r1446231938


##
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java:
##
@@ -171,107 +168,109 @@ public HDF5ReaderConfig(HDF5FormatPlugin plugin, 
HDF5FormatConfig formatConfig)
 }
   }
 
-  public HDF5BatchReader(HDF5ReaderConfig readerConfig, int maxRecords) {
-this.readerConfig = readerConfig;
-this.maxRecords = maxRecords;
+  public HDF5BatchReader(HDF5ReaderConfig config, EasySubScan scan, 
FileSchemaNegotiator negotiator) {
+errorContext = negotiator.parentErrorContext();
+file = negotiator.file();
+readerConfig = config;
 dataWriters = new ArrayList<>();
-this.showMetadataPreview = readerConfig.formatConfig.showPreview();
-  }
+showMetadataPreview = readerConfig.formatConfig.showPreview();
 
-  @Override
-  public boolean open(FileSchemaNegotiator negotiator) {
-split = negotiator.split();
-errorContext = negotiator.parentErrorContext();
 // Since the HDF file reader uses a stream to actually read the file, the 
file name from the
 // module is incorrect.
-fileName = split.getPath().getName();
-try {
-  openFile(negotiator);
-} catch (IOException e) {
-  throw UserException
-.dataReadError(e)
-.addContext("Failed to close input file: %s", split.getPath())
-.addContext(errorContext)
-.build(logger);
-}
+fileName = file.split().getPath().getName();
 
-ResultSetLoader loader;
-if (readerConfig.defaultPath == null) {
-  // Get file metadata
-  List metadata = getFileMetadata(hdfFile, new 
ArrayList<>());
-  metadataIterator = metadata.iterator();
-
-  // Schema for Metadata query
-  SchemaBuilder builder = new SchemaBuilder()
-.addNullable(PATH_COLUMN_NAME, MinorType.VARCHAR)
-.addNullable(DATA_TYPE_COLUMN_NAME, MinorType.VARCHAR)
-.addNullable(FILE_NAME_COLUMN_NAME, MinorType.VARCHAR)
-.addNullable(DATA_SIZE_COLUMN_NAME, MinorType.BIGINT)
-.addNullable(IS_LINK_COLUMN_NAME, MinorType.BIT)
-.addNullable(ELEMENT_COUNT_NAME, MinorType.BIGINT)
-.addNullable(DATASET_DATA_TYPE_NAME, MinorType.VARCHAR)
-.addNullable(DIMENSIONS_FIELD_NAME, MinorType.VARCHAR);
-
-  negotiator.tableSchema(builder.buildSchema(), false);
-
-  loader = negotiator.build();
-  dimensions = new int[0];
-  rowWriter = loader.writer();
-
-} else {
-  // This is the case when the default path is specified. Since the user 
is explicitly asking for a dataset
-  // Drill can obtain the schema by getting the datatypes below and 
ultimately mapping that schema to columns
-  Dataset dataSet = hdfFile.getDatasetByPath(readerConfig.defaultPath);
-  dimensions = dataSet.getDimensions();
-
-  loader = negotiator.build();
-  rowWriter = loader.writer();
-  writerSpec = new WriterSpec(rowWriter, negotiator.providedSchema(),
-  negotiator.parentErrorContext());
-  if (dimensions.length <= 1) {
-buildSchemaFor1DimensionalDataset(dataSet);
-  } else if (dimensions.length == 2) {
-buildSchemaFor2DimensionalDataset(dataSet);
-  } else {
-// Case for datasets of greater than 2D
-// These are automatically flattened
-buildSchemaFor2DimensionalDataset(dataSet);
+{ // Opens an HDF5 file

Review Comment:
   I guess some of these could become private methods but it's a minor point 
for me.



##
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java:
##
@@ -171,107 +164,104 @@ public HDF5ReaderConfig(HDF5FormatPlugin plugin, 
HDF5FormatConfig formatConfig)
 }
   }
 
-  public HDF5BatchReader(HDF5ReaderConfig readerConfig, int maxRecords) {
-this.readerConfig = readerConfig;
-this.maxRecords = maxRecords;
+  public HDF5BatchReader(HDF5ReaderConfig config, EasySubScan scan, 
FileSchemaNegotiator negotiator) {
+errorContext = negotiator.parentErrorContext();
+file = negotiator.file();
+readerConfig = config;
 dataWriters = new ArrayList<>();
-this.showMetadataPreview = readerConfig.formatConfig.showPreview();
-  }
+showMetadataPreview = readerConfig.formatConfig.showPreview();
 
-  @Override
-  public boolean open(FileSchemaNegotiator negotiator) {
-split = negotiator.split();
-errorContext = negotiator.parentErrorContext();
 // Since the HDF file reader uses a stream to actually read the file, the 
file name from the
 // module is incorrect.
-fileName = split.getPath().getName();
-try {
-  openFile(negotiator);
-} catch (IOException e) {
-  throw

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-14 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806487#comment-17806487
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1890990577

   > > @mbeckerle With respect to style, I tried to reply to that comment, but 
the thread won't let me. In any event, Drill classes will typically start with 
the constructor, then have whatever methods are appropriate for the class. The 
logger creation usually happens before the constructor. I think all of your 
other classes followed this format, so the one or two that didn't kind of 
jumped out at me.
   > 
   > @cgivre I believe the style issues are all fixed. The build did not get 
any codestyle issues.
   
   The issue I was referring to was more around the organization of a few 
classes.  Usually we'll have the constructor (if present) at the top followed 
by any class methods.  I think there was a class or two where the constructor 
was at the bottom or something like that.  In any event, consider the issue 
resolved.




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-14 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806486#comment-17806486
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1451758017


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.daffodil;
+
+import org.apache.daffodil.japi.DataProcessor;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader;
+import org.apache.drill.exec.physical.impl.scan.v3.file.FileDescrip;
+import org.apache.drill.exec.physical.impl.scan.v3.file.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import 
org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Objects;
+
+import static 
org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory.*;
+import static 
org.apache.drill.exec.store.daffodil.schema.DrillDaffodilSchemaUtils.daffodilDataProcessorToDrillSchema;
+
+public class DaffodilBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(DaffodilBatchReader.class);
+  private final RowSetLoader rowSetLoader;
+  private final CustomErrorContext errorContext;
+  private final DaffodilMessageParser dafParser;
+  private final InputStream dataInputStream;
+
+  public DaffodilBatchReader(DaffodilReaderConfig readerConfig, EasySubScan 
scan,
+  FileSchemaNegotiator negotiator) {
+
+errorContext = negotiator.parentErrorContext();
+DaffodilFormatConfig dafConfig = readerConfig.plugin.getConfig();
+
+String schemaURIString = dafConfig.getSchemaURI(); // 
"schema/complexArray1.dfdl.xsd";
+String rootName = dafConfig.getRootName();
+String rootNamespace = dafConfig.getRootNamespace();
+boolean validationMode = dafConfig.getValidationMode();
+
+URI dfdlSchemaURI;
+try {
+  dfdlSchemaURI = new URI(schemaURIString);
+} catch (URISyntaxException e) {
+  throw UserException.validationError(e).build(logger);
+}
+
+FileDescrip file = negotiator.file();
+DrillFileSystem fs = file.fileSystem();
+URI fsSchemaURI = fs.getUri().resolve(dfdlSchemaURI);
+
+DaffodilDataProcessorFactory dpf = new DaffodilDataProcessorFactory();
+DataProcessor dp;
+try {
+  dp = dpf.getDataProcessor(fsSchemaURI, validationMode, rootName, 
rootNamespace);
+} catch (CompileFailure e) {
+  throw UserException.dataReadError(e)
+  .message(String.format("Failed to get Daffodil DFDL processor for: 
%s", fsSchemaURI))
+  .addContext(errorContext).addContext(e.getMessage()).build(logger);
+}
+// Create the corresponding Drill schema.
+// Note: this could be a very large schema. Think of a large complex RDBMS 
schema,
+// all of it, hundreds of tables, but all part of the same metadata tree.
+TupleMetadata drillSchema = daffodilDataProcessorToDrillSchema(dp);
+// Inform Drill about the schema
+negotiator.tableSchema(drillSchema, true);
+
+//
+// DATA TIME: Next we construct the runtime objects, and open files.
+//
+// We get the DaffodilMessageParser, which is a stateful driver for 
daffodil that
+// actually does the parsing.
+rowSetLoader = negotiator.build().writer();
+
+// We construct the Daffodil InfosetOutputter which the daffodil parser 
uses to
+// convert infoset event

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-14 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806484#comment-17806484
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1451757410


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaVisitor.java:
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.daffodil.schema;
+
+import org.apache.daffodil.runtime1.api.ChoiceMetadata;
+import org.apache.daffodil.runtime1.api.ComplexElementMetadata;
+import org.apache.daffodil.runtime1.api.ElementMetadata;
+import org.apache.daffodil.runtime1.api.InfosetSimpleElement;
+import org.apache.daffodil.runtime1.api.MetadataHandler;
+import org.apache.daffodil.runtime1.api.SequenceMetadata;
+import org.apache.daffodil.runtime1.api.SimpleElementMetadata;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Stack;
+
+/**
+ * This class transforms a DFDL/Daffodil schema into a Drill Schema.
+ */
+public class DrillDaffodilSchemaVisitor extends MetadataHandler {
+  private static final Logger logger = 
LoggerFactory.getLogger(DrillDaffodilSchemaVisitor.class);
+  /**
+   * Unfortunately, SchemaBuilder and MapBuilder, while similar, do not share 
a base class so we
+   * have a stack of MapBuilders, and when empty we use the SchemaBuilder

Review Comment:
   This is likely music to @paul-rogers's ears.





> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-14 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806482#comment-17806482
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1451756763


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java:
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.daffodil.schema;
+
+import org.apache.daffodil.japi.Compiler;
+import org.apache.daffodil.japi.Daffodil;
+import org.apache.daffodil.japi.DataProcessor;
+import org.apache.daffodil.japi.Diagnostic;
+import org.apache.daffodil.japi.InvalidParserException;
+import org.apache.daffodil.japi.InvalidUsageException;
+import org.apache.daffodil.japi.ProcessorFactory;
+import org.apache.daffodil.japi.ValidationMode;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.nio.channels.Channels;
+import java.util.List;
+import java.util.Objects;
+
+/**
+ * Compiles a DFDL schema (mostly for tests) or loads a pre-compiled DFDL 
schema so that one can
+ * obtain a DataProcessor for use with DaffodilMessageParser.
+ * 
+ * TODO: Needs to use a cache to avoid reloading/recompiling every time.
+ */
+public class DaffodilDataProcessorFactory {
+  // Default constructor is used.
+
+  private static final Logger logger = 
LoggerFactory.getLogger(DaffodilDataProcessorFactory.class);
+
+  private DataProcessor dp;
+
+  /**
+   * Gets a Daffodil DataProcessor given the necessary arguments to compile or 
reload it.
+   *
+   * @param schemaFileURI
+   * pre-compiled dfdl schema (.bin extension) or DFDL schema source (.xsd 
extension)
+   * @param validationMode
+   * Use true to request Daffodil built-in 'limited' validation. Use false 
for no validation.
+   * @param rootName
+   * Local name of root element of the message. Can be null to use the 
first element declaration
+   * of the primary schema file. Ignored if reloading a pre-compiled 
schema.
+   * @param rootNS
+   * Namespace URI as a string. Can be null to use the target namespace of 
the primary schema
+   * file or if it is unambiguous what element is the rootName. Ignored if 
reloading a
+   * pre-compiled schema.
+   * @return the DataProcessor
+   * @throws CompileFailure
+   * - if schema compilation fails
+   */
+  public DataProcessor getDataProcessor(URI schemaFileURI, boolean 
validationMode, String rootName,
+  String rootNS)
+  throws CompileFailure {
+
+DaffodilDataProcessorFactory dmp = new DaffodilDataProcessorFactory();
+boolean isPrecompiled = schemaFileURI.toString().endsWith(".bin");
+if (isPrecompiled) {
+  if (Objects.nonNull(rootName) && !rootName.isEmpty()) {
+// A usage error. You shouldn't supply the name and optionally 
namespace if loading
+// precompiled schema because those are built into it. Should be null 
or "".
+logger.warn("Root element name '{}' is ignored when used with 
precompiled DFDL schema.",
+rootName);
+  }
+  try {
+dmp.loadSchema(schemaFileURI);
+  } catch (IOException | InvalidParserException e) {
+throw new CompileFailure(e);
+  }
+  dmp.setupDP(validationMode, null);
+} else {
+  List pfDiags;
+  try {
+pfDiags = dmp.compileSchema(schemaFileURI, rootName, rootNS);
+  } catch (URISyntaxException | IOException e) {
+throw new CompileFailure(e);
+  }
+  dmp.setupDP(validationMode, pfDiags);
+}
+return dmp.dp;
+  }
+
+  private void loadSchema(URI schemaFileURI) throws IOException, 
InvalidParserException {
+Compiler c = Daffodil.compiler();
+dp = c.reload(Channels.newChannel(schemaFileURI.toURL().openStream()));

Review Comment:
   This definitely seems like an area where there is potential for a lot of 
different things to go wrong.  My view is we should just do our best to provide 
clear error

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 27960 matches

Mail list logo