[jira] [Created] (DRILL-8495) Tried to remove unmanaged buffer

2024-05-13 Thread Maksym Rymar (Jira)
Maksym Rymar created DRILL-8495:
---

 Summary: Tried to remove unmanaged buffer
 Key: DRILL-8495
 URL: https://issues.apache.org/jira/browse/DRILL-8495
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.21.1
Reporter: Maksym Rymar
Assignee: Maksym Rymar


 

Drill throws an exception on Hive table:
{code:java}
  (java.lang.IllegalStateException) Tried to remove unmanaged buffer.
    org.apache.drill.exec.ops.BufferManagerImpl.replace():51
    io.netty.buffer.DrillBuf.reallocIfNeeded():101
    
org.apache.drill.exec.store.hive.writers.primitive.HiveStringWriter.write():38
    
org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.readHiveRecordAndInsertIntoRecordBatch():416
    org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.next():402
    org.apache.drill.exec.physical.impl.ScanBatch.internalNext():235
    org.apache.drill.exec.physical.impl.ScanBatch.next():299
    
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractRecordBatch.next():101
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():59
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():93
    org.apache.drill.exec.record.AbstractRecordBatch.next():160
    
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
    org.apache.drill.exec.physical.impl.BaseRootExec.next():103
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
    org.apache.drill.exec.physical.impl.BaseRootExec.next():93
    org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0():321
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1899
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():310
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():748 {code}
 

 

Reproduce:
 # Create Hive table:

{code:java}
create table if NOT EXISTS students(id int, name string, surname string) stored 
as parquet;{code}

 # Insert a new row with 2 string values of size > 256 bytes:

{code:java}
insert into students values (1, 
'Veeery
 long name', 
'bg
 surname');{code}

 # Execute Drill query:

{code:java}
select * from hive.`students` {code}

 
 
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8492) Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values

2024-05-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845996#comment-17845996
 ] 

ASF GitHub Bot commented on DRILL-8492:
---

handmadecode commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2108103905

   > Follow up: I see use of the FragmentContext class for accessing config 
options in the old Parquet reader, perhaps it's a good a vehicle...
   
   `FragmentContext` is used to access the new config options where an instance 
already was available, i.e. in `ColumnReaderFactory`, 
`ParquetToDrillTypeConverter`, and `DrillParquetGroupConverter`.
   
   `FileMetadataCollector` doesn't have access to a `FragmentContext`, only to 
a `ParquetReaderConfig`.
   I can only find a FragmentContext in one of the two call paths to 
`FileMetadataCollector::addColumnMetadata`, so trying to inject a 
`FragmentContext` into `FileMetadataCollector` will probably have an impact on 
quite a few other classes.




> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> ---
>
> Key: DRILL-8492
> URL: https://issues.apache.org/jira/browse/DRILL-8492
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.21.1
>Reporter: Peter Franzen
>Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM  WHERE  = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8492) Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values

2024-05-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845816#comment-17845816
 ] 

ASF GitHub Bot commented on DRILL-8492:
---

jnturton commented on PR #2907:
URL: https://github.com/apache/drill/pull/2907#issuecomment-2106890548

   > However, I could very well have overlooked some way to access the global 
configuration, and I'd be grateful for any pointers.
   
   We should existing find examples in the Parquet format plugin. E.g. the 
"old" reader is affected by the option store.parquet.reader.pagereader.async.




> Allow Parquet TIME_MICROS and TIMESTAMP_MICROS  columns to be read as 64-bit 
> integer values
> ---
>
> Key: DRILL-8492
> URL: https://issues.apache.org/jira/browse/DRILL-8492
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.21.1
>Reporter: Peter Franzen
>Priority: Major
>
> When reading Parquet columns of type {{time_micros}} and 
> {{{}timestamp_micros{}}}, Drill truncates the microsecond values to 
> milliseconds in order to convert them to SQL timestamps.
> It is currently not possible to read the original microsecond values (as 
> 64-bit values, not SQL timestamps) through Drill.
> One solution for allowing reading the original 64-bit values is to add two 
> options similar to “store.parquet.reader.int96_as_timestamp" to control 
> whether microsecond
> times and timestamps are truncated to millisecond timestamps or read as 
> non-truncated 64-bit values.
> These options would be added to {{org.apache.drill.exec.ExecConstants}} and
> {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}.
> They would also be added to "drill-module.conf":
> {{   store.parquet.reader.time_micros_as_int64: false,}}
> {{   store.parquet.reader.timestamp_micros_as_int64: false,}}
> These options would then be used in the same places as 
> {{{}store.parquet.reader.int96_as_timestamp{}}}:
>  * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
>  * 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter
>  * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter
> to create an int64 reader instead of a time/timestamp reader when the 
> correspondning option is set to true.
> In addition to this, 
> {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must 
> be altered to _not_ truncate the min and max values for 
> time_micros/timestamp_micros if the corresponding option is true. This class 
> doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options 
> must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} 
> instance is created.
> Filtering on microsecond columns would be done using 64-bit values rather 
> than TIME/TIMESTAMP values when the new options are true, e.g.
> {{SELECT *  FROM  WHERE  = 1705914906694751;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)