[jira] [Created] (DRILL-8495) Tried to remove unmanaged buffer
Maksym Rymar created DRILL-8495: --- Summary: Tried to remove unmanaged buffer Key: DRILL-8495 URL: https://issues.apache.org/jira/browse/DRILL-8495 Project: Apache Drill Issue Type: Bug Affects Versions: 1.21.1 Reporter: Maksym Rymar Assignee: Maksym Rymar Drill throws an exception on Hive table: {code:java} (java.lang.IllegalStateException) Tried to remove unmanaged buffer. org.apache.drill.exec.ops.BufferManagerImpl.replace():51 io.netty.buffer.DrillBuf.reallocIfNeeded():101 org.apache.drill.exec.store.hive.writers.primitive.HiveStringWriter.write():38 org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.readHiveRecordAndInsertIntoRecordBatch():416 org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader.next():402 org.apache.drill.exec.physical.impl.ScanBatch.internalNext():235 org.apache.drill.exec.physical.impl.ScanBatch.next():299 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractRecordBatch.next():101 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():59 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():93 org.apache.drill.exec.record.AbstractRecordBatch.next():160 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237 org.apache.drill.exec.physical.impl.BaseRootExec.next():103 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 org.apache.drill.exec.physical.impl.BaseRootExec.next():93 org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0():321 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1899 org.apache.drill.exec.work.fragment.FragmentExecutor.run():310 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 {code} Reproduce: # Create Hive table: {code:java} create table if NOT EXISTS students(id int, name string, surname string) stored as parquet;{code} # Insert a new row with 2 string values of size > 256 bytes: {code:java} insert into students values (1, 'Veeery long name', 'bg surname');{code} # Execute Drill query: {code:java} select * from hive.`students` {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8492) Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values
[ https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845996#comment-17845996 ] ASF GitHub Bot commented on DRILL-8492: --- handmadecode commented on PR #2907: URL: https://github.com/apache/drill/pull/2907#issuecomment-2108103905 > Follow up: I see use of the FragmentContext class for accessing config options in the old Parquet reader, perhaps it's a good a vehicle... `FragmentContext` is used to access the new config options where an instance already was available, i.e. in `ColumnReaderFactory`, `ParquetToDrillTypeConverter`, and `DrillParquetGroupConverter`. `FileMetadataCollector` doesn't have access to a `FragmentContext`, only to a `ParquetReaderConfig`. I can only find a FragmentContext in one of the two call paths to `FileMetadataCollector::addColumnMetadata`, so trying to inject a `FragmentContext` into `FileMetadataCollector` will probably have an impact on quite a few other classes. > Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit > integer values > --- > > Key: DRILL-8492 > URL: https://issues.apache.org/jira/browse/DRILL-8492 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.21.1 >Reporter: Peter Franzen >Priority: Major > > When reading Parquet columns of type {{time_micros}} and > {{{}timestamp_micros{}}}, Drill truncates the microsecond values to > milliseconds in order to convert them to SQL timestamps. > It is currently not possible to read the original microsecond values (as > 64-bit values, not SQL timestamps) through Drill. > One solution for allowing reading the original 64-bit values is to add two > options similar to “store.parquet.reader.int96_as_timestamp" to control > whether microsecond > times and timestamps are truncated to millisecond timestamps or read as > non-truncated 64-bit values. > These options would be added to {{org.apache.drill.exec.ExecConstants}} and > {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}. > They would also be added to "drill-module.conf": > {{ store.parquet.reader.time_micros_as_int64: false,}} > {{ store.parquet.reader.timestamp_micros_as_int64: false,}} > These options would then be used in the same places as > {{{}store.parquet.reader.int96_as_timestamp{}}}: > * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory > * > org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter > * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter > to create an int64 reader instead of a time/timestamp reader when the > correspondning option is set to true. > In addition to this, > {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must > be altered to _not_ truncate the min and max values for > time_micros/timestamp_micros if the corresponding option is true. This class > doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options > must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} > instance is created. > Filtering on microsecond columns would be done using 64-bit values rather > than TIME/TIMESTAMP values when the new options are true, e.g. > {{SELECT * FROM WHERE = 1705914906694751;}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8492) Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit integer values
[ https://issues.apache.org/jira/browse/DRILL-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845816#comment-17845816 ] ASF GitHub Bot commented on DRILL-8492: --- jnturton commented on PR #2907: URL: https://github.com/apache/drill/pull/2907#issuecomment-2106890548 > However, I could very well have overlooked some way to access the global configuration, and I'd be grateful for any pointers. We should existing find examples in the Parquet format plugin. E.g. the "old" reader is affected by the option store.parquet.reader.pagereader.async. > Allow Parquet TIME_MICROS and TIMESTAMP_MICROS columns to be read as 64-bit > integer values > --- > > Key: DRILL-8492 > URL: https://issues.apache.org/jira/browse/DRILL-8492 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.21.1 >Reporter: Peter Franzen >Priority: Major > > When reading Parquet columns of type {{time_micros}} and > {{{}timestamp_micros{}}}, Drill truncates the microsecond values to > milliseconds in order to convert them to SQL timestamps. > It is currently not possible to read the original microsecond values (as > 64-bit values, not SQL timestamps) through Drill. > One solution for allowing reading the original 64-bit values is to add two > options similar to “store.parquet.reader.int96_as_timestamp" to control > whether microsecond > times and timestamps are truncated to millisecond timestamps or read as > non-truncated 64-bit values. > These options would be added to {{org.apache.drill.exec.ExecConstants}} and > {{{}org.apache.drill.exec.server.options.SystemOptionManager{}}}. > They would also be added to "drill-module.conf": > {{ store.parquet.reader.time_micros_as_int64: false,}} > {{ store.parquet.reader.timestamp_micros_as_int64: false,}} > These options would then be used in the same places as > {{{}store.parquet.reader.int96_as_timestamp{}}}: > * org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory > * > org.apache.drill.exec.store.parquet.columnreaders.ParquetToDrillTypeConverter > * org.apache.drill.exec.store.parquet2.DrillParquetGroupConverter > to create an int64 reader instead of a time/timestamp reader when the > correspondning option is set to true. > In addition to this, > {{org.apache.drill.exec.store.parquet.metadata.FileMetadataCollector }}must > be altered to _not_ truncate the min and max values for > time_micros/timestamp_micros if the corresponding option is true. This class > doesn’t have a reference to an {{{}OptionManager{}}}, so the two new options > must be extracted from the {{OptionManager}} when the {{ParquetReaderConfig}} > instance is created. > Filtering on microsecond columns would be done using 64-bit values rather > than TIME/TIMESTAMP values when the new options are true, e.g. > {{SELECT * FROM WHERE = 1705914906694751;}} -- This message was sent by Atlassian Jira (v8.20.10#820010)