[ https://issues.apache.org/jira/browse/HUDI-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danny Chen updated HUDI-1912: ----------------------------- Fix Version/s: 0.11.0 (was: 0.10.0) > Presto defaults to GenericHiveRecordCursor for all Hudi tables > -------------------------------------------------------------- > > Key: HUDI-1912 > URL: https://issues.apache.org/jira/browse/HUDI-1912 > Project: Apache Hudi > Issue Type: Sub-task > Components: Presto Integration > Affects Versions: 0.7.0 > Reporter: satish > Assignee: Sagar Sumit > Priority: Blocker > Fix For: 0.11.0 > > > See code here > https://github.com/prestodb/presto/blob/2ad67dcf000be86ebc5ff7732bbb9994c8e324a8/presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetPageSourceFactory.java#L168 > Starting Hudi 0.7, HoodieInputFormat comes with > UseRecordReaderFromInputFormat annotation. As a result, we are skipping all > optimizations in parquet PageSource and using basic GenericHiveRecordCursor > which has several limitations: > 1) No support for timestamp > 2) No support for synthesized columns > 3) No support for vectorized reading? > Example errors we saw: > Error#1 > {code} > java.lang.IllegalStateException: column type must be regular > at > com.google.common.base.Preconditions.checkState(Preconditions.java:507) > at > com.facebook.presto.hive.GenericHiveRecordCursor.<init>(GenericHiveRecordCursor.java:167) > at > com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:79) > at > com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:449) > at > com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:177) > at > com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:63) > at > com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:80) > at > com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:231) > at com.facebook.presto.operator.Driver.processInternal(Driver.java:418) > at > com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301) > at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722) > at com.facebook.presto.operator.Driver.processFor(Driver.java:294) > at > com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077) > at > com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) > at > com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545) > at > com.facebook.presto.$gen.Presto_0_247_17f857e____20210506_210241_1.run(Unknown > Source) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > {code} > Error#2 > {code} > java.lang.ClassCastException: class org.apache.hadoop.io.LongWritable cannot > be cast to class org.apache.hadoop.hive.serde2.io.TimestampWritable > (org.apache.hadoop.io.LongWritable and > org.apache.hadoop.hive.serde2.io.TimestampWritable are in unnamed module of > loader com.facebook.presto.server.PluginClassLoader @5c4e86e7) > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:39) > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:25) > at > com.facebook.presto.hive.GenericHiveRecordCursor.parseLongColumn(GenericHiveRecordCursor.java:286) > at > com.facebook.presto.hive.GenericHiveRecordCursor.parseColumn(GenericHiveRecordCursor.java:550) > at > com.facebook.presto.hive.GenericHiveRecordCursor.isNull(GenericHiveRecordCursor.java:508) > at > com.facebook.presto.hive.HiveRecordCursor.isNull(HiveRecordCursor.java:233) > at > com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:112) > at > com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:251) > at com.facebook.presto.operator.Driver.processInternal(Driver.java:418) > at > com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301) > at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722) > at com.facebook.presto.operator.Driver.processFor(Driver.java:294) > at > com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077) > at > com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) > at > com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545) > at > com.facebook.presto.$gen.Presto_0_247_17f857e____20210506_210241_1.run(Unknown > Source) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > {code} > In addition to errors above, performance also seems to have slowed down > substantially. -- This message was sent by Atlassian Jira (v8.20.1#820001)