[ 
https://issues.apache.org/jira/browse/HUDI-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346580#comment-17346580
 ] 

satish commented on HUDI-1912:
------------------------------

[~bhasudha] [~vinoth]  [~shivnarayan] FYI, Appreciate if we can figure out 
right way to fix this. Right now, this path is based on static annotation 
'UseRecordReaderFromInputFormat'. But we want regular COW tables to use parquet 
page source to leverage built-in optimizations.

> Presto defaults to GenericHiveRecordCursor for all Hudi tables
> --------------------------------------------------------------
>
>                 Key: HUDI-1912
>                 URL: https://issues.apache.org/jira/browse/HUDI-1912
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Presto Integration
>            Reporter: satish
>            Priority: Blocker
>
> See code here 
> https://github.com/prestodb/presto/blob/2ad67dcf000be86ebc5ff7732bbb9994c8e324a8/presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetPageSourceFactory.java#L168
> Starting Hudi 0.7, HoodieInputFormat comes with 
> UseRecordReaderFromInputFormat annotation. As a result, we are skipping all 
> optimizations in parquet PageSource and using basic GenericHiveRecordCursor 
> which has several limitations:
> 1) No support for timestamp
> 2) No support for synthesized columns
> 3) No support for vectorized reading?
> Example errors we saw:
> Error#1
> {code}
> java.lang.IllegalStateException: column type must be regular
>       at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:507)
>       at 
> com.facebook.presto.hive.GenericHiveRecordCursor.<init>(GenericHiveRecordCursor.java:167)
>       at 
> com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:79)
>       at 
> com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:449)
>       at 
> com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:177)
>       at 
> com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:63)
>       at 
> com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:80)
>       at 
> com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:231)
>       at com.facebook.presto.operator.Driver.processInternal(Driver.java:418)
>       at 
> com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301)
>       at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722)
>       at com.facebook.presto.operator.Driver.processFor(Driver.java:294)
>       at 
> com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
>       at 
> com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
>       at 
> com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
>       at 
> com.facebook.presto.$gen.Presto_0_247_17f857e____20210506_210241_1.run(Unknown
>  Source)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834) 
> {code}
> Error#2
> {code}
> java.lang.ClassCastException: class org.apache.hadoop.io.LongWritable cannot 
> be cast to class org.apache.hadoop.hive.serde2.io.TimestampWritable 
> (org.apache.hadoop.io.LongWritable and 
> org.apache.hadoop.hive.serde2.io.TimestampWritable are in unnamed module of 
> loader com.facebook.presto.server.PluginClassLoader @5c4e86e7)
>       at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:39)
>       at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:25)
>       at 
> com.facebook.presto.hive.GenericHiveRecordCursor.parseLongColumn(GenericHiveRecordCursor.java:286)
>       at 
> com.facebook.presto.hive.GenericHiveRecordCursor.parseColumn(GenericHiveRecordCursor.java:550)
>       at 
> com.facebook.presto.hive.GenericHiveRecordCursor.isNull(GenericHiveRecordCursor.java:508)
>       at 
> com.facebook.presto.hive.HiveRecordCursor.isNull(HiveRecordCursor.java:233)
>       at 
> com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:112)
>       at 
> com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:251)
>       at com.facebook.presto.operator.Driver.processInternal(Driver.java:418)
>       at 
> com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301)
>       at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722)
>       at com.facebook.presto.operator.Driver.processFor(Driver.java:294)
>       at 
> com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
>       at 
> com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
>       at 
> com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
>       at 
> com.facebook.presto.$gen.Presto_0_247_17f857e____20210506_210241_1.run(Unknown
>  Source)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> In addition to errors above, performance also seems to have slowed down 
> substantially.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to