[ https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789007#comment-17789007 ]
zhangbutao commented on HIVE-27900: ----------------------------------- I can not reproduce this issue on master code. My env is : 1) Hive master branch: You can compile hive code using the cmd: {code:java} mvn clean install -DskipTests -Piceberg -Pdist{code} 2)Tez 0.10.2 I recommend you use 0.10.2 to test as 0.10.3 is not released. We can not make sure 0.10.3 work well with Hive. 3) Hadoop 3.3.1 BTW, if the table _*local.test.b_qqd_shop_rfm_parquet_snappy* is_ empty without data, will the issue occur again in your env? > hive can not read iceberg-parquet table > --------------------------------------- > > Key: HIVE-27900 > URL: https://issues.apache.org/jira/browse/HIVE-27900 > Project: Hive > Issue Type: Bug > Components: Iceberg integration > Affects Versions: 4.0.0-beta-1 > Reporter: yongzhi.shao > Priority: Major > > We found that using HIVE4-BETA version, we could not query the > Iceberg-Parquet table with vectorised execution turned on. > {code:java} > --spark-sql(3.4.1+iceberg 1.4.2) > CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy ( > a string,b string,c string) > USING iceberg > LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy' > TBLPROPERTIES ( > 'current-snapshot-id' = '5138351937447353683', > 'format' = 'iceberg/parquet', > 'format-version' = '2', > 'read.orc.vectorization.enabled' = 'true', > 'write.format.default' = 'parquet', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.parquet.compression-codec' = 'snappy'); > --hive-sql > CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION > 'hdfs://xxxxxxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > set hive.default.fileformat=orc; > set hive.default.fileformat.managed=orc; > create table test_parquet_as_orc as select * from > b_qqd_shop_rfm_parquet_snappy limit 100; > , TaskAttempt 2 failed, info=[Error: Node: xxxx/xxx.xxxx.xx.xx: Error while > running task ( failure ) : > attempt_1696729618575_69586_1_00_000000_2:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137) > at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) > at > org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator.process(VectorLimitOperator.java:108) > at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878) > ... 20 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream.write(NonSyncByteArrayOutputStream.java:110) > at > org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite.writeString(LazyBinarySerializeWrite.java:280) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow$VectorSerializeStringWriter.serialize(VectorSerializeRow.java:532) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:316) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:297) > at > org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:113) > ... 28 more {code} > > # TEZ_VERSION 0.10.3 SNAPSHOT > # iceberg table is cow table. insert small data will get same error. > # using orc-iceberg is ok. > # disable vectorized and using iceberg-parquet is ok. -- This message was sent by Atlassian Jira (v8.20.10#820010)