[jira] [Commented] (FLINK-25529) java.lang.ClassNotFoundException: org.apache.orc.PhysicalWriter when write bulkly into hive-2.1.1 orc table

Yuan Zhu (Jira) Mon, 14 Feb 2022 04:34:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-25529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17491959#comment-17491959
 ]


Yuan Zhu commented on FLINK-25529:
----------------------------------

Replacing orc-core-1.5.6 with orc-core-1.5.6-nohive will invoke Exception too.
{code:java}
Caused by: java.lang.NoSuchMethodError: 
org.apache.orc.TypeDescription.createRowBatch()Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch;
    at org.apache.flink.orc.writer.OrcBulkWriter.<init>(OrcBulkWriter.java:47)  
  at 
org.apache.flink.orc.writer.OrcBulkWriterFactory.create(OrcBulkWriterFactory.java:106)
    at 
org.apache.flink.table.filesystem.FileSystemTableSink$ProjectionBulkFactory.create(FileSystemTableSink.java:593)
    at 
org.apache.flink.streaming.api.functions.sink.filesystem.BulkBucketWriter.openNew(BulkBucketWriter.java:75)
    at 
org.apache.flink.streaming.api.functions.sink.filesystem.OutputStreamBasedPartFileWriter$OutputStreamBasedBucketWriter.openNewInProgressFile(OutputStreamBasedPartFileWriter.java:90)
    at 
org.apache.flink.streaming.api.functions.sink.filesystem.BulkBucketWriter.openNewInProgressFile(BulkBucketWriter.java:36)
    at 
org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.rollPartFile(Bucket.java:243)
    at 
org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.write(Bucket.java:220)
    at 
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.onElement(Buckets.java:305)
    at 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.onElement(StreamingFileSinkHelper.java:103)
    at 
org.apache.flink.table.filesystem.stream.AbstractStreamingWriter.processElement(AbstractStreamingWriter.java:140)
 {code}
There is only 
org.apache.orc.TypeDescription.createRowBatch()Lorg.apache.orc.storage.ql.exec.vector.VectorizedRowBatch
 in orc-core-1.5.6-nohive.jar.

 

It seems there is only one way that is taking a workaround.

> java.lang.ClassNotFoundException: org.apache.orc.PhysicalWriter when write 
> bulkly into hive-2.1.1 orc table
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-25529
>                 URL: https://issues.apache.org/jira/browse/FLINK-25529
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>         Environment: hive 2.1.1
> flink 1.12.4
>            Reporter: Yuan Zhu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: lib.jpg
>
>
> I tried to write data bulkly into hive-2.1.1 with orc format, and encountered 
> java.lang.ClassNotFoundException: org.apache.orc.PhysicalWriter
>  
> Using bulk writer by setting table.exec.hive.fallback-mapred-writer = false;
>  
> {code:java}
> SET 'table.sql-dialect'='hive';
> create table orders(
>     order_id int,
>     order_date timestamp,
>     customer_name string,
>     price decimal(10,3),
>     product_id int,
>     order_status boolean
> )partitioned by (dt string)
> stored as orc;
>  
> SET 'table.sql-dialect'='default';
> create table datagen_source (
> order_id int,
> order_date timestamp(9),
> customer_name varchar,
> price decimal(10,3),
> product_id int,
> order_status boolean
> )with('connector' = 'datagen');
> create catalog myhive with ('type' = 'hive', 'hive-conf-dir' = '/mnt/conf');
> set table.exec.hive.fallback-mapred-writer = false;
> insert into myhive.`default`.orders
> /*+ OPTIONS(
>     'sink.partition-commit.trigger'='process-time',
>     'sink.partition-commit.policy.kind'='metastore,success-file',
>     'sink.rolling-policy.file-size'='128MB',
>     'sink.rolling-policy.rollover-interval'='10s',
>     'sink.rolling-policy.check-interval'='10s',
>     'auto-compaction'='true',
>     'compaction.file-size'='1MB'    ) */
> select * , date_format(now(),'yyyy-MM-dd') as dt from datagen_source;  {code}
> [ERROR] Could not execute SQL statement. Reason:
> java.lang.ClassNotFoundException: org.apache.orc.PhysicalWriter
>  
> My jars in lib dir listed in attachment.
> In HiveTableSink#createStreamSink(line:270), createBulkWriterFactory if 
> table.exec.hive.fallback-mapred-writer is false.
> If table is orc, HiveShimV200#createOrcBulkWriterFactory will be invoked. 
> OrcBulkWriterFactory depends on org.apache.orc.PhysicalWriter in orc-core, 
> but flink-connector-hive excludes orc-core for conflicting with hive-exec.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-25529) java.lang.ClassNotFoundException: org.apache.orc.PhysicalWriter when write bulkly into hive-2.1.1 orc table

Reply via email to