voon created HUDI-6103:
--------------------------

             Summary: Validate that fieldNames are valid for streaming reads
                 Key: HUDI-6103
                 URL: https://issues.apache.org/jira/browse/HUDI-6103
             Project: Apache Hudi
          Issue Type: Improvement
          Components: flink, flink-sql
            Reporter: voon
            Assignee: voon


The current error message that is thrown when an invalid fieldName is provided 
in the FlinkSQL table source DDL is ambiguous and not helpful.

 

Example:

 
{code:java}
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
    at 
org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.lambda$genPartColumnarRowReader$0(ParquetSplitReaderUtil.java:119)
    at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
    at 
java.util.Spliterators$IntArraySpliterator.forEachRemaining(Spliterators.java:1032)
    at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
    at 
org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.genPartColumnarRowReader(ParquetSplitReaderUtil.java:121)
    at 
org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:56)
    at 
org.apache.hudi.table.format.mor.MergeOnReadInputFormat.getBaseFileIterator(MergeOnReadInputFormat.java:341)
    at 
org.apache.hudi.table.format.mor.MergeOnReadInputFormat.getBaseFileIterator(MergeOnReadInputFormat.java:316)
    at 
org.apache.hudi.table.format.mor.MergeOnReadInputFormat.initIterator(MergeOnReadInputFormat.java:200)
    at 
org.apache.hudi.table.format.mor.MergeOnReadInputFormat.open(MergeOnReadInputFormat.java:185)
    at 
org.apache.hudi.table.format.mor.MergeOnReadInputFormat.open(MergeOnReadInputFormat.java:91)
    at 
org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84)
    at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
    at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
    at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)
 {code}
 

 

Such user-errors can be easily fixed, but the error message that is thrown does 
not make such an error obvious. 

 

This Jira ticket aims to add better error message so to make such errors more 
obvious.

 

 

*How to trigger the error*

For a source table with a schema as such:

 
{code:java}
CREATE TABLE `table_with_correct_schema` (
    `id` INT,
    `user_id` INT,
    `name` STRING,
    `partition_col` STRING
) PARTITIONED BY (`partition_col`)
WITH (
    'connector' = 'hudi',
     ...
){code}
 

Change a column to an incorrect name as such when reading:
{code:java}
CREATE TABLE `table_with_correct_schema` (
    `id` INT,
    `user_id_with_typo123` INT,
    `name` STRING,
    `partition_col` STRING
) PARTITIONED BY (`partition_col`)
WITH (
    'connector' = 'hudi',
     ...
){code}
 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to