voon created HUDI-6103: -------------------------- Summary: Validate that fieldNames are valid for streaming reads Key: HUDI-6103 URL: https://issues.apache.org/jira/browse/HUDI-6103 Project: Apache Hudi Issue Type: Improvement Components: flink, flink-sql Reporter: voon Assignee: voon
The current error message that is thrown when an invalid fieldName is provided in the FlinkSQL table source DDL is ambiguous and not helpful. Example: {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.lambda$genPartColumnarRowReader$0(ParquetSplitReaderUtil.java:119) at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250) at java.util.Spliterators$IntArraySpliterator.forEachRemaining(Spliterators.java:1032) at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.genPartColumnarRowReader(ParquetSplitReaderUtil.java:121) at org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:56) at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.getBaseFileIterator(MergeOnReadInputFormat.java:341) at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.getBaseFileIterator(MergeOnReadInputFormat.java:316) at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.initIterator(MergeOnReadInputFormat.java:200) at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.open(MergeOnReadInputFormat.java:185) at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.open(MergeOnReadInputFormat.java:91) at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333) {code} Such user-errors can be easily fixed, but the error message that is thrown does not make such an error obvious. This Jira ticket aims to add better error message so to make such errors more obvious. *How to trigger the error* For a source table with a schema as such: {code:java} CREATE TABLE `table_with_correct_schema` ( `id` INT, `user_id` INT, `name` STRING, `partition_col` STRING ) PARTITIONED BY (`partition_col`) WITH ( 'connector' = 'hudi', ... ){code} Change a column to an incorrect name as such when reading: {code:java} CREATE TABLE `table_with_correct_schema` ( `id` INT, `user_id_with_typo123` INT, `name` STRING, `partition_col` STRING ) PARTITIONED BY (`partition_col`) WITH ( 'connector' = 'hudi', ... ){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)