[ https://issues.apache.org/jira/browse/HUDI-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Perkins updated HUDI-7930: -------------------------------- Description: I have run into an issue with tables that have an array of rows in Flink. I am able to write data, but after compaction reads produce this exception. {{java.lang.RuntimeException: Unsupported type in the list: optional binary item1 (STRING)}} The error only occurs after a compaction happens and produces parquet files. I'm using Hudi 0.14.1 and Flink 1.17.2 writing to Azure ADLS. I have tried 'Merge on Read' and 'Copy on Right' tables. Steps to reproduce the error. 1. Create a table with an array of rows {{CREATE temporary TABLE TestTable (}} {{– Additional keys and foreign keys}} {{rowId STRING NOT NULL,}} {{myArray ARRAY< ROW< item1 STRING, item2 STRING > >}} {{) WITH (}} {{'connector' = 'hudi',}} {{'path' = 'abfs://<container>@<storage_account>.dfs.core.windows.net/hudi/testtable',}} {{'table.type' = 'MERGE_ON_READ',}} {{'write.batch.size' = '1',}} {{'hoodie.compact.inline' = 'true',}} {{'hoodie.compact.inline.max.delta.commits' = '1',}} {{'compaction.async.enabled' = 'false',}} {{'compaction.delta_commits' = '1',}} {{'hoodie.datasource.write.recordkey.field' = 'rowId'}} {{);}} 2. Insert some data {{insert into TestTable values}} {{('1', ARRAY[ROW('1.item1', '1.item2')]),}} {{('2', ARRAY[ROW('2.item1', '2.item2')]),}} {{('3', ARRAY[ROW('3.item1', '3.item2')]),}} {{('4', ARRAY[ROW('4.item1', '4.item2')]),}} {{('5', ARRAY[ROW('5.item1', '5.item2')]),}} {{('6', ARRAY[ROW('6.item1', '6.item2')]),}} {{('7', ARRAY[ROW('7.item1', '7.item2')]),}} {{('8', ARRAY[ROW('8.item1', '8.item2')]),}} {{('9', ARRAY[ROW('9.item1', '9.item2')]),}} {{('10', ARRAY[ROW('10.item1', '10.item2')])}} {{;}} 3. Query {{Select * from TestTable;}} was: I have run into an issue with Merge On Read tables that have an array of rows in Flink. I am able to write data, but after compaction reads produce this exception. {{java.lang.RuntimeException: Unsupported type in the list: optional binary item1 (STRING)}} The error only occurs after a compaction happens and produces parquet files. I'm using Hudi 0.14.1 and Flink 1.17.2 writing to Azure ADLS. I haven't tried switching to Copy on Right tables, but will try that next. Steps to reproduce the error. 1. Create a table with an array of rows {{CREATE temporary TABLE TestTable (}} {{– Additional keys and foreign keys}} {{rowId STRING NOT NULL,}} {{myArray ARRAY< ROW< item1 STRING, item2 STRING > >}} {{) WITH (}} {{'connector' = 'hudi',}} {{'path' = 'abfs://<container>@<storage_account>.dfs.core.windows.net/hudi/testtable',}} {{'table.type' = 'MERGE_ON_READ',}} {{'write.batch.size' = '1',}} {{'hoodie.compact.inline' = 'true',}} {{'hoodie.compact.inline.max.delta.commits' = '1',}} {{'compaction.async.enabled' = 'false',}} {{'compaction.delta_commits' = '1',}} {{'hoodie.datasource.write.recordkey.field' = 'rowId'}} {{);}} 2. Insert some data {{insert into TestTable values}} {{('1', ARRAY[ROW('1.item1', '1.item2')]),}} {{('2', ARRAY[ROW('2.item1', '2.item2')]),}} {{('3', ARRAY[ROW('3.item1', '3.item2')]),}} {{('4', ARRAY[ROW('4.item1', '4.item2')]),}} {{('5', ARRAY[ROW('5.item1', '5.item2')]),}} {{('6', ARRAY[ROW('6.item1', '6.item2')]),}} {{('7', ARRAY[ROW('7.item1', '7.item2')]),}} {{('8', ARRAY[ROW('8.item1', '8.item2')]),}} {{('9', ARRAY[ROW('9.item1', '9.item2')]),}} {{('10', ARRAY[ROW('10.item1', '10.item2')])}} {{;}} 3. Query {{Select * from TestTable;}} > After Compaction RuntimeException: Unsupported type in the list: optional > binary xxx (STRING) > --------------------------------------------------------------------------------------------- > > Key: HUDI-7930 > URL: https://issues.apache.org/jira/browse/HUDI-7930 > Project: Apache Hudi > Issue Type: Bug > Components: compaction > Affects Versions: 0.14.1 > Reporter: David Perkins > Priority: Critical > > I have run into an issue with tables that have an array of rows in Flink. I > am able to write data, but after compaction reads produce this exception. > {{java.lang.RuntimeException: Unsupported type in the list: optional binary > item1 (STRING)}} > The error only occurs after a compaction happens and produces parquet files. > I'm using Hudi 0.14.1 and Flink 1.17.2 writing to Azure ADLS. I have tried > 'Merge on Read' and 'Copy on Right' tables. > Steps to reproduce the error. > 1. Create a table with an array of rows > {{CREATE temporary TABLE TestTable (}} > {{– Additional keys and foreign keys}} > {{rowId STRING NOT NULL,}} > {{myArray ARRAY< ROW< item1 STRING, item2 STRING > >}} > {{) WITH (}} > {{'connector' = 'hudi',}} > {{'path' = > 'abfs://<container>@<storage_account>.dfs.core.windows.net/hudi/testtable',}} > {{'table.type' = 'MERGE_ON_READ',}} > {{'write.batch.size' = '1',}} > {{'hoodie.compact.inline' = 'true',}} > {{'hoodie.compact.inline.max.delta.commits' = '1',}} > {{'compaction.async.enabled' = 'false',}} > {{'compaction.delta_commits' = '1',}} > {{'hoodie.datasource.write.recordkey.field' = 'rowId'}} > {{);}} > 2. Insert some data > {{insert into TestTable values}} > {{('1', ARRAY[ROW('1.item1', '1.item2')]),}} > {{('2', ARRAY[ROW('2.item1', '2.item2')]),}} > {{('3', ARRAY[ROW('3.item1', '3.item2')]),}} > {{('4', ARRAY[ROW('4.item1', '4.item2')]),}} > {{('5', ARRAY[ROW('5.item1', '5.item2')]),}} > {{('6', ARRAY[ROW('6.item1', '6.item2')]),}} > {{('7', ARRAY[ROW('7.item1', '7.item2')]),}} > {{('8', ARRAY[ROW('8.item1', '8.item2')]),}} > {{('9', ARRAY[ROW('9.item1', '9.item2')]),}} > {{('10', ARRAY[ROW('10.item1', '10.item2')])}} > {{;}} > 3. Query > {{Select * from TestTable;}} -- This message was sent by Atlassian Jira (v8.20.10#820010)