[ 
https://issues.apache.org/jira/browse/HUDI-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Perkins updated HUDI-7930:
--------------------------------
    Description: 
I have run into an issue with tables that have an array of rows in Flink. I am 
able to write data, but after compaction reads produce this exception. 

{{java.lang.RuntimeException: Unsupported type in the list: optional binary 
item1 (STRING)}}

The error only occurs after a compaction happens and produces parquet files. 
I'm using Hudi 0.14.1 and Flink 1.17.2 writing to Azure ADLS. I have tried 
'Merge on Read' and 'Copy on Right' tables.

Steps to reproduce the error.
1. Create a table with an array of rows

{{CREATE temporary TABLE TestTable (}}
{{– Additional keys and foreign keys}}
{{rowId STRING NOT NULL,}}
{{myArray ARRAY< ROW< item1 STRING, item2 STRING > >}}
{{) WITH (}}
{{'connector' = 'hudi',}}
{{'path' = 
'abfs://<container>@<storage_account>.dfs.core.windows.net/hudi/testtable',}}
{{'table.type' = 'MERGE_ON_READ',}}
{{'write.batch.size' = '1',}}
{{'hoodie.compact.inline' = 'true',}}
{{'hoodie.compact.inline.max.delta.commits' = '1',}}
{{'compaction.async.enabled' = 'false',}}
{{'compaction.delta_commits' = '1',}}
{{'hoodie.datasource.write.recordkey.field' = 'rowId'}}
{{);}}

2. Insert some data
{{insert into TestTable values}}
{{('1', ARRAY[ROW('1.item1', '1.item2')]),}}
{{('2', ARRAY[ROW('2.item1', '2.item2')]),}}
{{('3', ARRAY[ROW('3.item1', '3.item2')]),}}
{{('4', ARRAY[ROW('4.item1', '4.item2')]),}}
{{('5', ARRAY[ROW('5.item1', '5.item2')]),}}
{{('6', ARRAY[ROW('6.item1', '6.item2')]),}}
{{('7', ARRAY[ROW('7.item1', '7.item2')]),}}
{{('8', ARRAY[ROW('8.item1', '8.item2')]),}}
{{('9', ARRAY[ROW('9.item1', '9.item2')]),}}
{{('10', ARRAY[ROW('10.item1', '10.item2')])}}
{{;}}

3. Query
{{Select * from TestTable;}}

  was:
I have run into an issue with Merge On Read tables that have an array of rows 
in Flink. I am able to write data, but after compaction reads produce this 
exception. 

{{java.lang.RuntimeException: Unsupported type in the list: optional binary 
item1 (STRING)}}

The error only occurs after a compaction happens and produces parquet files. 
I'm using Hudi 0.14.1 and Flink 1.17.2 writing to Azure ADLS. I haven't tried 
switching to Copy on Right tables, but will try that next.

Steps to reproduce the error.
1. Create a table with an array of rows

{{CREATE temporary TABLE TestTable (}}
{{– Additional keys and foreign keys}}
{{rowId STRING NOT NULL,}}
{{myArray ARRAY< ROW< item1 STRING, item2 STRING > >}}
{{) WITH (}}
{{'connector' = 'hudi',}}
{{'path' = 
'abfs://<container>@<storage_account>.dfs.core.windows.net/hudi/testtable',}}
{{'table.type' = 'MERGE_ON_READ',}}
{{'write.batch.size' = '1',}}
{{'hoodie.compact.inline' = 'true',}}
{{'hoodie.compact.inline.max.delta.commits' = '1',}}
{{'compaction.async.enabled' = 'false',}}
{{'compaction.delta_commits' = '1',}}
{{'hoodie.datasource.write.recordkey.field' = 'rowId'}}
{{);}}

2. Insert some data
{{insert into TestTable values}}
{{('1', ARRAY[ROW('1.item1', '1.item2')]),}}
{{('2', ARRAY[ROW('2.item1', '2.item2')]),}}
{{('3', ARRAY[ROW('3.item1', '3.item2')]),}}
{{('4', ARRAY[ROW('4.item1', '4.item2')]),}}
{{('5', ARRAY[ROW('5.item1', '5.item2')]),}}
{{('6', ARRAY[ROW('6.item1', '6.item2')]),}}
{{('7', ARRAY[ROW('7.item1', '7.item2')]),}}
{{('8', ARRAY[ROW('8.item1', '8.item2')]),}}
{{('9', ARRAY[ROW('9.item1', '9.item2')]),}}
{{('10', ARRAY[ROW('10.item1', '10.item2')])}}
{{;}}

3. Query
{{Select * from TestTable;}}


> After Compaction RuntimeException: Unsupported type in the list: optional 
> binary xxx (STRING)
> ---------------------------------------------------------------------------------------------
>
>                 Key: HUDI-7930
>                 URL: https://issues.apache.org/jira/browse/HUDI-7930
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: compaction
>    Affects Versions: 0.14.1
>            Reporter: David Perkins
>            Priority: Critical
>
> I have run into an issue with tables that have an array of rows in Flink. I 
> am able to write data, but after compaction reads produce this exception. 
> {{java.lang.RuntimeException: Unsupported type in the list: optional binary 
> item1 (STRING)}}
> The error only occurs after a compaction happens and produces parquet files. 
> I'm using Hudi 0.14.1 and Flink 1.17.2 writing to Azure ADLS. I have tried 
> 'Merge on Read' and 'Copy on Right' tables.
> Steps to reproduce the error.
> 1. Create a table with an array of rows
> {{CREATE temporary TABLE TestTable (}}
> {{– Additional keys and foreign keys}}
> {{rowId STRING NOT NULL,}}
> {{myArray ARRAY< ROW< item1 STRING, item2 STRING > >}}
> {{) WITH (}}
> {{'connector' = 'hudi',}}
> {{'path' = 
> 'abfs://<container>@<storage_account>.dfs.core.windows.net/hudi/testtable',}}
> {{'table.type' = 'MERGE_ON_READ',}}
> {{'write.batch.size' = '1',}}
> {{'hoodie.compact.inline' = 'true',}}
> {{'hoodie.compact.inline.max.delta.commits' = '1',}}
> {{'compaction.async.enabled' = 'false',}}
> {{'compaction.delta_commits' = '1',}}
> {{'hoodie.datasource.write.recordkey.field' = 'rowId'}}
> {{);}}
> 2. Insert some data
> {{insert into TestTable values}}
> {{('1', ARRAY[ROW('1.item1', '1.item2')]),}}
> {{('2', ARRAY[ROW('2.item1', '2.item2')]),}}
> {{('3', ARRAY[ROW('3.item1', '3.item2')]),}}
> {{('4', ARRAY[ROW('4.item1', '4.item2')]),}}
> {{('5', ARRAY[ROW('5.item1', '5.item2')]),}}
> {{('6', ARRAY[ROW('6.item1', '6.item2')]),}}
> {{('7', ARRAY[ROW('7.item1', '7.item2')]),}}
> {{('8', ARRAY[ROW('8.item1', '8.item2')]),}}
> {{('9', ARRAY[ROW('9.item1', '9.item2')]),}}
> {{('10', ARRAY[ROW('10.item1', '10.item2')])}}
> {{;}}
> 3. Query
> {{Select * from TestTable;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to