[jira] [Updated] (HUDI-7315) Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is incomparable.
[ https://issues.apache.org/jira/browse/HUDI-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-7315: Description: This issue is extended from HUDI-7309, as the risk still exists for the NOT filter predicate when the predicate it wraps does not support pushing down (e.g. expression with the operand typed Decimal). It is similar to the issue of AND/OR filter in HUDI-7309. Though I have not yet reproduced NOT filter issue in practice, the risk still exists. We should fix it. was: This issue is extended from HUDI-7309, as the risk still exists when the predicate it wraps does not support pushing down (e.g. expression with the operand typed Decimal). It is similar to the issue of AND/OR filter in HUDI-7309. Though I have not yet reproduced NOT filter issue in practice, the risk still exists. We should fix it. > Disable constructing NOT filter predicate when pushing down its wrapped > filter unsupported, as its operand's primitive value is incomparable. > - > > Key: HUDI-7315 > URL: https://issues.apache.org/jira/browse/HUDI-7315 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.14.0, 0.14.1 > Environment: Flink 1.17.1 > Hudi 0.14.x >Reporter: Yao Zhang >Assignee: Yao Zhang >Priority: Major > > This issue is extended from HUDI-7309, as the risk still exists for the NOT > filter predicate when the predicate it wraps does not support pushing down > (e.g. expression with the operand typed Decimal). > It is similar to the issue of AND/OR filter in HUDI-7309. Though I have not > yet reproduced NOT filter issue in practice, the risk still exists. We should > fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7315) Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is incomparable.
[ https://issues.apache.org/jira/browse/HUDI-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-7315: Summary: Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is incomparable. (was: Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is uncomparable.) > Disable constructing NOT filter predicate when pushing down its wrapped > filter unsupported, as its operand's primitive value is incomparable. > - > > Key: HUDI-7315 > URL: https://issues.apache.org/jira/browse/HUDI-7315 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.14.0, 0.14.1 > Environment: Flink 1.17.1 > Hudi 0.14.x >Reporter: Yao Zhang >Assignee: Yao Zhang >Priority: Major > > This issue is extended from HUDI-7309, as the risk still exists when the > predicate it wraps does not support pushing down (e.g. expression with the > operand typed Decimal). > It is similar to the issue of AND/OR filter in HUDI-7309. Though I have not > yet reproduced NOT filter issue in practice, the risk still exists. We should > fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7315) Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is uncomparable.
Yao Zhang created HUDI-7315: --- Summary: Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is uncomparable. Key: HUDI-7315 URL: https://issues.apache.org/jira/browse/HUDI-7315 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 0.14.1, 0.14.0 Environment: Flink 1.17.1 Hudi 0.14.x Reporter: Yao Zhang Assignee: Yao Zhang This issue is extended from HUDI-7309, as the risk still exists when the predicate it wraps does not support pushing down (e.g. expression with the operand typed Decimal). It is similar to the issue of AND/OR filter in HUDI-7309. Though I have not yet reproduced NOT filter issue in practice, the risk still exists. We should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7311) Comparing date with date literal in string format causes class cast exception during filter push down
Yao Zhang created HUDI-7311: --- Summary: Comparing date with date literal in string format causes class cast exception during filter push down Key: HUDI-7311 URL: https://issues.apache.org/jira/browse/HUDI-7311 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 0.14.1, 0.14.0 Reporter: Yao Zhang Assignee: Yao Zhang Given any table with arbitrary field typed date (e.g. field d_date with type of date). And execute the SQL with conditions for this field in where clause. {code:sql} select d_date from xxx where d_date = '2020-01-01' {code} An exception will occur: {code:java} Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at org.apache.hudi.source.ExpressionPredicates.toParquetPredicate(ExpressionPredicates.java:613) at org.apache.hudi.source.ExpressionPredicates.access$100(ExpressionPredicates.java:64) at org.apache.hudi.source.ExpressionPredicates$ColumnPredicate.filter(ExpressionPredicates.java:226) at org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:68) at org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:130) at org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:66) at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333) {code} Hudi Flink cannot convert the date literal in String format to Integer (the primitive type of date). However this SQL in Flink without Hudi works well. In summary, we should add literal type auto conversion before filter push down. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7309) Disable filter pushing down when the parquet type corresponding to its field logical type is not comparable
[ https://issues.apache.org/jira/browse/HUDI-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-7309: Description: Given thee table web_sales from TPCDS: {code:sql} CREATE TABLE web_sales ( ws_sold_date_sk int, ws_sold_time_sk int, ws_ship_date_sk int, ws_item_sk int, ws_bill_customer_sk int, ws_bill_cdemo_sk int, ws_bill_hdemo_sk int, ws_bill_addr_sk int, ws_ship_customer_sk int, ws_ship_cdemo_sk int, ws_ship_hdemo_sk int, ws_ship_addr_sk int, ws_web_page_sk int, ws_web_site_sk int, ws_ship_mode_sk int, ws_warehouse_sk int, ws_promo_sk int, ws_order_number int, ws_quantity int, ws_wholesale_cost decimal(7,2), ws_list_price decimal(7,2), ws_sales_price decimal(7,2), ws_ext_discount_amt decimal(7,2), ws_ext_sales_price decimal(7,2), ws_ext_wholesale_cost decimal(7,2), ws_ext_list_price decimal(7,2), ws_ext_tax decimal(7,2), ws_coupon_amt decimal(7,2), ws_ext_ship_cost decimal(7,2), ws_net_paid decimal(7,2), ws_net_paid_inc_tax decimal(7,2), ws_net_paid_inc_ship decimal(7,2), ws_net_paid_inc_ship_tax decimal(7,2), ws_net_profit decimal(7,2) ) with ( 'connector' = 'hudi', 'path' = 'hdfs://path/to/web_sales', 'table.type' = 'COPY_ON_WRITE', 'hoodie.datasource.write.recordkey.field' = 'ws_item_sk,ws_order_number' ); {code} And execute: {code:sql} select * from web_sales where ws_sold_date_sk = 2451268 and ws_sales_price between 100.00 and 150.00 {code} An exception will occur: {code:java} Caused by: java.lang.NullPointerException: left cannot be null at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.parquet.filter2.predicate.Operators$BinaryLogicalFilterPredicate.(Operators.java:257) at org.apache.parquet.filter2.predicate.Operators$And.(Operators.java:301) at org.apache.parquet.filter2.predicate.FilterApi.and(FilterApi.java:249) at org.apache.hudi.source.ExpressionPredicates$And.filter(ExpressionPredicates.java:551) at org.apache.hudi.source.ExpressionPredicates$Or.filter(ExpressionPredicates.java:589) at org.apache.hudi.source.ExpressionPredicates$Or.filter(ExpressionPredicates.java:589) at org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:68) at org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:130) at org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:66) at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333) {code} After further investigation, decimal type is not comparable in the form it stored in parquet format (fix length byte array). The way that pushes down this filter to parquet predicates are not supported(ExpressionPredicates::toParquetPredicate does not provide decimal conversion). Then when it constructs the AND filter, both the filters of operands are null. That's how this issue reproduces. If we execute this SQL: {code:sql} select * from web_sales where ws_sold_date_sk = 2451268 and ws_sales_price between 100.00 and 150.00 {code} It works without any problems as the predicates generated by pushing down process are null. Then Flink engine will filter the data instead of parquet. To solve this, I plan to add null checks for both AND and OR filter predicates contruction. If the field type pushing down was not supported, the generated filter would be null. The pushing down could be disabled by not contructing the AND or OR filter if any of its operands is null. was: Given thee table web_sales from TPCDS: {code:sql} CREATE TABLE web_sales (
[jira] [Created] (HUDI-7309) Disable filter pushing down when the parquet type corresponding to its field logical type is not comparable
Yao Zhang created HUDI-7309: --- Summary: Disable filter pushing down when the parquet type corresponding to its field logical type is not comparable Key: HUDI-7309 URL: https://issues.apache.org/jira/browse/HUDI-7309 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 0.14.1, 0.14.0 Environment: Hudi 0.14.0 Hudi 0.14.1rc1 Flink 1.17.1 Reporter: Yao Zhang Assignee: Yao Zhang Given thee table web_sales from TPCDS: {code:sql} CREATE TABLE web_sales ( ws_sold_date_sk int, ws_sold_time_sk int, ws_ship_date_sk int, ws_item_sk int, ws_bill_customer_sk int, ws_bill_cdemo_sk int, ws_bill_hdemo_sk int, ws_bill_addr_sk int, ws_ship_customer_sk int, ws_ship_cdemo_sk int, ws_ship_hdemo_sk int, ws_ship_addr_sk int, ws_web_page_sk int, ws_web_site_sk int, ws_ship_mode_sk int, ws_warehouse_sk int, ws_promo_sk int, ws_order_number int, ws_quantity int, ws_wholesale_cost decimal(7,2), ws_list_price decimal(7,2), ws_sales_price decimal(7,2), ws_ext_discount_amt decimal(7,2), ws_ext_sales_price decimal(7,2), ws_ext_wholesale_cost decimal(7,2), ws_ext_list_price decimal(7,2), ws_ext_tax decimal(7,2), ws_coupon_amt decimal(7,2), ws_ext_ship_cost decimal(7,2), ws_net_paid decimal(7,2), ws_net_paid_inc_tax decimal(7,2), ws_net_paid_inc_ship decimal(7,2), ws_net_paid_inc_ship_tax decimal(7,2), ws_net_profit decimal(7,2) ) with ( 'connector' = 'hudi', 'path' = 'hdfs://path/to/web_sales', 'table.type' = 'COPY_ON_WRITE', 'hoodie.datasource.write.recordkey.field' = 'ws_item_sk,ws_order_number' ); {code} And execute: {code:sql} select * from web_sales where ws_sold_date_sk = 2451268 and ws_sales_price between 100.00 and 150.00 {code} An exception will occur: {code:java} Caused by: java.lang.NullPointerException: left cannot be null at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.parquet.filter2.predicate.Operators$BinaryLogicalFilterPredicate.(Operators.java:257) at org.apache.parquet.filter2.predicate.Operators$And.(Operators.java:301) at org.apache.parquet.filter2.predicate.FilterApi.and(FilterApi.java:249) at org.apache.hudi.source.ExpressionPredicates$And.filter(ExpressionPredicates.java:551) at org.apache.hudi.source.ExpressionPredicates$Or.filter(ExpressionPredicates.java:589) at org.apache.hudi.source.ExpressionPredicates$Or.filter(ExpressionPredicates.java:589) at org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:68) at org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:130) at org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:66) at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333) {code} After further investigation, decimal type is not comparable in the form it stored in parquet format (fix length byte array). The way that pushes down this filter to parquet predicates are not supported(ExpressionPredicates::toParquetPredicate does not provide decimal conversion). Then when it constructs the AND filter, both the filters of operands are null. That's how this issue reproduces. If we execute this SQL: {code:sql} select {code} {color:#ff66ff}*{color} {code:sql} from web_sales where ws_sold_date_sk = 2451268 and ws_sales_price between 100.00 and 150.00 {code} It works without any problems as the predicates generated by pushing down process are null. Then Flink engine will filter the data instead of parquet. To solv
[jira] [Created] (HUDI-7303) Date field type unexpectedly convert to Long when using date comparison operator
Yao Zhang created HUDI-7303: --- Summary: Date field type unexpectedly convert to Long when using date comparison operator Key: HUDI-7303 URL: https://issues.apache.org/jira/browse/HUDI-7303 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 0.14.1, 0.14.0 Environment: Flink 1.15.4 Hudi 0.14.0 Flink 1.17.1 Hudi 0.14.0 Flink 1.17.1 Hudi 0.14.1rc1 Reporter: Yao Zhang Assignee: Yao Zhang Given the table date_dim from TPCDS as an example: {code:java} CREATE TABLE date_dim ( d_date_sk int, d_date_id varchar(16) NOT NULL, d_date date, d_month_seq int, d_week_seq int, d_quarter_seq int, d_year int, d_dow int, d_moy int, d_dom int, d_qoy int, d_fy_year int, d_fy_quarter_seq int, d_fy_week_seq int, d_day_name varchar(9) d_quarter_name varchar(6), d_holiday char(1), d_weekend char(1), d_following_holiday char(1), d_first_dom int, d_last_dom int, d_same_day_ly int, d_same_day_lq int, d_current_day char(1), d_current_week char(1), d_current_month char(1), d_current_quarter char(1), d_current_year char(1)) with ( 'connector' = 'hudi', 'path' = 'hdfs:///table_path/date_dim', 'table.type' = 'COPY_ON_WRITE'); {code} When you execute the following select statement, an exception will be thrown: {code:java} select * from date_dim where d_date between cast('1999-02-22' as date) and (cast('1999-02-22' as date) + INTERVAL '30' day); {code} The exception is: {code:java} java.lang.IllegalArgumentException: FilterPredicate column: d_date's declared type (java.lang.Long) does not match the schema found in file metadata. Column d_date is of type: INT32 Valid types for this column are: [class java.lang.Integer] at org.apache.parquet.filter2.predicate.ValidTypeMap.assertTypeValid(ValidTypeMap.java:125) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:179) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:113) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.predicate.Operators$GtEq.accept(Operators.java:246) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:119) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:306) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:67) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.(ParquetColumnarRowSplitReader.java:142) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.genPartColumnarRowReader(ParquetSplitReaderUtil.java:153) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:78) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:130) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:66) ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0] at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84) ~[flink-dist-1.17.1.jar:1.17.1] a
[jira] [Updated] (HUDI-7297) Exception thrown when field type mismatch is ambiguous
[ https://issues.apache.org/jira/browse/HUDI-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-7297: Description: If you create a table with mismatched file types in Flink SQL, for example you define a field as bigint while the actual field type is int, an IllegalArgumentException would be thrown like below: java.lang.IllegalArgumentException: Unexpected type: INT32 The exception is way too ambiguous. It is difficult to figure out which field type is incorrect and what the correct type is. You have to refer to the source code. Currently I plan to make the exception message more informative. was: If you create a table with mismatched file types, for example you define a field as bigint while the actual field type is int, an IllegalArgumentException would be thrown like below: java.lang.IllegalArgumentException: Unexpected type: INT32 The exception is way too ambiguous. It is difficult to figure out which field type is incorrect and what the correct type is. You have to refer to the source code. Currently I plan to make the exception message more informative. > Exception thrown when field type mismatch is ambiguous > -- > > Key: HUDI-7297 > URL: https://issues.apache.org/jira/browse/HUDI-7297 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Yao Zhang >Assignee: Yao Zhang >Priority: Minor > > If you create a table with mismatched file types in Flink SQL, for example > you define a field as bigint while the actual field type is int, an > IllegalArgumentException would be thrown like below: > java.lang.IllegalArgumentException: Unexpected type: INT32 > The exception is way too ambiguous. It is difficult to figure out which field > type is incorrect and what the correct type is. You have to refer to the > source code. > Currently I plan to make the exception message more informative. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7297) Exception thrown when field type mismatch is ambiguous
Yao Zhang created HUDI-7297: --- Summary: Exception thrown when field type mismatch is ambiguous Key: HUDI-7297 URL: https://issues.apache.org/jira/browse/HUDI-7297 Project: Apache Hudi Issue Type: Improvement Reporter: Yao Zhang Assignee: Yao Zhang If you create a table with mismatched file types, for example you define a field as bigint while the actual field type is int, an IllegalArgumentException would be thrown like below: java.lang.IllegalArgumentException: Unexpected type: INT32 The exception is way too ambiguous. It is difficult to figure out which field type is incorrect and what the correct type is. You have to refer to the source code. Currently I plan to make the exception message more informative. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6394) Fixed the issue that CreateHoodieTableCommand does not provide detailed exception stack trace
Yao Zhang created HUDI-6394: --- Summary: Fixed the issue that CreateHoodieTableCommand does not provide detailed exception stack trace Key: HUDI-6394 URL: https://issues.apache.org/jira/browse/HUDI-6394 Project: Apache Hudi Issue Type: Bug Components: spark Reporter: Yao Zhang Assignee: Yao Zhang Fix For: 0.14.0 If we encountered an exception when creating table using Hudi with Spark. The log would only show the exception class name and its message. For ClassNotFoundException it might be enough to trace the root cause. However, for other types of exceptions, their messages might not clear enough to demonstrate where the underlying problems locate. I think we should long the detailed stack trace. Another benefit is, the typical pattern of the stack trace in log file could help people or even automatical shell script to capture the exception. If it is necessary to add the stack trace in log file, I would like to do it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5725) Creating Hudi table with misspelled table type in Flink leads to Flink cluster crash
Yao Zhang created HUDI-5725: --- Summary: Creating Hudi table with misspelled table type in Flink leads to Flink cluster crash Key: HUDI-5725 URL: https://issues.apache.org/jira/browse/HUDI-5725 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 0.12.2, 0.11.1 Environment: Flink 1.13.2 Hadoop 3.1.1 Hudi 0.14.0-SNAPSHOT Reporter: Yao Zhang Assignee: Yao Zhang Fix For: 0.14.0 Create table with the following SQL: {code:sql} CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs:///hudi/t1', 'table.type' = 'MERGE_ON_REA D' ); {code} The value table.type contains an unexpected line break in 'MERGE_ON_READ'. After the table creation, insert an arbitrary line of data. Flink cluster will immediately crash with the exception below: {code:java} Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hudi.common.model.HoodieTableType.MERGE_ON_REA D at java.lang.Enum.valueOf(Enum.java:238) ~[?:1.8.0_121] at org.apache.hudi.common.model.HoodieTableType.valueOf(HoodieTableType.java:30) ~[hudi-flink1.13-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT] at org.apache.hudi.sink.StreamWriteOperatorCoordinator$TableState.(StreamWriteOperatorCoordinator.java:630) ~[hudi-flink1.13-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT] at org.apache.hudi.sink.StreamWriteOperatorCoordinator$TableState.create(StreamWriteOperatorCoordinator.java:640) ~[hudi-flink1.13-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT] at org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:187) ~[hudi-flink1.13-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT] at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:180) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) ~[flink-dist_2.11-1.13.2.jar:1.13.2] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) ~[flink-dist_2.11-1.13.2.jar:1.13.2] ... 12 more {code} The expected behavior is Flink cluster is still running and gives some infomation like 'Illegal table type' and gives the user a chance to correct the SQL statement. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601562#comment-17601562 ] Yao Zhang commented on HUDI-4485: - Hi [~codope] , Finally all unit test issues have been resolved and CI passed. Could you please help review this PR? Thank you very much. > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Assignee: Yao Zhang >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Attachments: spring-shell-1.2.0.RELEASE.jar > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > {*}{{*}}Describe the problem you faced{{*}}{*} > Hudi cli got empty result after running command show fsview all. > ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png]) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > {*}{{*}}To Reproduce{{*}}{*} > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > – insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > {*}{{*}}Expected behavior{{*}}{*} > `show fsview all` in Hudi cli should return all file slices. > {*}{{*}}Environment Description{{*}}{*} > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > {*}{{*}}Additional context{{*}}{*} > No. > {*}{{*}}Stacktrace{{*}}{*} > N/A > > Temporary solution: > I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the > attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4765) Compared inserting data via spark-sql with spark-shell,_hoodie_record_key generation logic is different, which might affects data upsert
Yao Zhang created HUDI-4765: --- Summary: Compared inserting data via spark-sql with spark-shell,_hoodie_record_key generation logic is different, which might affects data upsert Key: HUDI-4765 URL: https://issues.apache.org/jira/browse/HUDI-4765 Project: Apache Hudi Issue Type: Bug Components: spark, spark-sql Affects Versions: 0.11.1 Environment: Spark 3.1.1 Hudi 0.11.1 Reporter: Yao Zhang Create table using spark-sql: {code:java} create table hudi_mor_tbl ( id int, name string, price double, ts bigint ) using hudi tblproperties ( type = 'mor', primaryKey = 'id', preCombineField = 'ts' ) location 'hdfs:///hudi/hudi_mor_tbl'; {code} And then insert data via spark-shell and spark-sql respectively: {code:java} import org.apache.spark.sql._ import org.apache.spark.sql.types._ val fields = Array( StructField("id", IntegerType, true), StructField("name", StringType, true), StructField("price", DoubleType, true), StructField("ts", LongType, true) ) val simpleSchema = StructType(fields) val data = Seq(Row(2, "a2", 200.0, 100L)) val df = spark.createDataFrame(data, simpleSchema) df.write.format("hudi"). option(PRECOMBINE_FIELD_OPT_KEY, "ts"). option(RECORDKEY_FIELD_OPT_KEY, "id"). option(TABLE_NAME, "hudi_mor_tbl"). option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ"). mode(Append). save("hdfs:///hudi/hudi_mor_tbl") {code} {code:java} insert into hudi_mor_tbl select 1, 'a1', 20, 1000; {code} After that we query the table, we can see those two rows are as below: {code:java} +---++--+--++---++-++ |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| _hoodie_file_name| id|name|price| ts| +---++--+--++---++-++ | 20220902012710792|20220902012710792...| 2| |c3eff8c8-fa47-48c...| 2| a2|200.0| 100| | 20220902012813658|20220902012813658...| id:1| |c3eff8c8-fa47-48c...| 1| a1| 20.0|1000| +---++--+--++---++-++ {code} '_hoodie_record_key' field for spark_sql inserted data is 'id:1' while that for spark-shell is 2. It seems that spark_sql uses '[primaryKey_field_name]:[primaryKey_field_value]' to construct the '_hoodie_record_key' field, which is different from spark-shell. As a result, if we inserted one row via spark-sql and then upserted it via spark-shell, we would get two duplicated rows. That is not what we expected. Did I miss some configurations that might lead to this issue? If not, personally I think we should make the default record key generation logic consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HUDI-4718) Hudi cli does not support Kerberized Hadoop cluster
[ https://issues.apache.org/jira/browse/HUDI-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584707#comment-17584707 ] Yao Zhang edited comment on HUDI-4718 at 8/25/22 8:26 AM: -- Could anyone assign this issue to me? Thanks. I plan to add this feature after HUDI-4485 is merged, when spring shell version has bumped to 2.1.1 was (Author: paul8263): Could anyone assign this issue to me? Thanks. > Hudi cli does not support Kerberized Hadoop cluster > --- > > Key: HUDI-4718 > URL: https://issues.apache.org/jira/browse/HUDI-4718 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Reporter: Yao Zhang >Priority: Major > Fix For: 0.13.0 > > > Hudi cli connect command cannot read table from Kerberized Hadoop cluster and > there is no way to perform Kerberos authentication. > I plan to add this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-4718) Hudi cli does not support Kerberized Hadoop cluster
[ https://issues.apache.org/jira/browse/HUDI-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584707#comment-17584707 ] Yao Zhang commented on HUDI-4718: - Could anyone assign this issue to me? Thanks. > Hudi cli does not support Kerberized Hadoop cluster > --- > > Key: HUDI-4718 > URL: https://issues.apache.org/jira/browse/HUDI-4718 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Reporter: Yao Zhang >Priority: Major > Fix For: 0.13.0 > > > Hudi cli connect command cannot read table from Kerberized Hadoop cluster and > there is no way to perform Kerberos authentication. > I plan to add this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4718) Hudi cli does not support Kerberized Hadoop cluster
Yao Zhang created HUDI-4718: --- Summary: Hudi cli does not support Kerberized Hadoop cluster Key: HUDI-4718 URL: https://issues.apache.org/jira/browse/HUDI-4718 Project: Apache Hudi Issue Type: Bug Components: cli Reporter: Yao Zhang Fix For: 0.13.0 Hudi cli connect command cannot read table from Kerberized Hadoop cluster and there is no way to perform Kerberos authentication. I plan to add this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582773#comment-17582773 ] Yao Zhang commented on HUDI-4485: - Hi [~codope] , Personally I tend to bump Spring shell to 2.1.1, although it may lead to lots of work. Keeping it updated might help minimize the bugs and make it easier to implement new features. > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.13.0 > > Attachments: spring-shell-1.2.0.RELEASE.jar > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > {*}{{*}}Describe the problem you faced{{*}}{*} > Hudi cli got empty result after running command show fsview all. > ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png]) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > {*}{{*}}To Reproduce{{*}}{*} > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > – insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > {*}{{*}}Expected behavior{{*}}{*} > `show fsview all` in Hudi cli should return all file slices. > {*}{{*}}Environment Description{{*}}{*} > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > {*}{{*}}Additional context{{*}}{*} > No. > {*}{{*}}Stacktrace{{*}}{*} > N/A > > Temporary solution: > I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the > attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-4589) "show fsview all" hudi-cli fails for a hudi table written via flink
[ https://issues.apache.org/jira/browse/HUDI-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577812#comment-17577812 ] Yao Zhang commented on HUDI-4589: - Hi [~shivnarayan] and [~xichaomin] , Thank you for your reply. This issue is similar to HUDI-4485. The problem is not only caused by the wrong default parameter value of pathRegex, but also a feature of spring-shell 1.2.0.RELEASE, which will truncate everything between block quote identifiers in command line. Please view the details in HUDI-4485. > "show fsview all" hudi-cli fails for a hudi table written via flink > --- > > Key: HUDI-4589 > URL: https://issues.apache.org/jira/browse/HUDI-4589 > Project: Apache Hudi > Issue Type: Bug >Reporter: sivabalan narayanan >Priority: Major > > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > > *To Reproduce* > Steps to reproduce the behavior: > # Enter Flink SQL client. > # Execute the SQL and check the data was written successfully. > CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts > TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( > 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = > 'COPY_ON_WRITE' ); -- insert data using values INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); # Enter Hudi cli and > execute {{show fsview all}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577058#comment-17577058 ] Yao Zhang commented on HUDI-4485: - Hi [~codope] , Thank your for your reply. My Jira ID is paul8263. Should we bump spring shell version or fix 1.2.0.RELEASE as what the temporary solution does? > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.13.0 > > Attachments: spring-shell-1.2.0.RELEASE.jar > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > {*}{{*}}Describe the problem you faced{{*}}{*} > Hudi cli got empty result after running command show fsview all. > ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png]) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > {*}{{*}}To Reproduce{{*}}{*} > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > – insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > {*}{{*}}Expected behavior{{*}}{*} > `show fsview all` in Hudi cli should return all file slices. > {*}{{*}}Environment Description{{*}}{*} > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > {*}{{*}}Additional context{{*}}{*} > No. > {*}{{*}}Stacktrace{{*}}{*} > N/A > > Temporary solution: > I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the > attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-4485: Attachment: spring-shell-1.2.0.RELEASE.jar > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.13.0 > > Attachments: spring-shell-1.2.0.RELEASE.jar > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > **Describe the problem you faced** > Hudi cli got empty result after running command show fsview all. > ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > **To Reproduce** > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > -- insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > **Expected behavior** > `show fsview all` in Hudi cli should return all file slices. > **Environment Description** > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > **Additional context** > No. > **Stacktrace** > N/A > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-4485: Description: This issue is from: [[SUPPORT] Hudi cli got empty result for command show fsview all · Issue #6177 · apache/hudi (github.com)|https://github.com/apache/hudi/issues/6177] {*}{{*}}Describe the problem you faced{{*}}{*} Hudi cli got empty result after running command show fsview all. ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png]) The type of table t1 is COW and I am sure that the parquet file is actually generated inside data folder. Also, the parquet files are not damaged as the data could be retrieved correctly by reading as Hudi table or directly reading each parquet file(using Spark). {*}{{*}}To Reproduce{{*}}{*} Steps to reproduce the behavior: 1. Enter Flink SQL client. 2. Execute the SQL and check the data was written successfully. ```sql CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = 'COPY_ON_WRITE' ); – insert data using values INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); ``` 3. Enter Hudi cli and execute `show fsview all` {*}{{*}}Expected behavior{{*}}{*} `show fsview all` in Hudi cli should return all file slices. {*}{{*}}Environment Description{{*}}{*} * Hudi version : 0.11.1 * Spark version : 3.1.1 * Hive version : 3.1.0 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no {*}{{*}}Additional context{{*}}{*} No. {*}{{*}}Stacktrace{{*}}{*} N/A Temporary solution: I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/. was: This issue is from: [[SUPPORT] Hudi cli got empty result for command show fsview all · Issue #6177 · apache/hudi (github.com)|https://github.com/apache/hudi/issues/6177] *{*}Describe the problem you faced{*}* Hudi cli got empty result after running command show fsview all. ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png]) The type of table t1 is COW and I am sure that the parquet file is actually generated inside data folder. Also, the parquet files are not damaged as the data could be retrieved correctly by reading as Hudi table or directly reading each parquet file(using Spark). *{*}To Reproduce{*}* Steps to reproduce the behavior: 1. Enter Flink SQL client. 2. Execute the SQL and check the data was written successfully. ```sql CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = 'COPY_ON_WRITE' ); – insert data using values INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); ``` 3. Enter Hudi cli and execute `show fsview all` *{*}Expected behavior{*}* `show fsview all` in Hudi cli should return all file slices. *{*}Environment Description{*}* * Hudi version : 0.11.1 * Spark version : 3.1.1 * Hive version : 3.1.0 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no *{*}Additional context{*}* No. *{*}Stacktrace{*}* N/A Temporary solution: I modified and reocmpiled spring-shell 1.2.0.RELEASE. Please download the attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/. > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version :
[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-4485: Description: This issue is from: [[SUPPORT] Hudi cli got empty result for command show fsview all · Issue #6177 · apache/hudi (github.com)|https://github.com/apache/hudi/issues/6177] *{*}Describe the problem you faced{*}* Hudi cli got empty result after running command show fsview all. ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png]) The type of table t1 is COW and I am sure that the parquet file is actually generated inside data folder. Also, the parquet files are not damaged as the data could be retrieved correctly by reading as Hudi table or directly reading each parquet file(using Spark). *{*}To Reproduce{*}* Steps to reproduce the behavior: 1. Enter Flink SQL client. 2. Execute the SQL and check the data was written successfully. ```sql CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = 'COPY_ON_WRITE' ); – insert data using values INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); ``` 3. Enter Hudi cli and execute `show fsview all` *{*}Expected behavior{*}* `show fsview all` in Hudi cli should return all file slices. *{*}Environment Description{*}* * Hudi version : 0.11.1 * Spark version : 3.1.1 * Hive version : 3.1.0 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no *{*}Additional context{*}* No. *{*}Stacktrace{*}* N/A Temporary solution: I modified and reocmpiled spring-shell 1.2.0.RELEASE. Please download the attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/. was: This issue is from: [[SUPPORT] Hudi cli got empty result for command show fsview all · Issue #6177 · apache/hudi (github.com)|https://github.com/apache/hudi/issues/6177] **Describe the problem you faced** Hudi cli got empty result after running command show fsview all. ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) The type of table t1 is COW and I am sure that the parquet file is actually generated inside data folder. Also, the parquet files are not damaged as the data could be retrieved correctly by reading as Hudi table or directly reading each parquet file(using Spark). **To Reproduce** Steps to reproduce the behavior: 1. Enter Flink SQL client. 2. Execute the SQL and check the data was written successfully. ```sql CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = 'COPY_ON_WRITE' ); -- insert data using values INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); ``` 3. Enter Hudi cli and execute `show fsview all` **Expected behavior** `show fsview all` in Hudi cli should return all file slices. **Environment Description** * Hudi version : 0.11.1 * Spark version : 3.1.1 * Hive version : 3.1.0 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no **Additional context** No. **Stacktrace** N/A > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.13.0 > > Attachments: spring-shell-1.2.0.RELEASE.jar > > > This issue is from: [[SUPPORT] Hudi cli got empt
[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572899#comment-17572899 ] Yao Zhang commented on HUDI-4485: - Hi [~codope] , The root cause is one of the dependency of hudi-cli is spring shell 1.2.0, which automatically erases everything between '\*' and '*/'(including the identifiers). It seems that this feature cannot be turned off. I am contacting spring shell community for support. If the new version can solve this problem I prefer to upgrade. We need to discuss whether it is a good practice if we bump the version to 2.1.0. > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.13.0 > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > **Describe the problem you faced** > Hudi cli got empty result after running command show fsview all. > ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > **To Reproduce** > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > -- insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > **Expected behavior** > `show fsview all` in Hudi cli should return all file slices. > **Environment Description** > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > **Additional context** > No. > **Stacktrace** > N/A > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571705#comment-17571705 ] Yao Zhang edited comment on HUDI-4485 at 7/29/22 2:55 AM: -- This problem is caused by the default value of pathRegex parameter of the command 'show fsview all'. The default value is \*/\*/\*, which corresponds to the folder structure that partitioned by two columns. That is to say, the command show fsview all would return empty if it was not partitioned by two columns. Pensonally I plan to change the default value to \*/\* and enrich the parameter explanation. Correct me if I am wrong. Thanks. was (Author: paul8263): This problem is caused by the default value of pathRegex parameter of the command 'show fsview all'. The default value is */*/*, which corresponds to the folder structure that partitioned by two columns. That is to say, the command show fsview all would return empty if it was not partitioned by two columns. Pensonally I plan to change the default value to */* and enrich the parameter explanation. Correct me if I am wrong. Thanks. > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.12.0 > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > **Describe the problem you faced** > Hudi cli got empty result after running command show fsview all. > ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > **To Reproduce** > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > -- insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > **Expected behavior** > `show fsview all` in Hudi cli should return all file slices. > **Environment Description** > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > **Additional context** > No. > **Stacktrace** > N/A > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572719#comment-17572719 ] Yao Zhang edited comment on HUDI-4485 at 7/29/22 2:53 AM: -- Hi all, After further investigation I found that Spring-shell 1.2.0 deals with block comment in command line. The relevant codes are: org.springframework.shell.core.AbstractShell::executeCommand {code:java} // We support simple block comments; ie a single pair per line if (!inBlockComment && line.contains("/*") && line.contains("*/")) { blockCommentBegin(); String lhs = line.substring(0, line.lastIndexOf("/*")); if (line.contains("*/")) { line = lhs + line.substring(line.lastIndexOf("*/") + 2); blockCommentFinish(); } else { line = lhs; } } if (inBlockComment) { if (!line.contains("*/")) { return new CommandResult(true); } blockCommentFinish(); line = line.substring(line.lastIndexOf("*/") + 2); } // We also support inline comments (but only at start of line, otherwise valid // command options like http://www.helloworld.com will fail as per ROO-517) if (!inBlockComment && (line.trim().startsWith("//") || line.trim().startsWith("#"))) { // # support in ROO-1116 line = ""; } {code} The codes above remove the last occurance of "/* xxx \*/" in side a command line string. That's why we pass '*/*/*' to pathRegex and finally we will get '\*/\*\*'. Moreover, the block comment removal logic above is buggy as in the case of '\*/\*/\*' , the begin comment block identifier is '\*/\*(/\*)' as quoted in the string, also the end comment block identifier is '\*/(\*/)\*'. characters before begin identifier and after end identifier will be kept. That's why we get '\*/\*\*'. Finally, I suggest we should disable erasing block comment in hudi cli command line. Unfortunately, Spring shell 1.2.0 does not provide such as configuration that can disable block comment processing. Also I tried to use a converter that append '/*\*/' to every command string but it did not work, because spring shell deals with block comment before invoking converters. was (Author: paul8263): Hi all, After further investigation I found that Spring-shell 1.2.0 deals with block comment in command line. The relevant codes are: org.springframework.shell.core.AbstractShell::executeCommand {code:java} // We support simple block comments; ie a single pair per line if (!inBlockComment && line.contains("/*") && line.contains("*/")) { blockCommentBegin(); String lhs = line.substring(0, line.lastIndexOf("/*")); if (line.contains("*/")) { line = lhs + line.substring(line.lastIndexOf("*/") + 2); blockCommentFinish(); } else { line = lhs; } } if (inBlockComment) { if (!line.contains("*/")) { return new CommandResult(true); } blockCommentFinish(); line = line.substring(line.lastIndexOf("*/") + 2); } // We also support inline comments (but only at start of line, otherwise valid // command options like http://www.helloworld.com will fail as per ROO-517) if (!inBlockComment && (line.trim().startsWith("//") || line.trim().startsWith("#"))) { // # support in ROO-1116 line = ""; } {code} The codes above remove the last occurance of "/* xxx */" in side a command line string. That's why we pass '*/*/*' to pathRegex and finally we will get '*/**'. Moreover, the block comment removal logic above is buggy as in the case of '*/*/*' , the begin comment block identifier is '*/*(/*)' as quoted in the string, also the end comment block identifier is '*/(*/)*'. characters before begin identifier and after end identifier will be kept. That's why we get '*/**'. Finally, I suggest we should disable erasing block comment in hudi cli command line. Unfortunately, Spring shell 1.2.0 does not provide such as configuration that can disable block comment processing. Also I tried to use a converter that append '/**/' to every command string but it did not work, because spring shell deals with block comment before invoking converters. > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.12.0 > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > **Describe the problem you faced** > Hudi cli got empty result after running command s
[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-4485: Status: In Progress (was: Open) > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.12.0 > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > **Describe the problem you faced** > Hudi cli got empty result after running command show fsview all. > ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > **To Reproduce** > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > -- insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > **Expected behavior** > `show fsview all` in Hudi cli should return all file slices. > **Environment Description** > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > **Additional context** > No. > **Stacktrace** > N/A > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572719#comment-17572719 ] Yao Zhang commented on HUDI-4485: - Hi all, After further investigation I found that Spring-shell 1.2.0 deals with block comment in command line. The relevant codes are: org.springframework.shell.core.AbstractShell::executeCommand {code:java} // We support simple block comments; ie a single pair per line if (!inBlockComment && line.contains("/*") && line.contains("*/")) { blockCommentBegin(); String lhs = line.substring(0, line.lastIndexOf("/*")); if (line.contains("*/")) { line = lhs + line.substring(line.lastIndexOf("*/") + 2); blockCommentFinish(); } else { line = lhs; } } if (inBlockComment) { if (!line.contains("*/")) { return new CommandResult(true); } blockCommentFinish(); line = line.substring(line.lastIndexOf("*/") + 2); } // We also support inline comments (but only at start of line, otherwise valid // command options like http://www.helloworld.com will fail as per ROO-517) if (!inBlockComment && (line.trim().startsWith("//") || line.trim().startsWith("#"))) { // # support in ROO-1116 line = ""; } {code} The codes above remove the last occurance of "/* xxx */" in side a command line string. That's why we pass '*/*/*' to pathRegex and finally we will get '*/**'. Moreover, the block comment removal logic above is buggy as in the case of '*/*/*' , the begin comment block identifier is '*/*(/*)' as quoted in the string, also the end comment block identifier is '*/(*/)*'. characters before begin identifier and after end identifier will be kept. That's why we get '*/**'. Finally, I suggest we should disable erasing block comment in hudi cli command line. Unfortunately, Spring shell 1.2.0 does not provide such as configuration that can disable block comment processing. Also I tried to use a converter that append '/**/' to every command string but it did not work, because spring shell deals with block comment before invoking converters. > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.12.0 > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > **Describe the problem you faced** > Hudi cli got empty result after running command show fsview all. > ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > **To Reproduce** > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > -- insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > **Expected behavior** > `show fsview all` in Hudi cli should return all file slices. > **Environment Description** > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > **Additional context** > No. > **Stacktrace** > N/A > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571705#comment-17571705 ] Yao Zhang commented on HUDI-4485: - This problem is caused by the default value of pathRegex parameter of the command 'show fsview all'. The default value is */*/*, which corresponds to the folder structure that partitioned by two columns. That is to say, the command show fsview all would return empty if it was not partitioned by two columns. Pensonally I plan to change the default value to */* and enrich the parameter explanation. Correct me if I am wrong. Thanks. > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.12.0 > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > **Describe the problem you faced** > Hudi cli got empty result after running command show fsview all. > ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > **To Reproduce** > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > -- insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > **Expected behavior** > `show fsview all` in Hudi cli should return all file slices. > **Environment Description** > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > **Additional context** > No. > **Stacktrace** > N/A > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571699#comment-17571699 ] Yao Zhang commented on HUDI-4485: - Could someone assign this issue to me? Thanks. > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.12.0 > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > **Describe the problem you faced** > Hudi cli got empty result after running command show fsview all. > ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > **To Reproduce** > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > -- insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > **Expected behavior** > `show fsview all` in Hudi cli should return all file slices. > **Environment Description** > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > **Additional context** > No. > **Stacktrace** > N/A > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4485) Hudi cli got empty result for command show fsview all
Yao Zhang created HUDI-4485: --- Summary: Hudi cli got empty result for command show fsview all Key: HUDI-4485 URL: https://issues.apache.org/jira/browse/HUDI-4485 Project: Apache Hudi Issue Type: Bug Components: cli Affects Versions: 0.11.1 Environment: Hudi version : 0.11.1 Spark version : 3.1.1 Hive version : 3.1.0 Hadoop version : 3.1.1 Reporter: Yao Zhang Fix For: 0.12.0 This issue is from: [[SUPPORT] Hudi cli got empty result for command show fsview all · Issue #6177 · apache/hudi (github.com)|https://github.com/apache/hudi/issues/6177] **Describe the problem you faced** Hudi cli got empty result after running command show fsview all. ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) The type of table t1 is COW and I am sure that the parquet file is actually generated inside data folder. Also, the parquet files are not damaged as the data could be retrieved correctly by reading as Hudi table or directly reading each parquet file(using Spark). **To Reproduce** Steps to reproduce the behavior: 1. Enter Flink SQL client. 2. Execute the SQL and check the data was written successfully. ```sql CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = 'COPY_ON_WRITE' ); -- insert data using values INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); ``` 3. Enter Hudi cli and execute `show fsview all` **Expected behavior** `show fsview all` in Hudi cli should return all file slices. **Environment Description** * Hudi version : 0.11.1 * Spark version : 3.1.1 * Hive version : 3.1.0 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no **Additional context** No. **Stacktrace** N/A -- This message was sent by Atlassian Jira (v8.20.10#820010)