from:"Yao Zhang \(Jira\)"

[jira] [Updated] (HUDI-7315) Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is incomparable.

2024-01-18 Thread Yao Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Zhang updated HUDI-7315:

Description: 
This issue is extended from HUDI-7309, as the risk still exists for the NOT 
filter predicate when the predicate it wraps does not support pushing down 
(e.g. expression with the operand typed Decimal).

It is similar to the issue of AND/OR filter in HUDI-7309. Though I have not yet 
reproduced NOT filter issue in practice, the risk still exists. We should fix 
it.

  was:
This issue is extended from HUDI-7309, as the risk still exists when the 
predicate it wraps does not support pushing down (e.g. expression with the 
operand typed Decimal).

It is similar to the issue of AND/OR filter in HUDI-7309. Though I have not yet 
reproduced NOT filter issue in practice, the risk still exists. We should fix 
it.


> Disable constructing NOT filter predicate when pushing down its wrapped 
> filter unsupported, as its operand's primitive value is incomparable.
> -
>
> Key: HUDI-7315
> URL: https://issues.apache.org/jira/browse/HUDI-7315
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.14.0, 0.14.1
> Environment: Flink 1.17.1
> Hudi 0.14.x
>Reporter: Yao Zhang
>Assignee: Yao Zhang
>Priority: Major
>
> This issue is extended from HUDI-7309, as the risk still exists for the NOT 
> filter predicate when the predicate it wraps does not support pushing down 
> (e.g. expression with the operand typed Decimal).
> It is similar to the issue of AND/OR filter in HUDI-7309. Though I have not 
> yet reproduced NOT filter issue in practice, the risk still exists. We should 
> fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7315) Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is incomparable.

2024-01-18 Thread Yao Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Zhang updated HUDI-7315:

Summary: Disable constructing NOT filter predicate when pushing down its 
wrapped filter unsupported, as its operand's primitive value is incomparable.  
(was: Disable constructing NOT filter predicate when pushing down its wrapped 
filter unsupported, as its operand's primitive value is uncomparable.)

> Disable constructing NOT filter predicate when pushing down its wrapped 
> filter unsupported, as its operand's primitive value is incomparable.
> -
>
> Key: HUDI-7315
> URL: https://issues.apache.org/jira/browse/HUDI-7315
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.14.0, 0.14.1
> Environment: Flink 1.17.1
> Hudi 0.14.x
>Reporter: Yao Zhang
>Assignee: Yao Zhang
>Priority: Major
>
> This issue is extended from HUDI-7309, as the risk still exists when the 
> predicate it wraps does not support pushing down (e.g. expression with the 
> operand typed Decimal).
> It is similar to the issue of AND/OR filter in HUDI-7309. Though I have not 
> yet reproduced NOT filter issue in practice, the risk still exists. We should 
> fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7315) Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is uncomparable.

2024-01-18 Thread Yao Zhang (Jira)

Yao Zhang created HUDI-7315:
---

 Summary: Disable constructing NOT filter predicate when pushing 
down its wrapped filter unsupported, as its operand's primitive value is 
uncomparable.
 Key: HUDI-7315
 URL: https://issues.apache.org/jira/browse/HUDI-7315
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.14.1, 0.14.0
 Environment: Flink 1.17.1
Hudi 0.14.x
Reporter: Yao Zhang
Assignee: Yao Zhang


This issue is extended from HUDI-7309, as the risk still exists when the 
predicate it wraps does not support pushing down (e.g. expression with the 
operand typed Decimal).

It is similar to the issue of AND/OR filter in HUDI-7309. Though I have not yet 
reproduced NOT filter issue in practice, the risk still exists. We should fix 
it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7311) Comparing date with date literal in string format causes class cast exception during filter push down

2024-01-17 Thread Yao Zhang (Jira)

Yao Zhang created HUDI-7311:
---

 Summary: Comparing date with date literal in string format causes 
class cast exception during filter push down
 Key: HUDI-7311
 URL: https://issues.apache.org/jira/browse/HUDI-7311
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.14.1, 0.14.0
Reporter: Yao Zhang
Assignee: Yao Zhang


Given any table with arbitrary field typed date (e.g. field d_date with type of 
date). And execute the SQL with conditions for this field in where clause.
{code:sql}
select d_date from xxx where d_date = '2020-01-01'
{code}
An exception will occur:
{code:java}
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
java.lang.Integer
at 
org.apache.hudi.source.ExpressionPredicates.toParquetPredicate(ExpressionPredicates.java:613)
at 
org.apache.hudi.source.ExpressionPredicates.access$100(ExpressionPredicates.java:64)
at 
org.apache.hudi.source.ExpressionPredicates$ColumnPredicate.filter(ExpressionPredicates.java:226)
at 
org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:68)
at 
org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:130)
at 
org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:66)
at 
org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)
{code}
Hudi Flink cannot convert the date literal in String format to Integer (the 
primitive type of date). However this SQL in Flink without Hudi works well.

In summary, we should add literal type auto conversion before filter push down.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7309) Disable filter pushing down when the parquet type corresponding to its field logical type is not comparable

2024-01-17 Thread Yao Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Zhang updated HUDI-7309:

Description: 
Given thee table web_sales from TPCDS:
{code:sql}
CREATE TABLE web_sales (
   ws_sold_date_sk int,
   ws_sold_time_sk int,
   ws_ship_date_sk int,
   ws_item_sk int,
   ws_bill_customer_sk int,
   ws_bill_cdemo_sk int,
   ws_bill_hdemo_sk int,
   ws_bill_addr_sk int,
   ws_ship_customer_sk int,
   ws_ship_cdemo_sk int,
   ws_ship_hdemo_sk int,
   ws_ship_addr_sk int,
   ws_web_page_sk int,
   ws_web_site_sk int,
   ws_ship_mode_sk int,
   ws_warehouse_sk int,
   ws_promo_sk int,
   ws_order_number int,
   ws_quantity int,
   ws_wholesale_cost decimal(7,2),
   ws_list_price decimal(7,2),
   ws_sales_price decimal(7,2),
   ws_ext_discount_amt decimal(7,2),
   ws_ext_sales_price decimal(7,2),
   ws_ext_wholesale_cost decimal(7,2),
   ws_ext_list_price decimal(7,2),
   ws_ext_tax decimal(7,2),
   ws_coupon_amt decimal(7,2),
   ws_ext_ship_cost decimal(7,2),
   ws_net_paid decimal(7,2),
   ws_net_paid_inc_tax decimal(7,2),
   ws_net_paid_inc_ship decimal(7,2),
   ws_net_paid_inc_ship_tax decimal(7,2),
   ws_net_profit decimal(7,2)
) with (
'connector' = 'hudi',
'path' = 'hdfs://path/to/web_sales',
'table.type' = 'COPY_ON_WRITE',
'hoodie.datasource.write.recordkey.field' = 'ws_item_sk,ws_order_number'
  );
{code}
And execute:
{code:sql}
select * from web_sales where ws_sold_date_sk = 2451268 and ws_sales_price 
between 100.00 and 150.00
{code}
An exception will occur:
{code:java}
Caused by: java.lang.NullPointerException: left cannot be null
at java.util.Objects.requireNonNull(Objects.java:228)
at 
org.apache.parquet.filter2.predicate.Operators$BinaryLogicalFilterPredicate.(Operators.java:257)
at 
org.apache.parquet.filter2.predicate.Operators$And.(Operators.java:301)
at 
org.apache.parquet.filter2.predicate.FilterApi.and(FilterApi.java:249)
at 
org.apache.hudi.source.ExpressionPredicates$And.filter(ExpressionPredicates.java:551)
at 
org.apache.hudi.source.ExpressionPredicates$Or.filter(ExpressionPredicates.java:589)
at 
org.apache.hudi.source.ExpressionPredicates$Or.filter(ExpressionPredicates.java:589)
at 
org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:68)
at 
org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:130)
at 
org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:66)
at 
org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)
{code}
After further investigation, decimal type is not comparable in the form it 
stored in parquet format (fix length byte array). The way that pushes down this 
filter to parquet predicates are not 
supported(ExpressionPredicates::toParquetPredicate does not provide decimal 
conversion). Then when it constructs the AND filter, both the filters of 
operands are null. That's how this issue reproduces.

If we execute this SQL:
{code:sql}
select * from web_sales where ws_sold_date_sk = 2451268 and ws_sales_price 
between 100.00 and 150.00
{code}
It works without any problems as the predicates generated by pushing down 
process are null. Then Flink engine will filter the data instead of parquet.

To solve this, I plan to add null checks for both AND and OR filter predicates 
contruction. If the field type pushing down was not supported, the generated 
filter would be null. The pushing down could be disabled by not contructing the 
AND or OR filter if any of its operands is null.

  was:
Given thee table web_sales from TPCDS:
{code:sql}
CREATE TABLE web_sales (

[jira] [Created] (HUDI-7309) Disable filter pushing down when the parquet type corresponding to its field logical type is not comparable

2024-01-17 Thread Yao Zhang (Jira)

Yao Zhang created HUDI-7309:
---

 Summary: Disable filter pushing down when the parquet type 
corresponding to its field logical type is not comparable
 Key: HUDI-7309
 URL: https://issues.apache.org/jira/browse/HUDI-7309
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.14.1, 0.14.0
 Environment: Hudi 0.14.0
Hudi 0.14.1rc1
Flink 1.17.1
Reporter: Yao Zhang
Assignee: Yao Zhang


Given thee table web_sales from TPCDS:
{code:sql}
CREATE TABLE web_sales (
   ws_sold_date_sk int,
   ws_sold_time_sk int,
   ws_ship_date_sk int,
   ws_item_sk int,
   ws_bill_customer_sk int,
   ws_bill_cdemo_sk int,
   ws_bill_hdemo_sk int,
   ws_bill_addr_sk int,
   ws_ship_customer_sk int,
   ws_ship_cdemo_sk int,
   ws_ship_hdemo_sk int,
   ws_ship_addr_sk int,
   ws_web_page_sk int,
   ws_web_site_sk int,
   ws_ship_mode_sk int,
   ws_warehouse_sk int,
   ws_promo_sk int,
   ws_order_number int,
   ws_quantity int,
   ws_wholesale_cost decimal(7,2),
   ws_list_price decimal(7,2),
   ws_sales_price decimal(7,2),
   ws_ext_discount_amt decimal(7,2),
   ws_ext_sales_price decimal(7,2),
   ws_ext_wholesale_cost decimal(7,2),
   ws_ext_list_price decimal(7,2),
   ws_ext_tax decimal(7,2),
   ws_coupon_amt decimal(7,2),
   ws_ext_ship_cost decimal(7,2),
   ws_net_paid decimal(7,2),
   ws_net_paid_inc_tax decimal(7,2),
   ws_net_paid_inc_ship decimal(7,2),
   ws_net_paid_inc_ship_tax decimal(7,2),
   ws_net_profit decimal(7,2)
) with (
'connector' = 'hudi',
'path' = 'hdfs://path/to/web_sales',
'table.type' = 'COPY_ON_WRITE',
'hoodie.datasource.write.recordkey.field' = 'ws_item_sk,ws_order_number'
  );
{code}
And execute:
{code:sql}
select * from web_sales where ws_sold_date_sk = 2451268 and ws_sales_price 
between 100.00 and 150.00
{code}
An exception will occur:
{code:java}
Caused by: java.lang.NullPointerException: left cannot be null
at java.util.Objects.requireNonNull(Objects.java:228)
at 
org.apache.parquet.filter2.predicate.Operators$BinaryLogicalFilterPredicate.(Operators.java:257)
at 
org.apache.parquet.filter2.predicate.Operators$And.(Operators.java:301)
at 
org.apache.parquet.filter2.predicate.FilterApi.and(FilterApi.java:249)
at 
org.apache.hudi.source.ExpressionPredicates$And.filter(ExpressionPredicates.java:551)
at 
org.apache.hudi.source.ExpressionPredicates$Or.filter(ExpressionPredicates.java:589)
at 
org.apache.hudi.source.ExpressionPredicates$Or.filter(ExpressionPredicates.java:589)
at 
org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:68)
at 
org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:130)
at 
org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:66)
at 
org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)
{code}
After further investigation, decimal type is not comparable in the form it 
stored in parquet format (fix length byte array). The way that pushes down this 
filter to parquet predicates are not 
supported(ExpressionPredicates::toParquetPredicate does not provide decimal 
conversion). Then when it constructs the AND filter, both the filters of 
operands are null. That's how this issue reproduces.

If we execute this SQL:
{code:sql}
select 
{code}
{color:#ff66ff}*{color}
{code:sql}
 from web_sales where ws_sold_date_sk = 2451268 and ws_sales_price between 
100.00 and 150.00
{code}
It works without any problems as the predicates generated by pushing down 
process are null. Then Flink engine will filter the data instead of parquet.

To solv

[jira] [Created] (HUDI-7303) Date field type unexpectedly convert to Long when using date comparison operator

2024-01-16 Thread Yao Zhang (Jira)

Yao Zhang created HUDI-7303:
---

 Summary: Date field type unexpectedly convert to Long when using 
date comparison operator
 Key: HUDI-7303
 URL: https://issues.apache.org/jira/browse/HUDI-7303
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.14.1, 0.14.0
 Environment: Flink 1.15.4 Hudi 0.14.0
Flink 1.17.1 Hudi 0.14.0
Flink 1.17.1 Hudi 0.14.1rc1
Reporter: Yao Zhang
Assignee: Yao Zhang


Given the table date_dim from TPCDS as an example:
{code:java}
CREATE TABLE date_dim (
  d_date_sk int,
  d_date_id varchar(16) NOT NULL,
  d_date date,
  d_month_seq int,
  d_week_seq int,
  d_quarter_seq int,
  d_year int,
  d_dow int,
  d_moy int,
  d_dom int,
  d_qoy int,
  d_fy_year int, 
  d_fy_quarter_seq int,
  d_fy_week_seq int,
  d_day_name varchar(9)
  d_quarter_name varchar(6),
  d_holiday char(1),
  d_weekend char(1),
  d_following_holiday char(1),
  d_first_dom int,
  d_last_dom int,
  d_same_day_ly int,
  d_same_day_lq int,
  d_current_day char(1),
  d_current_week char(1),
  d_current_month char(1),
  d_current_quarter char(1),
  d_current_year char(1)) with (
  'connector' = 'hudi',
  'path' = 'hdfs:///table_path/date_dim',
  'table.type' = 'COPY_ON_WRITE'); {code}

When you execute the following select statement, an exception will be thrown:

{code:java}
select * from date_dim where d_date between cast('1999-02-22' as date) and 
(cast('1999-02-22' as date) + INTERVAL '30' day);
{code}

The exception is:

{code:java}
java.lang.IllegalArgumentException: FilterPredicate column: d_date's declared 
type (java.lang.Long) does not match the schema found in file metadata. Column 
d_date is of type: INT32
Valid types for this column are: [class java.lang.Integer]
at 
org.apache.parquet.filter2.predicate.ValidTypeMap.assertTypeValid(ValidTypeMap.java:125)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:179)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:113)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.predicate.Operators$GtEq.accept(Operators.java:246) 
~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:119)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:306) 
~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95) 
~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45) 
~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:67)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.(ParquetColumnarRowSplitReader.java:142)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.genPartColumnarRowReader(ParquetSplitReaderUtil.java:153)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:78)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:130)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:66)
 ~[hudi-flink1.17-bundle-0.14.0.jar:0.14.0]
at 
org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84)
 ~[flink-dist-1.17.1.jar:1.17.1]
a

[jira] [Updated] (HUDI-7297) Exception thrown when field type mismatch is ambiguous

2024-01-14 Thread Yao Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Zhang updated HUDI-7297:

Description: 
If you create a table with mismatched file types in Flink SQL, for example you 
define a field as bigint while the actual field type is int, an 
IllegalArgumentException would be thrown like below:

java.lang.IllegalArgumentException: Unexpected type: INT32

The exception is way too ambiguous. It is difficult to figure out which field 
type is incorrect and what the correct type is. You have to refer to the source 
code.

Currently I plan to make the exception message more informative. 

  was:
If you create a table with mismatched file types, for example you define a 
field as bigint while the actual field type is int, an IllegalArgumentException 
would be thrown like below:

java.lang.IllegalArgumentException: Unexpected type: INT32

The exception is way too ambiguous. It is difficult to figure out which field 
type is incorrect and what the correct type is. You have to refer to the source 
code.

Currently I plan to make the exception message more informative. 


> Exception thrown when field type mismatch is ambiguous
> --
>
> Key: HUDI-7297
> URL: https://issues.apache.org/jira/browse/HUDI-7297
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Yao Zhang
>Assignee: Yao Zhang
>Priority: Minor
>
> If you create a table with mismatched file types in Flink SQL, for example 
> you define a field as bigint while the actual field type is int, an 
> IllegalArgumentException would be thrown like below:
> java.lang.IllegalArgumentException: Unexpected type: INT32
> The exception is way too ambiguous. It is difficult to figure out which field 
> type is incorrect and what the correct type is. You have to refer to the 
> source code.
> Currently I plan to make the exception message more informative. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7297) Exception thrown when field type mismatch is ambiguous

2024-01-14 Thread Yao Zhang (Jira)

Yao Zhang created HUDI-7297:
---

 Summary: Exception thrown when field type mismatch is ambiguous
 Key: HUDI-7297
 URL: https://issues.apache.org/jira/browse/HUDI-7297
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Yao Zhang
Assignee: Yao Zhang


If you create a table with mismatched file types, for example you define a 
field as bigint while the actual field type is int, an IllegalArgumentException 
would be thrown like below:

java.lang.IllegalArgumentException: Unexpected type: INT32

The exception is way too ambiguous. It is difficult to figure out which field 
type is incorrect and what the correct type is. You have to refer to the source 
code.

Currently I plan to make the exception message more informative. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-6394) Fixed the issue that CreateHoodieTableCommand does not provide detailed exception stack trace

2023-06-16 Thread Yao Zhang (Jira)

Yao Zhang created HUDI-6394:
---

 Summary: Fixed the issue that CreateHoodieTableCommand does not 
provide detailed exception stack trace
 Key: HUDI-6394
 URL: https://issues.apache.org/jira/browse/HUDI-6394
 Project: Apache Hudi
  Issue Type: Bug
  Components: spark
Reporter: Yao Zhang
Assignee: Yao Zhang
 Fix For: 0.14.0


If we encountered an exception when creating table using Hudi with Spark. The 
log would only show the exception class name and its message. For 
ClassNotFoundException it might be enough to trace the root cause. 
However, for other types of exceptions, their messages might not clear enough 
to demonstrate where the underlying problems locate. I think we should long the 
detailed stack trace. Another benefit is, the typical pattern of the stack 
trace in log file could help people or even automatical shell script to capture 
the exception.
If it is necessary to add the stack trace in log file, I would like to do it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5725) Creating Hudi table with misspelled table type in Flink leads to Flink cluster crash

2023-02-07 Thread Yao Zhang (Jira)

Yao Zhang created HUDI-5725:
---

 Summary: Creating Hudi table with misspelled table type in Flink 
leads to Flink cluster crash
 Key: HUDI-5725
 URL: https://issues.apache.org/jira/browse/HUDI-5725
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.12.2, 0.11.1
 Environment: Flink 1.13.2
Hadoop 3.1.1
Hudi 0.14.0-SNAPSHOT
Reporter: Yao Zhang
Assignee: Yao Zhang
 Fix For: 0.14.0


Create table with the following SQL:
{code:sql}
CREATE TABLE t1(
  uuid VARCHAR(20),
  name VARCHAR(10),
  age INT,
  ts TIMESTAMP(3),
  `partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
  'connector' = 'hudi',
  'path' = 'hdfs:///hudi/t1',
  'table.type' = 'MERGE_ON_REA
  D'
);
{code}

The value table.type contains an unexpected line break in 'MERGE_ON_READ'.

After the table creation, insert an arbitrary line of data. Flink cluster will 
immediately crash with the exception below:

{code:java}
Caused by: java.lang.IllegalArgumentException: No enum constant 
org.apache.hudi.common.model.HoodieTableType.MERGE_ON_REA
  D
at java.lang.Enum.valueOf(Enum.java:238) ~[?:1.8.0_121]
at 
org.apache.hudi.common.model.HoodieTableType.valueOf(HoodieTableType.java:30) 
~[hudi-flink1.13-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator$TableState.(StreamWriteOperatorCoordinator.java:630)
 ~[hudi-flink1.13-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator$TableState.create(StreamWriteOperatorCoordinator.java:640)
 ~[hudi-flink1.13-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:187)
 ~[hudi-flink1.13-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194)
 ~[flink-dist_2.11-1.13.2.jar:1.13.2]
at 
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85)
 ~[flink-dist_2.11-1.13.2.jar:1.13.2]
at 
org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592)
 ~[flink-dist_2.11-1.13.2.jar:1.13.2]
at 
org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955)
 ~[flink-dist_2.11-1.13.2.jar:1.13.2]
at 
org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873)
 ~[flink-dist_2.11-1.13.2.jar:1.13.2]
at 
org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383) 
~[flink-dist_2.11-1.13.2.jar:1.13.2]
at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181)
 ~[flink-dist_2.11-1.13.2.jar:1.13.2]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605)
 ~[flink-dist_2.11-1.13.2.jar:1.13.2]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:180)
 ~[flink-dist_2.11-1.13.2.jar:1.13.2]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
~[flink-dist_2.11-1.13.2.jar:1.13.2]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
~[flink-dist_2.11-1.13.2.jar:1.13.2]
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
~[flink-dist_2.11-1.13.2.jar:1.13.2]
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
~[flink-dist_2.11-1.13.2.jar:1.13.2]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
~[flink-dist_2.11-1.13.2.jar:1.13.2]
... 12 more
{code}

The expected behavior is Flink cluster is still running and gives some 
infomation like 'Illegal table type' and gives the user a chance to correct the 
SQL statement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-09-07 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601562#comment-17601562
 ] 

Yao Zhang commented on HUDI-4485:
-

Hi [~codope] ,

Finally all unit test issues have been resolved and CI passed. Could you please 
help review this PR? Thank you very much.

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Assignee: Yao Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: spring-shell-1.2.0.RELEASE.jar
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> {*}{{*}}Describe the problem you faced{{*}}{*}
> Hudi cli got empty result after running command show fsview all.
> ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png])
> The type of table t1 is COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> {*}{{*}}To Reproduce{{*}}{*}
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
> uuid VARCHAR(20),
> name VARCHAR(10),
> age INT,
> ts TIMESTAMP(3),
> `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
> 'connector' = 'hudi',
> 'path' = 'hdfs:///path/to/table/',
> 'table.type' = 'COPY_ON_WRITE'
> );
> – insert data using values
> INSERT INTO t1 VALUES
> ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> {*}{{*}}Expected behavior{{*}}{*}
> `show fsview all` in Hudi cli should return all file slices.
> {*}{{*}}Environment Description{{*}}{*}
>  * Hudi version : 0.11.1
>  * Spark version : 3.1.1
>  * Hive version : 3.1.0
>  * Hadoop version : 3.1.1
>  * Storage (HDFS/S3/GCS..) : HDFS
>  * Running on Docker? (yes/no) : no
> {*}{{*}}Additional context{{*}}{*}
> No.
> {*}{{*}}Stacktrace{{*}}{*}
> N/A
>  
> Temporary solution：
> I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the 
> attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4765) Compared inserting data via spark-sql with spark-shell,_hoodie_record_key generation logic is different, which might affects data upsert

2022-09-01 Thread Yao Zhang (Jira)

Yao Zhang created HUDI-4765:
---

 Summary: Compared inserting data via spark-sql with 
spark-shell,_hoodie_record_key generation logic is different, which might 
affects data upsert
 Key: HUDI-4765
 URL: https://issues.apache.org/jira/browse/HUDI-4765
 Project: Apache Hudi
  Issue Type: Bug
  Components: spark, spark-sql
Affects Versions: 0.11.1
 Environment: Spark 3.1.1
Hudi 0.11.1
Reporter: Yao Zhang


Create table using spark-sql:
{code:java}
create table hudi_mor_tbl (
  id int,
  name string,
  price double,
  ts bigint
) using hudi
tblproperties (
  type = 'mor',
  primaryKey = 'id',
  preCombineField = 'ts'
)
location 'hdfs:///hudi/hudi_mor_tbl'; {code}
And then insert data via spark-shell and spark-sql respectively:
{code:java}
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val fields = Array(
      StructField("id", IntegerType, true),
      StructField("name", StringType, true),
      StructField("price", DoubleType, true),
      StructField("ts", LongType, true)
  )
val simpleSchema = StructType(fields)
val data = Seq(Row(2, "a2", 200.0, 100L))
val df = spark.createDataFrame(data, simpleSchema)
df.write.format("hudi").
  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
  option(RECORDKEY_FIELD_OPT_KEY, "id").
  option(TABLE_NAME, "hudi_mor_tbl").
  option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ").
  mode(Append).
  save("hdfs:///hudi/hudi_mor_tbl") {code}
{code:java}
insert into hudi_mor_tbl select 1, 'a1', 20, 1000; {code}
After that we query the table, we can see those two rows are as below:
{code:java}
+---++--+--++---++-++
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name| id|name|price|  ts|
+---++--+--++---++-++
|  20220902012710792|20220902012710792...|                 2|                   
   |c3eff8c8-fa47-48c...|  2|  a2|200.0| 100|
|  20220902012813658|20220902012813658...|              id:1|                   
   |c3eff8c8-fa47-48c...|  1|  a1| 20.0|1000|
+---++--+--++---++-++
 {code}
'_hoodie_record_key' field for spark_sql inserted data is 'id:1' while that for 
spark-shell is 2. It seems that spark_sql uses 
'[primaryKey_field_name]:[primaryKey_field_value]' to construct the 
'_hoodie_record_key' field, which is different from spark-shell.

As a result, if we inserted one row via spark-sql and then upserted it via 
spark-shell, we would get two duplicated rows. That is not what we expected.

Did I miss some configurations that might lead to this issue? If not, 
personally I think we should make the default record key generation logic 
consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HUDI-4718) Hudi cli does not support Kerberized Hadoop cluster

2022-08-25 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584707#comment-17584707
 ] 

Yao Zhang edited comment on HUDI-4718 at 8/25/22 8:26 AM:
--

Could anyone assign this issue to me? Thanks.

I plan to add this feature after HUDI-4485 is merged, when spring shell version 
has bumped to 2.1.1


was (Author: paul8263):
Could anyone assign this issue to me? Thanks.

> Hudi cli does not support Kerberized Hadoop cluster
> ---
>
> Key: HUDI-4718
> URL: https://issues.apache.org/jira/browse/HUDI-4718
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Reporter: Yao Zhang
>Priority: Major
> Fix For: 0.13.0
>
>
> Hudi cli connect command cannot read table from Kerberized Hadoop cluster and 
> there is no way to perform Kerberos authentication. 
> I plan to add this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-4718) Hudi cli does not support Kerberized Hadoop cluster

2022-08-25 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584707#comment-17584707
 ] 

Yao Zhang commented on HUDI-4718:
-

Could anyone assign this issue to me? Thanks.

> Hudi cli does not support Kerberized Hadoop cluster
> ---
>
> Key: HUDI-4718
> URL: https://issues.apache.org/jira/browse/HUDI-4718
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Reporter: Yao Zhang
>Priority: Major
> Fix For: 0.13.0
>
>
> Hudi cli connect command cannot read table from Kerberized Hadoop cluster and 
> there is no way to perform Kerberos authentication. 
> I plan to add this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4718) Hudi cli does not support Kerberized Hadoop cluster

2022-08-25 Thread Yao Zhang (Jira)

Yao Zhang created HUDI-4718:
---

 Summary: Hudi cli does not support Kerberized Hadoop cluster
 Key: HUDI-4718
 URL: https://issues.apache.org/jira/browse/HUDI-4718
 Project: Apache Hudi
  Issue Type: Bug
  Components: cli
Reporter: Yao Zhang
 Fix For: 0.13.0


Hudi cli connect command cannot read table from Kerberized Hadoop cluster and 
there is no way to perform Kerberos authentication. 

I plan to add this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-08-21 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582773#comment-17582773
 ] 

Yao Zhang commented on HUDI-4485:
-

Hi [~codope] ,

Personally I tend to bump Spring shell to 2.1.1, although it may lead to lots 
of work. Keeping it updated might help minimize the bugs and make it easier to 
implement new features.

 

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: spring-shell-1.2.0.RELEASE.jar
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> {*}{{*}}Describe the problem you faced{{*}}{*}
> Hudi cli got empty result after running command show fsview all.
> ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png])
> The type of table t1 is COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> {*}{{*}}To Reproduce{{*}}{*}
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
> uuid VARCHAR(20),
> name VARCHAR(10),
> age INT,
> ts TIMESTAMP(3),
> `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
> 'connector' = 'hudi',
> 'path' = 'hdfs:///path/to/table/',
> 'table.type' = 'COPY_ON_WRITE'
> );
> – insert data using values
> INSERT INTO t1 VALUES
> ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> {*}{{*}}Expected behavior{{*}}{*}
> `show fsview all` in Hudi cli should return all file slices.
> {*}{{*}}Environment Description{{*}}{*}
>  * Hudi version : 0.11.1
>  * Spark version : 3.1.1
>  * Hive version : 3.1.0
>  * Hadoop version : 3.1.1
>  * Storage (HDFS/S3/GCS..) : HDFS
>  * Running on Docker? (yes/no) : no
> {*}{{*}}Additional context{{*}}{*}
> No.
> {*}{{*}}Stacktrace{{*}}{*}
> N/A
>  
> Temporary solution：
> I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the 
> attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-4589) "show fsview all" hudi-cli fails for a hudi table written via flink

2022-08-09 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577812#comment-17577812
 ] 

Yao Zhang commented on HUDI-4589:
-

Hi [~shivnarayan] and [~xichaomin] ,

Thank you for your reply.

This issue is similar to HUDI-4485.

The problem is not only caused by the wrong default parameter value of 
pathRegex, but also a feature of spring-shell 1.2.0.RELEASE, which will 
truncate everything between block quote identifiers in command line. Please 
view the details in HUDI-4485.

> "show fsview all" hudi-cli fails for a hudi table written via flink
> ---
>
> Key: HUDI-4589
> URL: https://issues.apache.org/jira/browse/HUDI-4589
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: sivabalan narayanan
>Priority: Major
>
> The type of table t1 is COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
>  
> *To Reproduce*
> Steps to reproduce the behavior:
>  # Enter Flink SQL client.
>  # Execute the SQL and check the data was written successfully.
> CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts 
> TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 
> 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = 
> 'COPY_ON_WRITE' ); -- insert data using values INSERT INTO t1 VALUES 
> ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), 
> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), 
> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), 
> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), 
> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), 
> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), 
> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), 
> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); # Enter Hudi cli and 
> execute {{show fsview all}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-08-08 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577058#comment-17577058
 ] 

Yao Zhang commented on HUDI-4485:
-

Hi [~codope] ,

Thank your for your reply. My Jira ID is paul8263.

Should we bump spring shell version or fix 1.2.0.RELEASE as what the temporary 
solution does?

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: spring-shell-1.2.0.RELEASE.jar
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> {*}{{*}}Describe the problem you faced{{*}}{*}
> Hudi cli got empty result after running command show fsview all.
> ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png])
> The type of table t1 is COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> {*}{{*}}To Reproduce{{*}}{*}
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
> uuid VARCHAR(20),
> name VARCHAR(10),
> age INT,
> ts TIMESTAMP(3),
> `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
> 'connector' = 'hudi',
> 'path' = 'hdfs:///path/to/table/',
> 'table.type' = 'COPY_ON_WRITE'
> );
> – insert data using values
> INSERT INTO t1 VALUES
> ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> {*}{{*}}Expected behavior{{*}}{*}
> `show fsview all` in Hudi cli should return all file slices.
> {*}{{*}}Environment Description{{*}}{*}
>  * Hudi version : 0.11.1
>  * Spark version : 3.1.1
>  * Hive version : 3.1.0
>  * Hadoop version : 3.1.1
>  * Storage (HDFS/S3/GCS..) : HDFS
>  * Running on Docker? (yes/no) : no
> {*}{{*}}Additional context{{*}}{*}
> No.
> {*}{{*}}Stacktrace{{*}}{*}
> N/A
>  
> Temporary solution：
> I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the 
> attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-08-04 Thread Yao Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Zhang updated HUDI-4485:

Attachment: spring-shell-1.2.0.RELEASE.jar

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: spring-shell-1.2.0.RELEASE.jar
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> **Describe the problem you faced**
> Hudi cli got empty result after running command show fsview all.
> ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png)
> The type of table t1 is  COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> **To Reproduce**
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
>   uuid VARCHAR(20),
>   name VARCHAR(10),
>   age INT,
>   ts TIMESTAMP(3),
>   `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
>   'connector' = 'hudi',
>   'path' = 'hdfs:///path/to/table/',
>   'table.type' = 'COPY_ON_WRITE'
> );
> -- insert data using values
> INSERT INTO t1 VALUES
>   ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
>   ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
>   ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
>   ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
>   ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
>   ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
>   ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
>   ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> **Expected behavior**
> `show fsview all` in Hudi cli should return all file slices.
> **Environment Description**
> * Hudi version : 0.11.1
> * Spark version : 3.1.1
> * Hive version : 3.1.0
> * Hadoop version : 3.1.1
> * Storage (HDFS/S3/GCS..) : HDFS
> * Running on Docker? (yes/no) : no
> **Additional context**
> No.
> **Stacktrace**
> N/A
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-08-04 Thread Yao Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Zhang updated HUDI-4485:

Description: 
This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
fsview all · Issue #6177 · apache/hudi 
(github.com)|https://github.com/apache/hudi/issues/6177]

{*}{{*}}Describe the problem you faced{{*}}{*}

Hudi cli got empty result after running command show fsview all.

![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png])

The type of table t1 is COW and I am sure that the parquet file is actually 
generated inside data folder. Also, the parquet files are not damaged as the 
data could be retrieved correctly by reading as Hudi table or directly reading 
each parquet file(using Spark).

{*}{{*}}To Reproduce{{*}}{*}

Steps to reproduce the behavior:

1. Enter Flink SQL client.
2. Execute the SQL and check the data was written successfully.
```sql
CREATE TABLE t1(
uuid VARCHAR(20),
name VARCHAR(10),
age INT,
ts TIMESTAMP(3),
`partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
'connector' = 'hudi',
'path' = 'hdfs:///path/to/table/',
'table.type' = 'COPY_ON_WRITE'
);

– insert data using values
INSERT INTO t1 VALUES
('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
```
3. Enter Hudi cli and execute `show fsview all`

{*}{{*}}Expected behavior{{*}}{*}

`show fsview all` in Hudi cli should return all file slices.

{*}{{*}}Environment Description{{*}}{*}
 * Hudi version : 0.11.1

 * Spark version : 3.1.1

 * Hive version : 3.1.0

 * Hadoop version : 3.1.1

 * Storage (HDFS/S3/GCS..) : HDFS

 * Running on Docker? (yes/no) : no

{*}{{*}}Additional context{{*}}{*}

No.

{*}{{*}}Stacktrace{{*}}{*}

N/A

 

Temporary solution：

I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the 
attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/.

  was:
This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
fsview all · Issue #6177 · apache/hudi 
(github.com)|https://github.com/apache/hudi/issues/6177]

*{*}Describe the problem you faced{*}*

Hudi cli got empty result after running command show fsview all.

![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png])

The type of table t1 is COW and I am sure that the parquet file is actually 
generated inside data folder. Also, the parquet files are not damaged as the 
data could be retrieved correctly by reading as Hudi table or directly reading 
each parquet file(using Spark).

*{*}To Reproduce{*}*

Steps to reproduce the behavior:

1. Enter Flink SQL client.
2. Execute the SQL and check the data was written successfully.
```sql
CREATE TABLE t1(
uuid VARCHAR(20),
name VARCHAR(10),
age INT,
ts TIMESTAMP(3),
`partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
'connector' = 'hudi',
'path' = 'hdfs:///path/to/table/',
'table.type' = 'COPY_ON_WRITE'
);

– insert data using values
INSERT INTO t1 VALUES
('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
```
3. Enter Hudi cli and execute `show fsview all`

*{*}Expected behavior{*}*

`show fsview all` in Hudi cli should return all file slices.

*{*}Environment Description{*}*
 * Hudi version : 0.11.1

 * Spark version : 3.1.1

 * Hive version : 3.1.0

 * Hadoop version : 3.1.1

 * Storage (HDFS/S3/GCS..) : HDFS

 * Running on Docker? (yes/no) : no

*{*}Additional context{*}*

No.

*{*}Stacktrace{*}*

N/A

 

Temporary solution：

I modified and reocmpiled spring-shell 1.2.0.RELEASE. Please download the 
attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/.


> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version :

[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-08-04 Thread Yao Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Zhang updated HUDI-4485:

Description: 
This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
fsview all · Issue #6177 · apache/hudi 
(github.com)|https://github.com/apache/hudi/issues/6177]

*{*}Describe the problem you faced{*}*

Hudi cli got empty result after running command show fsview all.

![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png])

The type of table t1 is COW and I am sure that the parquet file is actually 
generated inside data folder. Also, the parquet files are not damaged as the 
data could be retrieved correctly by reading as Hudi table or directly reading 
each parquet file(using Spark).

*{*}To Reproduce{*}*

Steps to reproduce the behavior:

1. Enter Flink SQL client.
2. Execute the SQL and check the data was written successfully.
```sql
CREATE TABLE t1(
uuid VARCHAR(20),
name VARCHAR(10),
age INT,
ts TIMESTAMP(3),
`partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
'connector' = 'hudi',
'path' = 'hdfs:///path/to/table/',
'table.type' = 'COPY_ON_WRITE'
);

– insert data using values
INSERT INTO t1 VALUES
('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
```
3. Enter Hudi cli and execute `show fsview all`

*{*}Expected behavior{*}*

`show fsview all` in Hudi cli should return all file slices.

*{*}Environment Description{*}*
 * Hudi version : 0.11.1

 * Spark version : 3.1.1

 * Hive version : 3.1.0

 * Hadoop version : 3.1.1

 * Storage (HDFS/S3/GCS..) : HDFS

 * Running on Docker? (yes/no) : no

*{*}Additional context{*}*

No.

*{*}Stacktrace{*}*

N/A

 

Temporary solution：

I modified and reocmpiled spring-shell 1.2.0.RELEASE. Please download the 
attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/.

  was:
This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
fsview all · Issue #6177 · apache/hudi 
(github.com)|https://github.com/apache/hudi/issues/6177]

**Describe the problem you faced**

Hudi cli got empty result after running command show fsview all.

![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png)

The type of table t1 is  COW and I am sure that the parquet file is actually 
generated inside data folder. Also, the parquet files are not damaged as the 
data could be retrieved correctly by reading as Hudi table or directly reading 
each parquet file(using Spark).

**To Reproduce**

Steps to reproduce the behavior:

1. Enter Flink SQL client.
2. Execute the SQL and check the data was written successfully.
```sql
CREATE TABLE t1(
  uuid VARCHAR(20),
  name VARCHAR(10),
  age INT,
  ts TIMESTAMP(3),
  `partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
  'connector' = 'hudi',
  'path' = 'hdfs:///path/to/table/',
  'table.type' = 'COPY_ON_WRITE'
);

-- insert data using values
INSERT INTO t1 VALUES
  ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
  ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
  ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
  ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
  ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
  ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
  ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
  ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
```
3. Enter Hudi cli and execute `show fsview all`

**Expected behavior**

`show fsview all` in Hudi cli should return all file slices.

**Environment Description**

* Hudi version : 0.11.1

* Spark version : 3.1.1

* Hive version : 3.1.0

* Hadoop version : 3.1.1

* Storage (HDFS/S3/GCS..) : HDFS

* Running on Docker? (yes/no) : no


**Additional context**

No.

**Stacktrace**

N/A


 


> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: spring-shell-1.2.0.RELEASE.jar
>
>
> This issue is from: [[SUPPORT] Hudi cli got empt

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-07-29 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572899#comment-17572899
 ] 

Yao Zhang commented on HUDI-4485:
-

Hi [~codope] ,

The root cause is one of the dependency of hudi-cli is spring shell 1.2.0, 
which automatically erases everything between '\*' and '*/'(including the 
identifiers). It seems that this feature cannot be turned off. I am contacting 
spring shell community for support. If the new version can solve this problem I 
prefer to upgrade. We need to discuss whether it is a good practice if we bump 
the version to 2.1.0.

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.13.0
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> **Describe the problem you faced**
> Hudi cli got empty result after running command show fsview all.
> ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png)
> The type of table t1 is  COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> **To Reproduce**
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
>   uuid VARCHAR(20),
>   name VARCHAR(10),
>   age INT,
>   ts TIMESTAMP(3),
>   `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
>   'connector' = 'hudi',
>   'path' = 'hdfs:///path/to/table/',
>   'table.type' = 'COPY_ON_WRITE'
> );
> -- insert data using values
> INSERT INTO t1 VALUES
>   ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
>   ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
>   ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
>   ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
>   ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
>   ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
>   ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
>   ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> **Expected behavior**
> `show fsview all` in Hudi cli should return all file slices.
> **Environment Description**
> * Hudi version : 0.11.1
> * Spark version : 3.1.1
> * Hive version : 3.1.0
> * Hadoop version : 3.1.1
> * Storage (HDFS/S3/GCS..) : HDFS
> * Running on Docker? (yes/no) : no
> **Additional context**
> No.
> **Stacktrace**
> N/A
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-07-28 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571705#comment-17571705
 ] 

Yao Zhang edited comment on HUDI-4485 at 7/29/22 2:55 AM:
--

This problem is caused by the default value of pathRegex parameter of the 
command 'show fsview all'. The default value is \*/\*/\*, which corresponds to 
the folder structure that partitioned by two columns. That is to say, the 
command show fsview all would return empty if it was not partitioned by two 
columns.

Pensonally I plan to change the default value to \*/\* and enrich the parameter 
explanation.

Correct me if I am wrong. Thanks.


was (Author: paul8263):
This problem is caused by the default value of pathRegex parameter of the 
command 'show fsview all'. The default value is */*/*, which corresponds to the 
folder structure that partitioned by two columns. That is to say, the command 
show fsview all would return empty if it was not partitioned by two columns.

Pensonally I plan to change the default value to */* and enrich the parameter 
explanation.

Correct me if I am wrong. Thanks.

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.12.0
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> **Describe the problem you faced**
> Hudi cli got empty result after running command show fsview all.
> ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png)
> The type of table t1 is  COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> **To Reproduce**
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
>   uuid VARCHAR(20),
>   name VARCHAR(10),
>   age INT,
>   ts TIMESTAMP(3),
>   `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
>   'connector' = 'hudi',
>   'path' = 'hdfs:///path/to/table/',
>   'table.type' = 'COPY_ON_WRITE'
> );
> -- insert data using values
> INSERT INTO t1 VALUES
>   ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
>   ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
>   ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
>   ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
>   ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
>   ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
>   ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
>   ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> **Expected behavior**
> `show fsview all` in Hudi cli should return all file slices.
> **Environment Description**
> * Hudi version : 0.11.1
> * Spark version : 3.1.1
> * Hive version : 3.1.0
> * Hadoop version : 3.1.1
> * Storage (HDFS/S3/GCS..) : HDFS
> * Running on Docker? (yes/no) : no
> **Additional context**
> No.
> **Stacktrace**
> N/A
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-07-28 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572719#comment-17572719
 ] 

Yao Zhang edited comment on HUDI-4485 at 7/29/22 2:53 AM:
--

Hi all,

After further investigation I found that Spring-shell 1.2.0 deals with block 
comment in command line. The relevant codes are:

org.springframework.shell.core.AbstractShell::executeCommand
{code:java}
// We support simple block comments; ie a single pair per line
if (!inBlockComment && line.contains("/*") && line.contains("*/")) {
blockCommentBegin();
String lhs = line.substring(0, line.lastIndexOf("/*"));
if (line.contains("*/")) {
line = lhs + line.substring(line.lastIndexOf("*/") + 2);
blockCommentFinish();
} else {
line = lhs;
}
}
if (inBlockComment) {
if (!line.contains("*/")) {
return new CommandResult(true);
}
blockCommentFinish();
line = line.substring(line.lastIndexOf("*/") + 2);
}
// We also support inline comments (but only at start of line, otherwise valid
// command options like http://www.helloworld.com will fail as per ROO-517)
if (!inBlockComment && (line.trim().startsWith("//") || 
line.trim().startsWith("#"))) { // # support in ROO-1116
line = "";
}
{code}

The codes above remove the last occurance of "/* xxx \*/" in side a command 
line string. That's why we pass '*/*/*' to pathRegex and finally we will get 
'\*/\*\*'. Moreover, the block comment removal logic above is buggy as in the 
case of '\*/\*/\*' , the begin comment block identifier is '\*/\*(/\*)' as 
quoted in the string, also the end comment block identifier is '\*/(\*/)\*'. 
characters before begin identifier and after end identifier will be kept. 
That's why we get '\*/\*\*'.

Finally, I suggest we should disable erasing block comment in hudi cli command 
line. Unfortunately, Spring shell 1.2.0 does not provide such as configuration 
that can disable block comment processing. Also I tried to use a converter that 
append '/*\*/' to every command string but it did not work, because spring 
shell deals with block comment before invoking converters.


was (Author: paul8263):
Hi all,

After further investigation I found that Spring-shell 1.2.0 deals with block 
comment in command line. The relevant codes are:

org.springframework.shell.core.AbstractShell::executeCommand
{code:java}
// We support simple block comments; ie a single pair per line
if (!inBlockComment && line.contains("/*") && line.contains("*/")) {
blockCommentBegin();
String lhs = line.substring(0, line.lastIndexOf("/*"));
if (line.contains("*/")) {
line = lhs + line.substring(line.lastIndexOf("*/") + 2);
blockCommentFinish();
} else {
line = lhs;
}
}
if (inBlockComment) {
if (!line.contains("*/")) {
return new CommandResult(true);
}
blockCommentFinish();
line = line.substring(line.lastIndexOf("*/") + 2);
}
// We also support inline comments (but only at start of line, otherwise valid
// command options like http://www.helloworld.com will fail as per ROO-517)
if (!inBlockComment && (line.trim().startsWith("//") || 
line.trim().startsWith("#"))) { // # support in ROO-1116
line = "";
}
{code}

The codes above remove the last occurance of "/* xxx */" in side a command line 
string. That's why we pass '*/*/*' to pathRegex and finally we will get '*/**'. 
Moreover, the block comment removal logic above is buggy as in the case of 
'*/*/*' , the begin comment block identifier is '*/*(/*)' as quoted in the 
string, also the end comment block identifier is '*/(*/)*'. characters before 
begin identifier and after end identifier will be kept. That's why we get 
'*/**'.

Finally, I suggest we should disable erasing block comment in hudi cli command 
line. Unfortunately, Spring shell 1.2.0 does not provide such as configuration 
that can disable block comment processing. Also I tried to use a converter that 
append '/**/' to every command string but it did not work, because spring shell 
deals with block comment before invoking converters.

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.12.0
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> **Describe the problem you faced**
> Hudi cli got empty result after running command s

[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-07-28 Thread Yao Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Zhang updated HUDI-4485:

Status: In Progress  (was: Open)

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.12.0
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> **Describe the problem you faced**
> Hudi cli got empty result after running command show fsview all.
> ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png)
> The type of table t1 is  COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> **To Reproduce**
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
>   uuid VARCHAR(20),
>   name VARCHAR(10),
>   age INT,
>   ts TIMESTAMP(3),
>   `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
>   'connector' = 'hudi',
>   'path' = 'hdfs:///path/to/table/',
>   'table.type' = 'COPY_ON_WRITE'
> );
> -- insert data using values
> INSERT INTO t1 VALUES
>   ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
>   ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
>   ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
>   ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
>   ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
>   ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
>   ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
>   ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> **Expected behavior**
> `show fsview all` in Hudi cli should return all file slices.
> **Environment Description**
> * Hudi version : 0.11.1
> * Spark version : 3.1.1
> * Hive version : 3.1.0
> * Hadoop version : 3.1.1
> * Storage (HDFS/S3/GCS..) : HDFS
> * Running on Docker? (yes/no) : no
> **Additional context**
> No.
> **Stacktrace**
> N/A
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-07-28 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572719#comment-17572719
 ] 

Yao Zhang commented on HUDI-4485:
-

Hi all,

After further investigation I found that Spring-shell 1.2.0 deals with block 
comment in command line. The relevant codes are:

org.springframework.shell.core.AbstractShell::executeCommand
{code:java}
// We support simple block comments; ie a single pair per line
if (!inBlockComment && line.contains("/*") && line.contains("*/")) {
blockCommentBegin();
String lhs = line.substring(0, line.lastIndexOf("/*"));
if (line.contains("*/")) {
line = lhs + line.substring(line.lastIndexOf("*/") + 2);
blockCommentFinish();
} else {
line = lhs;
}
}
if (inBlockComment) {
if (!line.contains("*/")) {
return new CommandResult(true);
}
blockCommentFinish();
line = line.substring(line.lastIndexOf("*/") + 2);
}
// We also support inline comments (but only at start of line, otherwise valid
// command options like http://www.helloworld.com will fail as per ROO-517)
if (!inBlockComment && (line.trim().startsWith("//") || 
line.trim().startsWith("#"))) { // # support in ROO-1116
line = "";
}
{code}

The codes above remove the last occurance of "/* xxx */" in side a command line 
string. That's why we pass '*/*/*' to pathRegex and finally we will get '*/**'. 
Moreover, the block comment removal logic above is buggy as in the case of 
'*/*/*' , the begin comment block identifier is '*/*(/*)' as quoted in the 
string, also the end comment block identifier is '*/(*/)*'. characters before 
begin identifier and after end identifier will be kept. That's why we get 
'*/**'.

Finally, I suggest we should disable erasing block comment in hudi cli command 
line. Unfortunately, Spring shell 1.2.0 does not provide such as configuration 
that can disable block comment processing. Also I tried to use a converter that 
append '/**/' to every command string but it did not work, because spring shell 
deals with block comment before invoking converters.

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.12.0
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> **Describe the problem you faced**
> Hudi cli got empty result after running command show fsview all.
> ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png)
> The type of table t1 is  COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> **To Reproduce**
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
>   uuid VARCHAR(20),
>   name VARCHAR(10),
>   age INT,
>   ts TIMESTAMP(3),
>   `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
>   'connector' = 'hudi',
>   'path' = 'hdfs:///path/to/table/',
>   'table.type' = 'COPY_ON_WRITE'
> );
> -- insert data using values
> INSERT INTO t1 VALUES
>   ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
>   ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
>   ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
>   ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
>   ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
>   ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
>   ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
>   ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> **Expected behavior**
> `show fsview all` in Hudi cli should return all file slices.
> **Environment Description**
> * Hudi version : 0.11.1
> * Spark version : 3.1.1
> * Hive version : 3.1.0
> * Hadoop version : 3.1.1
> * Storage (HDFS/S3/GCS..) : HDFS
> * Running on Docker? (yes/no) : no
> **Additional context**
> No.
> **Stacktrace**
> N/A
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-07-26 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571705#comment-17571705
 ] 

Yao Zhang commented on HUDI-4485:
-

This problem is caused by the default value of pathRegex parameter of the 
command 'show fsview all'. The default value is */*/*, which corresponds to the 
folder structure that partitioned by two columns. That is to say, the command 
show fsview all would return empty if it was not partitioned by two columns.

Pensonally I plan to change the default value to */* and enrich the parameter 
explanation.

Correct me if I am wrong. Thanks.

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.12.0
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> **Describe the problem you faced**
> Hudi cli got empty result after running command show fsview all.
> ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png)
> The type of table t1 is  COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> **To Reproduce**
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
>   uuid VARCHAR(20),
>   name VARCHAR(10),
>   age INT,
>   ts TIMESTAMP(3),
>   `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
>   'connector' = 'hudi',
>   'path' = 'hdfs:///path/to/table/',
>   'table.type' = 'COPY_ON_WRITE'
> );
> -- insert data using values
> INSERT INTO t1 VALUES
>   ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
>   ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
>   ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
>   ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
>   ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
>   ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
>   ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
>   ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> **Expected behavior**
> `show fsview all` in Hudi cli should return all file slices.
> **Environment Description**
> * Hudi version : 0.11.1
> * Spark version : 3.1.1
> * Hive version : 3.1.0
> * Hadoop version : 3.1.1
> * Storage (HDFS/S3/GCS..) : HDFS
> * Running on Docker? (yes/no) : no
> **Additional context**
> No.
> **Stacktrace**
> N/A
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-07-26 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571699#comment-17571699
 ] 

Yao Zhang commented on HUDI-4485:
-

Could someone assign this issue to me? Thanks.

> Hudi cli got empty result for command show fsview all
> -
>
> Key: HUDI-4485
> URL: https://issues.apache.org/jira/browse/HUDI-4485
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.11.1
> Environment: Hudi version : 0.11.1
> Spark version : 3.1.1
> Hive version : 3.1.0
> Hadoop version : 3.1.1
>Reporter: Yao Zhang
>Priority: Minor
> Fix For: 0.12.0
>
>
> This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
> fsview all · Issue #6177 · apache/hudi 
> (github.com)|https://github.com/apache/hudi/issues/6177]
> **Describe the problem you faced**
> Hudi cli got empty result after running command show fsview all.
> ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png)
> The type of table t1 is  COW and I am sure that the parquet file is actually 
> generated inside data folder. Also, the parquet files are not damaged as the 
> data could be retrieved correctly by reading as Hudi table or directly 
> reading each parquet file(using Spark).
> **To Reproduce**
> Steps to reproduce the behavior:
> 1. Enter Flink SQL client.
> 2. Execute the SQL and check the data was written successfully.
> ```sql
> CREATE TABLE t1(
>   uuid VARCHAR(20),
>   name VARCHAR(10),
>   age INT,
>   ts TIMESTAMP(3),
>   `partition` VARCHAR(20)
> )
> PARTITIONED BY (`partition`)
> WITH (
>   'connector' = 'hudi',
>   'path' = 'hdfs:///path/to/table/',
>   'table.type' = 'COPY_ON_WRITE'
> );
> -- insert data using values
> INSERT INTO t1 VALUES
>   ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
>   ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
>   ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
>   ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
>   ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
>   ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
>   ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
>   ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
> ```
> 3. Enter Hudi cli and execute `show fsview all`
> **Expected behavior**
> `show fsview all` in Hudi cli should return all file slices.
> **Environment Description**
> * Hudi version : 0.11.1
> * Spark version : 3.1.1
> * Hive version : 3.1.0
> * Hadoop version : 3.1.1
> * Storage (HDFS/S3/GCS..) : HDFS
> * Running on Docker? (yes/no) : no
> **Additional context**
> No.
> **Stacktrace**
> N/A
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4485) Hudi cli got empty result for command show fsview all

2022-07-26 Thread Yao Zhang (Jira)

Yao Zhang created HUDI-4485:
---

 Summary: Hudi cli got empty result for command show fsview all
 Key: HUDI-4485
 URL: https://issues.apache.org/jira/browse/HUDI-4485
 Project: Apache Hudi
  Issue Type: Bug
  Components: cli
Affects Versions: 0.11.1
 Environment: Hudi version : 0.11.1
Spark version : 3.1.1
Hive version : 3.1.0
Hadoop version : 3.1.1
Reporter: Yao Zhang
 Fix For: 0.12.0


This issue is from: [[SUPPORT] Hudi cli got empty result for command show 
fsview all · Issue #6177 · apache/hudi 
(github.com)|https://github.com/apache/hudi/issues/6177]

**Describe the problem you faced**

Hudi cli got empty result after running command show fsview all.

![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png)

The type of table t1 is  COW and I am sure that the parquet file is actually 
generated inside data folder. Also, the parquet files are not damaged as the 
data could be retrieved correctly by reading as Hudi table or directly reading 
each parquet file(using Spark).

**To Reproduce**

Steps to reproduce the behavior:

1. Enter Flink SQL client.
2. Execute the SQL and check the data was written successfully.
```sql
CREATE TABLE t1(
  uuid VARCHAR(20),
  name VARCHAR(10),
  age INT,
  ts TIMESTAMP(3),
  `partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
  'connector' = 'hudi',
  'path' = 'hdfs:///path/to/table/',
  'table.type' = 'COPY_ON_WRITE'
);

-- insert data using values
INSERT INTO t1 VALUES
  ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
  ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
  ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
  ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
  ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
  ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
  ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
  ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
```
3. Enter Hudi cli and execute `show fsview all`

**Expected behavior**

`show fsview all` in Hudi cli should return all file slices.

**Environment Description**

* Hudi version : 0.11.1

* Spark version : 3.1.1

* Hive version : 3.1.0

* Hadoop version : 3.1.1

* Storage (HDFS/S3/GCS..) : HDFS

* Running on Docker? (yes/no) : no


**Additional context**

No.

**Stacktrace**

N/A


 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7315) Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is incomparable.

[jira] [Updated] (HUDI-7315) Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is incomparable.

[jira] [Created] (HUDI-7315) Disable constructing NOT filter predicate when pushing down its wrapped filter unsupported, as its operand's primitive value is uncomparable.

[jira] [Created] (HUDI-7311) Comparing date with date literal in string format causes class cast exception during filter push down

[jira] [Updated] (HUDI-7309) Disable filter pushing down when the parquet type corresponding to its field logical type is not comparable

[jira] [Created] (HUDI-7309) Disable filter pushing down when the parquet type corresponding to its field logical type is not comparable

[jira] [Created] (HUDI-7303) Date field type unexpectedly convert to Long when using date comparison operator

[jira] [Updated] (HUDI-7297) Exception thrown when field type mismatch is ambiguous

[jira] [Created] (HUDI-7297) Exception thrown when field type mismatch is ambiguous

[jira] [Created] (HUDI-6394) Fixed the issue that CreateHoodieTableCommand does not provide detailed exception stack trace

[jira] [Created] (HUDI-5725) Creating Hudi table with misspelled table type in Flink leads to Flink cluster crash

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Created] (HUDI-4765) Compared inserting data via spark-sql with spark-shell,_hoodie_record_key generation logic is different, which might affects data upsert

[jira] [Comment Edited] (HUDI-4718) Hudi cli does not support Kerberized Hadoop cluster

[jira] [Commented] (HUDI-4718) Hudi cli does not support Kerberized Hadoop cluster

[jira] [Created] (HUDI-4718) Hudi cli does not support Kerberized Hadoop cluster

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Commented] (HUDI-4589) "show fsview all" hudi-cli fails for a hudi table written via flink

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Comment Edited] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Comment Edited] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Commented] (HUDI-4485) Hudi cli got empty result for command show fsview all

[jira] [Created] (HUDI-4485) Hudi cli got empty result for command show fsview all

30 matches

Site Navigation

Mail list logo

Footer information