[jira] [Updated] (HUDI-5557) Wrong candidate files found in metadata table

Alexey Kudinkin (Jira) Fri, 17 Feb 2023 09:10:07 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Kudinkin updated HUDI-5557:
----------------------------------
    Priority: Blocker  (was: Critical)

> Wrong candidate files found in metadata table 
> ----------------------------------------------
>
>                 Key: HUDI-5557
>                 URL: https://issues.apache.org/jira/browse/HUDI-5557
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata, spark-sql
>    Affects Versions: 0.12.1
>            Reporter: ruofan
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.13.1
>
>
> Suppose the hudi table has five fields, but only two fields are indexed. When 
> part of the filter condition in SQL comes from index fields and the other 
> part comes from non-index fields, the candidate files queried from the 
> metadata table are wrong.
> For example following hudi table schema
> {code:java}
> name: varchar(128)
> age: int
> addr: varchar(128)
> city: varchar(32)
> job: varchar(32) {code}
> table properties
> {code:java}
> hoodie.table.type=MERGE_ON_READ
> hoodie.metadata.enable=true
> hoodie.metadata.index.column.stats.enable=true
> hoodie.metadata.index.column.stats.column.list='name,city'
> hoodie.enable.data.skipping=true {code}
> sql
> {code:java}
> select * from hudi_table where name='tom' and age=18;  {code}
> if we set hoodie.enable.data.skipping=false, the data can be found. But if we 
> set hoodie.enable.data.skipping=true, we can't find the expected data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5557) Wrong candidate files found in metadata table

Reply via email to