[GitHub] [hudi] liqiquan opened a new issue, #6511: [SUPPORT]

GitBox Fri, 26 Aug 2022 02:12:19 -0700


liqiquan opened a new issue, #6511:
URL: https://github.com/apache/hudi/issues/6511


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Using insert_overwrite_table mode, presto reads and returns data from all 
versions of parquet files
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.Use the insert_overwrite_table mode to write the hudi table, at least twice
   2. Presto reads the table in step 1. If the catalog is hudi, reading the 
hudi table is normal
   3.Presto reads the table in step 1. If the catalog is hive, the version 
cannot be distinguished when reading the hudi table, and the data of all 
versions of parquet files will be read.
   
   For example, I write twice, each time I write 100 pieces of data. When using 
presto to read, it should read 100 pieces of data of the latest version, but 
actually all 200 pieces of data will be read.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version : 3.2.2
   
   * Hive version : 2.7.3
   
   * Hadoop version :3.3.2
   
   * Presto version：0.275
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] liqiquan opened a new issue, #6511: [SUPPORT]

Reply via email to