[ https://issues.apache.org/jira/browse/HUDI-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Balaji Varadarajan updated HUDI-1157: ------------------------------------- Parent: HUDI-1265 (was: HUDI-242) > Optimization whether to query Bootstrapped table using > HoodieBootstrapRelation vs Sparks Parquet datasource > ----------------------------------------------------------------------------------------------------------- > > Key: HUDI-1157 > URL: https://issues.apache.org/jira/browse/HUDI-1157 > Project: Apache Hudi > Issue Type: Sub-task > Components: bootstrap > Reporter: Udit Mehrotra > Priority: Major > > This has been discussed in > [https://github.com/apache/hudi/pull/1702#discussion_r466317612] > As of now, while querying using *DataSource* we are checking if the table has > been bootstrapped by the present of *bootstrap base path* in > *hoodie.properties* file, and based on that query the table using > *HoodieBootstrapRelation* vs *Spark Parquet Data Source*. However, there > could be a scenario where all the files in the originally bootstrapped table > have wither been *upserted/deleted* and thus have been fully bootstrapped and > their data has been moved over to the target hoodie table. For such tables, > we can start querying them using *Spark Parquet Data Source* which will be > faster with all of spark's optimizations. > So, basically we a need a way to check if all of the files have been fully > bootstrapped and moved over to the target location. -- This message was sent by Atlassian Jira (v8.3.4#803005)