I think discussions are going on in the PR itself.. Please chime in there as well if this suits you.
On Sat, May 2, 2020 at 6:01 AM Shiyan Xu <xu.shiyan.raym...@gmail.com> wrote: > Hi all, > > In case of reading schema-inferable source like parquet, when no new data > is found, then, if i understand correctly, no schema can be inferred, and > need not to be. > > Seeing this > method org.apache.hudi.utilities.sources.InputBatch#getSchemaProvider > requiring non-null schemaProvider, and > org.apache.hudi.utilities.deltastreamer.DeltaSync#readFromSource calling > getSchemaProvider() for all cases, including the no-new-data case, > exception will be thrown asking to set schema provider, for even reading > from schema-inferable parquet source. I think this is not an ideal case. > > I had a short draft PR to accept null schema provider in case of no new > data > https://github.com/apache/incubator-hudi/pull/1584/files > I actually prefer another approach of returning Option<SchemaProvider> > getSchemaProvider() > > In case I have misunderstand the logic or use case, I'd like to ask for > some feedback on this change. > > Thank you. > > Regards, > Raymond >