I think discussions are going on in the PR itself..
Please chime in there as well if this suits you.

On Sat, May 2, 2020 at 6:01 AM Shiyan Xu <xu.shiyan.raym...@gmail.com>
wrote:

> Hi all,
>
> In case of reading schema-inferable source like parquet, when no new data
> is found, then, if i understand correctly, no schema can be inferred, and
> need not to be.
>
> Seeing this
> method org.apache.hudi.utilities.sources.InputBatch#getSchemaProvider
> requiring non-null schemaProvider, and
> org.apache.hudi.utilities.deltastreamer.DeltaSync#readFromSource calling
> getSchemaProvider() for all cases, including the no-new-data case,
> exception will be thrown asking to set schema provider, for even reading
> from schema-inferable parquet source. I think this is not an ideal case.
>
> I had a short draft PR to accept null schema provider in case of no new
> data
> https://github.com/apache/incubator-hudi/pull/1584/files
> I actually prefer another approach of returning Option<SchemaProvider>
> getSchemaProvider()
>
> In case I have misunderstand the logic or use case, I'd like to ask for
> some feedback on this change.
>
> Thank you.
>
> Regards,
> Raymond
>

Reply via email to