I saw the talk on Spark data sources and looking at the interfaces, it seems that the schema needs to be provided upfront. This works for many data sources but I have a situation in which I would need to integrate a system that supports schema evolutions by allowing users to change schema without affecting existing rows. Basically, each row contains a schema hint (id and version) and this allows developers to evolve schema over time and perform migration at will. Since the schema needs to be specified upfront in the data source API, one possible way would be to build a union of all schema versions and handle populating row values appropriately. This works in case columns have been added or deleted in the schema but doesn't work if types have changed. I was wondering if it is possible to change the API to provide schema for each row instead of expecting data source to provide schema upfront?
Thanks, Aniket