I agree with adding a table capability for this. This is something that we support in our Spark branch so that users can evolve tables without breaking existing ETL jobs -- when you add an optional column, it shouldn't fail the existing pipeline writing data to a table. I can contribute the changes to validation if people are interested.
On Wed, May 13, 2020 at 2:57 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > In DSV1 this was pretty easy to do because of the burden of verification > for writes had to be in the datasource, the new setup makes partial writes > difficult. > > resolveOuptutColumns checks the table schema against the writeplan's > output and will fail any requests which don't contain every column as > specified in the table schema. > I would like it if instead if either we made this check optional for a > datasource, perhaps an "allow partial writes" trait for the table? Or just > allowed analysis > to fail on "withInputDataSchema" where an implementer could throw > exceptions on underspecified writes. > > > The use case here is that C* (and many other sinks) have mandated columns > that must be present during an insert as well as those > which are not required. > > Please let me know if i've misread this, > > Thanks for your time again, > Russ > -- Ryan Blue Software Engineer Netflix