ACCEPT_ANY_SCHEMA isn't a good way to solve the problem because you often want at least some checking in Spark to validate the rows match. It's a good way to be unblocked, but not a long-term solution.
On Thu, May 14, 2020 at 4:57 AM Russell Spitzer <russell.spit...@gmail.com> wrote: > Yeah! That is working for me. Thanks! > > On Thu, May 14, 2020 at 12:10 AM Wenchen Fan <cloud0...@gmail.com> wrote: > >> I think we already have this table capacity: ACCEPT_ANY_SCHEMA. Can you >> try that? >> >> On Thu, May 14, 2020 at 6:17 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> I would really appreciate that, I'm probably going to just write a >>> planner rule for now which matches up my table schema with the query output >>> if they are valid, and fails analysis otherwise. This approach is how I got >>> metadata columns in so I believe it would work for writing as well. >>> >>> On Wed, May 13, 2020 at 5:13 PM Ryan Blue <rb...@netflix.com> wrote: >>> >>>> I agree with adding a table capability for this. This is something that >>>> we support in our Spark branch so that users can evolve tables without >>>> breaking existing ETL jobs -- when you add an optional column, it shouldn't >>>> fail the existing pipeline writing data to a table. I can contribute the >>>> changes to validation if people are interested. >>>> >>>> On Wed, May 13, 2020 at 2:57 PM Russell Spitzer < >>>> russell.spit...@gmail.com> wrote: >>>> >>>>> In DSV1 this was pretty easy to do because of the burden of >>>>> verification for writes had to be in the datasource, the new setup makes >>>>> partial writes difficult. >>>>> >>>>> resolveOuptutColumns checks the table schema against the writeplan's >>>>> output and will fail any requests which don't contain every column as >>>>> specified in the table schema. >>>>> I would like it if instead if either we made this check optional for a >>>>> datasource, perhaps an "allow partial writes" trait for the table? Or just >>>>> allowed analysis >>>>> to fail on "withInputDataSchema" where an implementer could throw >>>>> exceptions on underspecified writes. >>>>> >>>>> >>>>> The use case here is that C* (and many other sinks) have mandated >>>>> columns that must be present during an insert as well as those >>>>> which are not required. >>>>> >>>>> Please let me know if i've misread this, >>>>> >>>>> Thanks for your time again, >>>>> Russ >>>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> -- Ryan Blue Software Engineer Netflix