Re: [DatasourceV2] Allowing Partial Writes to DSV2 Tables

2020-05-14 Thread Ryan Blue
ACCEPT_ANY_SCHEMA isn't a good way to solve the problem because you often want at least some checking in Spark to validate the rows match. It's a good way to be unblocked, but not a long-term solution. On Thu, May 14, 2020 at 4:57 AM Russell Spitzer wrote: > Yeah! That is working for me.

Re: [DatasourceV2] Allowing Partial Writes to DSV2 Tables

2020-05-14 Thread Russell Spitzer
Yeah! That is working for me. Thanks! On Thu, May 14, 2020 at 12:10 AM Wenchen Fan wrote: > I think we already have this table capacity: ACCEPT_ANY_SCHEMA. Can you > try that? > > On Thu, May 14, 2020 at 6:17 AM Russell Spitzer > wrote: > >> I would really appreciate that, I'm probably going

Re: [DatasourceV2] Allowing Partial Writes to DSV2 Tables

2020-05-13 Thread Wenchen Fan
I think we already have this table capacity: ACCEPT_ANY_SCHEMA. Can you try that? On Thu, May 14, 2020 at 6:17 AM Russell Spitzer wrote: > I would really appreciate that, I'm probably going to just write a planner > rule for now which matches up my table schema with the query output if they >

Re: [DatasourceV2] Allowing Partial Writes to DSV2 Tables

2020-05-13 Thread Russell Spitzer
I would really appreciate that, I'm probably going to just write a planner rule for now which matches up my table schema with the query output if they are valid, and fails analysis otherwise. This approach is how I got metadata columns in so I believe it would work for writing as well. On Wed,

Re: [DatasourceV2] Allowing Partial Writes to DSV2 Tables

2020-05-13 Thread Ryan Blue
I agree with adding a table capability for this. This is something that we support in our Spark branch so that users can evolve tables without breaking existing ETL jobs -- when you add an optional column, it shouldn't fail the existing pipeline writing data to a table. I can contribute the

[DatasourceV2] Allowing Partial Writes to DSV2 Tables

2020-05-13 Thread Russell Spitzer
In DSV1 this was pretty easy to do because of the burden of verification for writes had to be in the datasource, the new setup makes partial writes difficult. resolveOuptutColumns checks the table schema against the writeplan's output and will fail any requests which don't contain every column as