Re: [DatasourceV2] Allowing Partial Writes to DSV2 Tables

Russell Spitzer Wed, 13 May 2020 15:17:41 -0700

I would really appreciate that, I'm probably going to just write a planner
rule for now which matches up my table schema with the query output if they
are valid, and fails analysis otherwise. This approach is how I got
metadata columns in so I believe it would work for writing as well.


On Wed, May 13, 2020 at 5:13 PM Ryan Blue <rb...@netflix.com> wrote:

> I agree with adding a table capability for this. This is something that we
> support in our Spark branch so that users can evolve tables without
> breaking existing ETL jobs -- when you add an optional column, it shouldn't
> fail the existing pipeline writing data to a table. I can contribute the
> changes to validation if people are interested.
>
> On Wed, May 13, 2020 at 2:57 PM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> In DSV1 this was pretty easy to do because of the burden of verification
>> for writes had to be in the datasource, the new setup makes partial writes
>> difficult.
>>
>> resolveOuptutColumns checks the table schema against the writeplan's
>> output and will fail any requests which don't contain every column as
>> specified in the table schema.
>> I would like it if instead if either we made this check optional for a
>> datasource, perhaps an "allow partial writes" trait for the table? Or just
>> allowed analysis
>> to fail on "withInputDataSchema" where an implementer could throw
>> exceptions on underspecified writes.
>>
>>
>> The use case here is that C* (and many other sinks) have mandated columns
>> that must be present during an insert as well as those
>> which are not required.
>>
>> Please let me know if i've misread this,
>>
>> Thanks for your time again,
>> Russ
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [DatasourceV2] Allowing Partial Writes to DSV2 Tables

Reply via email to