ACCEPT_ANY_SCHEMA isn't a good way to solve the problem because you often
want at least some checking in Spark to validate the rows match. It's a
good way to be unblocked, but not a long-term solution.

On Thu, May 14, 2020 at 4:57 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> Yeah! That is working for me. Thanks!
>
> On Thu, May 14, 2020 at 12:10 AM Wenchen Fan <cloud0...@gmail.com> wrote:
>
>> I think we already have this table capacity: ACCEPT_ANY_SCHEMA. Can you
>> try that?
>>
>> On Thu, May 14, 2020 at 6:17 AM Russell Spitzer <
>> russell.spit...@gmail.com> wrote:
>>
>>> I would really appreciate that, I'm probably going to just write a
>>> planner rule for now which matches up my table schema with the query output
>>> if they are valid, and fails analysis otherwise. This approach is how I got
>>> metadata columns in so I believe it would work for writing as well.
>>>
>>> On Wed, May 13, 2020 at 5:13 PM Ryan Blue <rb...@netflix.com> wrote:
>>>
>>>> I agree with adding a table capability for this. This is something that
>>>> we support in our Spark branch so that users can evolve tables without
>>>> breaking existing ETL jobs -- when you add an optional column, it shouldn't
>>>> fail the existing pipeline writing data to a table. I can contribute the
>>>> changes to validation if people are interested.
>>>>
>>>> On Wed, May 13, 2020 at 2:57 PM Russell Spitzer <
>>>> russell.spit...@gmail.com> wrote:
>>>>
>>>>> In DSV1 this was pretty easy to do because of the burden of
>>>>> verification for writes had to be in the datasource, the new setup makes
>>>>> partial writes difficult.
>>>>>
>>>>> resolveOuptutColumns checks the table schema against the writeplan's
>>>>> output and will fail any requests which don't contain every column as
>>>>> specified in the table schema.
>>>>> I would like it if instead if either we made this check optional for a
>>>>> datasource, perhaps an "allow partial writes" trait for the table? Or just
>>>>> allowed analysis
>>>>> to fail on "withInputDataSchema" where an implementer could throw
>>>>> exceptions on underspecified writes.
>>>>>
>>>>>
>>>>> The use case here is that C* (and many other sinks) have mandated
>>>>> columns that must be present during an insert as well as those
>>>>> which are not required.
>>>>>
>>>>> Please let me know if i've misread this,
>>>>>
>>>>> Thanks for your time again,
>>>>> Russ
>>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to