Here's a patch with the validation idea: 
https://github.com/apache/druid/pull/15920

It adds validation for the most problematic case (mixing strings and arrays), 
provides a way to override the validation, and makes the warning log on the 
controller task when arrayIngestMode is 'mvd' more friendly and explanatory.

Depending on which direction you're going in, the errors look like either:

  Cannot write into field[flags] using type[VARCHAR ARRAY] and 
arrayIngestMode[mvd], since the existing type is[VARCHAR ARRAY]. Try setting 
arrayIngestMode to[array] to retain the SQL type[VARCHAR ARRAY]

Or:

  Cannot write into field[flags] using type[VARCHAR ARRAY] and 
arrayIngestMode[array], since the existing type is[VARCHAR]. Try wrapping this 
field using ARRAY_TO_MV(...) AS "flags"

The "try" language pushes people towards the behavior we'd like for the future: 
using arrayIngestMode[array] and wrapping MVDs in ARRAY_TO_MV.

I'm changing my vote to a plain 0, given that _most_ of the changes related to 
arrayIngestMode went out in Druid 28. However I do think it would be nice to 
get a patch like this in, given that the Druid 29 web console is pushing more 
people to change their arrayIngestMode.

Gian

On 2024/02/16 22:24:23 Gian Merlino wrote:
> I just learned that arrayIngestMode is not actually new, just
> https://github.com/apache/druid/pull/15588 is. However this will still make
> it more likely that people accidentally break their tables, so I am still
> -0. Just, slightly less so. I still think it would be a good idea, for
> Druid 29, to add string-to-array type validation to Druid 29's INSERT /
> REPLACE handling to compensate for the new web console support for
> arrayIngestMode, and the UI changes to push people towards setting it to
> "array".
> 
> I could be convinced that it's ok to do that in a 29.0.1. I don't think it
> should wait for 30, given the impact that can happen if people end up with
> mixed types without planning for it.
> 
> On Fri, Feb 16, 2024 at 2:16 PM Gian Merlino <g...@apache.org> wrote:
> 
> > Thanks for managing this release!
> >
> > My vote is -0, let me explain why. I am concerned about usability issues
> > with the new arrayIngestMode feature. There are various issues when mixing
> > MVD strings and string arrays in the same column: as soon as arrays show up
> > in a column, various "classic MVD-style" queries will fail at validation or
> > at runtime. I believe that the new feature, and especially the changes to
> > the web console in https://github.com/apache/druid/pull/15588, will make
> > it more likely that people will do this by accident and experience
> > brokenness.
> >
> > When this occurs, there is not an easy way to fix it; data needs to be
> > reingested or queries need to be adjusted. I believe that in some cases,
> > queries can't be adjusted without suffering from performance loss.
> >
> > This is something that could happen even before Druid 29, if you did a
> > Kafka ingest with auto types or useSchemaDiscovery, followed by a SQL
> > REPLACE from that table into itself. In that case, arrays written by Kafka
> > ingest would get rewritten as MVDs. But with this Druid 29 RC, there are
> > additional pathways created that enable people to get into this problematic
> > scenario with increased likelihood. I suggest we make some adjustments that
> > prevent it, such as:
> >
> > - Validating that SQL INSERT and SQL REPLACE do not insert an array type
> > into a column that previously contained strings, or vice versa.
> > - Making it obvious in the web console whether a tab is in
> > "arrayIngestMode: array" or "arrayIngestMode: mvd" or "server default".
> >
> > Thank you for considering this viewpoint.
> >
> > Gian
> >
> > On Tue, Feb 13, 2024 at 4:34 AM Laksh Singla <lakshsin...@apache.org>
> > wrote:
> >
> >> Hi all,
> >>
> >> I have created a build for Apache Druid 29.0.0, release
> >> candidate 1.
> >>
> >> Thanks to everyone who has helped contribute to the release! You can read
> >> the proposed release notes here:
> >> https://github.com/apache/druid/issues/15896
> >>
> >> The release candidate has been tagged in GitHub as
> >> druid-29.0.0-rc1 (869bd3978f0c835ef8eb7c1f25c468e23472a81b),
> >> available here:
> >> https://github.com/apache/druid/tree/druid-29.0.0-rc1
> >>
> >> The artifacts to be voted on are located here:
> >> https://dist.apache.org/repos/dist/dev/druid/29.0.0-rc1/
> >>
> >> A staged Maven repository is available for review at:
> >> https://repository.apache.org/content/repositories/orgapachedruid-1061/
> >>
> >> Staged druid.apache.org website documentation is available here:
> >> https://druid.staged.apache.org/docs/29.0.0/design/
> >>
> >> A Docker image containing the binary of the release candidate can be
> >> retrieved via:
> >> docker pull apache/druid:29.0.0-rc1
> >>
> >> artifact checksums
> >> src:
> >>
> >> 1948fab4500f3571591f887a638631b6a05f040b88b35004406ca852e16884e5d2269a74e8d07c79c477d8b02a2489efec85776f8ef46f4e8defedce4efb9931
> >> bin:
> >>
> >> f2a11ddc71b59a648d01d7c220a6fae527c34d702d4f1e7aed954803a3576543c4cd7149f34e8948eccf89240435f09a8512db83b46991ac6e354eeba8cbada4
> >> docker: 2bddcf692f2137dc4094908b63d2043423fca895ba0f479f0db34ec8016c4472
> >>
> >> Release artifacts are signed with the following key:
> >> https://people.apache.org/keys/committer/lakshsingla.asc
> >>
> >> This key and the key of other committers can also be found in the
> >> project's
> >> KEYS file here:
> >> https://dist.apache.org/repos/dist/release/druid/KEYS
> >>
> >> (If you are a committer, please feel free to add your own key to that file
> >> by following the instructions in the file's header.)
> >>
> >>
> >> Verify checksums:
> >> diff <(shasum -a512 apache-druid-29.0.0-src.tar.gz | \
> >> cut -d ' ' -f1) \
> >> <(cat apache-druid-29.0.0-src.tar.gz.sha512 ; echo)
> >>
> >> diff <(shasum -a512 apache-druid-29.0.0-bin.tar.gz | \
> >> cut -d ' ' -f1) \
> >> <(cat apache-druid-29.0.0-bin.tar.gz.sha512 ; echo)
> >>
> >> Verify signatures:
> >> gpg --verify apache-druid-29.0.0-src.tar.gz.asc \
> >> apache-druid-29.0.0-src.tar.gz
> >>
> >> gpg --verify apache-druid-29.0.0-bin.tar.gz.asc \
> >> apache-druid-29.0.0-bin.tar.gz
> >>
> >> Please review the proposed artifacts and vote. Note that Apache has
> >> specific requirements that must be met before +1 binding votes can be cast
> >> by PMC members. Please refer to the policy at
> >> http://www.apache.org/legal/release-policy.html#policy for more details.
> >>
> >> As part of the validation process, the release artifacts can be generated
> >> from source by running:
> >> mvn clean install -Papache-release,dist -Dgpg.skip
> >>
> >> The RAT license check can be run from source by:
> >> mvn apache-rat:check -Prat
> >>
> >> This vote will be open for at least 72 hours. The vote will pass if a
> >> majority of at least three +1 PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Druid 29.0.0
> >> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> >> [ ] -1 Do not release this package because...
> >>
> >> Thanks!
> >>
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to