Re: [DISCUSS] Handle libicu upgrades better

Nick Vatamaniuc Thu, 13 Jan 2022 08:38:33 -0800

Hi Ronny,

If it makes it easier to build on some platforms it could make sense.
Or find some way for both of them to point to a single libicu library.


On some OSes (ex. Linux distros), dynamically linking to a system
libicu also makes sense because it's often the easiest way to get
security updates. libicu has had quite a number of high risk CVEs over
the years [1]

[1] https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=icu

-Nick

On Wed, Jan 12, 2022 at 2:47 PM Ronny Berndt
<ro...@kioskkinder.com.invalid> wrote:
>
> Hi,
>
> to prevent different versions of the ICU libs, why don’t use the shipped 
> version
> of the libs from the spidermonkey tree (use only esr versions) and link 
> against those in the build process
> and don’t rely on the system version?
>
> The windows version of CouchDB isn’t available for the actual version and the 
> build process for this
> os stucks at the moment. Maybe it is a broader discussion and maybe it is a 
> good idea to combine
> this with the erlang version update process ([DISCUSS] Erlang version update 
> process for convenience binaries).
>
> - Ronny
>
> > Am 12.01.2022 um 16:31 schrieb Will Young <lostnetwork...@gmail.com>:
> >
> > Hi Nick,
> >
> >  I like the way this breaks down the problem into something that can
> > work with the existing maintenance mechanisms. On the UCA version it
> > looks to me like the major version tracks the last unicode version
> > that had a collation change (version 9.0?), while the ICU version is
> > changing with each release which would be more frequent than actual
> > collation changes. Looking at the ICU release notes I get the
> > impression that the frequency of change may inbetween because of bug
> > fixes or additions to unicode that directly get a differing order in
> > the root collation. I.e. ICU 54 seems like a clean match of UCA
> > version and collation change while it seems like 59 could have changed
> > some emoji sort orders that may already have been reflected in 58's
> > UCA version?
> >
> > Another question I have about ICU synchronization is spidermonkey's
> > use of ICU. Since all build instructions keep erlang and mozjs'
> > linking to the same system ICU, I think there could never be a need to
> > record an ICU related version from the query server, but I've never
> > seen instructions to set locales in relation the query server or do
> > anything to ensure a function is using the root collator, so I don't
> > think the build setup reflects an actual need for spidermonkey to be
> > truly in sync on aspects of icu like collation setup and everything
> > important is happening in the erlang/nifs?
> > Thanks,
> > -Will
> >
> > Am Di., 11. Jan. 2022 um 19:06 Uhr schrieb Nick Vatamaniuc 
> > <vatam...@gmail.com>:
> >>
> >> I threw together a draft PR https://github.com/apache/couchdb/pull/3889
> >>
> >> Would that work? There are two tricks there - re-using a field
> >> position from an older <2.3.1 format, this should allow transparently
> >> downgrading back to 3.2.1 as we ignore that field there. Also, used a
> >> map  so it should allow adding extra info to the views in the future
> >> (custom collation tailorings?).
> >>
> >> Thanks,
> >> -Nick
> >>
> >> On Sat, Nov 20, 2021 at 12:32 PM Nick Vatamaniuc <vatam...@gmail.com> 
> >> wrote:
> >>>
> >>> Thanks, Adam. And thanks for the tip about the view header, Bob.
> >>>
> >>> Wonder if a disk version would make sense for views. Noticed Eric did
> >>> a nice job transparently migrating 2.x -> 3.x view files when we
> >>> removed key seq indices. Perhaps something like that would work for
> >>> adding a collator version.
> >>>
> >>> Cheers,
> >>> -Nick
> >>>
> >>> On Fri, Nov 19, 2021 at 9:09 AM Adam Kocoloski <kocol...@apache.org> 
> >>> wrote:
> >>>>
> >>>> That seems like a smart solution Nick.
> >>>>
> >>>> Adam
> >>>>
> >>>>> On Nov 19, 2021, at 7:28 AM, Robert Newson <b...@rsn.io> wrote:
> >>>>>
> >>>>> Noting that the upgrade channel for views was misconceived (by me) as 
> >>>>> there is no version number in the header for them. You’d need to add it.
> >>>>>
> >>>>> B.
> >>>>>
> >>>>>> On 18 Nov 2021, at 07:12, Nick Vatamaniuc <vatam...@gmail.com> wrote:
> >>>>>>
> >>>>>> Thinking more about this issue I wonder if we can avoid resetting and
> >>>>>> rebuilding everything from scratch, and instead, let the upgrade
> >>>>>> happen in the background, while still serving the existing view data.
> >>>>>>
> >>>>>> The realization was that collation doesn't affect the emitted keys and
> >>>>>> values themselves, only their order in the view b-trees. That means
> >>>>>> we'd just have to rebuild b-trees, and that is exactly what our view
> >>>>>> compactor already does.
> >>>>>>
> >>>>>> When we detect a libicu version discrepancy we'd submit the view for
> >>>>>> compaction. We even have a dedicated "upgrade" [1] channel in smoosh
> >>>>>> which handles file version format upgrades, but we'll tweak that logic
> >>>>>> to trigger on libicu version mismatches as well.
> >>>>>>
> >>>>>> Would this work? Does anyone see any issue with that approach?
> >>>>>>
> >>>>>> [1] 
> >>>>>> https://github.com/apache/couchdb/blob/3.x/src/smoosh/src/smoosh_server.erl#L435-L442
> >>>>>>
> >>>>>> Cheers,
> >>>>>> -Nick
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Fri, Oct 29, 2021 at 7:01 PM Nick Vatamaniuc <vatam...@apache.org> 
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Hello everyone,
> >>>>>>>
> >>>>>>> CouchDB by default uses the libicu library to sort its view rows.
> >>>>>>> When views are built, we do not record or track the version of the
> >>>>>>> collation algorithm. The issue is that the ICU library may modify the
> >>>>>>> collation order between major libicu versions, and when that happens,
> >>>>>>> views built with the older versions may experience data loss. I wanted
> >>>>>>> to discuss the option to record the libicu collator version in each
> >>>>>>> view then warn the user when there is a mismatch. Also, optionally
> >>>>>>> ignore the mismatch, or automatically rebuild the views.
> >>>>>>>
> >>>>>>> Imagine, for example, searching patient records using start/end keys.
> >>>>>>> It could be possible that, say, the first letter of their name now
> >>>>>>> collates differently in a new libicu. That would prevent the patient
> >>>>>>> record from showing up in the view results for some important
> >>>>>>> procedure or medication. Users might not even be aware of this kind of
> >>>>>>> data loss occurring, there won't be any error in the API or warning in
> >>>>>>> the logs.
> >>>>>>>
> >>>>>>> I was thinking how to solve this. There were a few commits already to
> >>>>>>> cleanup our collation drivers [1], expose libicu and collation
> >>>>>>> algorithm version in the new _versions endpoint [2], and some other
> >>>>>>> minor fixes in that area. As the next steps we could:
> >>>>>>>
> >>>>>>> 1) Modify our views to keep track of the collation algorithm
> >>>>>>> version. We could attempt to transparently upgrade the view header
> >>>>>>> format -- read the old view file, update the header with an extra
> >>>>>>> libicu collation version field, that updates the signature, and then,
> >>>>>>> save the file with the new header and new signature. This avoids view
> >>>>>>> rebuilds, just records the collator version in the view and moves the
> >>>>>>> files to a new name.
> >>>>>>>
> >>>>>>> 2) Do what PostgreSQL does, and 2a) emit a warning with the view
> >>>>>>> results when the current libicu version doesn't match the version in
> >>>>>>> the view [3]. That means altering the view results to add a "warning":
> >>>>>>> "..." field. Another alternative 2b) is emit a warning in the
> >>>>>>> _design/$ddoc/_info only. Users would have to know that after an OS
> >>>>>>> version upgrade, or restoring backups, to make sure to look at their
> >>>>>>> _design/$ddoc/_info for each db for each ddoc. Of course, there may be
> >>>>>>> users which used the "raw" collation option, or know they are using
> >>>>>>> just the plain ASCII character sets in their views. So we'd have a
> >>>>>>> configuration setting to ignore the warnings as well.
> >>>>>>>
> >>>>>>> 3) Users who see the warning, could then either rebuild the view
> >>>>>>> with the new collator library manually, or it could happen
> >>>>>>> automatically based on a configuration option, basically "when
> >>>>>>> collator versions are miss-matched, invalidate and rebuild all the
> >>>>>>> views".
> >>>>>>>
> >>>>>>> 4) We'd have a way for the users to assert (POST a ddoc update) that
> >>>>>>> they double-checked the new ICU version and are convinced that a
> >>>>>>> particular view would not experience data loss with the new collator.
> >>>>>>> That should make the warning go away, and the view to not be rebuilt.
> >>>>>>> This can't be just a naive "collator" option setting as both per-view
> >>>>>>> and per-design options are used when computing the view signature, and
> >>>>>>> any changes there would result in the view being rebuilt. Perhaps we
> >>>>>>> can add it to the design docs as a separate option which is excluded
> >>>>>>> from the signature hash, like the "autoupdate" setting for background
> >>>>>>> index builder ("collation_version_accept"?). PostgreSQL also offers
> >>>>>>> this option with the ALTER COLLATION ... REFRESH VERSION command [3]
> >>>>>>>
> >>>>>>> What do we think, is this a reasonable approach? Is there something
> >>>>>>> easier / simpler we can do?
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> -Nick
> >>>>>>>
> >>>>>>> [1] 
> >>>>>>> https://github.com/apache/couchdb/pull/3746/commits/28f26f52fe2e170d98658311dafa8198d96b8061
> >>>>>>> [2] 
> >>>>>>> https://github.com/apache/couchdb/commit/c1bb4e4856edd93255d75ebe158b4da38bbf3333
> >>>>>>> [3] https://www.postgresql.org/docs/13/sql-altercollation.html
> >>>>>
> >>>>
>

Re: [DISCUSS] Handle libicu upgrades better

Reply via email to