On 19/06/2020 13:44, Boris Kolpackov wrote:

Hi Roger,

Thanks for getting the ball rolling. See my comments below.

Roger Leigh <rle...@codelibre.net> writes:

One of the issues I encountered was difficulty in building on modern
platforms, Windows in particular, which was the impetus for developing
the CMake build now incorporated officially in the Xerces-C 3.2.x
releases.
I would suggest we drop support for autotools in 4.0.0. I personally
view both (autotools and CMake) as pretty bad and I don't see a reason
to maintain two bad options.

I wouldn't object to this.  It halves the testing required on Unix platforms.  I can't say I adore CMake; it's a big improvement over the autotools but still not as nice as I would like, but it serves its purpose, and my use of it is primarily pragmatic (I have submitted a fair few upstream contributions though, including FindXercesC and FindXalanC).

These are too deeply entrenched to consider removing at this point, but
C++11 brings native char16_t and char32_t character types which could
alleviate some of the problems.
Agree. Do you know what's the story with char16_t vs wchar_t on Windows?
Specifically, will (Windows-only) codebases that pass L""-strings to
Xerces-C++ API need to be changed?

They are not directly compatible, but a simple typecast is sufficient to convert them.  We already do this in the Win32 transcoder if I recall correctly.

I'm not entirely sure how to class code written using L"".  It's not really portable, being Windows-only as you say (Windows being the only platform where wchar_t is 16-bit and usable as XMLCh). And it's not strictly portable even to different builds of Xerces-C, given that XMLCh is configurable and that wchar_t isn't even the default (char16_t is the default for C++11 and above).

As a result, I think that for any users who are using L"", they would have three options:

1. Replace L"" with u"".  This would work with Xerces-C 4.0 and C++11, but would not be backward compatible with older Xerces-C versions.

2. Add static_cast<const XMLCh *>() around L"" strings when passing them to Xerces-C.  This is portable and works even with older Xerces versions, so would be useful as a transitional step for codebases which want to support 4.0 and 3.2 and earlier.

3. Add static_cast<const XMLCh *>() around u"" strings when passing them to Xerces-C.  As for (2) this provides the same portability to old and new versions (it's a no-op when XMLCh is char16_t).  This might be a better option when you want to use C++11 but you also need to support 3.2 and earlier.

So long as we had these clearly documented, I think that would provide reasonable guidance and an effective means to transition while retaining compatibility with older versions.

Personally, I'd go with (3) for my own projects which are already using C++11 and UTF-8-encoded source files and then switch to (1) once all the target platforms are using Xerces-C 4.x.

* having exceptions derive from std::runtime_error; existing types can
remain compatible with wide strings

* support use of streams, including stringstreams, directly with Xerces e.g.
InputSource, perhaps as a set of adaptors

* where the C++ language and standard replace functionality in Xerces, it
would be worth considering replacement where there is a benefit; language
thread support might come under this category
Sounds great! If we require C++11 (or later), I see no reason not to
switch to std::thread & friends.

I think here we could also look to Xalan-C for examples.  It has historically made use of C++98 features including some support for strings, streams etc., and we might be able to directly copy some of its implementation choices (providing they make sense).

One of the nice aspects of Xerces-C is that it compiles like greased lightning due to not making heavy use of standard library headers.  It would be nice to retain that if possible.  We could possibly constrain use of streams to specific modules, for example.

On the maintainability side, I'd like to reduce the number of configuration
options to keep testing and support within reason.
Agree. One thing that I would like to keep is the ability to build
Xerces-C++ without any third-party library dependencies.
Yes, I think that point is well made, particularly when it comes to the network accessors, transcoders and the like.  There should be a self-contained option or none at all as possibilities.
* We have three message loaders and three sets of translations for en_US,
but no other translations. Some or all of them might be worth thinking
about dropping given the complete lack of utility these provide.
To me keeping only ICU and inmemory sounds like the way to go.
Agreed.
* We have several network accessors. But with the modern push for using
HTTPS everywhere, should Xerces be providing its own or should we simply
require CURL or platform-specific functionality?
Yes, I believe there should be only two options: no network support and
CURL.

There are also several transcoder options. Again, I think we should
only keep the built-in stuff and ICU.
These both make sense.  For the networking side, I think we could also argue the case for the MacOS accessor as well (cfurl), so long as it supports SSL.  The ones I think we should drop are the two direct socket implementations (unix socket and win32 socket).
Finally, I should note that while the above might look quite disruptive, I'm
not suggesting any sort of API breakage at this point.
I agree. I think we should be mindful of migration efforts that will be
required on the user's side.

Absolutely.  Many users of Xerces-C have well established codebases, and we should take care not to break them.  I think every change suggested here should be possible without making any API break.

The only possible source of breakage is the XMLCh switch, and that's not really a break at all given how flexible that type is today: portable code already handles it being of varying type.

Another thing worth discussing is which C++ standard we should target. I
think at a minimum C++11 but perhaps we should be bold and aim a bit
higher?

In fact, IMO, talking about targeting a C++ standard like C++11 or C++14
is not very useful since every major C++ compiler (GCC, Clang, MSVC)
completes support for the next standard over multiple releases. As
a result, while compilers may not have complete support, they often
include a perfectly usable subset of the features.

In this light, what we found more useful is to specify the minimum
versions of the three major compilers that we are willing to support
and any features that are available in all three are fair game.

I would consider C++11 a minimum, but I would prefer C++14 as the baseline if possible.  In practice, most compilers supporting C++11 also support a useful subset of C++14.

I agree that picking the minimum version of MSVC, GCC and LLVM is a useful means of determining the subset of features which are permitted.

I should note that when Xerces-C 3.2 was released, and included in the FreeBSD ports, I built it with C++11 and XMLCh=char16_t and ensured that every (packaged) open source user of Xerces-C++ was capable of being built both with C++11 and with char16_t [amberfish, apache-xml-security-c, cegui, enigma, freecad, gdal, glest, kdepim, libepp-nicbr, libkolabxml, opensaml, passwordsafe, pktanon, qbox, qgis, qgis-ltr, shibboleth-sp, sumo, traingame, xalan-c, xmlcopyeditor, xmltooling, xsd, zorba].  Every patch has been submitted and incorporated upstream as far as I'm aware. This means that making the switch to C++11 or C++14 should be completely transparent for most Xerces-C users.

In another project I am working on (build2) we had good results with
picking GCC 4.9, Clang 3.7, and MSVC 14u3 and getting a very usable
subset of C++14 (including move capture and generic lambdas).
I'm on slightly more recent versions, as is the AppVeyor and Travis CI, but so long as we can agree on the required C++ subset we can identify the minimum version requirement.
Another issue that we will need to decide on is which standard we
are going to build for (at least by default) and whether we will
be making it configurable. The problem here is that there is no
guarantee that code built for different standards is ABI-compatible
(and there are cases where the C++ standard itself broke this
compatibility). As a result, the only sure way to avoid surprises
is to build everything (Xerces-C++ and the application that uses
it) for the same standard.

For example, in build2 by default we use the latest available
standard for any given compiler/version but there is also a way to
override it for the entire build configuration. I am not sure if
CMake has anything like this.

It does, see https://cmake.org/cmake/help/latest/prop_tgt/CXX_STANDARD.html

We currently have it set like this: https://github.com/apache/xerces-c/blob/master/CMakeLists.txt#L37 :

   # Try C++17, then fall back to C++14, C++11 then C++98.  Used for
   feature tests
   # for optional features.
   set(CMAKE_CXX_STANDARD 17)

CMAKE_CXX_STANDARD sets the requested standard version.  If unavailable, it will fall back to earlier standards.  Since Xerces-C++ doesn't have a minimum standard at present, we allow it to fall back without restriction and then do specific feature tests to see what's available.  If we require C++14, we will need to add

   set(CMAKE_CXX_STANDARD 17)

   set(CMAKE_CXX_STANDARD_REQUIRED ON)

in order to mandate C++14.  It will then fail if it is not possible to achieve this.

What I do in my projects is something like this:

   # Prefer C++17 support
   if(NOT CMAKE_CXX_STANDARD)
   set(CMAKE_CXX_STANDARD 17)
   set(CMAKE_CXX_STANDARD_REQUIRED FALSE)
   endif()

This is to permit the user to override it, and only default if unset.


Kind regards,

Roger

Reply via email to