[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types
[ https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133718#comment-17133718 ] Scott Cantor commented on XERCESC-2206: --- I think that would be great (for me), I just didn't want to speak for somebody doing the work as to whether it's worth doing it but not releasing the result. I'm well aware of the nightmare of those 16-bit APIs, believe me. It is *mostly* possible to create a wide-string specialization for XMLCh, though it's not 100% C++ compliant due to locales. Certainly will be hugely welcome to have it for real once I can move to it. > Use char16_t and unicode literals to replace various XMLCh types > > > Key: XERCESC-2206 > URL: https://issues.apache.org/jira/browse/XERCESC-2206 > Project: Xerces-C++ > Issue Type: Bug > Components: Miscellaneous >Affects Versions: 3.3.0 >Reporter: Roger Leigh >Assignee: Roger Leigh >Priority: Major > Fix For: 3.3.0 > > > Currently, XMLCh can be a variety of 16-bit types depending upon the > platform, from wchar_t, uint16_t, unsigned short, to char16_t. > To reduce the platform-specific variability, fix XMLCh to char16_t, and also > permit the use of u"" unicode string literals in the codebase. This will > allow replacement of Unicode constants with direct use of literals. > This will additionally reduce the size of the test matrix with only one > character variant to test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types
[ https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133691#comment-17133691 ] Roger Leigh commented on XERCESC-2206: -- There would certainly be no urgency in releasing what's on the master branch. We could potentially stage the changes there and leave it until next year, and maybe also queue up any breaking changes which couldn't be applied for compatibility reasons before now for 3.2. This would permit us to do the work without committing to support two releases at the same time, if that would be acceptable? In benchmarking my application code, I've found that over 50% of the total CPU time could end up spent in transcoding, and a big part of that was conversion of UTF-8 to UTF-16 as input to Xerces-C++ and then more for reconversion of the output. If it were possible, I'd find much more value in UTF-8 end-to-end without involving UTF-16 or UTF-32. But being able to use UTF-16 literals and std::ustring directly would reduce the overheads by a fairly significant amount. > Use char16_t and unicode literals to replace various XMLCh types > > > Key: XERCESC-2206 > URL: https://issues.apache.org/jira/browse/XERCESC-2206 > Project: Xerces-C++ > Issue Type: Bug > Components: Miscellaneous >Affects Versions: 3.3.0 >Reporter: Roger Leigh >Assignee: Roger Leigh >Priority: Major > Fix For: 3.3.0 > > > Currently, XMLCh can be a variety of 16-bit types depending upon the > platform, from wchar_t, uint16_t, unsigned short, to char16_t. > To reduce the platform-specific variability, fix XMLCh to char16_t, and also > permit the use of u"" unicode string literals in the codebase. This will > allow replacement of Unicode constants with direct use of literals. > This will additionally reduce the size of the test matrix with only one > character variant to test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types
[ https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132655#comment-17132655 ] Scott Cantor commented on XERCESC-2206: --- I don't know that it's going to cause "pain" but I'm trying to say that it will necessitate maintaining two branches to do this, when we obviously are having trouble finding bodies to maintain even one. It's not easy for me to say that the benefit is worth it. In my experience, if you can't drop the old, there's little value in adding technical debt for the new. I think what I would probably argue is that having staged patches that update the code may be "good enough" vs. actually shipping them and having to support both. > Use char16_t and unicode literals to replace various XMLCh types > > > Key: XERCESC-2206 > URL: https://issues.apache.org/jira/browse/XERCESC-2206 > Project: Xerces-C++ > Issue Type: Bug > Components: Miscellaneous >Affects Versions: 3.3.0 >Reporter: Roger Leigh >Assignee: Roger Leigh >Priority: Major > Fix For: 3.3.0 > > > Currently, XMLCh can be a variety of 16-bit types depending upon the > platform, from wchar_t, uint16_t, unsigned short, to char16_t. > To reduce the platform-specific variability, fix XMLCh to char16_t, and also > permit the use of u"" unicode string literals in the codebase. This will > allow replacement of Unicode constants with direct use of literals. > This will additionally reduce the size of the test matrix with only one > character variant to test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types
[ https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132650#comment-17132650 ] Roger Leigh commented on XERCESC-2206: -- I certainly don't want to proceed with anything if it will cause you problems. My understanding from earlier conversations was that branching off xerces-3.2 would unblock such changes on the master branch, so that they wouldn't be disruptive for the stable release. Apologies if I've misunderstood. The changes I have made so far on master bring the language requirement up to C++98, and some of these changes would raise it up to C++11. If we need to discuss that further e.g. on the mailing list with a wider audience or in person, I'll be happy to do so. The intent is to make the library easier and more convenient to use, rather than causing unnecessary pain. Some of the tickets I've opened, such as XERCESC-2204 are more for discussion than any immediate action. The overriding intent for these is to minimise the combinatorial explosion of interacting configuration options to maximise test coverage and to minimise the maintenance required so that an end user is not going to get an untested combination. > Use char16_t and unicode literals to replace various XMLCh types > > > Key: XERCESC-2206 > URL: https://issues.apache.org/jira/browse/XERCESC-2206 > Project: Xerces-C++ > Issue Type: Bug > Components: Miscellaneous >Affects Versions: 3.3.0 >Reporter: Roger Leigh >Assignee: Roger Leigh >Priority: Major > Fix For: 3.3.0 > > > Currently, XMLCh can be a variety of 16-bit types depending upon the > platform, from wchar_t, uint16_t, unsigned short, to char16_t. > To reduce the platform-specific variability, fix XMLCh to char16_t, and also > permit the use of u"" unicode string literals in the codebase. This will > allow replacement of Unicode constants with direct use of literals. > This will additionally reduce the size of the test matrix with only one > character variant to test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types
[ https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130676#comment-17130676 ] Scott Cantor commented on XERCESC-2206: --- Installing both sets of headers is probably optional, I think it's enough to just allow side by side install of the library, which would work for 3.2 and 3.3 (the minor version is embedded into the library name, wisdom of that decision notwithstanding). I would agree that it's not likely to be a major hassle to maintain both, I'm just noting I would have to for a while (independently of whether I'm doing it officially or not). > Use char16_t and unicode literals to replace various XMLCh types > > > Key: XERCESC-2206 > URL: https://issues.apache.org/jira/browse/XERCESC-2206 > Project: Xerces-C++ > Issue Type: Bug > Components: Miscellaneous >Affects Versions: 3.3.0 >Reporter: Roger Leigh >Assignee: Roger Leigh >Priority: Major > Fix For: 3.3.0 > > > Currently, XMLCh can be a variety of 16-bit types depending upon the > platform, from wchar_t, uint16_t, unsigned short, to char16_t. > To reduce the platform-specific variability, fix XMLCh to char16_t, and also > permit the use of u"" unicode string literals in the codebase. This will > allow replacement of Unicode constants with direct use of literals. > This will additionally reduce the size of the test matrix with only one > character variant to test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types
[ https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130670#comment-17130670 ] Roger Leigh commented on XERCESC-2206: -- None of this is intended to cause any *API* break. But if there's any question of needing it to be parallel installable, we could make this preparatory work for a version 4.0 so that distributors can have xerces-c3 and xerces-c4 co-installable. For the few bugfixes and security updates which need applying, it won't be too much work to apply them to both branches. My intent here is to make Xerces-C++ viable for the longer term by making it transparently usable with contemporary compilers and standard libraries from the last decade, rather than have it stuck in the pre-Standard mid-90s-era which is mutually incompatible with the rest of the C++ world. I am not suggesting using C++20 or C++17, or anything remotely bleeding edge. We would be using a strictly limited subset of features, which currently would be: char16_t, thread and eventually optional use of streams and the standard exception types, and possibly ustring. None of those changes are intended to cause any breakage for existing C++ code using Xerces-C++. The char16_t change should be entirely transparent since it's already in 3.2.x if the compiler supports it, and is tested against most open source Xerces-C++ users. What it will enable is the guaranteed ability to utilise unicode literals and string literals in code calling Xerces which will make applications simpler and more readable. > Use char16_t and unicode literals to replace various XMLCh types > > > Key: XERCESC-2206 > URL: https://issues.apache.org/jira/browse/XERCESC-2206 > Project: Xerces-C++ > Issue Type: Bug > Components: Miscellaneous >Affects Versions: 3.3.0 >Reporter: Roger Leigh >Assignee: Roger Leigh >Priority: Major > Fix For: 3.3.0 > > > Currently, XMLCh can be a variety of 16-bit types depending upon the > platform, from wchar_t, uint16_t, unsigned short, to char16_t. > To reduce the platform-specific variability, fix XMLCh to char16_t, and also > permit the use of u"" unicode string literals in the codebase. This will > allow replacement of Unicode constants with direct use of literals. > This will additionally reduce the size of the test matrix with only one > character variant to test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types
[ https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130648#comment-17130648 ] Scott Cantor commented on XERCESC-2206: --- I suppose maintaining two branches is an option but it's not an ideal state certainly and doesn't really get the project into a better state unless I just fork. > Use char16_t and unicode literals to replace various XMLCh types > > > Key: XERCESC-2206 > URL: https://issues.apache.org/jira/browse/XERCESC-2206 > Project: Xerces-C++ > Issue Type: Bug > Components: Miscellaneous >Affects Versions: 3.3.0 >Reporter: Roger Leigh >Assignee: Roger Leigh >Priority: Major > Fix For: 3.3.0 > > > Currently, XMLCh can be a variety of 16-bit types depending upon the > platform, from wchar_t, uint16_t, unsigned short, to char16_t. > To reduce the platform-specific variability, fix XMLCh to char16_t, and also > permit the use of u"" unicode string literals in the codebase. This will > allow replacement of Unicode constants with direct use of literals. > This will additionally reduce the size of the test matrix with only one > character variant to test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types
[ https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130630#comment-17130630 ] Roger Leigh commented on XERCESC-2206: -- I have deliberately set the version to 3.3 so this is solely for the "master" branch and not intended to disrupt the "xerces-3.2" branch; likewise for all the issues created today. > Use char16_t and unicode literals to replace various XMLCh types > > > Key: XERCESC-2206 > URL: https://issues.apache.org/jira/browse/XERCESC-2206 > Project: Xerces-C++ > Issue Type: Bug > Components: Miscellaneous >Affects Versions: 3.3.0 >Reporter: Roger Leigh >Assignee: Roger Leigh >Priority: Major > Fix For: 3.3.0 > > > Currently, XMLCh can be a variety of 16-bit types depending upon the > platform, from wchar_t, uint16_t, unsigned short, to char16_t. > To reduce the platform-specific variability, fix XMLCh to char16_t, and also > permit the use of u"" unicode string literals in the codebase. This will > allow replacement of Unicode constants with direct use of literals. > This will additionally reduce the size of the test matrix with only one > character variant to test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types
[ https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130605#comment-17130605 ] Scott Cantor commented on XERCESC-2206: --- Same as other issue, I have no such support on too many platforms for that to be possible right now. > Use char16_t and unicode literals to replace various XMLCh types > > > Key: XERCESC-2206 > URL: https://issues.apache.org/jira/browse/XERCESC-2206 > Project: Xerces-C++ > Issue Type: Bug > Components: Miscellaneous >Affects Versions: 3.3.0 >Reporter: Roger Leigh >Assignee: Roger Leigh >Priority: Major > Fix For: 3.3.0 > > > Currently, XMLCh can be a variety of 16-bit types depending upon the > platform, from wchar_t, uint16_t, unsigned short, to char16_t. > To reduce the platform-specific variability, fix XMLCh to char16_t, and also > permit the use of u"" unicode string literals in the codebase. This will > allow replacement of Unicode constants with direct use of literals. > This will additionally reduce the size of the test matrix with only one > character variant to test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org