[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types

2020-06-11 Thread Scott Cantor (Jira)


[ 
https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133718#comment-17133718
 ] 

Scott Cantor commented on XERCESC-2206:
---

I think that would be great (for me), I just didn't want to speak for somebody 
doing the work as to whether it's worth doing it but not releasing the result.

I'm well aware of the nightmare of those 16-bit APIs, believe me. It is 
*mostly* possible to create a wide-string specialization for XMLCh, though it's 
not 100% C++ compliant due to locales. Certainly will be hugely welcome to have 
it for real once I can move to it.

> Use char16_t and unicode literals to replace various XMLCh types
> 
>
> Key: XERCESC-2206
> URL: https://issues.apache.org/jira/browse/XERCESC-2206
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Miscellaneous
>Affects Versions: 3.3.0
>Reporter: Roger Leigh
>Assignee: Roger Leigh
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, XMLCh can be a variety of 16-bit types depending upon the 
> platform, from wchar_t, uint16_t, unsigned short, to char16_t.
> To reduce the platform-specific variability, fix XMLCh to char16_t, and also 
> permit the use of u"" unicode string literals in the codebase.  This will 
> allow replacement of Unicode constants with direct use of literals.
> This will additionally reduce the size of the test matrix with only one 
> character variant to test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types

2020-06-11 Thread Roger Leigh (Jira)


[ 
https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133691#comment-17133691
 ] 

Roger Leigh commented on XERCESC-2206:
--

There would certainly be no urgency in releasing what's on the master branch.  
We could potentially stage the changes there and leave it until next year, and 
maybe also queue up any breaking changes which couldn't be applied for 
compatibility reasons before now for 3.2.  This would permit us to do the work 
without committing to support two releases at the same time, if that would be 
acceptable?

In benchmarking my application code, I've found that over 50% of the total CPU 
time could end up spent in transcoding, and a big part of that was conversion 
of UTF-8 to UTF-16 as input to Xerces-C++ and then more for reconversion of the 
output.  If it were possible, I'd find much more value in UTF-8 end-to-end 
without involving UTF-16 or UTF-32.  But being able to use UTF-16 literals and 
std::ustring directly would reduce the overheads by a fairly significant amount.

> Use char16_t and unicode literals to replace various XMLCh types
> 
>
> Key: XERCESC-2206
> URL: https://issues.apache.org/jira/browse/XERCESC-2206
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Miscellaneous
>Affects Versions: 3.3.0
>Reporter: Roger Leigh
>Assignee: Roger Leigh
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, XMLCh can be a variety of 16-bit types depending upon the 
> platform, from wchar_t, uint16_t, unsigned short, to char16_t.
> To reduce the platform-specific variability, fix XMLCh to char16_t, and also 
> permit the use of u"" unicode string literals in the codebase.  This will 
> allow replacement of Unicode constants with direct use of literals.
> This will additionally reduce the size of the test matrix with only one 
> character variant to test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types

2020-06-10 Thread Scott Cantor (Jira)


[ 
https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132655#comment-17132655
 ] 

Scott Cantor commented on XERCESC-2206:
---

I don't know that it's going to cause "pain" but I'm trying to say that it will 
necessitate maintaining two branches to do this, when we obviously are having 
trouble finding bodies to maintain even one.

It's not easy for me to say that the benefit is worth it. In my experience, if 
you can't drop the old, there's little value in adding technical debt for the 
new.

I think what I would probably argue is that having staged patches that update 
the code may be "good enough" vs. actually shipping them and having to support 
both.

> Use char16_t and unicode literals to replace various XMLCh types
> 
>
> Key: XERCESC-2206
> URL: https://issues.apache.org/jira/browse/XERCESC-2206
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Miscellaneous
>Affects Versions: 3.3.0
>Reporter: Roger Leigh
>Assignee: Roger Leigh
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, XMLCh can be a variety of 16-bit types depending upon the 
> platform, from wchar_t, uint16_t, unsigned short, to char16_t.
> To reduce the platform-specific variability, fix XMLCh to char16_t, and also 
> permit the use of u"" unicode string literals in the codebase.  This will 
> allow replacement of Unicode constants with direct use of literals.
> This will additionally reduce the size of the test matrix with only one 
> character variant to test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types

2020-06-10 Thread Roger Leigh (Jira)


[ 
https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132650#comment-17132650
 ] 

Roger Leigh commented on XERCESC-2206:
--

I certainly don't want to proceed with anything if it will cause you problems.  
My understanding from earlier conversations was that branching off xerces-3.2 
would unblock such changes on the master branch, so that they wouldn't be 
disruptive for the stable release.  Apologies if I've misunderstood.  The 
changes I have made so far on master bring the language requirement up to 
C++98, and some of these changes would raise it up to C++11.  If we need to 
discuss that further e.g. on the mailing list with a wider audience or in 
person, I'll be happy to do so.  The intent is to make the library easier and 
more convenient to use, rather than causing unnecessary pain.

Some of the tickets I've opened, such as XERCESC-2204 are more for discussion 
than any immediate action.  The overriding intent for these is to minimise the 
combinatorial explosion of interacting configuration options to maximise test 
coverage and to minimise the maintenance required so that an end user is not 
going to get an untested combination.

> Use char16_t and unicode literals to replace various XMLCh types
> 
>
> Key: XERCESC-2206
> URL: https://issues.apache.org/jira/browse/XERCESC-2206
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Miscellaneous
>Affects Versions: 3.3.0
>Reporter: Roger Leigh
>Assignee: Roger Leigh
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, XMLCh can be a variety of 16-bit types depending upon the 
> platform, from wchar_t, uint16_t, unsigned short, to char16_t.
> To reduce the platform-specific variability, fix XMLCh to char16_t, and also 
> permit the use of u"" unicode string literals in the codebase.  This will 
> allow replacement of Unicode constants with direct use of literals.
> This will additionally reduce the size of the test matrix with only one 
> character variant to test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types

2020-06-10 Thread Scott Cantor (Jira)


[ 
https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130676#comment-17130676
 ] 

Scott Cantor commented on XERCESC-2206:
---

Installing both sets of headers is probably optional, I think it's enough to 
just allow side by side install of the library, which would work for 3.2 and 
3.3 (the minor version is embedded into the library name, wisdom of that 
decision notwithstanding).

I would agree that it's not likely to be a major hassle to maintain both, I'm 
just noting I would have to for a while (independently of whether I'm doing it 
officially or not).

> Use char16_t and unicode literals to replace various XMLCh types
> 
>
> Key: XERCESC-2206
> URL: https://issues.apache.org/jira/browse/XERCESC-2206
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Miscellaneous
>Affects Versions: 3.3.0
>Reporter: Roger Leigh
>Assignee: Roger Leigh
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, XMLCh can be a variety of 16-bit types depending upon the 
> platform, from wchar_t, uint16_t, unsigned short, to char16_t.
> To reduce the platform-specific variability, fix XMLCh to char16_t, and also 
> permit the use of u"" unicode string literals in the codebase.  This will 
> allow replacement of Unicode constants with direct use of literals.
> This will additionally reduce the size of the test matrix with only one 
> character variant to test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types

2020-06-10 Thread Roger Leigh (Jira)


[ 
https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130670#comment-17130670
 ] 

Roger Leigh commented on XERCESC-2206:
--

None of this is intended to cause any *API* break.  But if there's any question 
of needing it to be parallel installable, we could make this preparatory work 
for a version 4.0 so that distributors can have xerces-c3 and xerces-c4 
co-installable.

For the few bugfixes and security updates which need applying, it won't be too 
much work to apply them to both branches.

My intent here is to make Xerces-C++ viable for the longer term by making it 
transparently usable with contemporary compilers and standard libraries from 
the last decade, rather than have it stuck in the pre-Standard mid-90s-era 
which is mutually incompatible with the rest of the C++ world.  I am not 
suggesting using C++20 or C++17, or anything remotely bleeding edge.  We would 
be using a strictly limited subset of features, which currently would be: 
char16_t, thread and eventually optional use of streams and the standard 
exception types, and possibly ustring.

None of those changes are intended to cause any breakage for existing C++ code 
using Xerces-C++.  The char16_t change should be entirely transparent since 
it's already in 3.2.x if the compiler supports it, and is tested against most 
open source Xerces-C++ users.  What it will enable is the guaranteed ability to 
utilise unicode literals and string literals in code calling Xerces which will 
make applications simpler and more readable.

> Use char16_t and unicode literals to replace various XMLCh types
> 
>
> Key: XERCESC-2206
> URL: https://issues.apache.org/jira/browse/XERCESC-2206
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Miscellaneous
>Affects Versions: 3.3.0
>Reporter: Roger Leigh
>Assignee: Roger Leigh
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, XMLCh can be a variety of 16-bit types depending upon the 
> platform, from wchar_t, uint16_t, unsigned short, to char16_t.
> To reduce the platform-specific variability, fix XMLCh to char16_t, and also 
> permit the use of u"" unicode string literals in the codebase.  This will 
> allow replacement of Unicode constants with direct use of literals.
> This will additionally reduce the size of the test matrix with only one 
> character variant to test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types

2020-06-10 Thread Scott Cantor (Jira)


[ 
https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130648#comment-17130648
 ] 

Scott Cantor commented on XERCESC-2206:
---

I suppose maintaining two branches is an option but it's not an ideal state 
certainly and doesn't really get the project into a better state unless I just 
fork.

> Use char16_t and unicode literals to replace various XMLCh types
> 
>
> Key: XERCESC-2206
> URL: https://issues.apache.org/jira/browse/XERCESC-2206
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Miscellaneous
>Affects Versions: 3.3.0
>Reporter: Roger Leigh
>Assignee: Roger Leigh
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, XMLCh can be a variety of 16-bit types depending upon the 
> platform, from wchar_t, uint16_t, unsigned short, to char16_t.
> To reduce the platform-specific variability, fix XMLCh to char16_t, and also 
> permit the use of u"" unicode string literals in the codebase.  This will 
> allow replacement of Unicode constants with direct use of literals.
> This will additionally reduce the size of the test matrix with only one 
> character variant to test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types

2020-06-10 Thread Roger Leigh (Jira)


[ 
https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130630#comment-17130630
 ] 

Roger Leigh commented on XERCESC-2206:
--

I have deliberately set the version to 3.3 so this is solely for the "master" 
branch and not intended to disrupt the "xerces-3.2" branch; likewise for all 
the issues created today.

> Use char16_t and unicode literals to replace various XMLCh types
> 
>
> Key: XERCESC-2206
> URL: https://issues.apache.org/jira/browse/XERCESC-2206
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Miscellaneous
>Affects Versions: 3.3.0
>Reporter: Roger Leigh
>Assignee: Roger Leigh
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, XMLCh can be a variety of 16-bit types depending upon the 
> platform, from wchar_t, uint16_t, unsigned short, to char16_t.
> To reduce the platform-specific variability, fix XMLCh to char16_t, and also 
> permit the use of u"" unicode string literals in the codebase.  This will 
> allow replacement of Unicode constants with direct use of literals.
> This will additionally reduce the size of the test matrix with only one 
> character variant to test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2206) Use char16_t and unicode literals to replace various XMLCh types

2020-06-10 Thread Scott Cantor (Jira)


[ 
https://issues.apache.org/jira/browse/XERCESC-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130605#comment-17130605
 ] 

Scott Cantor commented on XERCESC-2206:
---

Same as other issue, I have no such support on too many platforms for that to 
be possible right now.

> Use char16_t and unicode literals to replace various XMLCh types
> 
>
> Key: XERCESC-2206
> URL: https://issues.apache.org/jira/browse/XERCESC-2206
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Miscellaneous
>Affects Versions: 3.3.0
>Reporter: Roger Leigh
>Assignee: Roger Leigh
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, XMLCh can be a variety of 16-bit types depending upon the 
> platform, from wchar_t, uint16_t, unsigned short, to char16_t.
> To reduce the platform-specific variability, fix XMLCh to char16_t, and also 
> permit the use of u"" unicode string literals in the codebase.  This will 
> allow replacement of Unicode constants with direct use of literals.
> This will additionally reduce the size of the test matrix with only one 
> character variant to test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org