Re: Adding new IBM extended charsets

Nasser Ebrahim Fri, 06 Jul 2018 06:58:28 -0700

Hi Sherman,

Thank you for your valuable inputs. I would like to clarify some of your 
points.

> I would assume the best option might be (3), in which only keeps those
> ibm charsets that might be used/configured to be the default charset for
> vm startup on IBM platform, and leaves the rest to icu4j charsets 
provider.

I understood you preferred option is 3 [Remove all extended charsets from 
JDK (keep only default charsets) and use the extended charsets from third 
party like ICU4J]. Just to confirm, so you meant we need to keep only the 
standard charsets in the JDK and remove all the extended charsets from JDK 
and use them from ICU4J OR you meant apply that only for the new extended 
charsets. I think it is better to keep the consistency - either take all 
extended charsets from ICU4J or maintain all extended charsets with JDK. 
Keeping some extended charsets within JDK and use ICU4J for other extended 
charsets may confuse the Java user.

> There are hundreds of IBM specific charsets, and with various versions 
that
> probably are not going to be used by most of the Java developers in most
> normal use scenarios. I would assume it's NOT in our best interest to 
> implement,
> test and support/maintain them in openjdk project/repo.

As I explained in the previous note, the IBM charsets are not just 
applicable to IBM Platforms. It can be applicable to any platforms if Java 
has to access an application or database on IBM platforms. For example, if 
a Java application on Linux or Windows requires EBCDIC code pages for JDBC 
to access DB2 on z/OS.

Precisely, there are 75 charsets from IBM JDK which are currently missing 
in openjdk. Within the 75 missing charsets, 2 charsets are default 
charsets on AIX platform. Out of the remaining 73 charsets, some of them 
are default charsets for z/OS platform. We would like to contribute those 
75 charsets to openJDK. 

> For those to be kept in openjdk repo, they could/should be separated 
from
> the extended charsets and kept into a separate package/module as well, 
and
> probably are only configured to build into the binaries for IBM 
> platforms, and
> maintained by the AIX port.

Yes. I agree it makes sense to keep all IBM charsets as a separate charset 
provider / module and include them on a need basis to reduce the 
footprint. If everybody agrees, we can start working towards creating a 
separate charset provider/module for IBM charsets in JDK. Please advise. 

Thank you,
Nasser Ebrahim

From:   Xueming Shen <xueming.s...@oracle.com>
To:     core-libs-dev@openjdk.java.net
Date:   07/05/2018 10:14 PM
Subject:        Re: Adding new IBM extended charsets
Sent by:        "core-libs-dev" <core-libs-dev-boun...@openjdk.java.net>

On 7/4/18, 5:41 AM, Nasser Ebrahim wrote:
> Hello,
>
> Am starting this mail thread to discuss about adding new IBM extended
> charsets. The questions is whether we need to add the new extended
> charsets to jdk.charsets or to a new separate charset provider/module 
like
> jdk.ibmcharsets. This discussion is in continuation of the suggestion 
from
> Alan Bateman in the mail chain -
> 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/053316.html
.
>
>
> Am copying his inputs from that mail thread to start the discussion:
> "I think we should start a discussion here about moving some or all of 
the
> IBM charsets to their own service provider module. I realize the AIX 
port
> might want to include some of them in its build of java.base but they
> aren't interesting to include in java.base, or even jdk.charsets, on 
most
> platform"
>
> First, let me clarify whether IBM charsets are applicable only to IBM
> platforms like AIX or applicable to other platforms as well. All IBM
> charsets are applicable to any platforms including Linux and windows if
> those platforms needs to communicate with an application or database in
> IBM platforms like AIX. That is the reason, we traditionally add them to
> the jdk.charsets. However, we agree with Alan that those IBM charsets 
are
> not required if the JDK is not communicating to any 
applications/databases
> on IBM platforms. Hence, it makes sense to consider a separate charset
> provider / module for IBM charsets and use build parameters to decide
> whether to generate the new charset provider or not for any platforms.
>
> Let me list out all the possible options I can think of for adding new
> extended charsets so that we can discuss and decide which is the best
> option.
>
> 1) Continue to add new extended charsets to jdk.charsets.
> The advantage with this approach is that no need to add new charset
> provider and all extended charsets are placed in one module. Also, any
> extended charset is applicable to any platform if they need to 
communicate
> with application/database in different platforms. The disadvantage is 
that
> the number of charsets in jdk.charsets keep increasing and blot its 
size.
> Also, many of those charsets may not be used in the lifetime of the JDK
> unless it is communicating  with application/databases of those 
platforms.
>
> 2) Create a new charset provider and module (say jdk.ibmcharsets) for 
all
> IBM charsets and include the new module in JDK on a need basis.
> The advantage with this approach is that the foot print of jdk.charsets
> can be reduced and can include the new module only if it is required. 
The
> disadvantage is that a new charset provider needs to be created. Also,
> extended charsets will be located in two different modules and many a
> times both the modules are required.
>
> 3) Remove all extended charsets from JDK (keep only default charsets) 
and
> use the extended charsets from third party like ICU4J.
> I believe this option might be discussed in the past and there might be
> valid reason not to pursue this option. Am still listing it to ensure 
that
> we have considered this option as well. The advantage with this approach
> is that we can avoid maintaining the same charsets by two different open
> source communities. The disadvantage with this option is that the 
release
> cycle of the two communities may be different and we may need to 
maintain
> the level ourselves for LTS releases as we may not want to change the
> specification in a service stream.
>
> Please share your thoughts on your preferred option and list out any 
other
> options which I missed out. Thank you for your time.
>
>
Hi Nasser,

I would assume the best option might be (3), in which only keeps those
ibm charsets that might be used/configured to be the default charset for
vm startup on IBM platform, and leaves the rest to icu4j charsets 
provider.

There are hundreds of IBM specific charsets, and with various versions 
that
probably are not going to be used by most of the Java developers in most
normal use scenarios. I would assume it's NOT in our best interest to 
implement,
test and support/maintain them in openjdk project/repo. Since the jar is
maven repo already, any request for the support of those charsets can be 
pointed
to there accordingly. I did not know there is such package/jar in the 
maven.
I did check the icu4j years ago and there was no nio provider interface 
provided
back then.

For those to be kept in openjdk repo, they could/should be separated from
the extended charsets and kept into a separate package/module as well, and
probably are only configured to build into the binaries for IBM 
platforms, and
maintained by the AIX port.

Thanks,
Sherman

Re: Adding new IBM extended charsets

Reply via email to