Re: Adding new IBM extended charsets

2018-08-22 Thread Nasser Ebrahim
Sure Magnus. Thank you for your suggestion.

Once am ready with the prototype, I will initiate the infrastructure 
related discussion to  build-...@openjdk.java.net and continue the 
streamlining of the extended charsets related discussion here.

Thank you,
Nasser Ebrahim





From:   Magnus Ihse Bursie 
To: Alan Bateman , Nasser Ebrahim 
, core-libs-dev@openjdk.java.net, build-dev 

Date:   08/22/2018 02:27 PM
Subject:Re: Adding new IBM extended charsets



On 2018-08-05 20:38, Alan Bateman wrote:
>
> As regards the way forward then I think we have to put infrastructure 
> into the build to make it easy to allow specific charsets be included 
> or excluded from specific platforms. As things stand, and as have you 
> have found with your updates to the stdcs- files, the 
> charsets are generated to be included in either java.base or 
> jdk.charsets. We need another input to the configurability to make it 
> possible to include or exclude so that the main stream platforms do 
> not have to include the IBM charsets. There are several details around 
> this, particularly around aliases, but if we can get that done then we 
> have a lot of flexibility. 

And when you are ready to discuss the required build changes, please 
head over to build-...@openjdk.java.net. ;-)

The build part of the charset handling is less-than-ideal. I hope we can 
find a way to clean this up as well, as part of streamlining the 
included charsets.

/Magnus








Re: Adding new IBM extended charsets

2018-08-22 Thread Nasser Ebrahim
Hi Alan,

Thank you for your valuable inputs. I will initiate the discussion with 
ICU4J community to explore the possibility of using ICU4J by resolving the 
compatibility and performance difference so that we can use ICU4J for most 
of the extended charsets and remove them JDK build. As we discussed 
earlier, significant changes are required on ICU4J side to resolve the 
functional and performance difference for JDK to directly consume it and 
hence may be considered as a long term solution.

In the mean time, I can explore the other option you have suggested to 
make the IBM charsets specific to AIX platform and keep optional for other 
platforms by making the make file changes. I will try to create a 
prototype to do the make/src file changes which enable generating IBM 
charsets as a separate module only on AIX platform and keep optional for 
other platforms.

Please let me know if you have any inputs.

Thank you,
Nasser Ebrahim




From:   Alan Bateman 
To: Nasser Ebrahim , 
core-libs-dev@openjdk.java.net, Xueming Shen 
Date:   08/06/2018 12:08 AM
Subject:Re: Adding new IBM extended charsets



On 24/07/2018 09:56, Nasser Ebrahim wrote:
Thank you Martin, Sherman and Alan for your valuable inputs. 

I have done some initial analysis on the ICU4J. There are some 
compatibility issues on the ICU4J charsets with JDK charsets but am more 
concerned about its performance as JDK optimization do no exist in that 
implementation. I think we need to work with the ICU4J community to 
resolve those issues before we remove those charsets from JDK.
If you can work with the ICU4J project on these issues then I think we 
have a way forward. An additional issue with their downloads is that they 
target JDK 6 and don't seem to have thought about deploying as modules 
with JDK 9 or newer yet. Their downloads can be used as automatic modules 
but it requires renaming their JAR files due to unusual naming that they 
use to encode the version string. A simple Automatic-Module-Name attribute 
would make it easy for developers to deploy their charset provider on the 
module path, they can still target JDK 6.

As regards the way forward then I think we have to put infrastructure into 
the build to make it easy to allow specific charsets be included or 
excluded from specific platforms. As things stand, and as have you have 
found with your updates to the stdcs- files, the charsets are 
generated to be included in either java.base or jdk.charsets. We need 
another input to the configurability to make it possible to include or 
exclude so that the main stream platforms do not have to include the IBM 
charsets. There are several details around this, particularly around 
aliases, but if we can get that done then we have a lot of flexibility.  
My personal view is that we should work towards excluding the IBM charsets 
from the main stream platforms, starting with a cull of the EBCDIC 
charsets. If the ICU4J project can get their issues sorted out in a 
similar time frame then it makes for a simple migration story -- the JDK 
includes the standard charsets and many additional charsets. If you need 
others then download the ICU4J charset provider and deploy it on your 
class path or module path.

-Alan





Re: Adding new IBM extended charsets

2018-08-22 Thread Magnus Ihse Bursie

On 2018-08-05 20:38, Alan Bateman wrote:


As regards the way forward then I think we have to put infrastructure 
into the build to make it easy to allow specific charsets be included 
or excluded from specific platforms. As things stand, and as have you 
have found with your updates to the stdcs- files, the 
charsets are generated to be included in either java.base or 
jdk.charsets. We need another input to the configurability to make it 
possible to include or exclude so that the main stream platforms do 
not have to include the IBM charsets. There are several details around 
this, particularly around aliases, but if we can get that done then we 
have a lot of flexibility. 


And when you are ready to discuss the required build changes, please 
head over to build-...@openjdk.java.net. ;-)


The build part of the charset handling is less-than-ideal. I hope we can 
find a way to clean this up as well, as part of streamlining the 
included charsets.


/Magnus




Re: Adding new IBM extended charsets

2018-08-06 Thread Florian Weimer

On 07/06/2018 02:23 PM, Nasser Ebrahim wrote:

Hi Florian,

Thank you for your response. iconv is platform dependent and not good for
the platform agnostic nature of Java. Also, many charsets in Java are not
available across platforms. I believe Java decided to have its own
charsets due to those reasons so that it can work seamlessly on any
supported platforms.


But then the tables will occasionally be different from what the 
platform uses for conversation.  Wouldn't be compatibility with the rest 
of the system installation be preferred in this context?


Thanks,
Florian


Re: Adding new IBM extended charsets

2018-08-05 Thread Alan Bateman

On 24/07/2018 09:56, Nasser Ebrahim wrote:

Thank you Martin, Sherman and Alan for your valuable inputs.

I have done some initial analysis on the ICU4J. There are some 
compatibility issues on the ICU4J charsets with JDK charsets but am 
more concerned about its performance as JDK optimization do no exist 
in that implementation. I think we need to work with the ICU4J 
community to resolve those issues before we remove those charsets from 
JDK.
If you can work with the ICU4J project on these issues then I think we 
have a way forward. An additional issue with their downloads is that 
they target JDK 6 and don't seem to have thought about deploying as 
modules with JDK 9 or newer yet. Their downloads can be used as 
automatic modules but it requires renaming their JAR files due to 
unusual naming that they use to encode the version string. A simple 
Automatic-Module-Name attribute would make it easy for developers to 
deploy their charset provider on the module path, they can still target 
JDK 6.


As regards the way forward then I think we have to put infrastructure 
into the build to make it easy to allow specific charsets be included or 
excluded from specific platforms. As things stand, and as have you have 
found with your updates to the stdcs- files, the charsets are 
generated to be included in either java.base or jdk.charsets. We need 
another input to the configurability to make it possible to include or 
exclude so that the main stream platforms do not have to include the IBM 
charsets. There are several details around this, particularly around 
aliases, but if we can get that done then we have a lot of flexibility.  
My personal view is that we should work towards excluding the IBM 
charsets from the main stream platforms, starting with a cull of the 
EBCDIC charsets. If the ICU4J project can get their issues sorted out in 
a similar time frame then it makes for a simple migration story -- the 
JDK includes the standard charsets and many additional charsets. If you 
need others then download the ICU4J charset provider and deploy it on 
your class path or module path.


-Alan


Re: Adding new IBM extended charsets

2018-07-24 Thread Nasser Ebrahim
Thank you Martin, Sherman and Alan for your valuable inputs. 

I have done some initial analysis on the ICU4J. There are some 
compatibility issues on the ICU4J charsets with JDK charsets but am more 
concerned about its performance as JDK optimization do no exist in that 
implementation. I think we need to work with the ICU4J community to 
resolve those issues before we remove those charsets from JDK.

The primary reason we are interested to contribute the charsets to openjdk 
is that Java users of all locales to get a seamless experience when they 
move between openjdk and other implementations. I agree it is good from 
footprint and maintenance perspective if we are able to reduce the number 
of charsets. 

I believe the maintenance effort on the charsets are usually less as we 
hardly make any changes to the charsets once developed. Also, the charsets 
are usually independent to each other and hence usually will not affect 
the Java users unless they are used. As more team members from my team 
would like to actively participate in the openjdk community, I hope 
maintenance of any issues reported on IBM charsets may not be an issue 
going forward. As we discussed before, the footprint issue can be avoided 
if we enable the IBM charsets on a need basis with a build flag. 

As you advised, we can enable the IBM charsets only for AIX platform by 
default and user can enable them on other platforms on a need basis. If 
all of you agree, we can start working on moving all IBM charsets from 
jdk.charsets to a different module  jdk.ibm.charsets and enable them only 
for AIX platform by default. We can consider removing them from JDK in 
future if community found them as an overhead or not adding value. 

Please advise. 

Thank you,
Nasser Ebrahim



From:   Alan Bateman 
To: Xueming Shen , Nasser Ebrahim 

Cc: core-libs-dev@openjdk.java.net
Date:   07/19/2018 03:44 PM
Subject:Re: Adding new IBM extended charsets



On 19/07/2018 08:27, Xueming Shen wrote:
> Hi Nasser,
>
> From openjdk's perspective It would be preferred to direct the develop 
> to use the charset
> implementation provided by IBM, or the reliable third party that has 
> the appropriate knowledge,
> experience and resource to support/maintain those charsets such as the 
> icu4j charset
> project. I have been pulling the data from that huge icu-charset-data 
> file and implement/maintain
> them based on my best knowledge, but I'm sure engineers from IBM or 
> the icu project probably
> can do a much better job to implement/maintain/update those charsets 
> going forward.
>
> As first step we can separate those IBM charsets from the jdk.charset 
> into a separate package
> somewhere and configure them to be built into java.base and 
> jdk.charsets, for aix platform only.
> Then we can further discuss the best way to handle/distribute those 
> charsets that are not needed
> for the java.base module (for vm startup). As I said, it would be 
> ideal if we can remove them from the
> openjdk repo/binaries complete and direct the developer/user to use 
> the icu4j charset provider
> for those encodings, when needed. But given the possible compatibility 
> concern, we might want to
> phase this work out gradually in next major release.
I agree and in terms of phasing then I don't think it would be too 
disruptive if the EBCDIC charsets were dropped from jdk.charsets in JDK 
12, at least on the main stream platforms. As we've established in this 
thread, the ICU4J project does seem to publish its charset provider to 
Maven so there are alternatives for applications that really need these 
charsets

Nasser - do you do any testing with the ICU4J charsets? I quickly tried 
62.1 and it seems to work fine on the class path. I didn't check for any 
compatibility differences or compare the performance but maybe you have. 
It's a bit awkward to test this provider as an automatic module due to 
the unusual naming of these JAR files. They may not have looked at 
modules yet but the ability to link thee icu4h.charsets module into a 
run-time image seems something that people may want to do in the future.

-Alan







Re: Adding new IBM extended charsets

2018-07-19 Thread Alan Bateman

On 19/07/2018 08:27, Xueming Shen wrote:

Hi Nasser,

From openjdk's perspective It would be preferred to direct the develop 
to use the charset
implementation provided by IBM, or the reliable third party that has 
the appropriate knowledge,
experience and resource to support/maintain those charsets such as the 
icu4j charset
project. I have been pulling the data from that huge icu-charset-data 
file and implement/maintain
them based on my best knowledge, but I'm sure engineers from IBM or 
the icu project probably
can do a much better job to implement/maintain/update those charsets 
going forward.


As first step we can separate those IBM charsets from the jdk.charset 
into a separate package
somewhere and configure them to be built into java.base and 
jdk.charsets, for aix platform only.
Then we can further discuss the best way to handle/distribute those 
charsets that are not needed
for the java.base module (for vm startup). As I said, it would be 
ideal if we can remove them from the
openjdk repo/binaries complete and direct the developer/user to use 
the icu4j charset provider
for those encodings, when needed. But given the possible compatibility 
concern, we might want to

phase this work out gradually in next major release.
I agree and in terms of phasing then I don't think it would be too 
disruptive if the EBCDIC charsets were dropped from jdk.charsets in JDK 
12, at least on the main stream platforms. As we've established in this 
thread, the ICU4J project does seem to publish its charset provider to 
Maven so there are alternatives for applications that really need these 
charsets


Nasser - do you do any testing with the ICU4J charsets? I quickly tried 
62.1 and it seems to work fine on the class path. I didn't check for any 
compatibility differences or compare the performance but maybe you have. 
It's a bit awkward to test this provider as an automatic module due to 
the unusual naming of these JAR files. They may not have looked at 
modules yet but the ability to link thee icu4h.charsets module into a 
run-time image seems something that people may want to do in the future.


-Alan


Re: Adding new IBM extended charsets

2018-07-19 Thread Xueming Shen

Hi Nasser,

From openjdk's perspective It would be preferred to direct the develop 
to use the charset
implementation provided by IBM, or the reliable third party that has the 
appropriate knowledge,
experience and resource to support/maintain those charsets such as the 
icu4j charset
project. I have been pulling the data from that huge icu-charset-data 
file and implement/maintain
them based on my best knowledge, but I'm sure engineers from IBM or the 
icu project probably
can do a much better job to implement/maintain/update those charsets 
going forward.


As first step we can separate those IBM charsets from the jdk.charset 
into a separate package
somewhere and configure them to be built into java.base and 
jdk.charsets, for aix platform only.
Then we can further discuss the best way to handle/distribute those 
charsets that are not needed
for the java.base module (for vm startup). As I said, it would be ideal 
if we can remove them from the
openjdk repo/binaries complete and direct the developer/user to use the 
icu4j charset provider
for those encodings, when needed. But given the possible compatibility 
concern, we might want to

phase this work out gradually in next major release.

Thanks,
Sherman


On 7/17/18, 6:48 AM, Nasser Ebrahim wrote:

Hi Alan,

Thank you for your inputs. I would like to clarify that all the  IBM 
charsets (IBM) in jdk.charsets are not IBM platform specific 
charsets. For example, only 43 charsets out of 72 IBM in 
jdk.charsets are EBCDIC or IBM platform specific charsets. Similarly, 
many charsets in the list of 75 charsets which we would like to 
contribute are not EBCDIC charsets.


I feel we should have a standard guideline for the extended charsets. 
If we are keeping the extended charsets in the JDK, then we may want 
to consider all ICU/IANA approved charsets in JDK. Otherwise, we may 
want to keep only the standard charsets in JDK and remove all the 
extended charsets so that all extended charsets can be taken from 
third party libraries like ICU4J.


If we decided to keep the extended charsets, then may be we can 
classify the extended charsets as ASCII and EBCDIC and the 
corresponding modules as jdk.ascii.charset and jdk.ebcdic.charset. 
Then, depends upon the platform, we can consider including either of 
the charset module or both.


Please advise.

Thank you,
Nasser Ebrahim




From: Alan Bateman 
To: Nasser Ebrahim , Xueming Shen 
, core-libs-dev@openjdk.java.net

Date: 07/09/2018 01:25 AM
Subject: Re: Adding new IBM extended charsets




On 06/07/2018 14:56, Nasser Ebrahim wrote:
> :
> I understood you preferred option is 3 [Remove all extended charsets 
from
> JDK (keep only default charsets) and use the extended charsets from 
third
> party like ICU4J]. Just to confirm, so you meant we need to keep 
only the
> standard charsets in the JDK and remove all the extended charsets 
from JDK
> and use them from ICU4J OR you meant apply that only for the new 
extended

> charsets. I think it is better to keep the consistency - either take all
> extended charsets from ICU4J or maintain all extended charsets with JDK.
> Keeping some extended charsets within JDK and use ICU4J for other 
extended

> charsets may confuse the Java user.
I think the suggestion in Sherman's mail is to drop the 70 or so IBM
charsets from jdk.charsets. This will reduce the size of jdk.charsets
and eliminate the need to maintain these charsets (at least on non-AIX
builds). If developers need these charsets, say when connecting to
database on an IBM system, then they can deploy the ICU4J provider on
the class path or module path.

I don't think the suggestion impacts the 11 IBM charsets in java.base on
non-AIX builds or the non-IBM charsets in jdk.charsets. They may be
opportunities to drop some of these but that can be looked at separately.

Also I don't think the suggestion impacts the additional 12 IBM charsets
that are included in the AIX build of java.base at this time. From the
review threads, it seems there are supported locales on AIX that map to
these charsets so this is why they are in java.base.

-Alan.








Re: Adding new IBM extended charsets

2018-07-17 Thread Martin Buchholz
History:  I recall 15 years ago wondering why no one from IBM was
maintaining the IBM charsets in the jdk.  Today anything but UTF-* is
increasingly legacy, so the case for including them is weaker than back
then.  Probably today the IBM charsets are best maintained outside the jdk
sources as a third party package, especially considering that openjdk has
been kicking some other components like corba out of the nest.

But there is some synergy - some of the openjdk charset tests check
invariants that apply to all charset implementations.  I may have even
fixed some IBM charset code many years ago.


Re: Adding new IBM extended charsets

2018-07-17 Thread Nasser Ebrahim
Hi Alan,

Thank you for your inputs. I would like to clarify that all the  IBM 
charsets (IBM) in jdk.charsets are not IBM platform specific charsets. 
For example, only 43 charsets out of 72 IBM in jdk.charsets are EBCDIC 
or IBM platform specific charsets. Similarly, many charsets in the list of 
75 charsets which we would like to contribute are not EBCDIC charsets. 

I feel we should have a standard guideline for the extended charsets. If 
we are keeping the extended charsets in the JDK, then we may want to 
consider all ICU/IANA approved charsets in JDK. Otherwise, we may want to 
keep only the standard charsets in JDK and remove all the extended 
charsets so that all extended charsets can be taken from third party 
libraries like ICU4J.

If we decided to keep the extended charsets, then may be we can classify 
the extended charsets as ASCII and EBCDIC and the corresponding modules as 
jdk.ascii.charset and jdk.ebcdic.charset. Then, depends upon the platform, 
we can consider including either of the charset module or both. 

Please advise.

Thank you,
Nasser Ebrahim




From:   Alan Bateman 
To: Nasser Ebrahim , Xueming Shen 
, core-libs-dev@openjdk.java.net
Date:   07/09/2018 01:25 AM
Subject:Re: Adding new IBM extended charsets



On 06/07/2018 14:56, Nasser Ebrahim wrote:
> :
> I understood you preferred option is 3 [Remove all extended charsets 
from
> JDK (keep only default charsets) and use the extended charsets from 
third
> party like ICU4J]. Just to confirm, so you meant we need to keep only 
the
> standard charsets in the JDK and remove all the extended charsets from 
JDK
> and use them from ICU4J OR you meant apply that only for the new 
extended
> charsets. I think it is better to keep the consistency - either take all
> extended charsets from ICU4J or maintain all extended charsets with JDK.
> Keeping some extended charsets within JDK and use ICU4J for other 
extended
> charsets may confuse the Java user.
I think the suggestion in Sherman's mail is to drop the 70 or so IBM 
charsets from jdk.charsets. This will reduce the size of jdk.charsets 
and eliminate the need to maintain these charsets (at least on non-AIX 
builds). If developers need these charsets, say when connecting to 
database on an IBM system, then they can deploy the ICU4J provider on 
the class path or module path.

I don't think the suggestion impacts the 11 IBM charsets in java.base on 
non-AIX builds or the non-IBM charsets in jdk.charsets. They may be 
opportunities to drop some of these but that can be looked at separately.

Also I don't think the suggestion impacts the additional 12 IBM charsets 
that are included in the AIX build of java.base at this time. From the 
review threads, it seems there are supported locales on AIX that map to 
these charsets so this is why they are in java.base.

-Alan.







Re: Adding new IBM extended charsets

2018-07-08 Thread Alan Bateman

On 06/07/2018 14:56, Nasser Ebrahim wrote:

:
I understood you preferred option is 3 [Remove all extended charsets from
JDK (keep only default charsets) and use the extended charsets from third
party like ICU4J]. Just to confirm, so you meant we need to keep only the
standard charsets in the JDK and remove all the extended charsets from JDK
and use them from ICU4J OR you meant apply that only for the new extended
charsets. I think it is better to keep the consistency - either take all
extended charsets from ICU4J or maintain all extended charsets with JDK.
Keeping some extended charsets within JDK and use ICU4J for other extended
charsets may confuse the Java user.
I think the suggestion in Sherman's mail is to drop the 70 or so IBM 
charsets from jdk.charsets. This will reduce the size of jdk.charsets 
and eliminate the need to maintain these charsets (at least on non-AIX 
builds). If developers need these charsets, say when connecting to 
database on an IBM system, then they can deploy the ICU4J provider on 
the class path or module path.


I don't think the suggestion impacts the 11 IBM charsets in java.base on 
non-AIX builds or the non-IBM charsets in jdk.charsets. They may be 
opportunities to drop some of these but that can be looked at separately.


Also I don't think the suggestion impacts the additional 12 IBM charsets 
that are included in the AIX build of java.base at this time. From the 
review threads, it seems there are supported locales on AIX that map to 
these charsets so this is why they are in java.base.


-Alan.


Re: Adding new IBM extended charsets

2018-07-06 Thread Nasser Ebrahim
Hi Sherman,

Thank you for your valuable inputs. I would like to clarify some of your 
points.

> I would assume the best option might be (3), in which only keeps those
> ibm charsets that might be used/configured to be the default charset for
> vm startup on IBM platform, and leaves the rest to icu4j charsets 
provider.

I understood you preferred option is 3 [Remove all extended charsets from 
JDK (keep only default charsets) and use the extended charsets from third 
party like ICU4J]. Just to confirm, so you meant we need to keep only the 
standard charsets in the JDK and remove all the extended charsets from JDK 
and use them from ICU4J OR you meant apply that only for the new extended 
charsets. I think it is better to keep the consistency - either take all 
extended charsets from ICU4J or maintain all extended charsets with JDK. 
Keeping some extended charsets within JDK and use ICU4J for other extended 
charsets may confuse the Java user.

> There are hundreds of IBM specific charsets, and with various versions 
that
> probably are not going to be used by most of the Java developers in most
> normal use scenarios. I would assume it's NOT in our best interest to 
> implement,
> test and support/maintain them in openjdk project/repo.

As I explained in the previous note, the IBM charsets are not just 
applicable to IBM Platforms. It can be applicable to any platforms if Java 
has to access an application or database on IBM platforms. For example, if 
a Java application on Linux or Windows requires EBCDIC code pages for JDBC 
to access DB2 on z/OS.
 
Precisely, there are 75 charsets from IBM JDK which are currently missing 
in openjdk. Within the 75 missing charsets, 2 charsets are default 
charsets on AIX platform. Out of the remaining 73 charsets, some of them 
are default charsets for z/OS platform. We would like to contribute those 
75 charsets to openJDK. 
 
> For those to be kept in openjdk repo, they could/should be separated 
from
> the extended charsets and kept into a separate package/module as well, 
and
> probably are only configured to build into the binaries for IBM 
> platforms, and
> maintained by the AIX port.

Yes. I agree it makes sense to keep all IBM charsets as a separate charset 
provider / module and include them on a need basis to reduce the 
footprint. If everybody agrees, we can start working towards creating a 
separate charset provider/module for IBM charsets in JDK. Please advise. 
 
Thank you,
Nasser Ebrahim




From:   Xueming Shen 
To: core-libs-dev@openjdk.java.net
Date:   07/05/2018 10:14 PM
Subject:    Re: Adding new IBM extended charsets
Sent by:"core-libs-dev" 



On 7/4/18, 5:41 AM, Nasser Ebrahim wrote:
> Hello,
>
> Am starting this mail thread to discuss about adding new IBM extended
> charsets. The questions is whether we need to add the new extended
> charsets to jdk.charsets or to a new separate charset provider/module 
like
> jdk.ibmcharsets. This discussion is in continuation of the suggestion 
from
> Alan Bateman in the mail chain -
> 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/053316.html
.
>
>
> Am copying his inputs from that mail thread to start the discussion:
> "I think we should start a discussion here about moving some or all of 
the
> IBM charsets to their own service provider module. I realize the AIX 
port
> might want to include some of them in its build of java.base but they
> aren't interesting to include in java.base, or even jdk.charsets, on 
most
> platform"
>
> First, let me clarify whether IBM charsets are applicable only to IBM
> platforms like AIX or applicable to other platforms as well. All IBM
> charsets are applicable to any platforms including Linux and windows if
> those platforms needs to communicate with an application or database in
> IBM platforms like AIX. That is the reason, we traditionally add them to
> the jdk.charsets. However, we agree with Alan that those IBM charsets 
are
> not required if the JDK is not communicating to any 
applications/databases
> on IBM platforms. Hence, it makes sense to consider a separate charset
> provider / module for IBM charsets and use build parameters to decide
> whether to generate the new charset provider or not for any platforms.
>
> Let me list out all the possible options I can think of for adding new
> extended charsets so that we can discuss and decide which is the best
> option.
>
> 1) Continue to add new extended charsets to jdk.charsets.
> The advantage with this approach is that no need to add new charset
> provider and all extended charsets are placed in one module. Also, any
> extended charset is applicable to any platform if they need to 
communicate
> with application/database in different platforms. The disadvantage is 
that
> the number of charsets in jdk.ch

Re: Adding new IBM extended charsets

2018-07-06 Thread Nasser Ebrahim
Hi Florian,

Thank you for your response. iconv is platform dependent and not good for 
the platform agnostic nature of Java. Also, many charsets in Java are not 
available across platforms. I believe Java decided to have its own 
charsets due to those reasons so that it can work seamlessly on any 
supported platforms.

Thank you,
Nasser Ebrahim 



From:   Florian Weimer 
To: Nasser Ebrahim , Java Core Libs 

Date:   07/04/2018 07:11 PM
Subject:Re: Adding new IBM extended charsets



On 07/04/2018 02:41 PM, Nasser Ebrahim wrote:
> Please share your thoughts on your preferred option and list out any 
other
> options which I missed out. Thank you for your time.

Could you use the platform iconv implementation instead?  That would 
avoid shipping the tables in the JDK.

Thanks,
Florian







Re: Adding new IBM extended charsets

2018-07-05 Thread Xueming Shen

On 7/4/18, 5:41 AM, Nasser Ebrahim wrote:

Hello,

Am starting this mail thread to discuss about adding new IBM extended
charsets. The questions is whether we need to add the new extended
charsets to jdk.charsets or to a new separate charset provider/module like
jdk.ibmcharsets. This discussion is in continuation of the suggestion from
Alan Bateman in the mail chain -
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/053316.html.


Am copying his inputs from that mail thread to start the discussion:
"I think we should start a discussion here about moving some or all of the
IBM charsets to their own service provider module. I realize the AIX port
might want to include some of them in its build of java.base but they
aren't interesting to include in java.base, or even jdk.charsets, on most
platform"

First, let me clarify whether IBM charsets are applicable only to IBM
platforms like AIX or applicable to other platforms as well. All IBM
charsets are applicable to any platforms including Linux and windows if
those platforms needs to communicate with an application or database in
IBM platforms like AIX. That is the reason, we traditionally add them to
the jdk.charsets. However, we agree with Alan that those IBM charsets are
not required if the JDK is not communicating to any applications/databases
on IBM platforms. Hence, it makes sense to consider a separate charset
provider / module for IBM charsets and use build parameters to decide
whether to generate the new charset provider or not for any platforms.

Let me list out all the possible options I can think of for adding new
extended charsets so that we can discuss and decide which is the best
option.

1) Continue to add new extended charsets to jdk.charsets.
The advantage with this approach is that no need to add new charset
provider and all extended charsets are placed in one module. Also, any
extended charset is applicable to any platform if they need to communicate
with application/database in different platforms. The disadvantage is that
the number of charsets in jdk.charsets keep increasing and blot its size.
Also, many of those charsets may not be used in the lifetime of the JDK
unless it is communicating  with application/databases of those platforms.

2) Create a new charset provider and module (say jdk.ibmcharsets) for all
IBM charsets and include the new module in JDK on a need basis.
The advantage with this approach is that the foot print of jdk.charsets
can be reduced and can include the new module only if it is required. The
disadvantage is that a new charset provider needs to be created. Also,
extended charsets will be located in two different modules and many a
times both the modules are required.

3) Remove all extended charsets from JDK (keep only default charsets) and
use the extended charsets from third party like ICU4J.
I believe this option might be discussed in the past and there might be
valid reason not to pursue this option. Am still listing it to ensure that
we have considered this option as well. The advantage with this approach
is that we can avoid maintaining the same charsets by two different open
source communities. The disadvantage with this option is that the release
cycle of the two communities may be different and we may need to maintain
the level ourselves for LTS releases as we may not want to change the
specification in a service stream.

Please share your thoughts on your preferred option and list out any other
options which I missed out. Thank you for your time.



Hi Nasser,

I would assume the best option might be (3), in which only keeps those
ibm charsets that might be used/configured to be the default charset for
vm startup on IBM platform, and leaves the rest to icu4j charsets provider.

There are hundreds of IBM specific charsets, and with various versions that
probably are not going to be used by most of the Java developers in most
normal use scenarios. I would assume it's NOT in our best interest to 
implement,

test and support/maintain them in openjdk project/repo. Since the jar is
maven repo already, any request for the support of those charsets can be 
pointed

to there accordingly. I did not know there is such package/jar in the maven.
I did check the icu4j years ago and there was no nio provider interface 
provided

back then.

For those to be kept in openjdk repo, they could/should be separated from
the extended charsets and kept into a separate package/module as well, and
probably are only configured to build into the binaries for IBM 
platforms, and

maintained by the AIX port.

Thanks,
Sherman





Re: Adding new IBM extended charsets

2018-07-04 Thread Florian Weimer

On 07/04/2018 02:41 PM, Nasser Ebrahim wrote:

Please share your thoughts on your preferred option and list out any other
options which I missed out. Thank you for your time.


Could you use the platform iconv implementation instead?  That would 
avoid shipping the tables in the JDK.


Thanks,
Florian


Adding new IBM extended charsets

2018-07-04 Thread Nasser Ebrahim
Hello,

Am starting this mail thread to discuss about adding new IBM extended 
charsets. The questions is whether we need to add the new extended 
charsets to jdk.charsets or to a new separate charset provider/module like 
jdk.ibmcharsets. This discussion is in continuation of the suggestion from 
Alan Bateman in the mail chain - 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/053316.html. 


Am copying his inputs from that mail thread to start the discussion:
"I think we should start a discussion here about moving some or all of the 
IBM charsets to their own service provider module. I realize the AIX port 
might want to include some of them in its build of java.base but they 
aren't interesting to include in java.base, or even jdk.charsets, on most 
platform"

First, let me clarify whether IBM charsets are applicable only to IBM 
platforms like AIX or applicable to other platforms as well. All IBM 
charsets are applicable to any platforms including Linux and windows if 
those platforms needs to communicate with an application or database in 
IBM platforms like AIX. That is the reason, we traditionally add them to 
the jdk.charsets. However, we agree with Alan that those IBM charsets are 
not required if the JDK is not communicating to any applications/databases 
on IBM platforms. Hence, it makes sense to consider a separate charset 
provider / module for IBM charsets and use build parameters to decide 
whether to generate the new charset provider or not for any platforms.

Let me list out all the possible options I can think of for adding new 
extended charsets so that we can discuss and decide which is the best 
option.

1) Continue to add new extended charsets to jdk.charsets. 
The advantage with this approach is that no need to add new charset 
provider and all extended charsets are placed in one module. Also, any 
extended charset is applicable to any platform if they need to communicate 
with application/database in different platforms. The disadvantage is that 
the number of charsets in jdk.charsets keep increasing and blot its size. 
Also, many of those charsets may not be used in the lifetime of the JDK 
unless it is communicating  with application/databases of those platforms.

2) Create a new charset provider and module (say jdk.ibmcharsets) for all 
IBM charsets and include the new module in JDK on a need basis.
The advantage with this approach is that the foot print of jdk.charsets 
can be reduced and can include the new module only if it is required. The 
disadvantage is that a new charset provider needs to be created. Also, 
extended charsets will be located in two different modules and many a 
times both the modules are required.

3) Remove all extended charsets from JDK (keep only default charsets) and 
use the extended charsets from third party like ICU4J.
I believe this option might be discussed in the past and there might be 
valid reason not to pursue this option. Am still listing it to ensure that 
we have considered this option as well. The advantage with this approach 
is that we can avoid maintaining the same charsets by two different open 
source communities. The disadvantage with this option is that the release 
cycle of the two communities may be different and we may need to maintain 
the level ourselves for LTS releases as we may not want to change the 
specification in a service stream.

Please share your thoughts on your preferred option and list out any other 
options which I missed out. Thank you for your time.

Regards,
Nasser Ebrahim