Re: Adding new IBM extended charsets
Sure Magnus. Thank you for your suggestion. Once am ready with the prototype, I will initiate the infrastructure related discussion to build-...@openjdk.java.net and continue the streamlining of the extended charsets related discussion here. Thank you, Nasser Ebrahim From: Magnus Ihse Bursie To: Alan Bateman , Nasser Ebrahim , core-libs-dev@openjdk.java.net, build-dev Date: 08/22/2018 02:27 PM Subject:Re: Adding new IBM extended charsets On 2018-08-05 20:38, Alan Bateman wrote: > > As regards the way forward then I think we have to put infrastructure > into the build to make it easy to allow specific charsets be included > or excluded from specific platforms. As things stand, and as have you > have found with your updates to the stdcs- files, the > charsets are generated to be included in either java.base or > jdk.charsets. We need another input to the configurability to make it > possible to include or exclude so that the main stream platforms do > not have to include the IBM charsets. There are several details around > this, particularly around aliases, but if we can get that done then we > have a lot of flexibility. And when you are ready to discuss the required build changes, please head over to build-...@openjdk.java.net. ;-) The build part of the charset handling is less-than-ideal. I hope we can find a way to clean this up as well, as part of streamlining the included charsets. /Magnus
Re: Adding new IBM extended charsets
Hi Alan, Thank you for your valuable inputs. I will initiate the discussion with ICU4J community to explore the possibility of using ICU4J by resolving the compatibility and performance difference so that we can use ICU4J for most of the extended charsets and remove them JDK build. As we discussed earlier, significant changes are required on ICU4J side to resolve the functional and performance difference for JDK to directly consume it and hence may be considered as a long term solution. In the mean time, I can explore the other option you have suggested to make the IBM charsets specific to AIX platform and keep optional for other platforms by making the make file changes. I will try to create a prototype to do the make/src file changes which enable generating IBM charsets as a separate module only on AIX platform and keep optional for other platforms. Please let me know if you have any inputs. Thank you, Nasser Ebrahim From: Alan Bateman To: Nasser Ebrahim , core-libs-dev@openjdk.java.net, Xueming Shen Date: 08/06/2018 12:08 AM Subject:Re: Adding new IBM extended charsets On 24/07/2018 09:56, Nasser Ebrahim wrote: Thank you Martin, Sherman and Alan for your valuable inputs. I have done some initial analysis on the ICU4J. There are some compatibility issues on the ICU4J charsets with JDK charsets but am more concerned about its performance as JDK optimization do no exist in that implementation. I think we need to work with the ICU4J community to resolve those issues before we remove those charsets from JDK. If you can work with the ICU4J project on these issues then I think we have a way forward. An additional issue with their downloads is that they target JDK 6 and don't seem to have thought about deploying as modules with JDK 9 or newer yet. Their downloads can be used as automatic modules but it requires renaming their JAR files due to unusual naming that they use to encode the version string. A simple Automatic-Module-Name attribute would make it easy for developers to deploy their charset provider on the module path, they can still target JDK 6. As regards the way forward then I think we have to put infrastructure into the build to make it easy to allow specific charsets be included or excluded from specific platforms. As things stand, and as have you have found with your updates to the stdcs- files, the charsets are generated to be included in either java.base or jdk.charsets. We need another input to the configurability to make it possible to include or exclude so that the main stream platforms do not have to include the IBM charsets. There are several details around this, particularly around aliases, but if we can get that done then we have a lot of flexibility. My personal view is that we should work towards excluding the IBM charsets from the main stream platforms, starting with a cull of the EBCDIC charsets. If the ICU4J project can get their issues sorted out in a similar time frame then it makes for a simple migration story -- the JDK includes the standard charsets and many additional charsets. If you need others then download the ICU4J charset provider and deploy it on your class path or module path. -Alan
Re: Adding new IBM extended charsets
On 2018-08-05 20:38, Alan Bateman wrote: As regards the way forward then I think we have to put infrastructure into the build to make it easy to allow specific charsets be included or excluded from specific platforms. As things stand, and as have you have found with your updates to the stdcs- files, the charsets are generated to be included in either java.base or jdk.charsets. We need another input to the configurability to make it possible to include or exclude so that the main stream platforms do not have to include the IBM charsets. There are several details around this, particularly around aliases, but if we can get that done then we have a lot of flexibility. And when you are ready to discuss the required build changes, please head over to build-...@openjdk.java.net. ;-) The build part of the charset handling is less-than-ideal. I hope we can find a way to clean this up as well, as part of streamlining the included charsets. /Magnus
Re: Adding new IBM extended charsets
On 07/06/2018 02:23 PM, Nasser Ebrahim wrote: Hi Florian, Thank you for your response. iconv is platform dependent and not good for the platform agnostic nature of Java. Also, many charsets in Java are not available across platforms. I believe Java decided to have its own charsets due to those reasons so that it can work seamlessly on any supported platforms. But then the tables will occasionally be different from what the platform uses for conversation. Wouldn't be compatibility with the rest of the system installation be preferred in this context? Thanks, Florian
Re: Adding new IBM extended charsets
On 24/07/2018 09:56, Nasser Ebrahim wrote: Thank you Martin, Sherman and Alan for your valuable inputs. I have done some initial analysis on the ICU4J. There are some compatibility issues on the ICU4J charsets with JDK charsets but am more concerned about its performance as JDK optimization do no exist in that implementation. I think we need to work with the ICU4J community to resolve those issues before we remove those charsets from JDK. If you can work with the ICU4J project on these issues then I think we have a way forward. An additional issue with their downloads is that they target JDK 6 and don't seem to have thought about deploying as modules with JDK 9 or newer yet. Their downloads can be used as automatic modules but it requires renaming their JAR files due to unusual naming that they use to encode the version string. A simple Automatic-Module-Name attribute would make it easy for developers to deploy their charset provider on the module path, they can still target JDK 6. As regards the way forward then I think we have to put infrastructure into the build to make it easy to allow specific charsets be included or excluded from specific platforms. As things stand, and as have you have found with your updates to the stdcs- files, the charsets are generated to be included in either java.base or jdk.charsets. We need another input to the configurability to make it possible to include or exclude so that the main stream platforms do not have to include the IBM charsets. There are several details around this, particularly around aliases, but if we can get that done then we have a lot of flexibility. My personal view is that we should work towards excluding the IBM charsets from the main stream platforms, starting with a cull of the EBCDIC charsets. If the ICU4J project can get their issues sorted out in a similar time frame then it makes for a simple migration story -- the JDK includes the standard charsets and many additional charsets. If you need others then download the ICU4J charset provider and deploy it on your class path or module path. -Alan
Re: Adding new IBM extended charsets
Thank you Martin, Sherman and Alan for your valuable inputs. I have done some initial analysis on the ICU4J. There are some compatibility issues on the ICU4J charsets with JDK charsets but am more concerned about its performance as JDK optimization do no exist in that implementation. I think we need to work with the ICU4J community to resolve those issues before we remove those charsets from JDK. The primary reason we are interested to contribute the charsets to openjdk is that Java users of all locales to get a seamless experience when they move between openjdk and other implementations. I agree it is good from footprint and maintenance perspective if we are able to reduce the number of charsets. I believe the maintenance effort on the charsets are usually less as we hardly make any changes to the charsets once developed. Also, the charsets are usually independent to each other and hence usually will not affect the Java users unless they are used. As more team members from my team would like to actively participate in the openjdk community, I hope maintenance of any issues reported on IBM charsets may not be an issue going forward. As we discussed before, the footprint issue can be avoided if we enable the IBM charsets on a need basis with a build flag. As you advised, we can enable the IBM charsets only for AIX platform by default and user can enable them on other platforms on a need basis. If all of you agree, we can start working on moving all IBM charsets from jdk.charsets to a different module jdk.ibm.charsets and enable them only for AIX platform by default. We can consider removing them from JDK in future if community found them as an overhead or not adding value. Please advise. Thank you, Nasser Ebrahim From: Alan Bateman To: Xueming Shen , Nasser Ebrahim Cc: core-libs-dev@openjdk.java.net Date: 07/19/2018 03:44 PM Subject:Re: Adding new IBM extended charsets On 19/07/2018 08:27, Xueming Shen wrote: > Hi Nasser, > > From openjdk's perspective It would be preferred to direct the develop > to use the charset > implementation provided by IBM, or the reliable third party that has > the appropriate knowledge, > experience and resource to support/maintain those charsets such as the > icu4j charset > project. I have been pulling the data from that huge icu-charset-data > file and implement/maintain > them based on my best knowledge, but I'm sure engineers from IBM or > the icu project probably > can do a much better job to implement/maintain/update those charsets > going forward. > > As first step we can separate those IBM charsets from the jdk.charset > into a separate package > somewhere and configure them to be built into java.base and > jdk.charsets, for aix platform only. > Then we can further discuss the best way to handle/distribute those > charsets that are not needed > for the java.base module (for vm startup). As I said, it would be > ideal if we can remove them from the > openjdk repo/binaries complete and direct the developer/user to use > the icu4j charset provider > for those encodings, when needed. But given the possible compatibility > concern, we might want to > phase this work out gradually in next major release. I agree and in terms of phasing then I don't think it would be too disruptive if the EBCDIC charsets were dropped from jdk.charsets in JDK 12, at least on the main stream platforms. As we've established in this thread, the ICU4J project does seem to publish its charset provider to Maven so there are alternatives for applications that really need these charsets Nasser - do you do any testing with the ICU4J charsets? I quickly tried 62.1 and it seems to work fine on the class path. I didn't check for any compatibility differences or compare the performance but maybe you have. It's a bit awkward to test this provider as an automatic module due to the unusual naming of these JAR files. They may not have looked at modules yet but the ability to link thee icu4h.charsets module into a run-time image seems something that people may want to do in the future. -Alan
Re: Adding new IBM extended charsets
On 19/07/2018 08:27, Xueming Shen wrote: Hi Nasser, From openjdk's perspective It would be preferred to direct the develop to use the charset implementation provided by IBM, or the reliable third party that has the appropriate knowledge, experience and resource to support/maintain those charsets such as the icu4j charset project. I have been pulling the data from that huge icu-charset-data file and implement/maintain them based on my best knowledge, but I'm sure engineers from IBM or the icu project probably can do a much better job to implement/maintain/update those charsets going forward. As first step we can separate those IBM charsets from the jdk.charset into a separate package somewhere and configure them to be built into java.base and jdk.charsets, for aix platform only. Then we can further discuss the best way to handle/distribute those charsets that are not needed for the java.base module (for vm startup). As I said, it would be ideal if we can remove them from the openjdk repo/binaries complete and direct the developer/user to use the icu4j charset provider for those encodings, when needed. But given the possible compatibility concern, we might want to phase this work out gradually in next major release. I agree and in terms of phasing then I don't think it would be too disruptive if the EBCDIC charsets were dropped from jdk.charsets in JDK 12, at least on the main stream platforms. As we've established in this thread, the ICU4J project does seem to publish its charset provider to Maven so there are alternatives for applications that really need these charsets Nasser - do you do any testing with the ICU4J charsets? I quickly tried 62.1 and it seems to work fine on the class path. I didn't check for any compatibility differences or compare the performance but maybe you have. It's a bit awkward to test this provider as an automatic module due to the unusual naming of these JAR files. They may not have looked at modules yet but the ability to link thee icu4h.charsets module into a run-time image seems something that people may want to do in the future. -Alan
Re: Adding new IBM extended charsets
Hi Nasser, From openjdk's perspective It would be preferred to direct the develop to use the charset implementation provided by IBM, or the reliable third party that has the appropriate knowledge, experience and resource to support/maintain those charsets such as the icu4j charset project. I have been pulling the data from that huge icu-charset-data file and implement/maintain them based on my best knowledge, but I'm sure engineers from IBM or the icu project probably can do a much better job to implement/maintain/update those charsets going forward. As first step we can separate those IBM charsets from the jdk.charset into a separate package somewhere and configure them to be built into java.base and jdk.charsets, for aix platform only. Then we can further discuss the best way to handle/distribute those charsets that are not needed for the java.base module (for vm startup). As I said, it would be ideal if we can remove them from the openjdk repo/binaries complete and direct the developer/user to use the icu4j charset provider for those encodings, when needed. But given the possible compatibility concern, we might want to phase this work out gradually in next major release. Thanks, Sherman On 7/17/18, 6:48 AM, Nasser Ebrahim wrote: Hi Alan, Thank you for your inputs. I would like to clarify that all the IBM charsets (IBM) in jdk.charsets are not IBM platform specific charsets. For example, only 43 charsets out of 72 IBM in jdk.charsets are EBCDIC or IBM platform specific charsets. Similarly, many charsets in the list of 75 charsets which we would like to contribute are not EBCDIC charsets. I feel we should have a standard guideline for the extended charsets. If we are keeping the extended charsets in the JDK, then we may want to consider all ICU/IANA approved charsets in JDK. Otherwise, we may want to keep only the standard charsets in JDK and remove all the extended charsets so that all extended charsets can be taken from third party libraries like ICU4J. If we decided to keep the extended charsets, then may be we can classify the extended charsets as ASCII and EBCDIC and the corresponding modules as jdk.ascii.charset and jdk.ebcdic.charset. Then, depends upon the platform, we can consider including either of the charset module or both. Please advise. Thank you, Nasser Ebrahim From: Alan Bateman To: Nasser Ebrahim , Xueming Shen , core-libs-dev@openjdk.java.net Date: 07/09/2018 01:25 AM Subject: Re: Adding new IBM extended charsets On 06/07/2018 14:56, Nasser Ebrahim wrote: > : > I understood you preferred option is 3 [Remove all extended charsets from > JDK (keep only default charsets) and use the extended charsets from third > party like ICU4J]. Just to confirm, so you meant we need to keep only the > standard charsets in the JDK and remove all the extended charsets from JDK > and use them from ICU4J OR you meant apply that only for the new extended > charsets. I think it is better to keep the consistency - either take all > extended charsets from ICU4J or maintain all extended charsets with JDK. > Keeping some extended charsets within JDK and use ICU4J for other extended > charsets may confuse the Java user. I think the suggestion in Sherman's mail is to drop the 70 or so IBM charsets from jdk.charsets. This will reduce the size of jdk.charsets and eliminate the need to maintain these charsets (at least on non-AIX builds). If developers need these charsets, say when connecting to database on an IBM system, then they can deploy the ICU4J provider on the class path or module path. I don't think the suggestion impacts the 11 IBM charsets in java.base on non-AIX builds or the non-IBM charsets in jdk.charsets. They may be opportunities to drop some of these but that can be looked at separately. Also I don't think the suggestion impacts the additional 12 IBM charsets that are included in the AIX build of java.base at this time. From the review threads, it seems there are supported locales on AIX that map to these charsets so this is why they are in java.base. -Alan.
Re: Adding new IBM extended charsets
History: I recall 15 years ago wondering why no one from IBM was maintaining the IBM charsets in the jdk. Today anything but UTF-* is increasingly legacy, so the case for including them is weaker than back then. Probably today the IBM charsets are best maintained outside the jdk sources as a third party package, especially considering that openjdk has been kicking some other components like corba out of the nest. But there is some synergy - some of the openjdk charset tests check invariants that apply to all charset implementations. I may have even fixed some IBM charset code many years ago.
Re: Adding new IBM extended charsets
Hi Alan, Thank you for your inputs. I would like to clarify that all the IBM charsets (IBM) in jdk.charsets are not IBM platform specific charsets. For example, only 43 charsets out of 72 IBM in jdk.charsets are EBCDIC or IBM platform specific charsets. Similarly, many charsets in the list of 75 charsets which we would like to contribute are not EBCDIC charsets. I feel we should have a standard guideline for the extended charsets. If we are keeping the extended charsets in the JDK, then we may want to consider all ICU/IANA approved charsets in JDK. Otherwise, we may want to keep only the standard charsets in JDK and remove all the extended charsets so that all extended charsets can be taken from third party libraries like ICU4J. If we decided to keep the extended charsets, then may be we can classify the extended charsets as ASCII and EBCDIC and the corresponding modules as jdk.ascii.charset and jdk.ebcdic.charset. Then, depends upon the platform, we can consider including either of the charset module or both. Please advise. Thank you, Nasser Ebrahim From: Alan Bateman To: Nasser Ebrahim , Xueming Shen , core-libs-dev@openjdk.java.net Date: 07/09/2018 01:25 AM Subject:Re: Adding new IBM extended charsets On 06/07/2018 14:56, Nasser Ebrahim wrote: > : > I understood you preferred option is 3 [Remove all extended charsets from > JDK (keep only default charsets) and use the extended charsets from third > party like ICU4J]. Just to confirm, so you meant we need to keep only the > standard charsets in the JDK and remove all the extended charsets from JDK > and use them from ICU4J OR you meant apply that only for the new extended > charsets. I think it is better to keep the consistency - either take all > extended charsets from ICU4J or maintain all extended charsets with JDK. > Keeping some extended charsets within JDK and use ICU4J for other extended > charsets may confuse the Java user. I think the suggestion in Sherman's mail is to drop the 70 or so IBM charsets from jdk.charsets. This will reduce the size of jdk.charsets and eliminate the need to maintain these charsets (at least on non-AIX builds). If developers need these charsets, say when connecting to database on an IBM system, then they can deploy the ICU4J provider on the class path or module path. I don't think the suggestion impacts the 11 IBM charsets in java.base on non-AIX builds or the non-IBM charsets in jdk.charsets. They may be opportunities to drop some of these but that can be looked at separately. Also I don't think the suggestion impacts the additional 12 IBM charsets that are included in the AIX build of java.base at this time. From the review threads, it seems there are supported locales on AIX that map to these charsets so this is why they are in java.base. -Alan.
Re: Adding new IBM extended charsets
On 06/07/2018 14:56, Nasser Ebrahim wrote: : I understood you preferred option is 3 [Remove all extended charsets from JDK (keep only default charsets) and use the extended charsets from third party like ICU4J]. Just to confirm, so you meant we need to keep only the standard charsets in the JDK and remove all the extended charsets from JDK and use them from ICU4J OR you meant apply that only for the new extended charsets. I think it is better to keep the consistency - either take all extended charsets from ICU4J or maintain all extended charsets with JDK. Keeping some extended charsets within JDK and use ICU4J for other extended charsets may confuse the Java user. I think the suggestion in Sherman's mail is to drop the 70 or so IBM charsets from jdk.charsets. This will reduce the size of jdk.charsets and eliminate the need to maintain these charsets (at least on non-AIX builds). If developers need these charsets, say when connecting to database on an IBM system, then they can deploy the ICU4J provider on the class path or module path. I don't think the suggestion impacts the 11 IBM charsets in java.base on non-AIX builds or the non-IBM charsets in jdk.charsets. They may be opportunities to drop some of these but that can be looked at separately. Also I don't think the suggestion impacts the additional 12 IBM charsets that are included in the AIX build of java.base at this time. From the review threads, it seems there are supported locales on AIX that map to these charsets so this is why they are in java.base. -Alan.
Re: Adding new IBM extended charsets
Hi Sherman, Thank you for your valuable inputs. I would like to clarify some of your points. > I would assume the best option might be (3), in which only keeps those > ibm charsets that might be used/configured to be the default charset for > vm startup on IBM platform, and leaves the rest to icu4j charsets provider. I understood you preferred option is 3 [Remove all extended charsets from JDK (keep only default charsets) and use the extended charsets from third party like ICU4J]. Just to confirm, so you meant we need to keep only the standard charsets in the JDK and remove all the extended charsets from JDK and use them from ICU4J OR you meant apply that only for the new extended charsets. I think it is better to keep the consistency - either take all extended charsets from ICU4J or maintain all extended charsets with JDK. Keeping some extended charsets within JDK and use ICU4J for other extended charsets may confuse the Java user. > There are hundreds of IBM specific charsets, and with various versions that > probably are not going to be used by most of the Java developers in most > normal use scenarios. I would assume it's NOT in our best interest to > implement, > test and support/maintain them in openjdk project/repo. As I explained in the previous note, the IBM charsets are not just applicable to IBM Platforms. It can be applicable to any platforms if Java has to access an application or database on IBM platforms. For example, if a Java application on Linux or Windows requires EBCDIC code pages for JDBC to access DB2 on z/OS. Precisely, there are 75 charsets from IBM JDK which are currently missing in openjdk. Within the 75 missing charsets, 2 charsets are default charsets on AIX platform. Out of the remaining 73 charsets, some of them are default charsets for z/OS platform. We would like to contribute those 75 charsets to openJDK. > For those to be kept in openjdk repo, they could/should be separated from > the extended charsets and kept into a separate package/module as well, and > probably are only configured to build into the binaries for IBM > platforms, and > maintained by the AIX port. Yes. I agree it makes sense to keep all IBM charsets as a separate charset provider / module and include them on a need basis to reduce the footprint. If everybody agrees, we can start working towards creating a separate charset provider/module for IBM charsets in JDK. Please advise. Thank you, Nasser Ebrahim From: Xueming Shen To: core-libs-dev@openjdk.java.net Date: 07/05/2018 10:14 PM Subject: Re: Adding new IBM extended charsets Sent by:"core-libs-dev" On 7/4/18, 5:41 AM, Nasser Ebrahim wrote: > Hello, > > Am starting this mail thread to discuss about adding new IBM extended > charsets. The questions is whether we need to add the new extended > charsets to jdk.charsets or to a new separate charset provider/module like > jdk.ibmcharsets. This discussion is in continuation of the suggestion from > Alan Bateman in the mail chain - > http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/053316.html . > > > Am copying his inputs from that mail thread to start the discussion: > "I think we should start a discussion here about moving some or all of the > IBM charsets to their own service provider module. I realize the AIX port > might want to include some of them in its build of java.base but they > aren't interesting to include in java.base, or even jdk.charsets, on most > platform" > > First, let me clarify whether IBM charsets are applicable only to IBM > platforms like AIX or applicable to other platforms as well. All IBM > charsets are applicable to any platforms including Linux and windows if > those platforms needs to communicate with an application or database in > IBM platforms like AIX. That is the reason, we traditionally add them to > the jdk.charsets. However, we agree with Alan that those IBM charsets are > not required if the JDK is not communicating to any applications/databases > on IBM platforms. Hence, it makes sense to consider a separate charset > provider / module for IBM charsets and use build parameters to decide > whether to generate the new charset provider or not for any platforms. > > Let me list out all the possible options I can think of for adding new > extended charsets so that we can discuss and decide which is the best > option. > > 1) Continue to add new extended charsets to jdk.charsets. > The advantage with this approach is that no need to add new charset > provider and all extended charsets are placed in one module. Also, any > extended charset is applicable to any platform if they need to communicate > with application/database in different platforms. The disadvantage is that > the number of charsets in jdk.ch
Re: Adding new IBM extended charsets
Hi Florian, Thank you for your response. iconv is platform dependent and not good for the platform agnostic nature of Java. Also, many charsets in Java are not available across platforms. I believe Java decided to have its own charsets due to those reasons so that it can work seamlessly on any supported platforms. Thank you, Nasser Ebrahim From: Florian Weimer To: Nasser Ebrahim , Java Core Libs Date: 07/04/2018 07:11 PM Subject:Re: Adding new IBM extended charsets On 07/04/2018 02:41 PM, Nasser Ebrahim wrote: > Please share your thoughts on your preferred option and list out any other > options which I missed out. Thank you for your time. Could you use the platform iconv implementation instead? That would avoid shipping the tables in the JDK. Thanks, Florian
Re: Adding new IBM extended charsets
On 7/4/18, 5:41 AM, Nasser Ebrahim wrote: Hello, Am starting this mail thread to discuss about adding new IBM extended charsets. The questions is whether we need to add the new extended charsets to jdk.charsets or to a new separate charset provider/module like jdk.ibmcharsets. This discussion is in continuation of the suggestion from Alan Bateman in the mail chain - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/053316.html. Am copying his inputs from that mail thread to start the discussion: "I think we should start a discussion here about moving some or all of the IBM charsets to their own service provider module. I realize the AIX port might want to include some of them in its build of java.base but they aren't interesting to include in java.base, or even jdk.charsets, on most platform" First, let me clarify whether IBM charsets are applicable only to IBM platforms like AIX or applicable to other platforms as well. All IBM charsets are applicable to any platforms including Linux and windows if those platforms needs to communicate with an application or database in IBM platforms like AIX. That is the reason, we traditionally add them to the jdk.charsets. However, we agree with Alan that those IBM charsets are not required if the JDK is not communicating to any applications/databases on IBM platforms. Hence, it makes sense to consider a separate charset provider / module for IBM charsets and use build parameters to decide whether to generate the new charset provider or not for any platforms. Let me list out all the possible options I can think of for adding new extended charsets so that we can discuss and decide which is the best option. 1) Continue to add new extended charsets to jdk.charsets. The advantage with this approach is that no need to add new charset provider and all extended charsets are placed in one module. Also, any extended charset is applicable to any platform if they need to communicate with application/database in different platforms. The disadvantage is that the number of charsets in jdk.charsets keep increasing and blot its size. Also, many of those charsets may not be used in the lifetime of the JDK unless it is communicating with application/databases of those platforms. 2) Create a new charset provider and module (say jdk.ibmcharsets) for all IBM charsets and include the new module in JDK on a need basis. The advantage with this approach is that the foot print of jdk.charsets can be reduced and can include the new module only if it is required. The disadvantage is that a new charset provider needs to be created. Also, extended charsets will be located in two different modules and many a times both the modules are required. 3) Remove all extended charsets from JDK (keep only default charsets) and use the extended charsets from third party like ICU4J. I believe this option might be discussed in the past and there might be valid reason not to pursue this option. Am still listing it to ensure that we have considered this option as well. The advantage with this approach is that we can avoid maintaining the same charsets by two different open source communities. The disadvantage with this option is that the release cycle of the two communities may be different and we may need to maintain the level ourselves for LTS releases as we may not want to change the specification in a service stream. Please share your thoughts on your preferred option and list out any other options which I missed out. Thank you for your time. Hi Nasser, I would assume the best option might be (3), in which only keeps those ibm charsets that might be used/configured to be the default charset for vm startup on IBM platform, and leaves the rest to icu4j charsets provider. There are hundreds of IBM specific charsets, and with various versions that probably are not going to be used by most of the Java developers in most normal use scenarios. I would assume it's NOT in our best interest to implement, test and support/maintain them in openjdk project/repo. Since the jar is maven repo already, any request for the support of those charsets can be pointed to there accordingly. I did not know there is such package/jar in the maven. I did check the icu4j years ago and there was no nio provider interface provided back then. For those to be kept in openjdk repo, they could/should be separated from the extended charsets and kept into a separate package/module as well, and probably are only configured to build into the binaries for IBM platforms, and maintained by the AIX port. Thanks, Sherman
Re: Adding new IBM extended charsets
On 07/04/2018 02:41 PM, Nasser Ebrahim wrote: Please share your thoughts on your preferred option and list out any other options which I missed out. Thank you for your time. Could you use the platform iconv implementation instead? That would avoid shipping the tables in the JDK. Thanks, Florian
Adding new IBM extended charsets
Hello, Am starting this mail thread to discuss about adding new IBM extended charsets. The questions is whether we need to add the new extended charsets to jdk.charsets or to a new separate charset provider/module like jdk.ibmcharsets. This discussion is in continuation of the suggestion from Alan Bateman in the mail chain - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/053316.html. Am copying his inputs from that mail thread to start the discussion: "I think we should start a discussion here about moving some or all of the IBM charsets to their own service provider module. I realize the AIX port might want to include some of them in its build of java.base but they aren't interesting to include in java.base, or even jdk.charsets, on most platform" First, let me clarify whether IBM charsets are applicable only to IBM platforms like AIX or applicable to other platforms as well. All IBM charsets are applicable to any platforms including Linux and windows if those platforms needs to communicate with an application or database in IBM platforms like AIX. That is the reason, we traditionally add them to the jdk.charsets. However, we agree with Alan that those IBM charsets are not required if the JDK is not communicating to any applications/databases on IBM platforms. Hence, it makes sense to consider a separate charset provider / module for IBM charsets and use build parameters to decide whether to generate the new charset provider or not for any platforms. Let me list out all the possible options I can think of for adding new extended charsets so that we can discuss and decide which is the best option. 1) Continue to add new extended charsets to jdk.charsets. The advantage with this approach is that no need to add new charset provider and all extended charsets are placed in one module. Also, any extended charset is applicable to any platform if they need to communicate with application/database in different platforms. The disadvantage is that the number of charsets in jdk.charsets keep increasing and blot its size. Also, many of those charsets may not be used in the lifetime of the JDK unless it is communicating with application/databases of those platforms. 2) Create a new charset provider and module (say jdk.ibmcharsets) for all IBM charsets and include the new module in JDK on a need basis. The advantage with this approach is that the foot print of jdk.charsets can be reduced and can include the new module only if it is required. The disadvantage is that a new charset provider needs to be created. Also, extended charsets will be located in two different modules and many a times both the modules are required. 3) Remove all extended charsets from JDK (keep only default charsets) and use the extended charsets from third party like ICU4J. I believe this option might be discussed in the past and there might be valid reason not to pursue this option. Am still listing it to ensure that we have considered this option as well. The advantage with this approach is that we can avoid maintaining the same charsets by two different open source communities. The disadvantage with this option is that the release cycle of the two communities may be different and we may need to maintain the level ourselves for LTS releases as we may not want to change the specification in a service stream. Please share your thoughts on your preferred option and list out any other options which I missed out. Thank you for your time. Regards, Nasser Ebrahim