Re: Review/comment needed for the new public java.util.Base64 class
Ulf, The base64 implementation is in TL right now. It does address some of the issue you suggested in this email. And I remember I did try use "byte[]" alphabets, which I don't recall bring us any benefit, but I'm not sure in which setup. The latest is at http://cr.openjdk.java.net/~sherman/8004212/webrev/ in which I'm trying to fix a corner case of incorrect return value from decode(buf, buf). The Base64 now is in TL does not necessary mean "the issue is closed". You can continue suggest/comment on the latest version for any possible performance improvement, bug fix and even API change, if necessary. -Sherman On 11/30/2012 02:01 PM, Ulf Zibis wrote: Hi Sherman, is this ssue still open ? -Ulf Am 03.11.2012 21:33, schrieb Ulf Zibis: Am 30.10.2012 23:40, schrieb Xueming Shen: I'm "confused" about the order of xxcode() and Xxcoder. In other places e.g. charsets, we have de... before en..., which is also true alphabetical should not be an issue. javadoc output should be in alphabetic order. If it is really concerns you, I can do a copy/paste:-) Yes, it doesn't matter much, but would be a nice conform style, so for me "personally" I would like the copy/paste:-) - What (should) happen(s), if lineSeparator.length == 0 ? do nothing. expected? I would say, empty line separator should not be allowed, so you might check: Objects.requireNonEmpty(lineSeparator); 603 static { 604 Arrays.fill(fromBase64, -1); 605 for (int i = 0; i < Encoder.toBase64.length; i++) 606 fromBase64[Encoder.toBase64[i]] = i; 607 fromBase64['='] = -2; 608 } This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 table to the outer class. Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)? understood. It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the end ... but it appears the hotspot likes it the constant/fixed length lookup table is inside encoder. Yes, but see my suggestion 12 lines below. Same as you suggestion in your previous email to use String in source and expand it during runtime. It saves about 500 bytes but slows the server vm. Are you really sure? As it only runs once at class init, JIT should not compile this code. Executing the bytecode to init the final int[], value for value, by interpreter should not be faster as expanding a String source in a loop. Those repeating lines of "isURL? " is also supposed to be a performance tweak to eliminate the array boundary check in loop. Yep, so I was thinking about: 653 private final int[] base64; 654 private final boolean isMIME; 655 656 private Decoder(boolean isURL, boolean isMIME) { 657 base64 = isURL ? fromBase64URL : fromBase64; 658 this.isMIME = isMIME; 659 } . 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst) { 912 int[] base64 = this.base64; // local copy for performance reasons (but maybe superfluous if HotSpot is intelligent enough) or: 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] base64) { . Why you mix the algorithms to check the padding? : 824 if (b == -2) { // padding byte 890 if (b == '=') { It is supposed reduce one "if" check for normal base64 character inside the loop. I'm not that sure it really helps though. 925 if (b == '=') { --> causes one additional "if" in the _main_ path of the loop, so should be slower for regular input 859 if (b == -2) { // padding byte --> causes one additional "if" in the _side_ path of the loop, so should only affect irregular input Maybe the following code is little faster as no loading of the constant '-2' is required: 858 if ((b = base64[b]) < 0) { 859 if (++b < 0) { // padding byte '=' Once again (the following was meant for the decode algorithm, not initialisation): Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)? Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster LoadB operations could be used by JIT. In an int[] table, the index offset must be shifted by 2 before. Maybe this doesn't directly affect the performance on x86/IA64 CPU, but wastes space in CPU cache for other tasks as a side effect. On ARM architectures I imagine, directly addressing a byte-stepped table could be faster than addressing a 4-byte-stepped table. At least the footprint of the table would be smaller. Little nit: You could delete line 641 for conformity with 629. -Ulf
Re: Review/comment needed for the new public java.util.Base64 class
Hi Sherman, is this ssue still open ? -Ulf Am 03.11.2012 21:33, schrieb Ulf Zibis: Am 30.10.2012 23:40, schrieb Xueming Shen: I'm "confused" about the order of xxcode() and Xxcoder. In other places e.g. charsets, we have de... before en..., which is also true alphabetical should not be an issue. javadoc output should be in alphabetic order. If it is really concerns you, I can do a copy/paste:-) Yes, it doesn't matter much, but would be a nice conform style, so for me "personally" I would like the copy/paste:-) - What (should) happen(s), if lineSeparator.length == 0 ? do nothing. expected? I would say, empty line separator should not be allowed, so you might check: Objects.requireNonEmpty(lineSeparator); 603 static { 604 Arrays.fill(fromBase64, -1); 605 for (int i = 0; i < Encoder.toBase64.length; i++) 606 fromBase64[Encoder.toBase64[i]] = i; 607 fromBase64['='] = -2; 608 } This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 table to the outer class. Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)? understood. It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the end ... but it appears the hotspot likes it the constant/fixed length lookup table is inside encoder. Yes, but see my suggestion 12 lines below. Same as you suggestion in your previous email to use String in source and expand it during runtime. It saves about 500 bytes but slows the server vm. Are you really sure? As it only runs once at class init, JIT should not compile this code. Executing the bytecode to init the final int[], value for value, by interpreter should not be faster as expanding a String source in a loop. Those repeating lines of "isURL? " is also supposed to be a performance tweak to eliminate the array boundary check in loop. Yep, so I was thinking about: 653 private final int[] base64; 654 private final boolean isMIME; 655 656 private Decoder(boolean isURL, boolean isMIME) { 657 base64 = isURL ? fromBase64URL : fromBase64; 658 this.isMIME = isMIME; 659 } . 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst) { 912 int[] base64 = this.base64; // local copy for performance reasons (but maybe superfluous if HotSpot is intelligent enough) or: 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] base64) { . Why you mix the algorithms to check the padding? : 824 if (b == -2) { // padding byte 890 if (b == '=') { It is supposed reduce one "if" check for normal base64 character inside the loop. I'm not that sure it really helps though. 925 if (b == '=') { --> causes one additional "if" in the _main_ path of the loop, so should be slower for regular input 859 if (b == -2) { // padding byte --> causes one additional "if" in the _side_ path of the loop, so should only affect irregular input Maybe the following code is little faster as no loading of the constant '-2' is required: 858 if ((b = base64[b]) < 0) { 859 if (++b < 0) { // padding byte '=' Once again (the following was meant for the decode algorithm, not initialisation): Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)? Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster LoadB operations could be used by JIT. In an int[] table, the index offset must be shifted by 2 before. Maybe this doesn't directly affect the performance on x86/IA64 CPU, but wastes space in CPU cache for other tasks as a side effect. On ARM architectures I imagine, directly addressing a byte-stepped table could be faster than addressing a 4-byte-stepped table. At least the footprint of the table would be smaller. Little nit: You could delete line 641 for conformity with 629. -Ulf
Re: Review/comment needed for the new public java.util.Base64 class
Am 30.10.2012 23:40, schrieb Xueming Shen: I'm "confused" about the order of xxcode() and Xxcoder. In other places e.g. charsets, we have de... before en..., which is also true alphabetical should not be an issue. javadoc output should be in alphabetic order. If it is really concerns you, I can do a copy/paste:-) Yes, it doesn't matter much, but would be a nice conform style, so for me "personally" I would like the copy/paste:-) - What (should) happen(s), if lineSeparator.length == 0 ? do nothing. expected? I would say, empty line separator should not be allowed, so you might check: Objects.requireNonEmpty(lineSeparator); 603 static { 604 Arrays.fill(fromBase64, -1); 605 for (int i = 0; i < Encoder.toBase64.length; i++) 606 fromBase64[Encoder.toBase64[i]] = i; 607 fromBase64['='] = -2; 608 } This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 table to the outer class. Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)? understood. It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the end ... but it appears the hotspot likes it the constant/fixed length lookup table is inside encoder. Yes, but see my suggestion 12 lines below. Same as you suggestion in your previous email to use String in source and expand it during runtime. It saves about 500 bytes but slows the server vm. Are you really sure? As it only runs once at class init, JIT should not compile this code. Executing the bytecode to init the final int[], value for value, by interpreter should not be faster as expanding a String source in a loop. Those repeating lines of "isURL? " is also supposed to be a performance tweak to eliminate the array boundary check in loop. Yep, so I was thinking about: 653 private final int[] base64; 654 private final boolean isMIME; 655 656 private Decoder(boolean isURL, boolean isMIME) { 657 base64 = isURL ? fromBase64URL : fromBase64; 658 this.isMIME = isMIME; 659 } . 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst) { 912 int[] base64 = this.base64; // local copy for performance reasons (but maybe superfluous if HotSpot is intelligent enough) or: 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] base64) { . Why you mix the algorithms to check the padding? : 824 if (b == -2) { // padding byte 890 if (b == '=') { It is supposed reduce one "if" check for normal base64 character inside the loop. I'm not that sure it really helps though. 925 if (b == '=') { --> causes one additional "if" in the _main_ path of the loop, so should be slower for regular input 859 if (b == -2) { // padding byte --> causes one additional "if" in the _side_ path of the loop, so should only affect irregular input Maybe the following code is little faster as no loading of the constant '-2' is required: 858 if ((b = base64[b]) < 0) { 859 if (++b < 0) { // padding byte '=' Once again (the following was meant for the decode algorithm, not initialisation): Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)? Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster LoadB operations could be used by JIT. In an int[] table, the index offset must be shifted by 2 before. Maybe this doesn't directly affect the performance on x86/IA64 CPU, but wastes space in CPU cache for other tasks as a side effect. On ARM architectures I imagine, directly addressing a byte-stepped table could be faster than addressing a 4-byte-stepped table. At least the footprint of the table would be smaller. Little nit: You could delete line 641 for conformity with 629. -Ulf
Re: Review/comment needed for the new public java.util.Base64 class
Hi Ulf, thanks for the comments. Webrev has been updated to address most of your comments. http://cr.openjdk.java.net/~sherman/4235519/webrev (1) all @param and @return tags have been "normalized". (2) private constructor BAse64() {} has been moved up. (3) MIMELINEMAX and CRLF have been moved into encoder. (4) -> lineLength >>2<<2; Your questions: I'm "confused" about the order of xxcode() and Xxcoder. In other places e.g. charsets, we have de... before en..., which is also true alphabetical should not be an issue. javadoc output should be in alphabetic order. If it is really concerns you, I can do a copy/paste:-) - What (should) happen(s), if lineSeparator.length == 0 ? do nothing. expected? - Isn't one of these little faster, or at least more clear? : lineLength -= lineLength % 4 lineLength & 0xFFFC lineLength << 2 >> 2 603 static { 604 Arrays.fill(fromBase64, -1); 605 for (int i = 0; i < Encoder.toBase64.length; i++) 606 fromBase64[Encoder.toBase64[i]] = i; 607 fromBase64['='] = -2; 608 } This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 table to the outer class. Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)? understood. but it appears the hotspot likes it the constant/fixed length lookup table is inside encoder. Same as you suggestion in your previous email to use String in source and expand it during runtime. It saves about 500 bytes but slows the server vm. Those repeating lines of "isURL? " is also supposed to be a performance tweak to eliminate the array boundary check in loop. Why you mix the algorithms to check the padding? : 824 if (b == -2) { // padding byte 890 if (b == '=') { It is supposed reduce one "if" check for normal base64 character inside the loop. I'm not that sure it really helps though. -Sherman
Re: Review/comment needed for the new public java.util.Base64 class
Hi Sherman, thanks for your details. Has this discussion been on the list, and I have missed it? I see a problem with hiding the "singleton" choice: Developers might tend to repetitively invoke public static Encoder getEncoder(int lineLength, byte[] lineSeparator) instead reusing the the same instance. (Note: It should not harm, to move the Objects.requireNonNull(lineSeparator); to the general private constructor.) In case of a public constant Encoder.RFC4648 ... developer would be aware about the re-usability of the encoder. IMHO, the get...() methods are just waste of source lines and bytecode footprint. But in contrast to the v4 I like the inner class approach: Base64.De/Encoder. Little nit: In lines 89,100,127,128,138,149,159 the indentation is 5 instead 4. -Ulf Am 30.10.2012 02:51, schrieb Xueming Shen: Ulf, my apology. Some how I missed your email. We tried various prototypes for this simple utility class. See http://cr.openjdk.java.net/~sherman/base64/ The v4 might be close to the static constant approach you suggested. While It's hard to draw a clear line on which one is better, we concluded that the proposed approach provides the best balance among usability, readability and extensibility. For example, the "get" approach allows us to hide the "singleton" choice to the implementation. It provides a consistent interface "fixed" basic/url/mime type en/decoder and the configurable floating length/linefeed encoder. -Sherman On 10/29/12 11:15 AM, Ulf Zibis wrote: Hi Sherman, can you give me a short answer please? -Ulf Am 23.10.2012 16:57, schrieb Ulf Zibis: Am 23.10.2012 15:04, schrieb Alan Bateman: I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder. I'm wondering, why there are those get... methods at all. Alternatively you could make the appropriate constructors and predifined static variants public. So one only should use: Base64.Encoder encoder = new Base64.Encoder(...); Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE; No need for those looong method names. -Ulf
Re: Review/comment needed for the new public java.util.Base64 class
Oops: lineLength << 2 >> 2 I meant:: lineLength >> 2 << 2 -Ulf
Re: Review/comment needed for the new public java.util.Base64 class
Am 24.10.2012 04:56, schrieb Xueming Shen: Thanks for the review. I hope I addressed most of the comments in the updated webrev at http://cr.openjdk.java.net/~sherman/4235519/webrev Hi Sherman, my additional comments: I'm "confused" about the order of xxcode() and Xxcoder. In other places e.g. charsets, we have de... before en..., which is also true alphabetical 1026 private Base64() {} 1027 private static final int MIMELINEMAX = 76; 1028 private static final byte[] crlf = new byte[] {'\r', '\n'}; - difficult to find, please move (more) to the beginning, at least not in between inner class definitions. - I more would like CRLF than crlf 118 * @param lineLengththe length of each output line (rounded down 119 * to nearest multiple of 4). This could be interpreted as: the parameter must be rounded before, so maybe better: 118 * @param lineLengththe length of each output line (if not a multiple of 4, 119 * it will be rounded down accordingly). 126 public static Encoder getEncoder(int lineLength, byte[] lineSeparator) { 127 Objects.requireNonNull(lineSeparator); 128 return new Encoder(false, lineSeparator, lineLength / 4 * 4); 129 } - What (should) happen(s), if lineSeparator.length == 0 ? - Isn't one of these little faster, or at least more clear? : lineLength -= lineLength % 4 lineLength & 0xFFFC lineLength << 2 >> 2 Broken indentation (why at all, compared to lines 213..215?): 208 private final byte[] newline; 209 private final intlinemax; 210 private final boolean isURL; Unconventional indentation and even broken in lines 223..224 (maybe more?): 247 * @param src the byte array to encode 248 * @param dst the output byte array 249 * @returnThe number of bytes written to the output byte array 250 * 251 * @throwsIllegalArgumentException if {@code dst} does not have 252 *enough space for encoding all input bytes. More conventional: 247 * @param src the byte array to encode 248 * @param dst the output byte array 249 * @return The number of bytes written to the output byte array 250 * 251 * @throws IllegalArgumentException if {@code dst} does not have 252 * enough space for encoding all input bytes. 603 static { 604 Arrays.fill(fromBase64, -1); 605 for (int i = 0; i < Encoder.toBase64.length; i++) 606 fromBase64[Encoder.toBase64[i]] = i; 607 fromBase64['='] = -2; 608 } This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 table to the outer class. Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)? Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster LoadB operations could be used by JIT. At least the footprint of the table would be smaller. Missing spaces: 805 return new DecInputStream(is, isURL?fromBase64URL:fromBase64, isMIME); Why at all you repeat this code many times? Why not? : 631 this.base64 = isURL ? fromBase64URL : fromBase64; Also it would at least be helpful for readers to define final, e.g.: 809 final int[] base64 = isURL ? fromBase64URL : fromBase64; Why you mix the algorithms to check the padding? : 824 if (b == -2) { // padding byte 890 if (b == '=') { -Ulf
Re: Review/comment needed for the new public java.util.Base64 class
Ulf, my apology. Some how I missed your email. We tried various prototypes for this simple utility class. See http://cr.openjdk.java.net/~sherman/base64/ The v4 might be close to the static constant approach you suggested. While It's hard to draw a clear line on which one is better, we concluded that the proposed approach provides the best balance among usability, readability and extensibility. For example, the "get" approach allows us to hide the "singleton" choice to the implementation. It provides a consistent interface "fixed" basic/url/mime type en/decoder and the configurable floating length/linefeed encoder. -Sherman On 10/29/12 11:15 AM, Ulf Zibis wrote: Hi Sherman, can you give me a short answer please? -Ulf Am 23.10.2012 16:57, schrieb Ulf Zibis: Am 23.10.2012 15:04, schrieb Alan Bateman: I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder. I'm wondering, why there are those get... methods at all. Alternatively you could make the appropriate constructors and predifined static variants public. So one only should use: Base64.Encoder encoder = new Base64.Encoder(...); Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE; No need for those looong method names. -Ulf
Re: Review/comment needed for the new public java.util.Base64 class
Hi Sherman, can you give me a short answer please? -Ulf Am 23.10.2012 16:57, schrieb Ulf Zibis: Am 23.10.2012 15:04, schrieb Alan Bateman: I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder. I'm wondering, why there are those get... methods at all. Alternatively you could make the appropriate constructors and predifined static variants public. So one only should use: Base64.Encoder encoder = new Base64.Encoder(...); Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE; No need for those looong method names. -Ulf
Re: Review/comment needed for the new public java.util.Base64 class
On 10/23/2012 8:50 PM, Mike Duigou wrote: I'm eager to use this! Some comments: -code is used in some places rather than {@code } fixed - The initial would perhaps be better as a. I don't know if definition lists are allowed in javadoc. - public OutputStream wrap(OutputStream os) closing the underlying output stream precludes many potential uses. FilterOutputStreams do not consistently close the underlying stream. It has been an issue for a while. Obviously there are users that prefer the underlying stream not get closed when the outer one is closed. I think we have couple rfes that ask for a "close this stream only but not the underlying one" close. But I don't think we can do this in the default close(), otherwise, I'm sure we will gen another side of complain, the close() does not close the underlying stream:-) - spelling error "methold" fixed - InputStream wrap() it might be nice if the return stream EOF when the padding character(s) are consumed but not close() or otherwise alter the source InputStream I need give it a little more time tomorrow. - private static byte[] getBytes(ByteBuffer bb) should document, even though it's a private method, that the bytes returned should be not considered mutable since they might be shared. I have been thinking of rewriting the de/encode(ByteBuffer) to get rid of getBytes(...). Probably need to move some piece around. If I end up keeping it I will put some comment int. - the "if (ret != dst.length) // TBD: not necessary" is worrisome as it will generate garbage. while I'm pretty sure I don't need this for encode (it is definitely needed for decode()), I just keep it for safe:-) I removed the "TBD" comment though. - I'm still wishing String encode(byte[] buf) didn't use an intermediate byte array. :-) - "stream after use, in which" -> "stream after use, during which" fixed. Thanks for taking a look. -Sherman On Oct 23 2012, at 19:56 , Xueming Shen wrote: Hi Alan, Thanks for the review. I hope I addressed most of the comments in the updated webrev at http://cr.openjdk.java.net/~sherman/4235519/webrev mainly (1) Pulled the base64 "terms" up to the class doc and then be referenced from various methods (2) Gave up the C style de/encode(byte[], null), just leave the invoker to have enough space. I remember Paul once suggested to have a convenient method to return the estimated length of the byte[] needed to en/decode a specified input byte array/buffer. I think we can do it later when it is really desired. I agreed your argument that if people need to get the size and prepare a "new" array, they bet off just call de/encode(byte[]) (3) Gave up the "liberal" decoding design, tight the spec/impl to treat the input as "illegal base64" if the padding character appears in the middle of input. (4) Some clarification of spec for thread safe, null exception and the pos/limit of input/output byte buffer as suggested. (5) Fixed the typos I still don't have a better name for method "getUrlEn/Decoder()", Base64.getBase64UrlEn/Decoder() does not feel good for me. -Sherman On 10/23/12 6:04 AM, Alan Bateman wrote: On 18/10/2012 03:10, Xueming Shen wrote: : webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev I took another pass over this, focusing on the API as that is what we have to get right. Performance is important too but I think the priority has to be the API first. Overall I think it is quite nice and easy to use. I wonder if the Base64 class description should try to establish the terms "base64" and "base64url" so that they can be referenced from the various methods? That would avoid needing to duplicate references to the RFC 4668 and RFC 2045 in so many places. I think it would also be useful if the specification indicated whether Encoders and Decoders are safe for use by concurrent threads. Also make it clear that NPE is thrown if any of the parameters are passed as null (unless otherwise specified of course). I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder. Do you think getEncoder(int,byte[]) will be used much? If not then perhaps it should be left out for the first version (I guess part of this is getting used to providing the line separate as a byte array). For the javadoc then Encoder and Decoder will need @since 1.8. They should probably cross reference each other too. encode(byte[]) should be clearer that it encodes all bytes in the given array. Also I think it needs to be clear that the returned byte array is appropriately sized -- as currently worded it doesn't make it clear that they can't be extra elements at the end (odd as it might be). Typo at line 215, "byter array" -> "byte array" Typo at l
Re: Review/comment needed for the new public java.util.Base64 class
I'm eager to use this! Some comments: - code is used in some places rather than {@code } - The initial would perhaps be better as a . I don't know if definition lists are allowed in javadoc. - public OutputStream wrap(OutputStream os) closing the underlying output stream precludes many potential uses. FilterOutputStreams do not consistently close the underlying stream. - spelling error "methold" - InputStream wrap() it might be nice if the return stream EOF when the padding character(s) are consumed but not close() or otherwise alter the source InputStream - private static byte[] getBytes(ByteBuffer bb) should document, even though it's a private method, that the bytes returned should be not considered mutable since they might be shared. - the "if (ret != dst.length) // TBD: not necessary" is worrisome as it will generate garbage. - I'm still wishing String encode(byte[] buf) didn't use an intermediate byte array. :-) - "stream after use, in which" -> "stream after use, during which" On Oct 23 2012, at 19:56 , Xueming Shen wrote: > Hi Alan, > > Thanks for the review. I hope I addressed most of the comments in the > updated webrev at > > http://cr.openjdk.java.net/~sherman/4235519/webrev > > mainly > > (1) Pulled the base64 "terms" up to the class doc and then be referenced from > various methods > (2) Gave up the C style de/encode(byte[], null), just leave the invoker to > have enough space. I > remember Paul once suggested to have a convenient method to return the > estimated length > of the byte[] needed to en/decode a specified input byte array/buffer. I > think we can do it > later when it is really desired. I agreed your argument that if people > need to get the size and > prepare a "new" array, they bet off just call de/encode(byte[]) > (3) Gave up the "liberal" decoding design, tight the spec/impl to treat the > input as "illegal base64" > if the padding character appears in the middle of input. > (4) Some clarification of spec for thread safe, null exception and the > pos/limit of input/output byte > buffer as suggested. > (5) Fixed the typos > > I still don't have a better name for method "getUrlEn/Decoder()", > Base64.getBase64UrlEn/Decoder() > does not feel good for me. > > -Sherman > > > On 10/23/12 6:04 AM, Alan Bateman wrote: >> On 18/10/2012 03:10, Xueming Shen wrote: >>> : >>> >>> webrev: >>> http://cr.openjdk.java.net/~sherman/4235519/webrev >> I took another pass over this, focusing on the API as that is what we have >> to get right. Performance is important too but I think the priority has to >> be the API first. >> >> Overall I think it is quite nice and easy to use. >> >> I wonder if the Base64 class description should try to establish the terms >> "base64" and "base64url" so that they can be referenced from the various >> methods? That would avoid needing to duplicate references to the RFC 4668 >> and RFC 2045 in so many places. >> >> I think it would also be useful if the specification indicated whether >> Encoders and Decoders are safe for use by concurrent threads. Also make it >> clear that NPE is thrown if any of the parameters are passed as null (unless >> otherwise specified of course). >> >> I'm not sure that getUrlEncoder is the most suitable name to get a base64url >> encoder. The reason is that the method name makes it sound like it returns a >> URLEncoder or or at least an encoder for HTML forms. While more verbose, >> getBase64UrlEncoder is clear that it returns a base64url encoder. >> >> Do you think getEncoder(int,byte[]) will be used much? If not then perhaps >> it should be left out for the first version (I guess part of this is getting >> used to providing the line separate as a byte array). >> >> For the javadoc then Encoder and Decoder will need @since 1.8. They should >> probably cross reference each other too. >> >> encode(byte[]) should be clearer that it encodes all bytes in the given >> array. Also I think it needs to be clear that the returned byte array is >> appropriately sized -- as currently worded it doesn't make it clear that >> they can't be extra elements at the end (odd as it might be). >> >> Typo at line 215, "byter array" -> "byte array" >> >> Typo at line 246, "methold" -> "method". >> >> Typo at line 247, "arry" -> "array" >> >> Type at line 254, "encocde" -> "encode" >> >> Typo at line 277, "buffter" -> "buffer". Another one at line 702. >> >> Minor comment, but I assume you should move the >> @SuppressWarnings("deprecation") on encodeToString to after the method >> comment rather than before. I see the same thing in decode(String). >> >> I think encode(ByteBuffer) needs to be clear as to the >> position/limit/capacity of the returned buffer. >> >> I'm not sure so about encode(ba, null) returning the required length, it >> feels odd and a bit like some of the win32 APIs. If the intention is that >> the caller allocates the byte[] and then calls e
Re: Review/comment needed for the new public java.util.Base64 class
Hi Alan, Thanks for the review. I hope I addressed most of the comments in the updated webrev at http://cr.openjdk.java.net/~sherman/4235519/webrev mainly (1) Pulled the base64 "terms" up to the class doc and then be referenced from various methods (2) Gave up the C style de/encode(byte[], null), just leave the invoker to have enough space. I remember Paul once suggested to have a convenient method to return the estimated length of the byte[] needed to en/decode a specified input byte array/buffer. I think we can do it later when it is really desired. I agreed your argument that if people need to get the size and prepare a "new" array, they bet off just call de/encode(byte[]) (3) Gave up the "liberal" decoding design, tight the spec/impl to treat the input as "illegal base64" if the padding character appears in the middle of input. (4) Some clarification of spec for thread safe, null exception and the pos/limit of input/output byte buffer as suggested. (5) Fixed the typos I still don't have a better name for method "getUrlEn/Decoder()", Base64.getBase64UrlEn/Decoder() does not feel good for me. -Sherman On 10/23/12 6:04 AM, Alan Bateman wrote: On 18/10/2012 03:10, Xueming Shen wrote: : webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev I took another pass over this, focusing on the API as that is what we have to get right. Performance is important too but I think the priority has to be the API first. Overall I think it is quite nice and easy to use. I wonder if the Base64 class description should try to establish the terms "base64" and "base64url" so that they can be referenced from the various methods? That would avoid needing to duplicate references to the RFC 4668 and RFC 2045 in so many places. I think it would also be useful if the specification indicated whether Encoders and Decoders are safe for use by concurrent threads. Also make it clear that NPE is thrown if any of the parameters are passed as null (unless otherwise specified of course). I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder. Do you think getEncoder(int,byte[]) will be used much? If not then perhaps it should be left out for the first version (I guess part of this is getting used to providing the line separate as a byte array). For the javadoc then Encoder and Decoder will need @since 1.8. They should probably cross reference each other too. encode(byte[]) should be clearer that it encodes all bytes in the given array. Also I think it needs to be clear that the returned byte array is appropriately sized -- as currently worded it doesn't make it clear that they can't be extra elements at the end (odd as it might be). Typo at line 215, "byter array" -> "byte array" Typo at line 246, "methold" -> "method". Typo at line 247, "arry" -> "array" Type at line 254, "encocde" -> "encode" Typo at line 277, "buffter" -> "buffer". Another one at line 702. Minor comment, but I assume you should move the @SuppressWarnings("deprecation") on encodeToString to after the method comment rather than before. I see the same thing in decode(String). I think encode(ByteBuffer) needs to be clear as to the position/limit/capacity of the returned buffer. I'm not sure so about encode(ba, null) returning the required length, it feels odd and a bit like some of the win32 APIs. If the intention is that the caller allocates the byte[] and then calls encode again then it would be easier to just encode(ba). For the decoder methods then IllegalArgumentException may be thrown mid-flight, meaning that some bytes may have been written into output buffer or array even though an exception is thrown. I think this needs to be made clear in the spec. It also makes me wonder if this is the right exception, it feels like a more specialized malformed input exception. Another point about the decode methods is that they stop at the padding bytes and so this allows for bytes after the padding. I'm curious about this choice and whether you considered this as an error? I see how this influences decode(ByteBuffer,ByteBuffer) to return a boolean and I just wonder if there are other choices to consider. That's mostly it for now. I didn't check in the IDE but there are lot of imports that are don't appear to be used, perhaps left over from previous iterations? -Alan
Re: Review/comment needed for the new public java.util.Base64 class
Am 23.10.2012 15:04, schrieb Alan Bateman: I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder. I'm wondering, why there are those get... methods at all. Alternatively you could make the appropriate constructors and predifined static variants public. So one only should use: Base64.Encoder encoder = new Base64.Encoder(...); Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE; No need for those looong method names. -Ulf
Re: Review/comment needed for the new public java.util.Base64 class
On 18/10/2012 03:10, Xueming Shen wrote: : webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev I took another pass over this, focusing on the API as that is what we have to get right. Performance is important too but I think the priority has to be the API first. Overall I think it is quite nice and easy to use. I wonder if the Base64 class description should try to establish the terms "base64" and "base64url" so that they can be referenced from the various methods? That would avoid needing to duplicate references to the RFC 4668 and RFC 2045 in so many places. I think it would also be useful if the specification indicated whether Encoders and Decoders are safe for use by concurrent threads. Also make it clear that NPE is thrown if any of the parameters are passed as null (unless otherwise specified of course). I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder. Do you think getEncoder(int,byte[]) will be used much? If not then perhaps it should be left out for the first version (I guess part of this is getting used to providing the line separate as a byte array). For the javadoc then Encoder and Decoder will need @since 1.8. They should probably cross reference each other too. encode(byte[]) should be clearer that it encodes all bytes in the given array. Also I think it needs to be clear that the returned byte array is appropriately sized -- as currently worded it doesn't make it clear that they can't be extra elements at the end (odd as it might be). Typo at line 215, "byter array" -> "byte array" Typo at line 246, "methold" -> "method". Typo at line 247, "arry" -> "array" Type at line 254, "encocde" -> "encode" Typo at line 277, "buffter" -> "buffer". Another one at line 702. Minor comment, but I assume you should move the @SuppressWarnings("deprecation") on encodeToString to after the method comment rather than before. I see the same thing in decode(String). I think encode(ByteBuffer) needs to be clear as to the position/limit/capacity of the returned buffer. I'm not sure so about encode(ba, null) returning the required length, it feels odd and a bit like some of the win32 APIs. If the intention is that the caller allocates the byte[] and then calls encode again then it would be easier to just encode(ba). For the decoder methods then IllegalArgumentException may be thrown mid-flight, meaning that some bytes may have been written into output buffer or array even though an exception is thrown. I think this needs to be made clear in the spec. It also makes me wonder if this is the right exception, it feels like a more specialized malformed input exception. Another point about the decode methods is that they stop at the padding bytes and so this allows for bytes after the padding. I'm curious about this choice and whether you considered this as an error? I see how this influences decode(ByteBuffer,ByteBuffer) to return a boolean and I just wonder if there are other choices to consider. That's mostly it for now. I didn't check in the IDE but there are lot of imports that are don't appear to be used, perhaps left over from previous iterations? -Alan
Re: Review/comment needed for the new public java.util.Base64 class
Understood. I do have the code:-) but I'm hesitated to go SharedSecrets simply for performance gain of a utility method. This definitely can be addressed if it turns out to be a real issue standing in critical path. -Sherman sherman@sherman-linux:~/Workspace/jdk8/test/java/util/Base64$ java PermBase64 20 1000 j.u.Base64.encode(ba) : 528745 j.u.Base64.encodeString(ba): 739703 j.u.Base64.encode(bb) : 486216 j.u.Base64.encode(bb, bb) : 538544 j.u.Base64.encode(bb, bb)-D: 850947 migBase64.encode(ba) : 659474 vs sherman@sherman-linux:~/Workspace/jdk8/test/java/util/Base64$ java PermBase64 20 1000 j.u.Base64.encode(ba) : 519391 j.u.Base64.encodeString(ba): 964854 j.u.Base64.encode(bb) : 490138 j.u.Base64.encode(bb, bb) : 539027 j.u.Base64.encode(bb, bb)-D: 786438 migBase64.encode(ba) : 660572 -Sherman On 10/19/2012 04:59 PM, Mike Duigou wrote: For me the greater concern, which is hard to measure, is the GC pressure added by the discarded byte array. Mike On Oct 19 2012, at 17:03 , Xueming Shen wrote: I see a 20% performance gain on server vm if switch to pure char[] based encoding and then use the sharedSecrets to avoid the copy. The dis-advantage is (1) have to use the sharedSecrets and (2) can't share the same between the encode(byte[]) and encode(String). Anyway it appears to be an alternative for performance improvement. -Sherman On 10/18/2012 01:07 PM, Mike Duigou wrote: I wonder if there would be advantage in using a SharedSecrets mechanism to allow construction of a String directly from a char array. The intermediate byte array seems wasteful especially for what is likely to be a heavily used path. Mike On Oct 17 2012, at 19:10 , Xueming Shen wrote: Hi Webrev has been updated with following changes (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods (2) some minor spec clarification regarding the "end of decoding" (3) performance tuning. webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev some performance scores: http://cr.openjdk.java.net/~sherman/4235519/score3 -Sherman
Re: Review/comment needed for the new public java.util.Base64 class
For me the greater concern, which is hard to measure, is the GC pressure added by the discarded byte array. Mike On Oct 19 2012, at 17:03 , Xueming Shen wrote: > > I see a 20% performance gain on server vm if switch to pure char[] based > encoding > and then use the sharedSecrets to avoid the copy. The dis-advantage is (1) > have to > use the sharedSecrets and (2) can't share the same between the encode(byte[]) > and encode(String). > > Anyway it appears to be an alternative for performance improvement. > > -Sherman > > > On 10/18/2012 01:07 PM, Mike Duigou wrote: >> I wonder if there would be advantage in using a SharedSecrets mechanism to >> allow construction of a String directly from a char array. The intermediate >> byte array seems wasteful especially for what is likely to be a heavily used >> path. >> >> Mike >> >> On Oct 17 2012, at 19:10 , Xueming Shen wrote: >> >>> Hi >>> >>> Webrev has been updated with following changes >>> >>> (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods >>> (2) some minor spec clarification regarding the "end of decoding" >>> (3) performance tuning. >>> >>> webrev: >>> http://cr.openjdk.java.net/~sherman/4235519/webrev >>> >>> some performance scores: >>> http://cr.openjdk.java.net/~sherman/4235519/score3 >>> >>> -Sherman >
Re: Review/comment needed for the new public java.util.Base64 class
I see a 20% performance gain on server vm if switch to pure char[] based encoding and then use the sharedSecrets to avoid the copy. The dis-advantage is (1) have to use the sharedSecrets and (2) can't share the same between the encode(byte[]) and encode(String). Anyway it appears to be an alternative for performance improvement. -Sherman On 10/18/2012 01:07 PM, Mike Duigou wrote: I wonder if there would be advantage in using a SharedSecrets mechanism to allow construction of a String directly from a char array. The intermediate byte array seems wasteful especially for what is likely to be a heavily used path. Mike On Oct 17 2012, at 19:10 , Xueming Shen wrote: Hi Webrev has been updated with following changes (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods (2) some minor spec clarification regarding the "end of decoding" (3) performance tuning. webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev some performance scores: http://cr.openjdk.java.net/~sherman/4235519/score3 -Sherman
Re: Review/comment needed for the new public java.util.Base64 class
On 10/18/2012 07:43 AM, Alan Bateman wrote: On 18/10/2012 03:10, Xueming Shen wrote: Hi Webrev has been updated with following changes (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods (2) some minor spec clarification regarding the "end of decoding" (3) performance tuning. webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev some performance scores: http://cr.openjdk.java.net/~sherman/4235519/score3 -Sherman Have you done any performance tuning on the new methods that encode/decode into an existing ByteBuffer? Just wondering as the direct buffer case seems more expensive that I would expect. It would be interesting to see using the put(int,byte) and fixing up the position at the end would help. It might also be something to look at with the compiler folks. get(index) and put(index, byte) do help, see 20%+ improvement. sherman@sherman-linux:~/Workspace/jdk8/test/java/util/Base64$ java PermBase64 20 1000 j.u.Base64.encode(ba) : 702000 j.u.Base64.encode(bb) : 690024 j.u.Base64.encode(bb, bb) : 910582 j.u.Base64.encode(bb, bb)-D: 1198606 migBase64.encode(ba) : 661271 -- j.u.Base64.decode(ba) : 797152 j.u.Base64.decode(bb) : 802968 j.u.Base64.decode(bb, bb) : 999577 j.u.Base64.decode(bb, bb)-D: 1472003 migBase64.decode(ba) : 1458450 webrev has been updated accordingly. Now implementations of de/encodeArray and de/encodeBuffer are identical except the access operation. -Sherman
Re: Review/comment needed for the new public java.util.Base64 class
I wonder if there would be advantage in using a SharedSecrets mechanism to allow construction of a String directly from a char array. The intermediate byte array seems wasteful especially for what is likely to be a heavily used path. Mike On Oct 17 2012, at 19:10 , Xueming Shen wrote: > Hi > > Webrev has been updated with following changes > > (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods > (2) some minor spec clarification regarding the "end of decoding" > (3) performance tuning. > > webrev: > http://cr.openjdk.java.net/~sherman/4235519/webrev > > some performance scores: > http://cr.openjdk.java.net/~sherman/4235519/score3 > > -Sherman
Re: Review/comment needed for the new public java.util.Base64 class
Michael, The current approach assumes that the base64 en/decoding of a file, stream will be handled by the en/decoder.wrap(...). The en/decode(bb, bb) is basically for one time invocation input and output from/to existing buffers, with the hope of avoiding the coding-life-circle management, so it assumes all input will be passed in in the src buffer and only needs limit support for cases that the existing output buffer is not sufficient. We will probably have to go with the "endOfInput" flag approach, if the current version is still not enough for most of "common" use cases. -Sherman On 10/18/2012 11:01 AM, Michael Schierl wrote: Am 18.10.2012 04:10, schrieb Xueming Shen: Hi Webrev has been updated with following changes (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods I think line 443 needs an additional check for atom == 0. Same for the array case. At least if you do not want an endOfInput flag in the parameters. If you have an endOfInput flag, you can use that in the condition instead of atom == 0. Consider this example where trying to encode 25 bytes using an input ByteBuffer of 16 bytes and an output ByteBuffer of 32 bytes (Of course, you'd use larger buffers in practice, but the same can happen with larger buffers too, as long as your buffer size is not divisible by three, or you do not completely fill the buffer with each read. Both can happen if the media you are reading from uses a different sector/block size - like when attaching a base64 attachment of a large file from disk to an email). First you fill 16 bytes of input into input buffer, then you call encode the first time. it will encode the first 15 bytes to 20 bytes of output, then encode the 16th byte to XY==. You flush the output and fill the next 9 bytes into the input buffer. The next call to encode will encode the 9 bytes to 12 bytes. So, in total you have 36 bytes, but the padding is in the middle. With the check, the first call will encode the first 15 bytes to 20 bytes of output and leave the 16th byte in the buffer. You flush the output and load 9 bytes to the 1 remaining byte in the input buffer. The second call will encode the first 9 bytes to 12 bytes and leave one byte left over. You flush the buffer and call a third time, which will encode the last byte to XY==. Still 36 bytes, this time with padding at the end. Regards, Michael
Re: Review/comment needed for the new public java.util.Base64 class
Am 18.10.2012 04:10, schrieb Xueming Shen: > Hi > > Webrev has been updated with following changes > > (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods I think line 443 needs an additional check for atom == 0. Same for the array case. At least if you do not want an endOfInput flag in the parameters. If you have an endOfInput flag, you can use that in the condition instead of atom == 0. Consider this example where trying to encode 25 bytes using an input ByteBuffer of 16 bytes and an output ByteBuffer of 32 bytes (Of course, you'd use larger buffers in practice, but the same can happen with larger buffers too, as long as your buffer size is not divisible by three, or you do not completely fill the buffer with each read. Both can happen if the media you are reading from uses a different sector/block size - like when attaching a base64 attachment of a large file from disk to an email). First you fill 16 bytes of input into input buffer, then you call encode the first time. it will encode the first 15 bytes to 20 bytes of output, then encode the 16th byte to XY==. You flush the output and fill the next 9 bytes into the input buffer. The next call to encode will encode the 9 bytes to 12 bytes. So, in total you have 36 bytes, but the padding is in the middle. With the check, the first call will encode the first 15 bytes to 20 bytes of output and leave the 16th byte in the buffer. You flush the output and load 9 bytes to the 1 remaining byte in the input buffer. The second call will encode the first 9 bytes to 12 bytes and leave one byte left over. You flush the buffer and call a third time, which will encode the last byte to XY==. Still 36 bytes, this time with padding at the end. Regards, Michael
Re: Review/comment needed for the new public java.util.Base64 class
Am 18.10.2012 16:04, schrieb Xueming Shen: On 10/18/2012 6:19 AM, Ulf Zibis wrote: Oops, you are working at 6 in the morning? Hi Sherman, you could create the code table by help of a String constant, which would speed up the loading and save a little footprint on the class file. (you have forgotten those tricks from sun.nio.cs coders? ;-) ) The implementation probably needs more tuning. Hopefully, some day we have arrays of primitive types in the class file's constant pool. But my main question: Why don't you enqueue those coders in the well known sun.nio.cs.ext provider? Base64 encoding is not a charset. Mainly it is supposed to be byte[]<->byte[] coding. Even for byte[]<=>char[]/String, Base64 "encode" byte[] to char[]/String and "decode" from char[]/String to byte[]. The "wrong" direction compared to charset en/decoder. 1:0 for you. I only had the ASCII/Unicode characters -> Base64 "en"coding in mind, which seems in fact a Java-String to byte[] to char[]/String 2-step-encoding. At least, I think, class Base64 should be hosted in java.nio.charset rather than common java.util. As I believe, that the usage of this encoding is far from common you might be surprised how "common" it might be:-) or surprised, why it wasn't part of the JDK before. -Ulf
Re: Review/comment needed for the new public java.util.Base64 class
On 10/18/12 7:43 AM, Alan Bateman wrote: On 18/10/2012 03:10, Xueming Shen wrote: Hi Webrev has been updated with following changes (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods (2) some minor spec clarification regarding the "end of decoding" (3) performance tuning. webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev some performance scores: http://cr.openjdk.java.net/~sherman/4235519/score3 -Sherman Have you done any performance tuning on the new methods that encode/decode into an existing ByteBuffer? Just wondering as the direct buffer case seems more expensive that I would expect. It would be interesting to see using the put(int,byte) and fixing up the position at the end would help. It might also be something to look at with the compiler folks. no "real" tuning yet on the direct buffer case. Just tried to get the API out for review/comment first. In any case, I think encode(ByteBuffer,ByteBuffer,int) looks reasonable and just requires using the idiom to get it right. I'm less sure about decode returning a boolean. Shouldn't it be an error if there are bytes remaining after the = padding? It depends on expectation. Padding partially serves as a "end of base64 byte stream" indicator. I'm just looking for ways to allow the method return an int to indicate the number of bytes written to the output buffer (and realize that we aren't going to get symmetry with the encode method). Yes, even a "int" return value will not get us the symmetry with encode(bb, bb). A boolean return provides a more liberal behavior. -Sherman
Re: Review/comment needed for the new public java.util.Base64 class
On 18/10/2012 03:10, Xueming Shen wrote: Hi Webrev has been updated with following changes (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods (2) some minor spec clarification regarding the "end of decoding" (3) performance tuning. webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev some performance scores: http://cr.openjdk.java.net/~sherman/4235519/score3 -Sherman Have you done any performance tuning on the new methods that encode/decode into an existing ByteBuffer? Just wondering as the direct buffer case seems more expensive that I would expect. It would be interesting to see using the put(int,byte) and fixing up the position at the end would help. It might also be something to look at with the compiler folks. In any case, I think encode(ByteBuffer,ByteBuffer,int) looks reasonable and just requires using the idiom to get it right. I'm less sure about decode returning a boolean. Shouldn't it be an error if there are bytes remaining after the = padding? I'm just looking for ways to allow the method return an int to indicate the number of bytes written to the output buffer (and realize that we aren't going to get symmetry with the encode method). -Alan.
Re: Review/comment needed for the new public java.util.Base64 class
On 18/10/2012 14:19, Ulf Zibis wrote: Hi Sherman, you could create the code table by help of a String constant, which would speed up the loading and save a little footprint on the class file. (you have forgotten those tricks from sun.nio.cs coders? ;-) ) But my main question: Why don't you enqueue those coders in the well known sun.nio.cs.ext provider? At least, I think, class Base64 should be hosted in java.nio.charset rather than common java.util. As I believe, that the usage of this encoding is far from common, adding it to the lazy loaded charsets.jar IMO should be more reasonable. The encoders and decoders in java.nio.charset are for translating between bytes and Unicode characters and that is not the case here. I think java.util is probably the right place for this. -Alan
Re: Review/comment needed for the new public java.util.Base64 class
On 10/18/2012 6:19 AM, Ulf Zibis wrote: Hi Sherman, you could create the code table by help of a String constant, which would speed up the loading and save a little footprint on the class file. (you have forgotten those tricks from sun.nio.cs coders? ;-) ) The implementation probably needs more tuning. But my main question: Why don't you enqueue those coders in the well known sun.nio.cs.ext provider? Base64 encoding is not a charset. Mainly it is supposed to be byte[]<->byte[] coding. Even for byte[]<=>char[]/String, Base64 "encode" byte[] to char[]/String and "decode" from char[]/String to byte[]. The "wrong" direction compared to charset en/decoder. At least, I think, class Base64 should be hosted in java.nio.charset rather than common java.util. As I believe, that the usage of this encoding is far from common you might be surprised how "common" it might be:-) -Sherman , adding it to the lazy loaded charsets.jar IMO should be more reasonable. -Ulf Am 18.10.2012 04:10, schrieb Xueming Shen: Hi Webrev has been updated with following changes (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods (2) some minor spec clarification regarding the "end of decoding" (3) performance tuning. webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev some performance scores: http://cr.openjdk.java.net/~sherman/4235519/score3 -Sherman
Re: Review/comment needed for the new public java.util.Base64 class
Hi Sherman, you could create the code table by help of a String constant, which would speed up the loading and save a little footprint on the class file. (you have forgotten those tricks from sun.nio.cs coders? ;-) ) But my main question: Why don't you enqueue those coders in the well known sun.nio.cs.ext provider? At least, I think, class Base64 should be hosted in java.nio.charset rather than common java.util. As I believe, that the usage of this encoding is far from common, adding it to the lazy loaded charsets.jar IMO should be more reasonable. -Ulf Am 18.10.2012 04:10, schrieb Xueming Shen: Hi Webrev has been updated with following changes (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods (2) some minor spec clarification regarding the "end of decoding" (3) performance tuning. webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev some performance scores: http://cr.openjdk.java.net/~sherman/4235519/score3 -Sherman
Re: Review/comment needed for the new public java.util.Base64 class
Hi Webrev has been updated with following changes (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods (2) some minor spec clarification regarding the "end of decoding" (3) performance tuning. webrev: http://cr.openjdk.java.net/~sherman/4235519/webrev some performance scores: http://cr.openjdk.java.net/~sherman/4235519/score3 -Sherman
Re: Review/comment needed for the new public java.util.Base64 class
On 10/13/12 10:21 AM, Xueming Shen wrote: With moving some code piece around (moved those lookup table into En/Decoder) and some array access code tuning (with the hope of eliminating the array boundary access) in encode0(), it appears the hotspot now can better optimize the decoder loop for the -server mode for encoding. decode0 also get 10% +boost with simply moving in the lookup table. The latest version is at (decode0() vs decode1()) --> encode0() vs encode1()) encode0 is the latest/faster one. encode1 is the one in webrev. -Sherman http://cr.openjdk.java.net/~sherman/4235519/Base64.java The latest scores is at http://cr.openjdk.java.net/~sherman/4235519/score2 (compared to http://cr.openjdk.java.net/~sherman/4235519/score1) -Sherman On 10/11/12 11:38 PM, Xueming Shen wrote: On 10/11/2012 02:43 AM, Stephen Colebourne wrote: There are lots of other public base 64 implementations to test/check against. http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html http://migbase64.sourceforge.net/ (claims to be fast) I did a quick performance check against migbase64 with the basic base64 de/encoding using this non-scientific benchmark. http://cr.openjdk.java.net/~sherman/4235519/PermBase64.java Here is the scores http://cr.openjdk.java.net/~sherman/4235519/scores It's a tie, j.u.Base64 wins the decoding, but needs some work on encoding side. java -server PermBase64 20 50 j.u.Base64.encode() : 1390212 migBase64.encode() : 720517 j.u.Base64.decode() : 1200642 migBase64.decode() : 2070015 java -server PermBase64 20 100 j.u.Base64.encode() : 1239407 migBase64.encode() : 740404 j.u.Base64.decode() : 1257092 migBase64.decode() : 2012910 java -server PermBase64 20 1000 j.u.Base64.encode() : 1062212 migBase64.encode() : 657342 j.u.Base64.decode() : 1133740 migBase64.decode() : 1930612 java -client PermBase64 20 50 j.u.Base64.encode() : 961331 migBase64.encode() : 875635 j.u.Base64.decode() : 1528790 migBase64.decode() : 2188473 java -client PermBase64 20 100 j.u.Base64.encode() : 966453 migBase64.encode() : 858082 j.u.Base64.decode() : 1459159 migBase64.decode() : 2115045 java -client PermBase64 20 1000 j.u.Base64.encode() : 954401 migBase64.encode() : 854903 j.u.Base64.decode() : 1444603 migBase64.decode() : 2038865
Re: Review/comment needed for the new public java.util.Base64 class
With moving some code piece around (moved those lookup table into En/Decoder) and some array access code tuning (with the hope of eliminating the array boundary access) in encode0(), it appears the hotspot now can better optimize the decoder loop for the -server mode for encoding. decode0 also get 10% +boost with simply moving in the lookup table. The latest version is at (decode0() vs decode1()) http://cr.openjdk.java.net/~sherman/4235519/Base64.java The latest scores is at http://cr.openjdk.java.net/~sherman/4235519/score2 (compared to http://cr.openjdk.java.net/~sherman/4235519/score1) -Sherman On 10/11/12 11:38 PM, Xueming Shen wrote: On 10/11/2012 02:43 AM, Stephen Colebourne wrote: There are lots of other public base 64 implementations to test/check against. http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html http://migbase64.sourceforge.net/ (claims to be fast) I did a quick performance check against migbase64 with the basic base64 de/encoding using this non-scientific benchmark. http://cr.openjdk.java.net/~sherman/4235519/PermBase64.java Here is the scores http://cr.openjdk.java.net/~sherman/4235519/scores It's a tie, j.u.Base64 wins the decoding, but needs some work on encoding side. java -server PermBase64 20 50 j.u.Base64.encode() : 1390212 migBase64.encode() : 720517 j.u.Base64.decode() : 1200642 migBase64.decode() : 2070015 java -server PermBase64 20 100 j.u.Base64.encode() : 1239407 migBase64.encode() : 740404 j.u.Base64.decode() : 1257092 migBase64.decode() : 2012910 java -server PermBase64 20 1000 j.u.Base64.encode() : 1062212 migBase64.encode() : 657342 j.u.Base64.decode() : 1133740 migBase64.decode() : 1930612 java -client PermBase64 20 50 j.u.Base64.encode() : 961331 migBase64.encode() : 875635 j.u.Base64.decode() : 1528790 migBase64.decode() : 2188473 java -client PermBase64 20 100 j.u.Base64.encode() : 966453 migBase64.encode() : 858082 j.u.Base64.decode() : 1459159 migBase64.decode() : 2115045 java -client PermBase64 20 1000 j.u.Base64.encode() : 954401 migBase64.encode() : 854903 j.u.Base64.decode() : 1444603 migBase64.decode() : 2038865
Re: Review/comment needed for the new public java.util.Base64 class
On 10/13/2012 11:21 AM, Peter Levart wrote: One way to do it without state in encoder is 1st as suggested to have the 'endOfStream' flag and in addition when new-line breaks are requested, to only encode "full lines" at each invocation. So If one wants to use this line-breakage feature he/she must make sure that there is enough space (line_size+2 at worst) left in destination buffer ...and (forgot to add) enough bytes remaining in source buffer or the endOfStream==true... to do any encoding or the encoder will not do anything (return 0 or false). Regards, Peter On 10/12/2012 10:58 PM, Xueming Shen wrote: Yes, I'm trying to figure out where to have this position info stored... On 10/12/2012 01:48 PM, Michael Schierl wrote: Am 12.10.2012 22:12, schrieb Xueming Shen: Hi, It appears to be possible to do something like boolean de/encode(ByteBuffer src, ByteBuffer dst); returns true if all remaining bytes in src are en/decoded, false, the dst is not big enough for all output bytes, the src.position() will be advanced to the position of next un-en/decoded byte, dst.position() will be updated accordingly as well. to avoid the en/decoder to hold an internal state. If it was unclear, that was what I tried to suggest. I thought you refered to the automatic adding of newlines when you said you'd need internal state (although that state is small enough - how many encoded bytes remaining before adding the next newline - so that it might be possible to pass it as a parameter, too). I'm not sure how important that adding of newlines really is (in all cases where I was guilty of using sun.misc.Base64Encoder, I always replaced them out after encoding), so maybe the ByteBuffer API could live without it. Note that without an explicit end of stream flag, it might not be easily possible to encode the last block - if you have a 16 byte buffer with 14 bytes filled, you cannot be sure whether you can encode the last two bytes to XYZ= because it is end of stream or you have to wait for the next byte. An option without the flag might be to assume when the input is less than 3 bytes that it is the end of the stream. But I doubt it will make the API easier to understand or use. :-) Regards, Michael
Re: Review/comment needed for the new public java.util.Base64 class
One way to do it without state in encoder is 1st as suggested to have the 'endOfStream' flag and in addition when new-line breaks are requested, to only encode "full lines" at each invocation. So If one wants to use this line-breakage feature he/she must make sure that there is enough space (line_size+2 at worst) left in destination buffer to do any encoding or the encoder will not do anything (return 0 or false). Regards, Peter On 10/12/2012 10:58 PM, Xueming Shen wrote: Yes, I'm trying to figure out where to have this position info stored... On 10/12/2012 01:48 PM, Michael Schierl wrote: Am 12.10.2012 22:12, schrieb Xueming Shen: Hi, It appears to be possible to do something like boolean de/encode(ByteBuffer src, ByteBuffer dst); returns true if all remaining bytes in src are en/decoded, false, the dst is not big enough for all output bytes, the src.position() will be advanced to the position of next un-en/decoded byte, dst.position() will be updated accordingly as well. to avoid the en/decoder to hold an internal state. If it was unclear, that was what I tried to suggest. I thought you refered to the automatic adding of newlines when you said you'd need internal state (although that state is small enough - how many encoded bytes remaining before adding the next newline - so that it might be possible to pass it as a parameter, too). I'm not sure how important that adding of newlines really is (in all cases where I was guilty of using sun.misc.Base64Encoder, I always replaced them out after encoding), so maybe the ByteBuffer API could live without it. Note that without an explicit end of stream flag, it might not be easily possible to encode the last block - if you have a 16 byte buffer with 14 bytes filled, you cannot be sure whether you can encode the last two bytes to XYZ= because it is end of stream or you have to wait for the next byte. An option without the flag might be to assume when the input is less than 3 bytes that it is the end of the stream. But I doubt it will make the API easier to understand or use. :-) Regards, Michael
Re: Review/comment needed for the new public java.util.Base64 class
Yes, I'm trying to figure out where to have this position info stored... On 10/12/2012 01:48 PM, Michael Schierl wrote: Am 12.10.2012 22:12, schrieb Xueming Shen: Hi, It appears to be possible to do something like boolean de/encode(ByteBuffer src, ByteBuffer dst); returns true if all remaining bytes in src are en/decoded, false, the dst is not big enough for all output bytes, the src.position() will be advanced to the position of next un-en/decoded byte, dst.position() will be updated accordingly as well. to avoid the en/decoder to hold an internal state. If it was unclear, that was what I tried to suggest. I thought you refered to the automatic adding of newlines when you said you'd need internal state (although that state is small enough - how many encoded bytes remaining before adding the next newline - so that it might be possible to pass it as a parameter, too). I'm not sure how important that adding of newlines really is (in all cases where I was guilty of using sun.misc.Base64Encoder, I always replaced them out after encoding), so maybe the ByteBuffer API could live without it. Note that without an explicit end of stream flag, it might not be easily possible to encode the last block - if you have a 16 byte buffer with 14 bytes filled, you cannot be sure whether you can encode the last two bytes to XYZ= because it is end of stream or you have to wait for the next byte. An option without the flag might be to assume when the input is less than 3 bytes that it is the end of the stream. But I doubt it will make the API easier to understand or use. :-) Regards, Michael
Re: Review/comment needed for the new public java.util.Base64 class
Am 12.10.2012 22:12, schrieb Xueming Shen: > Hi, > > It appears to be possible to do something like > > boolean de/encode(ByteBuffer src, ByteBuffer dst); > > returns true if all remaining bytes in src are en/decoded, false, the dst > is not big enough for all output bytes, the src.position() will be advanced > to the position of next un-en/decoded byte, dst.position() will be updated > accordingly as well. > > to avoid the en/decoder to hold an internal state. If it was unclear, that was what I tried to suggest. I thought you refered to the automatic adding of newlines when you said you'd need internal state (although that state is small enough - how many encoded bytes remaining before adding the next newline - so that it might be possible to pass it as a parameter, too). I'm not sure how important that adding of newlines really is (in all cases where I was guilty of using sun.misc.Base64Encoder, I always replaced them out after encoding), so maybe the ByteBuffer API could live without it. Note that without an explicit end of stream flag, it might not be easily possible to encode the last block - if you have a 16 byte buffer with 14 bytes filled, you cannot be sure whether you can encode the last two bytes to XYZ= because it is end of stream or you have to wait for the next byte. An option without the flag might be to assume when the input is less than 3 bytes that it is the end of the stream. But I doubt it will make the API easier to understand or use. :-) Regards, Michael
Re: Review/comment needed for the new public java.util.Base64 class
On 12/10/2012 21:12, Xueming Shen wrote: Hi, It appears to be possible to do something like boolean de/encode(ByteBuffer src, ByteBuffer dst); returns true if all remaining bytes in src are en/decoded, false, the dst is not big enough for all output bytes, the src.position() will be advanced to the position of next un-en/decoded byte, dst.position() will be updated accordingly as well. to avoid the en/decoder to hold an internal state. Right, there shouldn't be any need for internal state and src.remaining() will indicate if there are any bytes in src that couldn't be encoded into dst because of insufficient space. The return value should probably be the number of bytes written to dst. -Alan
Re: Review/comment needed for the new public java.util.Base64 class
Hi, It appears to be possible to do something like boolean de/encode(ByteBuffer src, ByteBuffer dst); returns true if all remaining bytes in src are en/decoded, false, the dst is not big enough for all output bytes, the src.position() will be advanced to the position of next un-en/decoded byte, dst.position() will be updated accordingly as well. to avoid the en/decoder to hold an internal state. -Sherman On 10/12/2012 12:47 PM, Ariel Weisberg wrote: Hi, Thanks for doing this BTW. I think that including ByteBuffer API even if it isn't as efficient as raw byte arrays is better then not having it in the API at all. If that means allocating a byte array for the output and then doing a put on the ByteBuffer that is fine. Down the line if someone has a particularly powerful itch to scratch WRT to performance they can add more code to the library to make it more efficient at handling them and then everyone will benefit or they can do their own implementation. Thanks, Ariel On Fri, Oct 12, 2012, at 02:56 PM, Xueming Shen wrote: Hi, The exactly reason I was trying to skip en/decode(ByteBuffer in, ByteByuffer out) for now. I'm struggling with/can't make up my mind on whether or not the en/decoder should have internal state, like the charset en/decoder. It appears the API is being pushed going that direction though:-) -Sherman On 10/12/2012 11:39 AM, Michael Schierl wrote: Hello, (sorry if the threading is broken, but I was not subscribed to the list and only found the discussion on Twitter and read it in the mailing list archive) Ariel Weisberg wrote on Thu Oct 11 11:30:56 PDT 2012: I know that ByteBuffers are pain, but I did notice that you can't specify a source/dest pair when using ByteBuffers and that ByteBuffers without arrays have to be copied. I don't see a simple safe way to normalize access to them the way you can if everything is a byte array. Agreed. One of the advantages of using byte buffers is reducing allocations, resulting in fewer garbage collections. In addition, in this implementation the ByteBuffers have to contain the full data. What I like about most byte buffers APIs is that I can pass in a ByteBuffer with incomplete data or maybe an output ByteBuffer that is too small to hold the complete result, and it will just process as much as it can, and leave the rest for the next round (which should work well for Base64, too, as it always processes chunks of 3 or 4 bytes). So, a useful ByteBuffer API in my opinion needs a method like public boolean encode(ByteBuffer in, ByteBuffer out, boolean endOfInput); public boolean decode(ByteBuffer in, ByteBuffer out, boolean endOfInput); (similar to CharsetEncoder#encode) that can process partial input and will return true if all processable input has been processed (i. e. in has to be refilled) or false if some input could not have been processed (i. e. out has to be flushed). Users have to call it again and again until they call it with endOfInput=true and get true back (Using an enum as result similar to CoderResult#UNDERFLOW and CoderResult#OVERFLOW might be another option if the boolean results are too cryptic). Having a ByteBuffer Base64 API might be useful (although I'm not sure yet if I ever need it), but as it is now, it is mostly useless for serious ByteBuffer usage, as if I have to split and copy the data manually anyway, I can as well use the array APIs. Just my 0.02 EUR, Michael
Re: Review/comment needed for the new public java.util.Base64 class
Hi, Thanks for doing this BTW. I think that including ByteBuffer API even if it isn't as efficient as raw byte arrays is better then not having it in the API at all. If that means allocating a byte array for the output and then doing a put on the ByteBuffer that is fine. Down the line if someone has a particularly powerful itch to scratch WRT to performance they can add more code to the library to make it more efficient at handling them and then everyone will benefit or they can do their own implementation. Thanks, Ariel On Fri, Oct 12, 2012, at 02:56 PM, Xueming Shen wrote: > Hi, > > The exactly reason I was trying to skip en/decode(ByteBuffer in, > ByteByuffer out) > for now. I'm struggling with/can't make up my mind on whether or not the > en/decoder > should have internal state, like the charset en/decoder. It appears the > API is being > pushed going that direction though:-) > > -Sherman > > On 10/12/2012 11:39 AM, Michael Schierl wrote: > > Hello, > > > > (sorry if the threading is broken, but I was not subscribed to the list > > and only found the discussion on Twitter and read it in the mailing list > > archive) > > > > Ariel Weisberg wrote on Thu Oct 11 11:30:56 PDT 2012: > >> I know that ByteBuffers are pain, but I did notice that you can't > >> specify a source/dest pair when using ByteBuffers and that ByteBuffers > >> without arrays have to be copied. I don't see a simple safe way to > >> normalize access to them the way you can if everything is a byte array. > > Agreed. One of the advantages of using byte buffers is reducing > > allocations, resulting in fewer garbage collections. > > > > In addition, in this implementation the ByteBuffers have to contain the > > full data. > > > > What I like about most byte buffers APIs is that I can pass in a > > ByteBuffer with incomplete data or maybe an output ByteBuffer that is > > too small to hold the complete result, and it will just process as much > > as it can, and leave the rest for the next round (which should work well > > for Base64, too, as it always processes chunks of 3 or 4 bytes). > > > > So, a useful ByteBuffer API in my opinion needs a method like > > > > public boolean encode(ByteBuffer in, ByteBuffer out, > >boolean endOfInput); > > > > public boolean decode(ByteBuffer in, ByteBuffer out, > >boolean endOfInput); > > > > (similar to CharsetEncoder#encode) that can process partial input and > > will return true if all processable input has been processed (i. e. in > > has to be refilled) or false if some input could not have been processed > > (i. e. out has to be flushed). > > > > Users have to call it again and again until they call it with > > endOfInput=true and get true back (Using an enum as result similar to > > CoderResult#UNDERFLOW and CoderResult#OVERFLOW might be another option > > if the boolean results are too cryptic). > > > > Having a ByteBuffer Base64 API might be useful (although I'm not sure > > yet if I ever need it), but as it is now, it is mostly useless for > > serious ByteBuffer usage, as if I have to split and copy the data > > manually anyway, I can as well use the array APIs. > > > > > > Just my 0.02 EUR, > > > > > > Michael >
Re: Review/comment needed for the new public java.util.Base64 class
Hi, The exactly reason I was trying to skip en/decode(ByteBuffer in, ByteByuffer out) for now. I'm struggling with/can't make up my mind on whether or not the en/decoder should have internal state, like the charset en/decoder. It appears the API is being pushed going that direction though:-) -Sherman On 10/12/2012 11:39 AM, Michael Schierl wrote: Hello, (sorry if the threading is broken, but I was not subscribed to the list and only found the discussion on Twitter and read it in the mailing list archive) Ariel Weisberg wrote on Thu Oct 11 11:30:56 PDT 2012: I know that ByteBuffers are pain, but I did notice that you can't specify a source/dest pair when using ByteBuffers and that ByteBuffers without arrays have to be copied. I don't see a simple safe way to normalize access to them the way you can if everything is a byte array. Agreed. One of the advantages of using byte buffers is reducing allocations, resulting in fewer garbage collections. In addition, in this implementation the ByteBuffers have to contain the full data. What I like about most byte buffers APIs is that I can pass in a ByteBuffer with incomplete data or maybe an output ByteBuffer that is too small to hold the complete result, and it will just process as much as it can, and leave the rest for the next round (which should work well for Base64, too, as it always processes chunks of 3 or 4 bytes). So, a useful ByteBuffer API in my opinion needs a method like public boolean encode(ByteBuffer in, ByteBuffer out, boolean endOfInput); public boolean decode(ByteBuffer in, ByteBuffer out, boolean endOfInput); (similar to CharsetEncoder#encode) that can process partial input and will return true if all processable input has been processed (i. e. in has to be refilled) or false if some input could not have been processed (i. e. out has to be flushed). Users have to call it again and again until they call it with endOfInput=true and get true back (Using an enum as result similar to CoderResult#UNDERFLOW and CoderResult#OVERFLOW might be another option if the boolean results are too cryptic). Having a ByteBuffer Base64 API might be useful (although I'm not sure yet if I ever need it), but as it is now, it is mostly useless for serious ByteBuffer usage, as if I have to split and copy the data manually anyway, I can as well use the array APIs. Just my 0.02 EUR, Michael
Re: Review/comment needed for the new public java.util.Base64 class
Hello, (sorry if the threading is broken, but I was not subscribed to the list and only found the discussion on Twitter and read it in the mailing list archive) Ariel Weisberg wrote on Thu Oct 11 11:30:56 PDT 2012: > I know that ByteBuffers are pain, but I did notice that you can't > specify a source/dest pair when using ByteBuffers and that ByteBuffers > without arrays have to be copied. I don't see a simple safe way to > normalize access to them the way you can if everything is a byte array. Agreed. One of the advantages of using byte buffers is reducing allocations, resulting in fewer garbage collections. In addition, in this implementation the ByteBuffers have to contain the full data. What I like about most byte buffers APIs is that I can pass in a ByteBuffer with incomplete data or maybe an output ByteBuffer that is too small to hold the complete result, and it will just process as much as it can, and leave the rest for the next round (which should work well for Base64, too, as it always processes chunks of 3 or 4 bytes). So, a useful ByteBuffer API in my opinion needs a method like public boolean encode(ByteBuffer in, ByteBuffer out, boolean endOfInput); public boolean decode(ByteBuffer in, ByteBuffer out, boolean endOfInput); (similar to CharsetEncoder#encode) that can process partial input and will return true if all processable input has been processed (i. e. in has to be refilled) or false if some input could not have been processed (i. e. out has to be flushed). Users have to call it again and again until they call it with endOfInput=true and get true back (Using an enum as result similar to CoderResult#UNDERFLOW and CoderResult#OVERFLOW might be another option if the boolean results are too cryptic). Having a ByteBuffer Base64 API might be useful (although I'm not sure yet if I ever need it), but as it is now, it is mostly useless for serious ByteBuffer usage, as if I have to split and copy the data manually anyway, I can as well use the array APIs. Just my 0.02 EUR, Michael
Re: Review/comment needed for the new public java.util.Base64 class
On 11/10/2012 19:30, Ariel Weisberg wrote: : I know that ByteBuffers are pain, but I did notice that you can't specify a source/dest pair when using ByteBuffers and that ByteBuffers without arrays have to be copied. I don't see a simple safe way to normalize access to them the way you can if everything is a byte array. I agree, encode/decode methods where the destination is a given ByteBuffer would be desirable (and probably more useful than returning a new ByteBuffer each time). Byte arrays are so commonly used that it probably justifying having both variants as proposed. -Alan
Re: Review/comment needed for the new public java.util.Base64 class
On 10/11/2012 02:43 AM, Stephen Colebourne wrote: There are lots of other public base 64 implementations to test/check against. http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html http://migbase64.sourceforge.net/ (claims to be fast) I did a quick performance check against migbase64 with the basic base64 de/encoding using this non-scientific benchmark. http://cr.openjdk.java.net/~sherman/4235519/PermBase64.java Here is the scores http://cr.openjdk.java.net/~sherman/4235519/scores It's a tie, j.u.Base64 wins the decoding, but needs some work on encoding side. java -server PermBase64 20 50 j.u.Base64.encode() : 1390212 migBase64.encode() : 720517 j.u.Base64.decode() : 1200642 migBase64.decode() : 2070015 java -server PermBase64 20 100 j.u.Base64.encode() : 1239407 migBase64.encode() : 740404 j.u.Base64.decode() : 1257092 migBase64.decode() : 2012910 java -server PermBase64 20 1000 j.u.Base64.encode() : 1062212 migBase64.encode() : 657342 j.u.Base64.decode() : 1133740 migBase64.decode() : 1930612 java -client PermBase64 20 50 j.u.Base64.encode() : 961331 migBase64.encode() : 875635 j.u.Base64.decode() : 1528790 migBase64.decode() : 2188473 java -client PermBase64 20 100 j.u.Base64.encode() : 966453 migBase64.encode() : 858082 j.u.Base64.decode() : 1459159 migBase64.decode() : 2115045 java -client PermBase64 20 1000 j.u.Base64.encode() : 954401 migBase64.encode() : 854903 j.u.Base64.decode() : 1444603 migBase64.decode() : 2038865
Re: Review/comment needed for the new public java.util.Base64 class
Hi, It looks like it tackles the issue of encoding binary data as text in an isolated fashion and doesn't seem open to adding other ways of encoding binary data as text such as hex or yEnc. Having a common interface in java.util for different encodings would be great. I am not asking for more implementation, just a more abstract interface for Decoder and Encoder. I know that ByteBuffers are pain, but I did notice that you can't specify a source/dest pair when using ByteBuffers and that ByteBuffers without arrays have to be copied. I don't see a simple safe way to normalize access to them the way you can if everything is a byte array. Thanks, Ariel On Thu, Oct 11, 2012, at 12:40 PM, Xueming Shen wrote: > On 10/11/2012 02:43 AM, Stephen Colebourne wrote: > > The class level javadoc is quite short. It also has hyperlinks in the > > first sentence, which means that they are visible in package level > > javadoc. Consider having no hyperlinks in the first sentence, and > > expanding a little on what base 64 is. > > > > Sure, will come up something longer. > > There are lots of other public base 64 implementations to test/check > > against. > > http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html > > http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html > > http://migbase64.sourceforge.net/ (claims to be fast) > > I did compare the result against the apache version, the difference > appears to > be the apache (1)append line feeds at the end of the encoded bytes > (2)skip the > padding '=' characters for URL-safe style. Will try other > implementations. > > > The arrays are defined inconsistently within the code (3 styles). > >private Encoder(byte[] base64, byte[] newline, int linemax) > >byte [] getBytes(ByteBuffer bb) > >private static final byte toBase64[] = > > I'd strongly encourage one style be used, and that it is the first of > > the three above. > > Good catch, the later two are not intentional, the leftover of old code. > webrev has > been updated according. > > Thanks! > -Sherman > > > > Stephen > > > > > > On 10 October 2012 18:54, Xueming Shen wrote: > >> A standard/public API for Base64 encoding and decoding has been long > >> overdue. JDK8 has a JEP [1] for this particular request. > >> > >> Here is the draft proposal to add a public Base64 utility class for JDK8. > >> > >> http://cr.openjdk.java.net/~sherman/4235519/webrev > >> > >> This class basically provides 4 variants of Base64 encoding scheme, > >> > >> (1) the default RFC 4648, which uses standard mapping, no line breaks, > >> (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" > >> and > >> "/" for the mapping > >> (3) the default MIME style, as in RFC 2045 (and its earlier versions), > >> which > >> uses > >> "standard" base64 mapping, a 76 characters per line limit and uses > >> crlf > >> pair > >> \r\n for line break. > >> (4) an extend-able MIME base64, with the char-per-line and the line > >> separator(s) > >> specified by developer. > >> > >> The encoder/decoder now provides encoding and decoding for byte[], String, > >> ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", > >> which we believe/hope should satisfy most of the common use cases. > >> Additional > >> method(s) can be added if strongly desired. > >> > >> We tried couple slightly different styles of design for such this "simple" > >> utility > >> class [2]. We kinda concluded that the version proposed probably provides > >> the best balance among readability, usability and extensibility. > >> > >> Please help review and comment on the API and implementation. > >> > >> Thanks! > >> -Sherman > >> > >> [1] http://openjdk.java.net/jeps/135 > >> [2] http://cr.openjdk.java.net/~sherman/base64/ >
Re: Review/comment needed for the new public java.util.Base64 class
On 10/11/2012 02:43 AM, Stephen Colebourne wrote: The class level javadoc is quite short. It also has hyperlinks in the first sentence, which means that they are visible in package level javadoc. Consider having no hyperlinks in the first sentence, and expanding a little on what base 64 is. Sure, will come up something longer. There are lots of other public base 64 implementations to test/check against. http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html http://migbase64.sourceforge.net/ (claims to be fast) I did compare the result against the apache version, the difference appears to be the apache (1)append line feeds at the end of the encoded bytes (2)skip the padding '=' characters for URL-safe style. Will try other implementations. The arrays are defined inconsistently within the code (3 styles). private Encoder(byte[] base64, byte[] newline, int linemax) byte [] getBytes(ByteBuffer bb) private static final byte toBase64[] = I'd strongly encourage one style be used, and that it is the first of the three above. Good catch, the later two are not intentional, the leftover of old code. webrev has been updated according. Thanks! -Sherman Stephen On 10 October 2012 18:54, Xueming Shen wrote: A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/
Re: Review/comment needed for the new public java.util.Base64 class
The class level javadoc is quite short. It also has hyperlinks in the first sentence, which means that they are visible in package level javadoc. Consider having no hyperlinks in the first sentence, and expanding a little on what base 64 is. There are lots of other public base 64 implementations to test/check against. http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html http://migbase64.sourceforge.net/ (claims to be fast) The arrays are defined inconsistently within the code (3 styles). private Encoder(byte[] base64, byte[] newline, int linemax) byte [] getBytes(ByteBuffer bb) private static final byte toBase64[] = I'd strongly encourage one style be used, and that it is the first of the three above. Stephen On 10 October 2012 18:54, Xueming Shen wrote: > > A standard/public API for Base64 encoding and decoding has been long > overdue. JDK8 has a JEP [1] for this particular request. > > Here is the draft proposal to add a public Base64 utility class for JDK8. > > http://cr.openjdk.java.net/~sherman/4235519/webrev > > This class basically provides 4 variants of Base64 encoding scheme, > > (1) the default RFC 4648, which uses standard mapping, no line breaks, > (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" > and > "/" for the mapping > (3) the default MIME style, as in RFC 2045 (and its earlier versions), which > uses > "standard" base64 mapping, a 76 characters per line limit and uses crlf > pair > \r\n for line break. > (4) an extend-able MIME base64, with the char-per-line and the line > separator(s) > specified by developer. > > The encoder/decoder now provides encoding and decoding for byte[], String, > ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", > which we believe/hope should satisfy most of the common use cases. > Additional > method(s) can be added if strongly desired. > > We tried couple slightly different styles of design for such this "simple" > utility > class [2]. We kinda concluded that the version proposed probably provides > the best balance among readability, usability and extensibility. > > Please help review and comment on the API and implementation. > > Thanks! > -Sherman > > [1] http://openjdk.java.net/jeps/135 > [2] http://cr.openjdk.java.net/~sherman/base64/
Re: Review/comment needed for the new public java.util.Base64 class
On 11/10/2012 04:15, Xueming Shen wrote: There is no plan yet. The sun.misc.BASE64En/Decoder should already trigger a compiler warning for it's a sun private API. @Deprecated annotation might be a good fit. -Sherman Yes, I think we should deprecate them. The other thing suggested in the JEP is that we would also look usages in the JDK, also other implementations (as there are many) to see if they can be retired. This can of course be done later. It also an opportunity for contributions, say for example someone might like to look at java.util.prefs to retire the implementation there. Another one that might be able to retire is the implementation in the com.sun.net.httpserver.BasicAuthenticator class and there are others. -Alan
Re: Review/comment needed for the new public java.util.Base64 class
On 10/10/12 8:39 PM, Weijun Wang wrote: On 10/11/2012 11:32 AM, Xueming Shen wrote: On 10/10/12 8:16 PM, Weijun Wang wrote: On 10/11/2012 11:09 AM, Xueming Shen wrote: On 10/10/12 6:51 PM, Weijun Wang wrote: Several questions: 1. In encode0(byte[] src, byte[] dst) 281 if (linepos == linemax && (atom != 0 || sp < sl)) { Maybe atom != 0 is not necessary? The logic here is that if we reached the last atom (atom == 0), but if there is still byte(s) in src (sp < sl), we will need to output the last special unit, which has one or two padding charactere '=', in this case, we still need to output the line separator(s). But when atom != 0, it seems sp < sl should always be true. Yes, the sp < sl part only counts if atom == 0. Means if it's NOT last atom, (atom != 0) output the line feeds, if it IS the last atom (atom == 0), only output the line feeds if there are leftover byte (sp < sl), which means the last 4-byte unit (with one or two '=') will be after this line feed. So, the value of sp < sl is always the same as (atom != 0 || sp < sl). That's why I said atom != 0 is not necessary. You're absolutely correct! Somehow I kept thinking you were saying sp < sl is not necessary:-) not the other way around. webrev will be updated shortly. Thanks! -Sherman -Max -Sherman -Max 2. Is it necessary to explicitly mention in the spec that there is no CrLf at the end of a MIME encoded string? I'm struggling with which is the appropriate/desired behavior, output the crlf for the last line or not. Apache's common coder appears to append the crlf for the last line, but our sun.misc version does not (but sun.misc.BASE64 actually appends the line separator if the last line happens to have 76 characters, I would assume this is a bug though). The current implement tries to match the sun.misc. I'm happy to go either way. But, as you suggested, it might be worth explicitly describing whatever behavior we choose. 3. The test confirms decoding can correctly reverse the encoding but it says nothing about the correctness of the encoding. Maybe we can just use "10. Test Vectors" of RFC 4648? I do have a version of TestBase64 actually compares encoded results of j.u.Base64, sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe I should at least plug in the code for comparing with sun.misc.Base64Encoder. Thanks! -Sherman On 10/11/2012 01:54 AM, Xueming Shen wrote: A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/
Re: Review/comment needed for the new public java.util.Base64 class
On 10/11/2012 11:32 AM, Xueming Shen wrote: On 10/10/12 8:16 PM, Weijun Wang wrote: On 10/11/2012 11:09 AM, Xueming Shen wrote: On 10/10/12 6:51 PM, Weijun Wang wrote: Several questions: 1. In encode0(byte[] src, byte[] dst) 281 if (linepos == linemax && (atom != 0 || sp < sl)) { Maybe atom != 0 is not necessary? The logic here is that if we reached the last atom (atom == 0), but if there is still byte(s) in src (sp < sl), we will need to output the last special unit, which has one or two padding charactere '=', in this case, we still need to output the line separator(s). But when atom != 0, it seems sp < sl should always be true. Yes, the sp < sl part only counts if atom == 0. Means if it's NOT last atom, (atom != 0) output the line feeds, if it IS the last atom (atom == 0), only output the line feeds if there are leftover byte (sp < sl), which means the last 4-byte unit (with one or two '=') will be after this line feed. So, the value of sp < sl is always the same as (atom != 0 || sp < sl). That's why I said atom != 0 is not necessary. -Max -Sherman -Max 2. Is it necessary to explicitly mention in the spec that there is no CrLf at the end of a MIME encoded string? I'm struggling with which is the appropriate/desired behavior, output the crlf for the last line or not. Apache's common coder appears to append the crlf for the last line, but our sun.misc version does not (but sun.misc.BASE64 actually appends the line separator if the last line happens to have 76 characters, I would assume this is a bug though). The current implement tries to match the sun.misc. I'm happy to go either way. But, as you suggested, it might be worth explicitly describing whatever behavior we choose. 3. The test confirms decoding can correctly reverse the encoding but it says nothing about the correctness of the encoding. Maybe we can just use "10. Test Vectors" of RFC 4648? I do have a version of TestBase64 actually compares encoded results of j.u.Base64, sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe I should at least plug in the code for comparing with sun.misc.Base64Encoder. Thanks! -Sherman On 10/11/2012 01:54 AM, Xueming Shen wrote: A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/
Re: Review/comment needed for the new public java.util.Base64 class
On 10/10/12 8:16 PM, Weijun Wang wrote: On 10/11/2012 11:09 AM, Xueming Shen wrote: On 10/10/12 6:51 PM, Weijun Wang wrote: Several questions: 1. In encode0(byte[] src, byte[] dst) 281 if (linepos == linemax && (atom != 0 || sp < sl)) { Maybe atom != 0 is not necessary? The logic here is that if we reached the last atom (atom == 0), but if there is still byte(s) in src (sp < sl), we will need to output the last special unit, which has one or two padding charactere '=', in this case, we still need to output the line separator(s). But when atom != 0, it seems sp < sl should always be true. Yes, the sp < sl part only counts if atom == 0. Means if it's NOT last atom, (atom != 0) output the line feeds, if it IS the last atom (atom == 0), only output the line feeds if there are leftover byte (sp < sl), which means the last 4-byte unit (with one or two '=') will be after this line feed. -Sherman -Max 2. Is it necessary to explicitly mention in the spec that there is no CrLf at the end of a MIME encoded string? I'm struggling with which is the appropriate/desired behavior, output the crlf for the last line or not. Apache's common coder appears to append the crlf for the last line, but our sun.misc version does not (but sun.misc.BASE64 actually appends the line separator if the last line happens to have 76 characters, I would assume this is a bug though). The current implement tries to match the sun.misc. I'm happy to go either way. But, as you suggested, it might be worth explicitly describing whatever behavior we choose. 3. The test confirms decoding can correctly reverse the encoding but it says nothing about the correctness of the encoding. Maybe we can just use "10. Test Vectors" of RFC 4648? I do have a version of TestBase64 actually compares encoded results of j.u.Base64, sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe I should at least plug in the code for comparing with sun.misc.Base64Encoder. Thanks! -Sherman On 10/11/2012 01:54 AM, Xueming Shen wrote: A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/
Re: Review/comment needed for the new public java.util.Base64 class
On 10/11/2012 11:09 AM, Xueming Shen wrote: On 10/10/12 6:51 PM, Weijun Wang wrote: Several questions: 1. In encode0(byte[] src, byte[] dst) 281 if (linepos == linemax && (atom != 0 || sp < sl)) { Maybe atom != 0 is not necessary? The logic here is that if we reached the last atom (atom == 0), but if there is still byte(s) in src (sp < sl), we will need to output the last special unit, which has one or two padding charactere '=', in this case, we still need to output the line separator(s). But when atom != 0, it seems sp < sl should always be true. -Max 2. Is it necessary to explicitly mention in the spec that there is no CrLf at the end of a MIME encoded string? I'm struggling with which is the appropriate/desired behavior, output the crlf for the last line or not. Apache's common coder appears to append the crlf for the last line, but our sun.misc version does not (but sun.misc.BASE64 actually appends the line separator if the last line happens to have 76 characters, I would assume this is a bug though). The current implement tries to match the sun.misc. I'm happy to go either way. But, as you suggested, it might be worth explicitly describing whatever behavior we choose. 3. The test confirms decoding can correctly reverse the encoding but it says nothing about the correctness of the encoding. Maybe we can just use "10. Test Vectors" of RFC 4648? I do have a version of TestBase64 actually compares encoded results of j.u.Base64, sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe I should at least plug in the code for comparing with sun.misc.Base64Encoder. Thanks! -Sherman On 10/11/2012 01:54 AM, Xueming Shen wrote: A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/
Re: Review/comment needed for the new public java.util.Base64 class
There is no plan yet. The sun.misc.BASE64En/Decoder should already trigger a compiler warning for it's a sun private API. @Deprecated annotation might be a good fit. -Sherman On 10/10/12 1:40 PM, Joe Darcy wrote: Hello, On 10/10/2012 1:03 PM, Iris Clark wrote: Hi, Sherman. I'm glad to see this coming in. As you said, long overdue. I'm curious. What are the plans are to encourage migration from the JDK private and unsupported sun.misc.BASE64{En,DE}coder classes? Compile-time warning? Documentation? Something else? On that point, sun.misc.BASE64{En,DE}coder may present a fine opportunity to use the @Deprecated annotation... -Joe Thanks, iris -Original Message- From: Xueming Shen Sent: Wednesday, October 10, 2012 10:55 AM To: core-libs-dev Subject: Review/comment needed for the new public java.util.Base64 class A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/
Re: Review/comment needed for the new public java.util.Base64 class
On 10/10/12 6:51 PM, Weijun Wang wrote: Several questions: 1. In encode0(byte[] src, byte[] dst) 281 if (linepos == linemax && (atom != 0 || sp < sl)) { Maybe atom != 0 is not necessary? The logic here is that if we reached the last atom (atom == 0), but if there is still byte(s) in src (sp < sl), we will need to output the last special unit, which has one or two padding charactere '=', in this case, we still need to output the line separator(s). 2. Is it necessary to explicitly mention in the spec that there is no CrLf at the end of a MIME encoded string? I'm struggling with which is the appropriate/desired behavior, output the crlf for the last line or not. Apache's common coder appears to append the crlf for the last line, but our sun.misc version does not (but sun.misc.BASE64 actually appends the line separator if the last line happens to have 76 characters, I would assume this is a bug though). The current implement tries to match the sun.misc. I'm happy to go either way. But, as you suggested, it might be worth explicitly describing whatever behavior we choose. 3. The test confirms decoding can correctly reverse the encoding but it says nothing about the correctness of the encoding. Maybe we can just use "10. Test Vectors" of RFC 4648? I do have a version of TestBase64 actually compares encoded results of j.u.Base64, sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe I should at least plug in the code for comparing with sun.misc.Base64Encoder. Thanks! -Sherman On 10/11/2012 01:54 AM, Xueming Shen wrote: A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/
Re: Review/comment needed for the new public java.util.Base64 class
Several questions: 1. In encode0(byte[] src, byte[] dst) 281 if (linepos == linemax && (atom != 0 || sp < sl)) { Maybe atom != 0 is not necessary? 2. Is it necessary to explicitly mention in the spec that there is no CrLf at the end of a MIME encoded string? 3. The test confirms decoding can correctly reverse the encoding but it says nothing about the correctness of the encoding. Maybe we can just use "10. Test Vectors" of RFC 4648? Thanks Max On 10/11/2012 01:54 AM, Xueming Shen wrote: A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/
Re: Review/comment needed for the new public java.util.Base64 class
Hello, On 10/10/2012 1:03 PM, Iris Clark wrote: Hi, Sherman. I'm glad to see this coming in. As you said, long overdue. I'm curious. What are the plans are to encourage migration from the JDK private and unsupported sun.misc.BASE64{En,DE}coder classes? Compile-time warning? Documentation? Something else? On that point, sun.misc.BASE64{En,DE}coder may present a fine opportunity to use the @Deprecated annotation... -Joe Thanks, iris -Original Message- From: Xueming Shen Sent: Wednesday, October 10, 2012 10:55 AM To: core-libs-dev Subject: Review/comment needed for the new public java.util.Base64 class A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/
RE: Review/comment needed for the new public java.util.Base64 class
Hi, Sherman. I'm glad to see this coming in. As you said, long overdue. I'm curious. What are the plans are to encourage migration from the JDK private and unsupported sun.misc.BASE64{En,DE}coder classes? Compile-time warning? Documentation? Something else? Thanks, iris -Original Message- From: Xueming Shen Sent: Wednesday, October 10, 2012 10:55 AM To: core-libs-dev Subject: Review/comment needed for the new public java.util.Base64 class A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/
Review/comment needed for the new public java.util.Base64 class
A standard/public API for Base64 encoding and decoding has been long overdue. JDK8 has a JEP [1] for this particular request. Here is the draft proposal to add a public Base64 utility class for JDK8. http://cr.openjdk.java.net/~sherman/4235519/webrev This class basically provides 4 variants of Base64 encoding scheme, (1) the default RFC 4648, which uses standard mapping, no line breaks, (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and "/" for the mapping (3) the default MIME style, as in RFC 2045 (and its earlier versions), which uses "standard" base64 mapping, a 76 characters per line limit and uses crlf pair \r\n for line break. (4) an extend-able MIME base64, with the char-per-line and the line separator(s) specified by developer. The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should satisfy most of the common use cases. Additional method(s) can be added if strongly desired. We tried couple slightly different styles of design for such this "simple" utility class [2]. We kinda concluded that the version proposed probably provides the best balance among readability, usability and extensibility. Please help review and comment on the API and implementation. Thanks! -Sherman [1] http://openjdk.java.net/jeps/135 [2] http://cr.openjdk.java.net/~sherman/base64/