subject:"Review\/comment needed for the new public java.util.Base64 class"

Re: Review/comment needed for the new public java.util.Base64 class

2012-11-30 Thread Xueming Shen


Ulf,

The base64 implementation is in TL right now. It does address some of the
issue you suggested in this email. And I remember I did try use "byte[]"
alphabets, which I don't recall bring us any benefit, but I'm not sure in which
setup.  The latest is at

http://cr.openjdk.java.net/~sherman/8004212/webrev/

in which I'm trying to fix a corner case of incorrect return value from 
decode(buf, buf).

The Base64 now is in TL does not necessary mean "the issue is closed". You
can continue suggest/comment on the latest version for any possible performance
improvement, bug fix and even API change, if necessary.

-Sherman


On 11/30/2012 02:01 PM, Ulf Zibis wrote:

Hi Sherman,

is this ssue still open ?

-Ulf


Am 03.11.2012 21:33, schrieb Ulf Zibis:

Am 30.10.2012 23:40, schrieb Xueming Shen:

I'm "confused" about the order of xxcode() and Xxcoder.
In other places e.g. charsets, we have de... before en..., which is also true 
alphabetical


should not be an issue. javadoc output should be in alphabetic order. If it is 
really
concerns you, I can do a copy/paste:-)


Yes, it doesn't matter much, but would be a nice conform style, so for me 
"personally" I would like the copy/paste:-)


- What (should) happen(s), if lineSeparator.length == 0 ?

do nothing. expected?


I would say, empty line separator should not be allowed, so you might check:
 Objects.requireNonEmpty(lineSeparator);


 603 static {
 604 Arrays.fill(fromBase64, -1);
 605 for (int i = 0; i < Encoder.toBase64.length; i++)
 606 fromBase64[Encoder.toBase64[i]] = i;
 607 fromBase64['='] = -2;
 608 }
This causes the inner Encoder class to be loaded, even if not needed. So maybe 
move the toBase64 table to the outer class.
Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?


understood. 


It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the 
end ...


but it appears the hotspot likes it the constant/fixed length lookup
table is inside encoder. 

Yes, but see my suggestion 12 lines below.


Same as you suggestion in your previous email to use
String in source and expand it during runtime. It saves about 500 bytes but 
slows
the server vm.


Are you really sure? As it only runs once at class init, JIT should not compile 
this code.
Executing the bytecode to init the final int[], value for value, by interpreter 
should not be faster as expanding a String source in a loop.


Those repeating lines of "isURL? " is also supposed to be a
performance tweak to eliminate the array boundary check in loop.


Yep, so I was thinking about:
 653 private final int[] base64;
 654 private final boolean isMIME;
 655
 656 private Decoder(boolean isURL, boolean isMIME) {
 657 base64 = isURL ? fromBase64URL : fromBase64;
 658 this.isMIME = isMIME;
 659 }
.
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst) {
 912 int[] base64 = this.base64; // local copy for performance 
reasons (but maybe superfluous if HotSpot is intelligent enough)
or:
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] 
base64) {
.


Why you mix the algorithms to check the padding? :
 824 if (b == -2) {   // padding byte
 890 if (b == '=') {


It is supposed reduce one "if" check for normal base64 character inside the
loop. I'm not that sure it really helps though.


 925 if (b == '=') {
--> causes one additional "if" in the _main_ path of the loop, so should be 
slower for regular input

 859 if (b == -2) {   // padding byte
--> causes one additional "if" in the _side_ path of the loop, so should only 
affect irregular input
Maybe the following code is little faster as no loading of the constant '-2' is 
required:
 858 if ((b = base64[b]) < 0) {
 859 if (++b < 0) {   // padding byte '='


Once again (the following was meant for the decode algorithm, not 
initialisation):
Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?
Retrieving the bytes by b = base64[x] then could be done without address-shift 
and smaller/faster LoadB operations could be used by JIT. In an int[] table, 
the index offset must be shifted by 2 before. Maybe this doesn't directly 
affect the performance on x86/IA64 CPU, but wastes space in CPU cache for other 
tasks as a side effect. On ARM architectures I imagine, directly addressing a 
byte-stepped table could be faster than addressing a 4-byte-stepped table. At 
least the footprint of the table would be smaller.

Little nit:
You could delete line 641 for conformity with 629.

-Ulf

Re: Review/comment needed for the new public java.util.Base64 class

2012-11-30 Thread Ulf Zibis


Hi Sherman,

is this ssue still open ?

-Ulf


Am 03.11.2012 21:33, schrieb Ulf Zibis:

Am 30.10.2012 23:40, schrieb Xueming Shen:

I'm "confused" about the order of xxcode() and Xxcoder.
In other places e.g. charsets, we have de... before en..., which is also true 
alphabetical


should not be an issue. javadoc output should be in alphabetic order. If it is 
really
concerns you, I can do a copy/paste:-)


Yes, it doesn't matter much, but would be a nice conform style, so for me "personally" I would 
like the copy/paste:-)



- What (should) happen(s), if lineSeparator.length == 0 ?

do nothing. expected?


I would say, empty line separator should not be allowed, so you might check:
 Objects.requireNonEmpty(lineSeparator);


 603 static {
 604 Arrays.fill(fromBase64, -1);
 605 for (int i = 0; i < Encoder.toBase64.length; i++)
 606 fromBase64[Encoder.toBase64[i]] = i;
 607 fromBase64['='] = -2;
 608 }
This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 
table to the outer class.

Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?


understood. 


It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the 
end ...


but it appears the hotspot likes it the constant/fixed length lookup
table is inside encoder. 

Yes, but see my suggestion 12 lines below.


Same as you suggestion in your previous email to use
String in source and expand it during runtime. It saves about 500 bytes but 
slows
the server vm.


Are you really sure? As it only runs once at class init, JIT should not compile 
this code.
Executing the bytecode to init the final int[], value for value, by interpreter should not be 
faster as expanding a String source in a loop.



Those repeating lines of "isURL? " is also supposed to be a
performance tweak to eliminate the array boundary check in loop.


Yep, so I was thinking about:
 653 private final int[] base64;
 654 private final boolean isMIME;
 655
 656 private Decoder(boolean isURL, boolean isMIME) {
 657 base64 = isURL ? fromBase64URL : fromBase64;
 658 this.isMIME = isMIME;
 659 }
.
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst) {
 912 int[] base64 = this.base64; // local copy for performance reasons (but maybe 
superfluous if HotSpot is intelligent enough)

or:
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] 
base64) {
.


Why you mix the algorithms to check the padding? :
 824 if (b == -2) {   // padding byte
 890 if (b == '=') {


It is supposed reduce one "if" check for normal base64 character inside the
loop. I'm not that sure it really helps though.


 925 if (b == '=') {
--> causes one additional "if" in the _main_ path of the loop, so should be 
slower for regular input

 859 if (b == -2) {   // padding byte
--> causes one additional "if" in the _side_ path of the loop, so should only 
affect irregular input
Maybe the following code is little faster as no loading of the constant '-2' is 
required:
 858 if ((b = base64[b]) < 0) {
 859 if (++b < 0) {   // padding byte '='


Once again (the following was meant for the decode algorithm, not 
initialisation):
Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?
Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster 
LoadB operations could be used by JIT. In an int[] table, the index offset must be shifted by 2 
before. Maybe this doesn't directly affect the performance on x86/IA64 CPU, but wastes space in 
CPU cache for other tasks as a side effect. On ARM architectures I imagine, directly addressing a 
byte-stepped table could be faster than addressing a 4-byte-stepped table. At least the footprint 
of the table would be smaller.


Little nit:
You could delete line 641 for conformity with 629.

-Ulf

Re: Review/comment needed for the new public java.util.Base64 class

2012-11-03 Thread Ulf Zibis


Am 30.10.2012 23:40, schrieb Xueming Shen:

I'm "confused" about the order of xxcode() and Xxcoder.
In other places e.g. charsets, we have de... before en..., which is also true 
alphabetical


should not be an issue. javadoc output should be in alphabetic order. If it is 
really
concerns you, I can do a copy/paste:-)


Yes, it doesn't matter much, but would be a nice conform style, so for me "personally" I would like 
the copy/paste:-)



- What (should) happen(s), if lineSeparator.length == 0 ?

do nothing. expected?


I would say, empty line separator should not be allowed, so you might check:
 Objects.requireNonEmpty(lineSeparator);


 603 static {
 604 Arrays.fill(fromBase64, -1);
 605 for (int i = 0; i < Encoder.toBase64.length; i++)
 606 fromBase64[Encoder.toBase64[i]] = i;
 607 fromBase64['='] = -2;
 608 }
This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 
table to the outer class.

Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?


understood. 


It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the 
end ...


but it appears the hotspot likes it the constant/fixed length lookup
table is inside encoder. 

Yes, but see my suggestion 12 lines below.


Same as you suggestion in your previous email to use
String in source and expand it during runtime. It saves about 500 bytes but 
slows
the server vm.


Are you really sure? As it only runs once at class init, JIT should not compile 
this code.
Executing the bytecode to init the final int[], value for value, by interpreter should not be faster 
as expanding a String source in a loop.



Those repeating lines of "isURL? " is also supposed to be a
performance tweak to eliminate the array boundary check in loop.


Yep, so I was thinking about:
 653 private final int[] base64;
 654 private final boolean isMIME;
 655
 656 private Decoder(boolean isURL, boolean isMIME) {
 657 base64 = isURL ? fromBase64URL : fromBase64;
 658 this.isMIME = isMIME;
 659 }
.
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst) {
 912 int[] base64 = this.base64; // local copy for performance reasons (but maybe 
superfluous if HotSpot is intelligent enough)

or:
 911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] 
base64) {
.


Why you mix the algorithms to check the padding? :
 824 if (b == -2) {   // padding byte
 890 if (b == '=') {


It is supposed reduce one "if" check for normal base64 character inside the
loop. I'm not that sure it really helps though.


 925 if (b == '=') {
--> causes one additional "if" in the _main_ path of the loop, so should be 
slower for regular input

 859 if (b == -2) {   // padding byte
--> causes one additional "if" in the _side_ path of the loop, so should only 
affect irregular input
Maybe the following code is little faster as no loading of the constant '-2' is 
required:
 858 if ((b = base64[b]) < 0) {
 859 if (++b < 0) {   // padding byte '='


Once again (the following was meant for the decode algorithm, not 
initialisation):
Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?
Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster 
LoadB operations could be used by JIT. In an int[] table, the index offset must be shifted by 2 
before. Maybe this doesn't directly affect the performance on x86/IA64 CPU, but wastes space in CPU 
cache for other tasks as a side effect. On ARM architectures I imagine, directly addressing a 
byte-stepped table could be faster than addressing a 4-byte-stepped table. At least the footprint of 
the table would be smaller.


Little nit:
You could delete line 641 for conformity with 629.

-Ulf

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-30 Thread Xueming Shen


Hi Ulf, thanks for the comments.

Webrev has been updated to address most of your comments.

http://cr.openjdk.java.net/~sherman/4235519/webrev

(1) all @param and @return tags have been "normalized".
(2) private constructor BAse64() {} has been moved up.
(3) MIMELINEMAX and CRLF have been moved into encoder.
(4) -> lineLength >>2<<2;

Your questions:




I'm "confused" about the order of xxcode() and Xxcoder.
In other places e.g. charsets, we have de... before en..., which is 
also true alphabetical


should not be an issue. javadoc output should be in alphabetic order. If 
it is really

concerns you, I can do a copy/paste:-)





- What (should) happen(s), if lineSeparator.length == 0 ?

do nothing. expected?




- Isn't one of these little faster, or at least more clear? :
lineLength -= lineLength % 4
lineLength & 0xFFFC
lineLength << 2 >> 2


 603 static {
 604 Arrays.fill(fromBase64, -1);
 605 for (int i = 0; i < Encoder.toBase64.length; i++)
 606 fromBase64[Encoder.toBase64[i]] = i;
 607 fromBase64['='] = -2;
 608 }
This causes the inner Encoder class to be loaded, even if not needed. 
So maybe move the toBase64 table to the outer class.
Have you compared performance with fromBase64 as byte[] (including 
local 'b' in decode() as byte)?


understood. but it appears the hotspot likes it the constant/fixed 
length lookup
table is inside encoder. Same as you suggestion in your previous email 
to use
String in source and expand it during runtime. It saves about 500 bytes 
but slows
the server vm. Those repeating lines of "isURL? " is also supposed 
to be a

performance tweak to eliminate the array boundary check in loop.




Why you mix the algorithms to check the padding? :
 824 if (b == -2) {   // padding byte
 890 if (b == '=') {


It is supposed reduce one "if" check for normal base64 character inside the
loop. I'm not that sure it really helps though.

-Sherman

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-30 Thread Ulf Zibis


Hi Sherman,

thanks for your details.
Has this discussion been on the list, and I have missed it?

I see a problem with hiding the "singleton" choice:
Developers might tend to repetitively invoke
public static Encoder getEncoder(int lineLength, byte[] lineSeparator)
instead reusing the the same instance.
(Note: It should not harm, to move the
 Objects.requireNonNull(lineSeparator);
to the general private constructor.)
In case of a public constant
Encoder.RFC4648
... developer would be aware about the re-usability of the encoder.

IMHO, the get...() methods are just waste of source lines and bytecode 
footprint.

But in contrast to the v4 I like the inner class approach: Base64.De/Encoder.

Little nit: In lines 89,100,127,128,138,149,159 the indentation is 5 instead 4.

-Ulf



Am 30.10.2012 02:51, schrieb Xueming Shen:

Ulf, my apology. Some how I missed your email.

We tried various prototypes for this simple utility class. See
http://cr.openjdk.java.net/~sherman/base64/

The v4 might be close to the static constant approach you suggested. While It's 
hard
to draw a clear line on which one is better, we concluded that the proposed 
approach
provides the best balance among usability, readability and extensibility. For 
example,
the "get" approach allows us to hide the "singleton" choice to the 
implementation. It
provides a consistent interface "fixed" basic/url/mime type en/decoder and the 
configurable
floating length/linefeed encoder.

-Sherman

On 10/29/12 11:15 AM, Ulf Zibis wrote:

Hi Sherman,

can you give me a short answer please?

-Ulf

Am 23.10.2012 16:57, schrieb Ulf Zibis:

Am 23.10.2012 15:04, schrieb Alan Bateman:
I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The 
reason is that the method name makes it sound like it returns a URLEncoder or or at least an 
encoder for HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a 
base64url encoder.


I'm wondering, why there are those get... methods at all.

Alternatively you could make the appropriate constructors and predifined static variants public. 
So one only should use:

Base64.Encoder encoder = new Base64.Encoder(...);
Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE;

No need for those looong method names.

-Ulf

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-30 Thread Ulf Zibis


Oops:

lineLength << 2 >> 2

I meant::
lineLength >> 2 << 2

-Ulf

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-30 Thread Ulf Zibis


Am 24.10.2012 04:56, schrieb Xueming Shen:

Thanks for the review. I hope I addressed most of the comments in the
updated webrev at

http://cr.openjdk.java.net/~sherman/4235519/webrev


Hi Sherman,

my additional comments:

I'm "confused" about the order of xxcode() and Xxcoder.
In other places e.g. charsets, we have de... before en..., which is also true 
alphabetical

1026 private Base64() {}
1027 private static final int MIMELINEMAX = 76;
1028 private static final byte[] crlf = new byte[] {'\r', '\n'};
- difficult to find, please move (more) to the beginning, at least not in between inner class 
definitions.

- I more would like CRLF than crlf

 118  * @param lineLengththe length of each output line (rounded down
 119  *  to nearest multiple of 4).
This could be interpreted as: the parameter must be rounded before, so maybe 
better:
 118  * @param lineLengththe length of each output line (if not a 
multiple of 4,
 119  *  it will be rounded down accordingly).


 126 public static Encoder getEncoder(int lineLength, byte[] lineSeparator) 
{
 127  Objects.requireNonNull(lineSeparator);
 128  return new Encoder(false, lineSeparator, lineLength / 4 * 4);
 129 }
- What (should) happen(s), if lineSeparator.length == 0 ?
- Isn't one of these little faster, or at least more clear? :
lineLength -= lineLength % 4
lineLength & 0xFFFC
lineLength << 2 >> 2

Broken indentation (why at all, compared to lines 213..215?):
 208 private final byte[] newline;
 209 private final intlinemax;
 210 private final boolean isURL;

Unconventional indentation and even broken in lines 223..224 (maybe more?):
 247  * @param   src   the byte array to encode
 248  * @param   dst   the output byte array
 249  * @returnThe number of bytes written to the output byte 
array
 250  *
 251  * @throwsIllegalArgumentException if {@code dst} does not 
have
 252  *enough space for encoding all input bytes.

More conventional:
 247  * @param  src  the byte array to encode
 248  * @param  dst  the output byte array
 249  * @return The number of bytes written to the output byte array
 250  *
 251  * @throws IllegalArgumentException if {@code dst} does not have
 252  * enough space for encoding all input bytes.

 603 static {
 604 Arrays.fill(fromBase64, -1);
 605 for (int i = 0; i < Encoder.toBase64.length; i++)
 606 fromBase64[Encoder.toBase64[i]] = i;
 607 fromBase64['='] = -2;
 608 }
This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 
table to the outer class.

Have you compared performance with fromBase64 as byte[] (including local 'b' in 
decode() as byte)?
Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster 
LoadB operations could be used by JIT. At least the footprint of the table would be smaller.


Missing spaces:
 805 return new DecInputStream(is, isURL?fromBase64URL:fromBase64, 
isMIME);
Why at all you repeat this code many times? Why not? :
 631 this.base64 = isURL ? fromBase64URL : fromBase64;
Also it would at least be helpful for readers to define final, e.g.:
 809 final int[] base64 = isURL ? fromBase64URL : fromBase64;

Why you mix the algorithms to check the padding? :
 824 if (b == -2) {   // padding byte
 890 if (b == '=') {


-Ulf

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-29 Thread Xueming Shen


Ulf, my apology. Some how I missed your email.

We tried various prototypes for this simple utility class. See
http://cr.openjdk.java.net/~sherman/base64/

The v4 might be close to the static constant approach you suggested. 
While It's hard
to draw a clear line on which one is better, we concluded that the 
proposed approach
provides the best balance among usability, readability and 
extensibility. For example,
the "get" approach allows us to hide the "singleton" choice to the 
implementation. It
provides a consistent interface "fixed" basic/url/mime type en/decoder 
and the configurable

floating length/linefeed encoder.

-Sherman

On 10/29/12 11:15 AM, Ulf Zibis wrote:

Hi Sherman,

can you give me a short answer please?

-Ulf

Am 23.10.2012 16:57, schrieb Ulf Zibis:

Am 23.10.2012 15:04, schrieb Alan Bateman:
I'm not sure that getUrlEncoder is the most suitable name to get a 
base64url encoder. The reason is that the method name makes it sound 
like it returns a URLEncoder or or at least an encoder for HTML 
forms. While more verbose, getBase64UrlEncoder is clear that it 
returns a base64url encoder.


I'm wondering, why there are those get... methods at all.

Alternatively you could make the appropriate constructors and 
predifined static variants public. So one only should use:

Base64.Encoder encoder = new Base64.Encoder(...);
Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE;

No need for those looong method names.

-Ulf

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-29 Thread Ulf Zibis


Hi Sherman,

can you give me a short answer please?

-Ulf

Am 23.10.2012 16:57, schrieb Ulf Zibis:

Am 23.10.2012 15:04, schrieb Alan Bateman:
I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason 
is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for 
HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder.


I'm wondering, why there are those get... methods at all.

Alternatively you could make the appropriate constructors and predifined static variants public. 
So one only should use:

Base64.Encoder encoder = new Base64.Encoder(...);
Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE;

No need for those looong method names.

-Ulf

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-23 Thread Xueming Shen




On 10/23/2012 8:50 PM, Mike Duigou wrote:

I'm eager to use this!

Some comments:

-code  is used in some places rather than {@code }

fixed


- The initial  would perhaps be better as a. I don't know if definition 
lists are allowed in javadoc.

- public OutputStream wrap(OutputStream os) closing the underlying output 
stream precludes many potential uses. FilterOutputStreams do not consistently 
close the underlying stream.
It has been an issue for a while. Obviously there are users that prefer 
the underlying stream not
get closed when the outer one is closed.  I think we have couple rfes 
that ask for a "close this
stream only but not the underlying one" close. But I don't think we can 
do this in the default
close(), otherwise, I'm sure we will gen another side of complain, the 
close() does not close the

underlying stream:-)


- spelling error "methold"

fixed


- InputStream wrap() it might be nice if the return stream EOF when the padding 
character(s) are consumed but not close() or otherwise alter the source 
InputStream

I need give it a little more time tomorrow.


- private static byte[] getBytes(ByteBuffer bb) should document, even though 
it's a private method, that the bytes returned should be not considered mutable 
since they might be shared.
I have been thinking of rewriting the de/encode(ByteBuffer) to get rid 
of getBytes(...). Probably
need to move some piece around. If I end up keeping it I will put some 
comment int.


- the "if (ret != dst.length) // TBD: not necessary" is worrisome as it will 
generate garbage.
while I'm pretty sure I don't need this for encode (it is definitely 
needed for

decode()), I just keep it for safe:-) I removed the "TBD" comment though.


- I'm still wishing String encode(byte[] buf) didn't use an intermediate byte 
array. :-)

- "stream after use, in which" ->  "stream after use, during which"

fixed.

Thanks for taking a look.

-Sherman


On Oct 23 2012, at 19:56 , Xueming Shen wrote:


Hi Alan,

Thanks for the review. I hope I addressed most of the comments in the
updated webrev at

http://cr.openjdk.java.net/~sherman/4235519/webrev

mainly

(1) Pulled the base64 "terms" up to the class doc and then be referenced from 
various methods
(2) Gave up the C style de/encode(byte[], null), just leave the invoker to have 
enough space. I
 remember Paul once suggested to have a convenient method to return the 
estimated length
 of the byte[] needed to en/decode a specified input byte array/buffer. I 
think we can do it
 later when it is really desired. I agreed your argument that if people 
need to get the size and
 prepare a "new" array, they bet off just call de/encode(byte[])
(3) Gave up the "liberal" decoding design, tight the spec/impl to treat the input as 
"illegal base64"
 if the padding character appears in the middle of input.
(4) Some clarification of spec for thread safe, null exception and the 
pos/limit of input/output byte
 buffer as suggested.
(5) Fixed the typos

I still don't have a better name for method "getUrlEn/Decoder()", 
Base64.getBase64UrlEn/Decoder()
does not feel good for me.

-Sherman


On 10/23/12 6:04 AM, Alan Bateman wrote:

On 18/10/2012 03:10, Xueming Shen wrote:

:

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

I took another pass over this, focusing on the API as that is what we have to 
get right. Performance is important too but I think the priority has to be the 
API first.

Overall I think it is quite nice and easy to use.

I wonder if the Base64 class description should try to establish the terms "base64" and 
"base64url" so that they can be referenced from the various methods? That would avoid 
needing to duplicate references to the RFC 4668 and RFC 2045 in so many places.

I think it would also be useful if the specification indicated whether Encoders 
and Decoders are safe for use by concurrent threads. Also make it clear that 
NPE is thrown if any of the parameters are passed as null (unless otherwise 
specified of course).

I'm not sure that getUrlEncoder is the most suitable name to get a base64url 
encoder. The reason is that the method name makes it sound like it returns a 
URLEncoder or or at least an encoder for HTML forms. While more verbose, 
getBase64UrlEncoder is clear that it returns a base64url encoder.

Do you think getEncoder(int,byte[]) will be used much? If not then perhaps it 
should be left out for the first version (I guess part of this is getting used 
to providing the line separate as a byte array).

For the javadoc then Encoder and Decoder will need @since 1.8. They should 
probably cross reference each other too.

encode(byte[]) should be clearer that it encodes all bytes in the given array. 
Also I think it needs to be clear that the returned byte array is appropriately 
sized -- as currently worded it doesn't make it clear that they can't be extra 
elements at the end (odd as it might be).

Typo at line 215, "byter array" ->  "byte array"

Typo at l

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-23 Thread Mike Duigou

I'm eager to use this! 

Some comments:

- code is used in some places rather than {@code }

- The initial  would perhaps be better as a . I don't know if 
definition lists are allowed in javadoc.

- public OutputStream wrap(OutputStream os) closing the underlying output 
stream precludes many potential uses. FilterOutputStreams do not consistently 
close the underlying stream.

- spelling error "methold"

- InputStream wrap() it might be nice if the return stream EOF when the padding 
character(s) are consumed but not close() or otherwise alter the source 
InputStream

- private static byte[] getBytes(ByteBuffer bb) should document, even though 
it's a private method, that the bytes returned should be not considered mutable 
since they might be shared.

- the "if (ret != dst.length) // TBD: not necessary" is worrisome as it will 
generate garbage.

- I'm still wishing String encode(byte[] buf) didn't use an intermediate byte 
array. :-)

- "stream after use, in which" -> "stream after use, during which"

On Oct 23 2012, at 19:56 , Xueming Shen wrote:

> Hi Alan,
> 
> Thanks for the review. I hope I addressed most of the comments in the
> updated webrev at
> 
> http://cr.openjdk.java.net/~sherman/4235519/webrev
> 
> mainly
> 
> (1) Pulled the base64 "terms" up to the class doc and then be referenced from 
> various methods
> (2) Gave up the C style de/encode(byte[], null), just leave the invoker to 
> have enough space. I
> remember Paul once suggested to have a convenient method to return the 
> estimated length
> of the byte[] needed to en/decode a specified input byte array/buffer. I 
> think we can do it
> later when it is really desired. I agreed your argument that if people 
> need to get the size and
> prepare a "new" array, they bet off just call de/encode(byte[])
> (3) Gave up the "liberal" decoding design, tight the spec/impl to treat the 
> input as "illegal base64"
> if the padding character appears in the middle of input.
> (4) Some clarification of spec for thread safe, null exception and the 
> pos/limit of input/output byte
> buffer as suggested.
> (5) Fixed the typos
> 
> I still don't have a better name for method "getUrlEn/Decoder()", 
> Base64.getBase64UrlEn/Decoder()
> does not feel good for me.
> 
> -Sherman
> 
> 
> On 10/23/12 6:04 AM, Alan Bateman wrote:
>> On 18/10/2012 03:10, Xueming Shen wrote:
>>> :
>>> 
>>> webrev:
>>> http://cr.openjdk.java.net/~sherman/4235519/webrev
>> I took another pass over this, focusing on the API as that is what we have 
>> to get right. Performance is important too but I think the priority has to 
>> be the API first.
>> 
>> Overall I think it is quite nice and easy to use.
>> 
>> I wonder if the Base64 class description should try to establish the terms 
>> "base64" and "base64url" so that they can be referenced from the various 
>> methods? That would avoid needing to duplicate references to the RFC 4668 
>> and RFC 2045 in so many places.
>> 
>> I think it would also be useful if the specification indicated whether 
>> Encoders and Decoders are safe for use by concurrent threads. Also make it 
>> clear that NPE is thrown if any of the parameters are passed as null (unless 
>> otherwise specified of course).
>> 
>> I'm not sure that getUrlEncoder is the most suitable name to get a base64url 
>> encoder. The reason is that the method name makes it sound like it returns a 
>> URLEncoder or or at least an encoder for HTML forms. While more verbose, 
>> getBase64UrlEncoder is clear that it returns a base64url encoder.
>> 
>> Do you think getEncoder(int,byte[]) will be used much? If not then perhaps 
>> it should be left out for the first version (I guess part of this is getting 
>> used to providing the line separate as a byte array).
>> 
>> For the javadoc then Encoder and Decoder will need @since 1.8. They should 
>> probably cross reference each other too.
>> 
>> encode(byte[]) should be clearer that it encodes all bytes in the given 
>> array. Also I think it needs to be clear that the returned byte array is 
>> appropriately sized -- as currently worded it doesn't make it clear that 
>> they can't be extra elements at the end (odd as it might be).
>> 
>> Typo at line 215, "byter array" -> "byte array"
>> 
>> Typo at line 246, "methold" -> "method".
>> 
>> Typo at line 247, "arry" -> "array"
>> 
>> Type at line 254, "encocde" -> "encode"
>> 
>> Typo at line 277, "buffter" -> "buffer". Another one at line 702.
>> 
>> Minor comment, but I assume you should move the 
>> @SuppressWarnings("deprecation") on encodeToString to after the method 
>> comment rather than before. I see the same thing in decode(String).
>> 
>> I think encode(ByteBuffer) needs to be clear as to the 
>> position/limit/capacity of the returned buffer.
>> 
>> I'm not sure so about encode(ba, null) returning the required length, it 
>> feels odd and a bit like some of the win32 APIs. If the intention is that 
>> the caller allocates the byte[] and then calls e

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-23 Thread Xueming Shen


Hi Alan,

Thanks for the review. I hope I addressed most of the comments in the
updated webrev at

http://cr.openjdk.java.net/~sherman/4235519/webrev

mainly

(1) Pulled the base64 "terms" up to the class doc and then be referenced 
from various methods
(2) Gave up the C style de/encode(byte[], null), just leave the invoker 
to have enough space. I
 remember Paul once suggested to have a convenient method to return 
the estimated length
 of the byte[] needed to en/decode a specified input byte 
array/buffer. I think we can do it
 later when it is really desired. I agreed your argument that if 
people need to get the size and

 prepare a "new" array, they bet off just call de/encode(byte[])
(3) Gave up the "liberal" decoding design, tight the spec/impl to treat 
the input as "illegal base64"

 if the padding character appears in the middle of input.
(4) Some clarification of spec for thread safe, null exception and the 
pos/limit of input/output byte

 buffer as suggested.
(5) Fixed the typos

I still don't have a better name for method "getUrlEn/Decoder()", 
Base64.getBase64UrlEn/Decoder()

does not feel good for me.

-Sherman


On 10/23/12 6:04 AM, Alan Bateman wrote:

On 18/10/2012 03:10, Xueming Shen wrote:

:

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev
I took another pass over this, focusing on the API as that is what we 
have to get right. Performance is important too but I think the 
priority has to be the API first.


Overall I think it is quite nice and easy to use.

I wonder if the Base64 class description should try to establish the 
terms "base64" and "base64url" so that they can be referenced from the 
various methods? That would avoid needing to duplicate references to 
the RFC 4668 and RFC 2045 in so many places.


I think it would also be useful if the specification indicated whether 
Encoders and Decoders are safe for use by concurrent threads. Also 
make it clear that NPE is thrown if any of the parameters are passed 
as null (unless otherwise specified of course).


I'm not sure that getUrlEncoder is the most suitable name to get a 
base64url encoder. The reason is that the method name makes it sound 
like it returns a URLEncoder or or at least an encoder for HTML forms. 
While more verbose, getBase64UrlEncoder is clear that it returns a 
base64url encoder.


Do you think getEncoder(int,byte[]) will be used much? If not then 
perhaps it should be left out for the first version (I guess part of 
this is getting used to providing the line separate as a byte array).


For the javadoc then Encoder and Decoder will need @since 1.8. They 
should probably cross reference each other too.


encode(byte[]) should be clearer that it encodes all bytes in the 
given array. Also I think it needs to be clear that the returned byte 
array is appropriately sized -- as currently worded it doesn't make it 
clear that they can't be extra elements at the end (odd as it might be).


Typo at line 215, "byter array" -> "byte array"

Typo at line 246, "methold" -> "method".

Typo at line 247, "arry" -> "array"

Type at line 254, "encocde" -> "encode"

Typo at line 277, "buffter" -> "buffer". Another one at line 702.

Minor comment, but I assume you should move the 
@SuppressWarnings("deprecation") on encodeToString to after the method 
comment rather than before. I see the same thing in decode(String).


I think encode(ByteBuffer) needs to be clear as to the 
position/limit/capacity of the returned buffer.


I'm not sure so about encode(ba, null) returning the required length, 
it feels odd and a bit like some of the win32 APIs. If the intention 
is that the caller allocates the byte[] and then calls encode again 
then it would be easier to just encode(ba).


For the decoder methods then IllegalArgumentException may be thrown 
mid-flight, meaning that some bytes may have been written into output 
buffer or array even though an exception is thrown. I think this needs 
to be made clear in the spec. It also makes me wonder if this is the 
right exception, it feels like a more specialized malformed input 
exception.


Another point about the decode methods is that they stop at the 
padding bytes and so this allows for bytes after the padding. I'm 
curious about this choice and whether you considered this as an error? 
I see how this influences decode(ByteBuffer,ByteBuffer) to return a 
boolean and I just wonder if there are other choices to consider.


That's mostly it for now. I didn't check in the IDE but there are lot 
of imports that are don't appear to be  used, perhaps left over from 
previous iterations?


-Alan

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-23 Thread Ulf Zibis


Am 23.10.2012 15:04, schrieb Alan Bateman:
I'm not sure that getUrlEncoder is the most suitable name to get a base64url encoder. The reason 
is that the method name makes it sound like it returns a URLEncoder or or at least an encoder for 
HTML forms. While more verbose, getBase64UrlEncoder is clear that it returns a base64url encoder.


I'm wondering, why there are those get... methods at all.

Alternatively you could make the appropriate constructors and predifined static variants public. So 
one only should use:

Base64.Encoder encoder = new Base64.Encoder(...);
Base64.Encoder urlEncoder = Base64.Encoder.RFC4648_URLSAFE;

No need for those looong method names.

-Ulf

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-23 Thread Alan Bateman


On 18/10/2012 03:10, Xueming Shen wrote:

:

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev
I took another pass over this, focusing on the API as that is what we 
have to get right. Performance is important too but I think the priority 
has to be the API first.


Overall I think it is quite nice and easy to use.

I wonder if the Base64 class description should try to establish the 
terms "base64" and "base64url" so that they can be referenced from the 
various methods? That would avoid needing to duplicate references to the 
RFC 4668 and RFC 2045 in so many places.


I think it would also be useful if the specification indicated whether 
Encoders and Decoders are safe for use by concurrent threads. Also make 
it clear that NPE is thrown if any of the parameters are passed as null 
(unless otherwise specified of course).


I'm not sure that getUrlEncoder is the most suitable name to get a 
base64url encoder. The reason is that the method name makes it sound 
like it returns a URLEncoder or or at least an encoder for HTML forms. 
While more verbose, getBase64UrlEncoder is clear that it returns a 
base64url encoder.


Do you think getEncoder(int,byte[]) will be used much? If not then 
perhaps it should be left out for the first version (I guess part of 
this is getting used to providing the line separate as a byte array).


For the javadoc then Encoder and Decoder will need @since 1.8. They 
should probably cross reference each other too.


encode(byte[]) should be clearer that it encodes all bytes in the given 
array. Also I think it needs to be clear that the returned byte array is 
appropriately sized -- as currently worded it doesn't make it clear that 
they can't be extra elements at the end (odd as it might be).


Typo at line 215, "byter array" -> "byte array"

Typo at line 246, "methold" -> "method".

Typo at line 247, "arry" -> "array"

Type at line 254, "encocde" -> "encode"

Typo at line 277, "buffter" -> "buffer". Another one at line 702.

Minor comment, but I assume you should move the 
@SuppressWarnings("deprecation") on encodeToString to after the method 
comment rather than before. I see the same thing in decode(String).


I think encode(ByteBuffer) needs to be clear as to the 
position/limit/capacity of the returned buffer.


I'm not sure so about encode(ba, null) returning the required length, it 
feels odd and a bit like some of the win32 APIs. If the intention is 
that the caller allocates the byte[] and then calls encode again then it 
would be easier to just encode(ba).


For the decoder methods then IllegalArgumentException may be thrown 
mid-flight, meaning that some bytes may have been written into output 
buffer or array even though an exception is thrown. I think this needs 
to be made clear in the spec. It also makes me wonder if this is the 
right exception, it feels like a more specialized malformed input exception.


Another point about the decode methods is that they stop at the padding 
bytes and so this allows for bytes after the padding. I'm curious about 
this choice and whether you considered this as an error? I see how this 
influences decode(ByteBuffer,ByteBuffer) to return a boolean and I just 
wonder if there are other choices to consider.


That's mostly it for now. I didn't check in the IDE but there are lot of 
imports that are don't appear to be  used, perhaps left over from 
previous iterations?


-Alan

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-19 Thread Xueming Shen

Understood. I do have the code:-) but I'm hesitated to go SharedSecrets 
simply for
performance gain of a utility method. This definitely can be addressed 
if it turns out

to be a real issue standing in critical path.

-Sherman


sherman@sherman-linux:~/Workspace/jdk8/test/java/util/Base64$ java  
PermBase64 20 1000

 j.u.Base64.encode(ba)  : 528745
 j.u.Base64.encodeString(ba): 739703
 j.u.Base64.encode(bb)  : 486216
 j.u.Base64.encode(bb, bb)  : 538544
 j.u.Base64.encode(bb, bb)-D: 850947
  migBase64.encode(ba)  : 659474
 vs

sherman@sherman-linux:~/Workspace/jdk8/test/java/util/Base64$ java  
PermBase64 20 1000

 j.u.Base64.encode(ba)  : 519391
 j.u.Base64.encodeString(ba): 964854
 j.u.Base64.encode(bb)  : 490138
 j.u.Base64.encode(bb, bb)  : 539027
 j.u.Base64.encode(bb, bb)-D: 786438
  migBase64.encode(ba)  : 660572

-Sherman

On 10/19/2012 04:59 PM, Mike Duigou wrote:

For me the greater concern, which is hard to measure, is the GC pressure added 
by the discarded byte array.

Mike

On Oct 19 2012, at 17:03 , Xueming Shen wrote:


I see a 20% performance gain on server vm if switch to pure char[] based 
encoding
and then use the sharedSecrets to avoid the copy. The dis-advantage is (1) have 
to
use the sharedSecrets and (2) can't share the same between the encode(byte[])
and encode(String).

Anyway it appears to be an alternative for performance improvement.

-Sherman


On 10/18/2012 01:07 PM, Mike Duigou wrote:

I wonder if there would be advantage in using a SharedSecrets mechanism to 
allow construction of a String directly from a char array. The intermediate 
byte array seems wasteful especially for what is likely to be a heavily used 
path.

Mike

On Oct 17 2012, at 19:10 , Xueming Shen wrote:


Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the "end of decoding"
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-19 Thread Mike Duigou

For me the greater concern, which is hard to measure, is the GC pressure added 
by the discarded byte array.

Mike

On Oct 19 2012, at 17:03 , Xueming Shen wrote:

> 
> I see a 20% performance gain on server vm if switch to pure char[] based 
> encoding
> and then use the sharedSecrets to avoid the copy. The dis-advantage is (1) 
> have to
> use the sharedSecrets and (2) can't share the same between the encode(byte[])
> and encode(String).
> 
> Anyway it appears to be an alternative for performance improvement.
> 
> -Sherman
> 
> 
> On 10/18/2012 01:07 PM, Mike Duigou wrote:
>> I wonder if there would be advantage in using a SharedSecrets mechanism to 
>> allow construction of a String directly from a char array. The intermediate 
>> byte array seems wasteful especially for what is likely to be a heavily used 
>> path.
>> 
>> Mike
>> 
>> On Oct 17 2012, at 19:10 , Xueming Shen wrote:
>> 
>>> Hi
>>> 
>>> Webrev has been updated with following changes
>>> 
>>> (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
>>> (2) some minor spec clarification regarding the "end of decoding"
>>> (3) performance tuning.
>>> 
>>> webrev:
>>> http://cr.openjdk.java.net/~sherman/4235519/webrev
>>> 
>>> some performance scores:
>>> http://cr.openjdk.java.net/~sherman/4235519/score3
>>> 
>>> -Sherman
>

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-19 Thread Xueming Shen



I see a 20% performance gain on server vm if switch to pure char[] based 
encoding
and then use the sharedSecrets to avoid the copy. The dis-advantage is 
(1) have to
use the sharedSecrets and (2) can't share the same between the 
encode(byte[])

and encode(String).

Anyway it appears to be an alternative for performance improvement.

-Sherman


On 10/18/2012 01:07 PM, Mike Duigou wrote:

I wonder if there would be advantage in using a SharedSecrets mechanism to 
allow construction of a String directly from a char array. The intermediate 
byte array seems wasteful especially for what is likely to be a heavily used 
path.

Mike

On Oct 17 2012, at 19:10 , Xueming Shen wrote:


Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the "end of decoding"
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Xueming Shen


On 10/18/2012 07:43 AM, Alan Bateman wrote:

On 18/10/2012 03:10, Xueming Shen wrote:

Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the "end of decoding"
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman
Have you done any performance tuning on the new methods that 
encode/decode into an existing ByteBuffer? Just wondering as the 
direct buffer case seems more expensive that I would expect. It would 
be interesting to see using the put(int,byte) and fixing up the 
position at the end would help. It might also be something to look at 
with the compiler folks.


get(index) and put(index, byte) do help, see 20%+ improvement.

sherman@sherman-linux:~/Workspace/jdk8/test/java/util/Base64$ java 
PermBase64 20 1000

 j.u.Base64.encode(ba)  : 702000
 j.u.Base64.encode(bb)  : 690024
 j.u.Base64.encode(bb, bb)  : 910582
 j.u.Base64.encode(bb, bb)-D: 1198606
  migBase64.encode(ba)  : 661271
--
 j.u.Base64.decode(ba)  : 797152
 j.u.Base64.decode(bb)  : 802968
 j.u.Base64.decode(bb, bb)  : 999577
 j.u.Base64.decode(bb, bb)-D: 1472003
  migBase64.decode(ba)  : 1458450

webrev has been updated accordingly. Now implementations of
de/encodeArray and de/encodeBuffer are identical except the
access operation.

-Sherman

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Mike Duigou

I wonder if there would be advantage in using a SharedSecrets mechanism to 
allow construction of a String directly from a char array. The intermediate 
byte array seems wasteful especially for what is likely to be a heavily used 
path.

Mike

On Oct 17 2012, at 19:10 , Xueming Shen wrote:

> Hi
> 
> Webrev has been updated with following changes
> 
> (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
> (2) some minor spec clarification regarding the "end of decoding"
> (3) performance tuning.
> 
> webrev:
> http://cr.openjdk.java.net/~sherman/4235519/webrev
> 
> some performance scores:
> http://cr.openjdk.java.net/~sherman/4235519/score3
> 
> -Sherman

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Xueming Shen


Michael,


The current approach assumes that the base64 en/decoding of a file, stream
will be handled by the en/decoder.wrap(...). The en/decode(bb, bb) is 
basically

for one time invocation input and output from/to existing buffers, with the
hope of avoiding the coding-life-circle management, so it assumes all input
will be passed in in the src buffer and only needs limit support for 
cases that
the existing output buffer is not sufficient. We will probably have to 
go with
the "endOfInput" flag approach, if the current version is still not 
enough for

most of "common" use cases.

-Sherman


On 10/18/2012 11:01 AM, Michael Schierl wrote:

Am 18.10.2012 04:10, schrieb Xueming Shen:

Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods

I think line 443 needs an additional check for atom == 0. Same for the
array case. At least if you do not want an endOfInput flag in the
parameters. If you have an endOfInput flag, you can use that in the
condition instead of atom == 0.

Consider this example where trying to encode 25 bytes using an input
ByteBuffer of 16 bytes and an output ByteBuffer of 32 bytes (Of course,
you'd use larger buffers in practice, but the same can happen with
larger buffers too, as long as your buffer size is not divisible by
three, or you do not completely fill the buffer with each read. Both can
happen if the media you are reading from uses a different sector/block
size - like when attaching a base64 attachment of a large file from disk
to an email).

First you fill 16 bytes of input into input buffer, then you call encode
the first time. it will encode the first 15 bytes to 20 bytes of output,
then encode the 16th byte to XY==. You flush the output and fill the
next 9 bytes into the input buffer. The next call to encode will encode
the 9 bytes to 12 bytes. So, in total you have 36 bytes, but the padding
is in the middle.

With the check, the first call will encode the first 15 bytes to 20
bytes of output and leave the 16th byte in the buffer. You flush the
output and load 9 bytes to the 1 remaining byte in the input buffer. The
second call will encode the first 9 bytes to 12 bytes and leave one byte
left over. You flush the buffer and call a third time, which will encode
the last byte to XY==. Still 36 bytes, this time with padding at the end.


Regards,


Michael

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Michael Schierl

Am 18.10.2012 04:10, schrieb Xueming Shen:
> Hi
> 
> Webrev has been updated with following changes
> 
> (1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods

I think line 443 needs an additional check for atom == 0. Same for the
array case. At least if you do not want an endOfInput flag in the
parameters. If you have an endOfInput flag, you can use that in the
condition instead of atom == 0.

Consider this example where trying to encode 25 bytes using an input
ByteBuffer of 16 bytes and an output ByteBuffer of 32 bytes (Of course,
you'd use larger buffers in practice, but the same can happen with
larger buffers too, as long as your buffer size is not divisible by
three, or you do not completely fill the buffer with each read. Both can
happen if the media you are reading from uses a different sector/block
size - like when attaching a base64 attachment of a large file from disk
to an email).

First you fill 16 bytes of input into input buffer, then you call encode
the first time. it will encode the first 15 bytes to 20 bytes of output,
then encode the 16th byte to XY==. You flush the output and fill the
next 9 bytes into the input buffer. The next call to encode will encode
the 9 bytes to 12 bytes. So, in total you have 36 bytes, but the padding
is in the middle.

With the check, the first call will encode the first 15 bytes to 20
bytes of output and leave the 16th byte in the buffer. You flush the
output and load 9 bytes to the 1 remaining byte in the input buffer. The
second call will encode the first 9 bytes to 12 bytes and leave one byte
left over. You flush the buffer and call a third time, which will encode
the last byte to XY==. Still 36 bytes, this time with padding at the end.


Regards,


Michael

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Ulf Zibis


Am 18.10.2012 16:04, schrieb Xueming Shen:


On 10/18/2012 6:19 AM, Ulf Zibis wrote:


Oops, you are working at 6 in the morning?


Hi Sherman,

you could create the code table by help of a String constant, which would speed up the loading 
and save a little footprint on the class file. (you have forgotten those tricks from sun.nio.cs 
coders? ;-) )


The implementation probably needs more tuning.


Hopefully, some day we have arrays of primitive types in the class file's 
constant pool.


But my main question:
Why don't you enqueue those coders in the well known sun.nio.cs.ext provider?


Base64 encoding is not a charset. Mainly it is supposed to be byte[]<->byte[] 
coding.
Even for byte[]<=>char[]/String, Base64 "encode" byte[] to char[]/String and 
"decode"
from char[]/String to byte[]. The "wrong" direction compared to charset 
en/decoder.


1:0 for you.
I only had the ASCII/Unicode characters -> Base64 "en"coding in mind, which seems in fact a 
Java-String to byte[] to char[]/String 2-step-encoding.





At least, I think, class Base64 should be hosted in java.nio.charset rather 
than common java.util.
As I believe, that the usage of this encoding is far from common


you might be surprised how "common" it might be:-)


or surprised, why it wasn't part of the JDK before.

-Ulf

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Xueming Shen


On 10/18/12 7:43 AM, Alan Bateman wrote:

On 18/10/2012 03:10, Xueming Shen wrote:

Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the "end of decoding"
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman
Have you done any performance tuning on the new methods that 
encode/decode into an existing ByteBuffer? Just wondering as the 
direct buffer case seems more expensive that I would expect. It would 
be interesting to see using the put(int,byte) and fixing up the 
position at the end would help. It might also be something to look at 
with the compiler folks.


no "real" tuning yet on the direct buffer case. Just tried to get the 
API out for review/comment first.


 



In any case, I think encode(ByteBuffer,ByteBuffer,int) looks 
reasonable and just requires using the idiom to get it right. I'm less 
sure about decode returning a boolean. Shouldn't it be an error if 
there are bytes remaining after the = padding? 


It depends on expectation. Padding partially serves as a "end of base64 
byte stream" indicator.


I'm just looking for ways to allow the method return an int to 
indicate the number of bytes written to the output buffer (and realize 
that we aren't going to get symmetry with the encode method).


Yes, even a "int" return value will not get us the symmetry with 
encode(bb, bb). A boolean return provides

a more liberal behavior.

-Sherman

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Alan Bateman


On 18/10/2012 03:10, Xueming Shen wrote:

Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the "end of decoding"
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman
Have you done any performance tuning on the new methods that 
encode/decode into an existing ByteBuffer? Just wondering as the direct 
buffer case seems more expensive that I would expect. It would be 
interesting to see using the put(int,byte) and fixing up the position at 
the end would help. It might also be something to look at with the 
compiler folks.


In any case, I think encode(ByteBuffer,ByteBuffer,int) looks reasonable 
and just requires using the idiom to get it right. I'm less sure about 
decode returning a boolean. Shouldn't it be an error if there are bytes 
remaining after the = padding? I'm just looking for ways to allow the 
method return an int to indicate the number of bytes written to the 
output buffer (and realize that we aren't going to get symmetry with the 
encode method).


-Alan.

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Alan Bateman


On 18/10/2012 14:19, Ulf Zibis wrote:

Hi Sherman,

you could create the code table by help of a String constant, which 
would speed up the loading and save a little footprint on the class 
file. (you have forgotten those tricks from sun.nio.cs coders? ;-) )


But my main question:
Why don't you enqueue those coders in the well known sun.nio.cs.ext 
provider?
At least, I think, class Base64 should be hosted in java.nio.charset 
rather than common java.util.
As I believe, that the usage of this encoding is far from common, 
adding it to the lazy loaded charsets.jar IMO should be more reasonable.
The encoders and decoders in java.nio.charset are for translating 
between bytes and Unicode characters and that is not the case here. I 
think java.util is probably the right place for this.


-Alan

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Xueming Shen



On 10/18/2012 6:19 AM, Ulf Zibis wrote:

Hi Sherman,

you could create the code table by help of a String constant, which 
would speed up the loading and save a little footprint on the class 
file. (you have forgotten those tricks from sun.nio.cs coders? ;-) )


The implementation probably needs more tuning.



But my main question:
Why don't you enqueue those coders in the well known sun.nio.cs.ext 
provider?


Base64 encoding is not a charset. Mainly it is supposed to be 
byte[]<->byte[] coding.
Even for byte[]<=>char[]/String, Base64 "encode" byte[] to char[]/String 
and "decode"
from char[]/String to byte[]. The "wrong" direction compared to charset 
en/decoder.


At least, I think, class Base64 should be hosted in java.nio.charset 
rather than common java.util.

As I believe, that the usage of this encoding is far from common


you might be surprised how "common" it might be:-)

-Sherman

, adding it to the lazy loaded charsets.jar IMO should be more 
reasonable.


-Ulf


Am 18.10.2012 04:10, schrieb Xueming Shen:

Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the "end of decoding"
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-18 Thread Ulf Zibis


Hi Sherman,

you could create the code table by help of a String constant, which would speed up the loading and 
save a little footprint on the class file. (you have forgotten those tricks from sun.nio.cs coders? 
;-) )


But my main question:
Why don't you enqueue those coders in the well known sun.nio.cs.ext provider?
At least, I think, class Base64 should be hosted in java.nio.charset rather 
than common java.util.
As I believe, that the usage of this encoding is far from common, adding it to the lazy loaded 
charsets.jar IMO should be more reasonable.


-Ulf


Am 18.10.2012 04:10, schrieb Xueming Shen:

Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the "end of decoding"
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-17 Thread Xueming Shen


Hi

Webrev has been updated with following changes

(1) added a pair of en/decode(ByteBuffer src, ByteBuffer dst) methods
(2) some minor spec clarification regarding the "end of decoding"
(3) performance tuning.

webrev:
http://cr.openjdk.java.net/~sherman/4235519/webrev

some performance scores:
http://cr.openjdk.java.net/~sherman/4235519/score3

-Sherman

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-13 Thread Xueming Shen


On 10/13/12 10:21 AM, Xueming Shen wrote:


With moving some code piece around (moved those lookup table into 
En/Decoder)
and some array access code tuning (with the hope of eliminating the 
array boundary
access) in encode0(), it appears the hotspot now can better optimize 
the decoder
loop for the -server mode for encoding. decode0 also get 10% +boost 
with simply

moving in the lookup table.

The latest version is at (decode0() vs decode1())

--> encode0() vs encode1())

encode0 is the latest/faster one. encode1 is the one in webrev.

-Sherman



http://cr.openjdk.java.net/~sherman/4235519/Base64.java

The latest scores is at

http://cr.openjdk.java.net/~sherman/4235519/score2

(compared to http://cr.openjdk.java.net/~sherman/4235519/score1)

-Sherman


On 10/11/12 11:38 PM, Xueming Shen wrote:



On 10/11/2012 02:43 AM, Stephen Colebourne wrote:

There are lots of other public base 64 implementations to 
test/check against.
http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html 

http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html 


http://migbase64.sourceforge.net/  (claims to be fast)




I did a quick performance check against migbase64 with the basic 
base64 de/encoding

using this non-scientific benchmark.
http://cr.openjdk.java.net/~sherman/4235519/PermBase64.java

Here is the scores
http://cr.openjdk.java.net/~sherman/4235519/scores

It's a tie, j.u.Base64 wins the decoding, but needs some work on 
encoding side.


java -server PermBase64 20 50
 j.u.Base64.encode() : 1390212
  migBase64.encode() : 720517
 j.u.Base64.decode() : 1200642
  migBase64.decode() : 2070015

java -server PermBase64 20 100
 j.u.Base64.encode() : 1239407
  migBase64.encode() : 740404
 j.u.Base64.decode() : 1257092
  migBase64.decode() : 2012910

java -server PermBase64 20 1000
 j.u.Base64.encode() : 1062212
  migBase64.encode() : 657342
 j.u.Base64.decode() : 1133740
  migBase64.decode() : 1930612

java -client PermBase64 20 50
 j.u.Base64.encode() : 961331
  migBase64.encode() : 875635
 j.u.Base64.decode() : 1528790
  migBase64.decode() : 2188473

java -client PermBase64 20 100
 j.u.Base64.encode() : 966453
  migBase64.encode() : 858082
 j.u.Base64.decode() : 1459159
  migBase64.decode() : 2115045

java -client PermBase64 20 1000
 j.u.Base64.encode() : 954401
  migBase64.encode() : 854903
 j.u.Base64.decode() : 1444603
  migBase64.decode() : 2038865

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-13 Thread Xueming Shen



With moving some code piece around (moved those lookup table into 
En/Decoder)
and some array access code tuning (with the hope of eliminating the 
array boundary
access) in encode0(), it appears the hotspot now can better optimize the 
decoder
loop for the -server mode for encoding. decode0 also get 10% +boost with 
simply

moving in the lookup table.

The latest version is at (decode0() vs decode1())

http://cr.openjdk.java.net/~sherman/4235519/Base64.java

The latest scores is at

http://cr.openjdk.java.net/~sherman/4235519/score2

(compared to http://cr.openjdk.java.net/~sherman/4235519/score1)

-Sherman


On 10/11/12 11:38 PM, Xueming Shen wrote:



On 10/11/2012 02:43 AM, Stephen Colebourne wrote:

There are lots of other public base 64 implementations to test/check 
against.
http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html 

http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html 


http://migbase64.sourceforge.net/  (claims to be fast)




I did a quick performance check against migbase64 with the basic 
base64 de/encoding

using this non-scientific benchmark.
http://cr.openjdk.java.net/~sherman/4235519/PermBase64.java

Here is the scores
http://cr.openjdk.java.net/~sherman/4235519/scores

It's a tie, j.u.Base64 wins the decoding, but needs some work on 
encoding side.


java -server PermBase64 20 50
 j.u.Base64.encode() : 1390212
  migBase64.encode() : 720517
 j.u.Base64.decode() : 1200642
  migBase64.decode() : 2070015

java -server PermBase64 20 100
 j.u.Base64.encode() : 1239407
  migBase64.encode() : 740404
 j.u.Base64.decode() : 1257092
  migBase64.decode() : 2012910

java -server PermBase64 20 1000
 j.u.Base64.encode() : 1062212
  migBase64.encode() : 657342
 j.u.Base64.decode() : 1133740
  migBase64.decode() : 1930612

java -client PermBase64 20 50
 j.u.Base64.encode() : 961331
  migBase64.encode() : 875635
 j.u.Base64.decode() : 1528790
  migBase64.decode() : 2188473

java -client PermBase64 20 100
 j.u.Base64.encode() : 966453
  migBase64.encode() : 858082
 j.u.Base64.decode() : 1459159
  migBase64.decode() : 2115045

java -client PermBase64 20 1000
 j.u.Base64.encode() : 954401
  migBase64.encode() : 854903
 j.u.Base64.decode() : 1444603
  migBase64.decode() : 2038865

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-13 Thread Peter Levart


On 10/13/2012 11:21 AM, Peter Levart wrote:
One way to do it without state in encoder is 1st as suggested to have 
the 'endOfStream' flag and in addition when new-line breaks are 
requested, to only encode "full lines" at each invocation. So If one 
wants to use this line-breakage feature he/she must make sure that 
there is enough space (line_size+2 at worst) left in destination buffer
...and (forgot to add) enough bytes remaining in source buffer or the 
endOfStream==true...
to do any encoding or the encoder will not do anything (return 0 or 
false).


Regards, Peter

On 10/12/2012 10:58 PM, Xueming Shen wrote:


Yes, I'm trying to figure out where to have this  position info 
stored...


On 10/12/2012 01:48 PM, Michael Schierl wrote:

Am 12.10.2012 22:12, schrieb Xueming Shen:

Hi,

It appears to be possible to do something like

boolean de/encode(ByteBuffer src, ByteBuffer dst);

returns true if all remaining bytes in src are en/decoded, false, 
the dst
is not big enough for all output bytes, the src.position() will be 
advanced
to the position of next un-en/decoded byte, dst.position() will be 
updated

accordingly as well.

to avoid the en/decoder to hold an internal state.

If it was unclear, that was what I tried to suggest. I thought you
refered to the automatic adding of newlines when you said you'd need
internal state (although that state is small enough - how many encoded
bytes remaining before adding the next newline - so that it might be
possible to pass it as a parameter, too). I'm not sure how important
that adding of newlines really is (in all cases where I was guilty of
using sun.misc.Base64Encoder, I always replaced them out after
encoding), so maybe the ByteBuffer API could live without it.

Note that without an explicit end of stream flag, it might not be 
easily
possible to encode the last block - if you have a 16 byte buffer 
with 14

bytes filled, you cannot be sure whether you can encode the last two
bytes to XYZ= because it is end of stream or you have to wait for the
next byte. An option without the flag might be to assume when the input
is less than 3 bytes that it is the end of the stream. But I doubt it
will make the API easier to understand or use. :-)


Regards,


Michael

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-13 Thread Peter Levart

One way to do it without state in encoder is 1st as suggested to have 
the 'endOfStream' flag and in addition when new-line breaks are 
requested, to only encode "full lines" at each invocation. So If one 
wants to use this line-breakage feature he/she must make sure that there 
is enough space (line_size+2 at worst) left in destination buffer to do 
any encoding or the encoder will not do anything (return 0 or false).


Regards, Peter

On 10/12/2012 10:58 PM, Xueming Shen wrote:


Yes, I'm trying to figure out where to have this  position info stored...

On 10/12/2012 01:48 PM, Michael Schierl wrote:

Am 12.10.2012 22:12, schrieb Xueming Shen:

Hi,

It appears to be possible to do something like

boolean de/encode(ByteBuffer src, ByteBuffer dst);

returns true if all remaining bytes in src are en/decoded, false, 
the dst
is not big enough for all output bytes, the src.position() will be 
advanced
to the position of next un-en/decoded byte, dst.position() will be 
updated

accordingly as well.

to avoid the en/decoder to hold an internal state.

If it was unclear, that was what I tried to suggest. I thought you
refered to the automatic adding of newlines when you said you'd need
internal state (although that state is small enough - how many encoded
bytes remaining before adding the next newline - so that it might be
possible to pass it as a parameter, too). I'm not sure how important
that adding of newlines really is (in all cases where I was guilty of
using sun.misc.Base64Encoder, I always replaced them out after
encoding), so maybe the ByteBuffer API could live without it.

Note that without an explicit end of stream flag, it might not be easily
possible to encode the last block - if you have a 16 byte buffer with 14
bytes filled, you cannot be sure whether you can encode the last two
bytes to XYZ= because it is end of stream or you have to wait for the
next byte. An option without the flag might be to assume when the input
is less than 3 bytes that it is the end of the stream. But I doubt it
will make the API easier to understand or use. :-)


Regards,


Michael

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Xueming Shen



Yes, I'm trying to figure out where to have this  position info stored...

On 10/12/2012 01:48 PM, Michael Schierl wrote:

Am 12.10.2012 22:12, schrieb Xueming Shen:

Hi,

It appears to be possible to do something like

boolean de/encode(ByteBuffer src, ByteBuffer dst);

returns true if all remaining bytes in src are en/decoded, false, the dst
is not big enough for all output bytes, the src.position() will be advanced
to the position of next un-en/decoded byte, dst.position() will be updated
accordingly as well.

to avoid the en/decoder to hold an internal state.

If it was unclear, that was what I tried to suggest. I thought you
refered to the automatic adding of newlines when you said you'd need
internal state (although that state is small enough - how many encoded
bytes remaining before adding the next newline - so that it might be
possible to pass it as a parameter, too). I'm not sure how important
that adding of newlines really is (in all cases where I was guilty of
using sun.misc.Base64Encoder, I always replaced them out after
encoding), so maybe the ByteBuffer API could live without it.

Note that without an explicit end of stream flag, it might not be easily
possible to encode the last block - if you have a 16 byte buffer with 14
bytes filled, you cannot be sure whether you can encode the last two
bytes to XYZ= because it is end of stream or you have to wait for the
next byte. An option without the flag might be to assume when the input
is less than 3 bytes that it is the end of the stream. But I doubt it
will make the API easier to understand or use. :-)


Regards,


Michael

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Michael Schierl

Am 12.10.2012 22:12, schrieb Xueming Shen:
> Hi,
> 
> It appears to be possible to do something like
> 
> boolean de/encode(ByteBuffer src, ByteBuffer dst);
> 
> returns true if all remaining bytes in src are en/decoded, false, the dst
> is not big enough for all output bytes, the src.position() will be advanced
> to the position of next un-en/decoded byte, dst.position() will be updated
> accordingly as well.
> 
> to avoid the en/decoder to hold an internal state.

If it was unclear, that was what I tried to suggest. I thought you
refered to the automatic adding of newlines when you said you'd need
internal state (although that state is small enough - how many encoded
bytes remaining before adding the next newline - so that it might be
possible to pass it as a parameter, too). I'm not sure how important
that adding of newlines really is (in all cases where I was guilty of
using sun.misc.Base64Encoder, I always replaced them out after
encoding), so maybe the ByteBuffer API could live without it.

Note that without an explicit end of stream flag, it might not be easily
possible to encode the last block - if you have a 16 byte buffer with 14
bytes filled, you cannot be sure whether you can encode the last two
bytes to XYZ= because it is end of stream or you have to wait for the
next byte. An option without the flag might be to assume when the input
is less than 3 bytes that it is the end of the stream. But I doubt it
will make the API easier to understand or use. :-)


Regards,


Michael

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Alan Bateman


On 12/10/2012 21:12, Xueming Shen wrote:

Hi,

It appears to be possible to do something like

boolean de/encode(ByteBuffer src, ByteBuffer dst);

returns true if all remaining bytes in src are en/decoded, false, the dst
is not big enough for all output bytes, the src.position() will be 
advanced
to the position of next un-en/decoded byte, dst.position() will be 
updated

accordingly as well.

to avoid the en/decoder to hold an internal state.
Right, there shouldn't be any need for internal state and 
src.remaining() will indicate if there are any bytes in src that 
couldn't be encoded into dst because of insufficient space. The return 
value should probably be the number of bytes written to dst.


-Alan

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Xueming Shen


Hi,

It appears to be possible to do something like

boolean de/encode(ByteBuffer src, ByteBuffer dst);

returns true if all remaining bytes in src are en/decoded, false, the dst
is not big enough for all output bytes, the src.position() will be advanced
to the position of next un-en/decoded byte, dst.position() will be updated
accordingly as well.

to avoid the en/decoder to hold an internal state.

-Sherman

On 10/12/2012 12:47 PM, Ariel Weisberg wrote:

Hi,

Thanks for doing this BTW.

I think that including ByteBuffer API even if it isn't as efficient as
raw byte arrays is better then not having it in the API at all. If that
means allocating a byte array for the output and then doing a put on the
ByteBuffer that is fine.

Down the line if someone has a particularly powerful itch to scratch WRT
to performance they can add more code to the library to make it more
efficient at handling them and then everyone will benefit or they can do
their own implementation.

Thanks,
Ariel


On Fri, Oct 12, 2012, at 02:56 PM, Xueming Shen wrote:

Hi,

The exactly reason I was trying to skip en/decode(ByteBuffer in,
ByteByuffer out)
for now. I'm struggling with/can't make up my mind on whether or not the
en/decoder
should  have internal state, like the charset en/decoder. It appears the
API is being
pushed going that direction though:-)

-Sherman

On 10/12/2012 11:39 AM, Michael Schierl wrote:

Hello,

(sorry if the threading is broken, but I was not subscribed to the list
and only found the discussion on Twitter and read it in the mailing list
archive)

Ariel Weisberg wrote on Thu Oct 11 11:30:56 PDT 2012:

I know that ByteBuffers are pain, but I did notice that you can't
specify a source/dest pair when using ByteBuffers and that ByteBuffers
without arrays have to be copied. I don't see a simple safe way to
normalize access to them the way you can if everything is a byte array.

Agreed. One of the advantages of using byte buffers is reducing
allocations, resulting in fewer garbage collections.

In addition, in this implementation the ByteBuffers have to contain the
full data.

What I like about most byte buffers APIs is that I can pass in a
ByteBuffer with incomplete data or maybe an output ByteBuffer that is
too small to hold the complete result, and it will just process as much
as it can, and leave the rest for the next round (which should work well
for Base64, too, as it always processes chunks of 3 or 4 bytes).

So, a useful ByteBuffer API in my opinion needs a method like

public boolean encode(ByteBuffer in, ByteBuffer out,
boolean endOfInput);

public boolean decode(ByteBuffer in, ByteBuffer out,
boolean endOfInput);

(similar to CharsetEncoder#encode) that can process partial input and
will return true if all processable input has been processed (i. e. in
has to be refilled) or false if some input could not have been processed
(i. e. out has to be flushed).

Users have to call it again and again until they call it with
endOfInput=true and get true back (Using an enum as result similar to
CoderResult#UNDERFLOW and CoderResult#OVERFLOW might be another option
if the boolean results are too cryptic).

Having a ByteBuffer Base64 API might be useful (although I'm not sure
yet if I ever need it), but as it is now, it is mostly useless for
serious ByteBuffer usage, as if I have to split and copy the data
manually anyway, I can as well use the array APIs.


Just my 0.02 EUR,


Michael

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Ariel Weisberg

Hi,

Thanks for doing this BTW.

I think that including ByteBuffer API even if it isn't as efficient as
raw byte arrays is better then not having it in the API at all. If that
means allocating a byte array for the output and then doing a put on the
ByteBuffer that is fine.

Down the line if someone has a particularly powerful itch to scratch WRT
to performance they can add more code to the library to make it more
efficient at handling them and then everyone will benefit or they can do
their own implementation.

Thanks,
Ariel


On Fri, Oct 12, 2012, at 02:56 PM, Xueming Shen wrote:
> Hi,
> 
> The exactly reason I was trying to skip en/decode(ByteBuffer in, 
> ByteByuffer out)
> for now. I'm struggling with/can't make up my mind on whether or not the 
> en/decoder
> should  have internal state, like the charset en/decoder. It appears the 
> API is being
> pushed going that direction though:-)
> 
> -Sherman
> 
> On 10/12/2012 11:39 AM, Michael Schierl wrote:
> > Hello,
> >
> > (sorry if the threading is broken, but I was not subscribed to the list
> > and only found the discussion on Twitter and read it in the mailing list
> > archive)
> >
> > Ariel Weisberg wrote on Thu Oct 11 11:30:56 PDT 2012:
> >> I know that ByteBuffers are pain, but I did notice that you can't
> >> specify a source/dest pair when using ByteBuffers and that ByteBuffers
> >> without arrays have to be copied. I don't see a simple safe way to
> >> normalize access to them the way you can if everything is a byte array.
> > Agreed. One of the advantages of using byte buffers is reducing
> > allocations, resulting in fewer garbage collections.
> >
> > In addition, in this implementation the ByteBuffers have to contain the
> > full data.
> >
> > What I like about most byte buffers APIs is that I can pass in a
> > ByteBuffer with incomplete data or maybe an output ByteBuffer that is
> > too small to hold the complete result, and it will just process as much
> > as it can, and leave the rest for the next round (which should work well
> > for Base64, too, as it always processes chunks of 3 or 4 bytes).
> >
> > So, a useful ByteBuffer API in my opinion needs a method like
> >
> > public boolean encode(ByteBuffer in, ByteBuffer out,
> >boolean endOfInput);
> >
> > public boolean decode(ByteBuffer in, ByteBuffer out,
> >boolean endOfInput);
> >
> > (similar to CharsetEncoder#encode) that can process partial input and
> > will return true if all processable input has been processed (i. e. in
> > has to be refilled) or false if some input could not have been processed
> > (i. e. out has to be flushed).
> >
> > Users have to call it again and again until they call it with
> > endOfInput=true and get true back (Using an enum as result similar to
> > CoderResult#UNDERFLOW and CoderResult#OVERFLOW might be another option
> > if the boolean results are too cryptic).
> >
> > Having a ByteBuffer Base64 API might be useful (although I'm not sure
> > yet if I ever need it), but as it is now, it is mostly useless for
> > serious ByteBuffer usage, as if I have to split and copy the data
> > manually anyway, I can as well use the array APIs.
> >
> >
> > Just my 0.02 EUR,
> >
> >
> > Michael
>

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Xueming Shen


Hi,

The exactly reason I was trying to skip en/decode(ByteBuffer in, 
ByteByuffer out)
for now. I'm struggling with/can't make up my mind on whether or not the 
en/decoder
should  have internal state, like the charset en/decoder. It appears the 
API is being

pushed going that direction though:-)

-Sherman

On 10/12/2012 11:39 AM, Michael Schierl wrote:

Hello,

(sorry if the threading is broken, but I was not subscribed to the list
and only found the discussion on Twitter and read it in the mailing list
archive)

Ariel Weisberg wrote on Thu Oct 11 11:30:56 PDT 2012:

I know that ByteBuffers are pain, but I did notice that you can't
specify a source/dest pair when using ByteBuffers and that ByteBuffers
without arrays have to be copied. I don't see a simple safe way to
normalize access to them the way you can if everything is a byte array.

Agreed. One of the advantages of using byte buffers is reducing
allocations, resulting in fewer garbage collections.

In addition, in this implementation the ByteBuffers have to contain the
full data.

What I like about most byte buffers APIs is that I can pass in a
ByteBuffer with incomplete data or maybe an output ByteBuffer that is
too small to hold the complete result, and it will just process as much
as it can, and leave the rest for the next round (which should work well
for Base64, too, as it always processes chunks of 3 or 4 bytes).

So, a useful ByteBuffer API in my opinion needs a method like

public boolean encode(ByteBuffer in, ByteBuffer out,
   boolean endOfInput);

public boolean decode(ByteBuffer in, ByteBuffer out,
   boolean endOfInput);

(similar to CharsetEncoder#encode) that can process partial input and
will return true if all processable input has been processed (i. e. in
has to be refilled) or false if some input could not have been processed
(i. e. out has to be flushed).

Users have to call it again and again until they call it with
endOfInput=true and get true back (Using an enum as result similar to
CoderResult#UNDERFLOW and CoderResult#OVERFLOW might be another option
if the boolean results are too cryptic).

Having a ByteBuffer Base64 API might be useful (although I'm not sure
yet if I ever need it), but as it is now, it is mostly useless for
serious ByteBuffer usage, as if I have to split and copy the data
manually anyway, I can as well use the array APIs.


Just my 0.02 EUR,


Michael

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Michael Schierl

Hello,

(sorry if the threading is broken, but I was not subscribed to the list
and only found the discussion on Twitter and read it in the mailing list
archive)

Ariel Weisberg wrote on Thu Oct 11 11:30:56 PDT 2012:
> I know that ByteBuffers are pain, but I did notice that you can't
> specify a source/dest pair when using ByteBuffers and that ByteBuffers
> without arrays have to be copied. I don't see a simple safe way to
> normalize access to them the way you can if everything is a byte array.

Agreed. One of the advantages of using byte buffers is reducing
allocations, resulting in fewer garbage collections.

In addition, in this implementation the ByteBuffers have to contain the
full data.

What I like about most byte buffers APIs is that I can pass in a
ByteBuffer with incomplete data or maybe an output ByteBuffer that is
too small to hold the complete result, and it will just process as much
as it can, and leave the rest for the next round (which should work well
for Base64, too, as it always processes chunks of 3 or 4 bytes).

So, a useful ByteBuffer API in my opinion needs a method like

public boolean encode(ByteBuffer in, ByteBuffer out,
  boolean endOfInput);

public boolean decode(ByteBuffer in, ByteBuffer out,
  boolean endOfInput);

(similar to CharsetEncoder#encode) that can process partial input and
will return true if all processable input has been processed (i. e. in
has to be refilled) or false if some input could not have been processed
(i. e. out has to be flushed).

Users have to call it again and again until they call it with
endOfInput=true and get true back (Using an enum as result similar to
CoderResult#UNDERFLOW and CoderResult#OVERFLOW might be another option
if the boolean results are too cryptic).

Having a ByteBuffer Base64 API might be useful (although I'm not sure
yet if I ever need it), but as it is now, it is mostly useless for
serious ByteBuffer usage, as if I have to split and copy the data
manually anyway, I can as well use the array APIs.


Just my 0.02 EUR,


Michael

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-12 Thread Alan Bateman


On 11/10/2012 19:30, Ariel Weisberg wrote:

:

I know that ByteBuffers are pain, but I did notice that you can't
specify a source/dest pair when using ByteBuffers and that ByteBuffers
without arrays have to be copied. I don't see a simple safe way to
normalize access to them the way you can if everything is a byte array.

I agree, encode/decode methods where the destination is a given 
ByteBuffer would be desirable (and probably more useful than returning a 
new ByteBuffer each time). Byte arrays are so commonly used that it 
probably justifying having both variants as proposed.


-Alan

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-11 Thread Xueming Shen




On 10/11/2012 02:43 AM, Stephen Colebourne wrote:

There are lots of other public base 64 implementations to test/check 
against.
http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html 

http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html 


http://migbase64.sourceforge.net/  (claims to be fast)




I did a quick performance check against migbase64 with the basic base64 
de/encoding

using this non-scientific benchmark.
http://cr.openjdk.java.net/~sherman/4235519/PermBase64.java

Here is the scores
http://cr.openjdk.java.net/~sherman/4235519/scores

It's a tie, j.u.Base64 wins the decoding, but needs some work on 
encoding side.


java -server PermBase64 20 50
 j.u.Base64.encode() : 1390212
  migBase64.encode() : 720517
 j.u.Base64.decode() : 1200642
  migBase64.decode() : 2070015

java -server PermBase64 20 100
 j.u.Base64.encode() : 1239407
  migBase64.encode() : 740404
 j.u.Base64.decode() : 1257092
  migBase64.decode() : 2012910

java -server PermBase64 20 1000
 j.u.Base64.encode() : 1062212
  migBase64.encode() : 657342
 j.u.Base64.decode() : 1133740
  migBase64.decode() : 1930612

java -client PermBase64 20 50
 j.u.Base64.encode() : 961331
  migBase64.encode() : 875635
 j.u.Base64.decode() : 1528790
  migBase64.decode() : 2188473

java -client PermBase64 20 100
 j.u.Base64.encode() : 966453
  migBase64.encode() : 858082
 j.u.Base64.decode() : 1459159
  migBase64.decode() : 2115045

java -client PermBase64 20 1000
 j.u.Base64.encode() : 954401
  migBase64.encode() : 854903
 j.u.Base64.decode() : 1444603
  migBase64.decode() : 2038865

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-11 Thread Ariel Weisberg

Hi,

It looks like it tackles the issue of encoding binary data as text in an
isolated fashion and doesn't seem open to adding other ways of encoding
binary data as text such as hex or yEnc. Having a common interface in
java.util for different encodings would be great. I am not asking for
more implementation, just a more abstract interface for Decoder and
Encoder.

I know that ByteBuffers are pain, but I did notice that you can't
specify a source/dest pair when using ByteBuffers and that ByteBuffers
without arrays have to be copied. I don't see a simple safe way to
normalize access to them the way you can if everything is a byte array.

Thanks,
Ariel

On Thu, Oct 11, 2012, at 12:40 PM, Xueming Shen wrote:
> On 10/11/2012 02:43 AM, Stephen Colebourne wrote:
> > The class level javadoc is quite short. It also has hyperlinks in the
> > first sentence, which means that they are visible in package level
> > javadoc. Consider having no hyperlinks in the first sentence, and
> > expanding a little on what base 64 is.
> >
> 
> Sure, will come up something longer.
> > There are lots of other public base 64 implementations to test/check 
> > against.
> > http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html
> > http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html
> > http://migbase64.sourceforge.net/  (claims to be fast)
> 
> I did compare the result against the apache version, the difference 
> appears to
> be the apache (1)append line feeds at the end of the encoded bytes 
> (2)skip the
> padding '=' characters for URL-safe style.  Will try other
> implementations.
> 
> > The arrays are defined inconsistently within the code (3 styles).
> >private Encoder(byte[] base64, byte[] newline, int linemax)
> >byte [] getBytes(ByteBuffer bb)
> >private static final byte toBase64[] =
> > I'd strongly encourage one style be used, and that it is the first of
> > the three above.
> 
> Good catch, the later two are not intentional, the leftover of old code. 
> webrev has
> been updated according.
> 
> Thanks!
> -Sherman
> 
> 
> > Stephen
> >
> >
> > On 10 October 2012 18:54, Xueming Shen  wrote:
> >> A standard/public API for Base64 encoding and decoding has been long
> >> overdue. JDK8  has a JEP [1] for this particular request.
> >>
> >> Here is the draft proposal to add a public Base64 utility class for JDK8.
> >>
> >> http://cr.openjdk.java.net/~sherman/4235519/webrev
> >>
> >> This class basically provides 4 variants of Base64 encoding scheme,
> >>
> >> (1) the default RFC 4648, which uses standard mapping, no line breaks,
> >> (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+"
> >> and
> >>  "/" for the mapping
> >> (3) the default MIME style, as in RFC 2045 (and its earlier versions), 
> >> which
> >> uses
> >>  "standard" base64 mapping, a 76 characters per line limit and uses 
> >> crlf
> >> pair
> >>   \r\n for line break.
> >> (4) an extend-able MIME base64, with the char-per-line and the line
> >> separator(s)
> >>   specified by developer.
> >>
> >> The encoder/decoder now provides encoding and decoding for byte[], String,
> >> ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream",
> >> which we believe/hope should satisfy most of the common use cases.
> >> Additional
> >> method(s) can be added if strongly desired.
> >>
> >> We tried couple slightly different styles of design for such this "simple"
> >> utility
> >> class [2].  We kinda concluded that the version proposed probably provides
> >> the best balance among readability, usability and extensibility.
> >>
> >> Please help review and comment on the API and implementation.
> >>
> >> Thanks!
> >> -Sherman
> >>
> >> [1] http://openjdk.java.net/jeps/135
> >> [2] http://cr.openjdk.java.net/~sherman/base64/
>

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-11 Thread Xueming Shen


On 10/11/2012 02:43 AM, Stephen Colebourne wrote:

The class level javadoc is quite short. It also has hyperlinks in the
first sentence, which means that they are visible in package level
javadoc. Consider having no hyperlinks in the first sentence, and
expanding a little on what base 64 is.



Sure, will come up something longer.

There are lots of other public base 64 implementations to test/check against.
http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html
http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html
http://migbase64.sourceforge.net/  (claims to be fast)


I did compare the result against the apache version, the difference 
appears to
be the apache (1)append line feeds at the end of the encoded bytes 
(2)skip the

padding '=' characters for URL-safe style.  Will try other implementations.


The arrays are defined inconsistently within the code (3 styles).
   private Encoder(byte[] base64, byte[] newline, int linemax)
   byte [] getBytes(ByteBuffer bb)
   private static final byte toBase64[] =
I'd strongly encourage one style be used, and that it is the first of
the three above.


Good catch, the later two are not intentional, the leftover of old code. 
webrev has

been updated according.

Thanks!
-Sherman



Stephen


On 10 October 2012 18:54, Xueming Shen  wrote:

A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+"
and
 "/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions), which
uses
 "standard" base64 mapping, a 76 characters per line limit and uses crlf
pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], String,
ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream",
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this "simple"
utility
class [2].  We kinda concluded that the version proposed probably provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-11 Thread Stephen Colebourne

The class level javadoc is quite short. It also has hyperlinks in the
first sentence, which means that they are visible in package level
javadoc. Consider having no hyperlinks in the first sentence, and
expanding a little on what base 64 is.

There are lots of other public base 64 implementations to test/check against.
http://commons.apache.org/net/api-3.1/org/apache/commons/net/util/Base64.html
http://www.cs.berkeley.edu/~jonah/bc/org/bouncycastle/util/encoders/Base64.html
http://migbase64.sourceforge.net/  (claims to be fast)

The arrays are defined inconsistently within the code (3 styles).
  private Encoder(byte[] base64, byte[] newline, int linemax)
  byte [] getBytes(ByteBuffer bb)
  private static final byte toBase64[] =
I'd strongly encourage one style be used, and that it is the first of
the three above.

Stephen

On 10 October 2012 18:54, Xueming Shen  wrote:
>
> A standard/public API for Base64 encoding and decoding has been long
> overdue. JDK8  has a JEP [1] for this particular request.
>
> Here is the draft proposal to add a public Base64 utility class for JDK8.
>
> http://cr.openjdk.java.net/~sherman/4235519/webrev
>
> This class basically provides 4 variants of Base64 encoding scheme,
>
> (1) the default RFC 4648, which uses standard mapping, no line breaks,
> (2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+"
> and
> "/" for the mapping
> (3) the default MIME style, as in RFC 2045 (and its earlier versions), which
> uses
> "standard" base64 mapping, a 76 characters per line limit and uses crlf
> pair
>  \r\n for line break.
> (4) an extend-able MIME base64, with the char-per-line and the line
> separator(s)
>  specified by developer.
>
> The encoder/decoder now provides encoding and decoding for byte[], String,
> ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream",
> which we believe/hope should satisfy most of the common use cases.
> Additional
> method(s) can be added if strongly desired.
>
> We tried couple slightly different styles of design for such this "simple"
> utility
> class [2].  We kinda concluded that the version proposed probably provides
> the best balance among readability, usability and extensibility.
>
> Please help review and comment on the API and implementation.
>
> Thanks!
> -Sherman
>
> [1] http://openjdk.java.net/jeps/135
> [2] http://cr.openjdk.java.net/~sherman/base64/

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-11 Thread Alan Bateman


On 11/10/2012 04:15, Xueming Shen wrote:
There is no plan yet. The sun.misc.BASE64En/Decoder should already 
trigger a compiler
warning for it's a sun private API. @Deprecated annotation might be a 
good fit.


-Sherman
Yes, I think we should deprecate them. The other thing suggested in the 
JEP is that we would also look usages in the JDK, also other 
implementations (as there are many) to see if they can be retired. This 
can of course be done later. It also an opportunity for contributions, 
say for example someone might like to look at java.util.prefs to retire 
the implementation there. Another one that might be able to retire is 
the implementation in the com.sun.net.httpserver.BasicAuthenticator 
class and there are others.


-Alan

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Xueming Shen


On 10/10/12 8:39 PM, Weijun Wang wrote:



On 10/11/2012 11:32 AM, Xueming Shen wrote:

On 10/10/12 8:16 PM, Weijun Wang wrote:



On 10/11/2012 11:09 AM, Xueming Shen wrote:

On 10/10/12 6:51 PM, Weijun Wang wrote:

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax && (atom != 0 || sp <
sl)) {

Maybe atom != 0 is not necessary?


The logic here is that if we reached the last atom (atom == 0), but if
there
is still byte(s) in src (sp < sl), we will need to output the last
special unit,
which has one or two padding charactere '=', in this case, we still
need to
output the line separator(s).


But when atom != 0, it seems sp < sl should always be true.


Yes, the sp < sl part only counts if atom == 0. Means if it's NOT last
atom,
(atom != 0) output the line feeds, if it IS the last atom (atom == 
0), only
output the line feeds if there are leftover byte (sp < sl), which 
means the

last 4-byte unit (with one or two '=') will be after this line feed.


So, the value of sp < sl is always the same as (atom != 0 || sp < sl). 
That's why I said atom != 0 is not necessary.


You're absolutely correct! Somehow I kept thinking you were saying sp < 
sl is not necessary:-)

not the other way around.

webrev will be updated shortly.

Thanks!
-Sherman




-Max



-Sherman



-Max





2. Is it necessary to explicitly mention in the spec that there is no
CrLf at the end of a MIME encoded string?


I'm struggling with which is the appropriate/desired behavior, output
the crlf for the last line
or not. Apache's common coder appears to append the crlf for the last
line, but our sun.misc
version does not (but sun.misc.BASE64 actually appends the line
separator if the last line
happens to have 76 characters, I would assume this is a bug 
though). The

current implement
tries to match the sun.misc. I'm happy to go either way.

But, as you suggested, it might be worth explicitly describing 
whatever

behavior we choose.



3. The test confirms decoding can correctly reverse the encoding but
it says nothing about the correctness of the encoding. Maybe we can
just use "10.  Test Vectors" of RFC 4648?



I do have a version of TestBase64 actually compares encoded results of
j.u.Base64,
sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. 
Maybe

I should
at least plug in the code for comparing with sun.misc.Base64Encoder.

Thanks!
-Sherman





On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for
JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line 
breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to 
replace

"+" and
 "/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier 
versions),

which uses
 "standard" base64 mapping, a 76 characters per line limit and
uses
crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[],
String,
ByteBuffer and a pair of "EncoderInputStream" and
"DecoderOutputStrream",
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
"simple" utility
class [2].  We kinda concluded that the version proposed probably
provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Weijun Wang




On 10/11/2012 11:32 AM, Xueming Shen wrote:

On 10/10/12 8:16 PM, Weijun Wang wrote:



On 10/11/2012 11:09 AM, Xueming Shen wrote:

On 10/10/12 6:51 PM, Weijun Wang wrote:

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax && (atom != 0 || sp <
sl)) {

Maybe atom != 0 is not necessary?


The logic here is that if we reached the last atom (atom == 0), but if
there
is still byte(s) in src (sp < sl), we will need to output the last
special unit,
which has one or two padding charactere '=', in this case, we still
need to
output the line separator(s).


But when atom != 0, it seems sp < sl should always be true.


Yes, the sp < sl part only counts if atom == 0. Means if it's NOT last
atom,
(atom != 0) output the line feeds, if it IS the last atom (atom == 0), only
output the line feeds if there are leftover byte (sp < sl), which means the
last 4-byte unit (with one or two '=') will be after this line feed.


So, the value of sp < sl is always the same as (atom != 0 || sp < sl). 
That's why I said atom != 0 is not necessary.


-Max



-Sherman



-Max





2. Is it necessary to explicitly mention in the spec that there is no
CrLf at the end of a MIME encoded string?


I'm struggling with which is the appropriate/desired behavior, output
the crlf for the last line
or not. Apache's common coder appears to append the crlf for the last
line, but our sun.misc
version does not (but sun.misc.BASE64 actually appends the line
separator if the last line
happens to have 76 characters, I would assume this is a bug though). The
current implement
tries to match the sun.misc. I'm happy to go either way.

But, as you suggested, it might be worth explicitly describing whatever
behavior we choose.



3. The test confirms decoding can correctly reverse the encoding but
it says nothing about the correctness of the encoding. Maybe we can
just use "10.  Test Vectors" of RFC 4648?



I do have a version of TestBase64 actually compares encoded results of
j.u.Base64,
sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe
I should
at least plug in the code for comparing with sun.misc.Base64Encoder.

Thanks!
-Sherman





On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for
JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace
"+" and
 "/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions),
which uses
 "standard" base64 mapping, a 76 characters per line limit and
uses
crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[],
String,
ByteBuffer and a pair of "EncoderInputStream" and
"DecoderOutputStrream",
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
"simple" utility
class [2].  We kinda concluded that the version proposed probably
provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Xueming Shen


On 10/10/12 8:16 PM, Weijun Wang wrote:



On 10/11/2012 11:09 AM, Xueming Shen wrote:

On 10/10/12 6:51 PM, Weijun Wang wrote:

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax && (atom != 0 || sp < 
sl)) {


Maybe atom != 0 is not necessary?


The logic here is that if we reached the last atom (atom == 0), but if
there
is still byte(s) in src (sp < sl), we will need to output the last
special unit,
which has one or two padding charactere '=', in this case, we still 
need to

output the line separator(s).


But when atom != 0, it seems sp < sl should always be true.


Yes, the sp < sl part only counts if atom == 0. Means if it's NOT last atom,
(atom != 0) output the line feeds, if it IS the last atom (atom == 0), only
output the line feeds if there are leftover byte (sp < sl), which means the
last 4-byte unit (with one or two '=') will be after this line feed.

-Sherman



-Max





2. Is it necessary to explicitly mention in the spec that there is no
CrLf at the end of a MIME encoded string?


I'm struggling with which is the appropriate/desired behavior, output
the crlf for the last line
or not. Apache's common coder appears to append the crlf for the last
line, but our sun.misc
version does not (but sun.misc.BASE64 actually appends the line
separator if the last line
happens to have 76 characters, I would assume this is a bug though). The
current implement
tries to match the sun.misc. I'm happy to go either way.

But, as you suggested, it might be worth explicitly describing whatever
behavior we choose.



3. The test confirms decoding can correctly reverse the encoding but
it says nothing about the correctness of the encoding. Maybe we can
just use "10.  Test Vectors" of RFC 4648?



I do have a version of TestBase64 actually compares encoded results of
j.u.Base64,
sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe
I should
at least plug in the code for comparing with sun.misc.Base64Encoder.

Thanks!
-Sherman





On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for
JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace
"+" and
 "/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions),
which uses
 "standard" base64 mapping, a 76 characters per line limit and 
uses

crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[],
String,
ByteBuffer and a pair of "EncoderInputStream" and
"DecoderOutputStrream",
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
"simple" utility
class [2].  We kinda concluded that the version proposed probably
provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Weijun Wang




On 10/11/2012 11:09 AM, Xueming Shen wrote:

On 10/10/12 6:51 PM, Weijun Wang wrote:

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax && (atom != 0 || sp < sl)) {

Maybe atom != 0 is not necessary?


The logic here is that if we reached the last atom (atom == 0), but if
there
is still byte(s) in src (sp < sl), we will need to output the last
special unit,
which has one or two padding charactere '=', in this case, we still need to
output the line separator(s).


But when atom != 0, it seems sp < sl should always be true.

-Max





2. Is it necessary to explicitly mention in the spec that there is no
CrLf at the end of a MIME encoded string?


I'm struggling with which is the appropriate/desired behavior, output
the crlf for the last line
or not. Apache's common coder appears to append the crlf for the last
line, but our sun.misc
version does not (but sun.misc.BASE64 actually appends the line
separator if the last line
happens to have 76 characters, I would assume this is a bug though). The
current implement
tries to match the sun.misc. I'm happy to go either way.

But, as you suggested, it might be worth explicitly describing whatever
behavior we choose.



3. The test confirms decoding can correctly reverse the encoding but
it says nothing about the correctness of the encoding. Maybe we can
just use "10.  Test Vectors" of RFC 4648?



I do have a version of TestBase64 actually compares encoded results of
j.u.Base64,
sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe
I should
at least plug in the code for comparing with sun.misc.Base64Encoder.

Thanks!
-Sherman





On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for
JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace
"+" and
 "/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions),
which uses
 "standard" base64 mapping, a 76 characters per line limit and uses
crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[],
String,
ByteBuffer and a pair of "EncoderInputStream" and
"DecoderOutputStrream",
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
"simple" utility
class [2].  We kinda concluded that the version proposed probably
provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Xueming Shen

There is no plan yet. The sun.misc.BASE64En/Decoder should already 
trigger a compiler
warning for it's a sun private API. @Deprecated annotation might be a 
good fit.


-Sherman

On 10/10/12 1:40 PM, Joe Darcy wrote:

Hello,

On 10/10/2012 1:03 PM, Iris Clark wrote:

Hi, Sherman.

I'm glad to see this coming in.  As you said, long overdue.

I'm curious.  What are the plans are to encourage migration from the 
JDK private and unsupported sun.misc.BASE64{En,DE}coder classes?  
Compile-time warning?  Documentation?  Something else?


On that point, sun.misc.BASE64{En,DE}coder may present a fine 
opportunity to use the @Deprecated annotation...


-Joe



Thanks,
iris

-Original Message-
From: Xueming Shen
Sent: Wednesday, October 10, 2012 10:55 AM
To: core-libs-dev
Subject: Review/comment needed for the new public java.util.Base64 class


A standard/public API for Base64 encoding and decoding has been long 
overdue. JDK8  has a JEP [1] for this particular request.


Here is the draft proposal to add a public Base64 utility class for 
JDK8.


http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace 
"+" and

  "/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier 
versions), which uses
  "standard" base64 mapping, a 76 characters per line limit and 
uses crlf pair

   \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
   specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], 
String, ByteBuffer and a pair of "EncoderInputStream" and 
"DecoderOutputStrream", which we believe/hope should satisfy most of 
the common use cases.

Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this 
"simple" utility class [2].  We kinda concluded that the version 
proposed probably provides the best balance among readability, 
usability and extensibility.


Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Xueming Shen


On 10/10/12 6:51 PM, Weijun Wang wrote:

Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax && (atom != 0 || sp < sl)) {

Maybe atom != 0 is not necessary?


The logic here is that if we reached the last atom (atom == 0), but if there
is still byte(s) in src (sp < sl), we will need to output the last 
special unit,

which has one or two padding charactere '=', in this case, we still need to
output the line separator(s).



2. Is it necessary to explicitly mention in the spec that there is no 
CrLf at the end of a MIME encoded string?


I'm struggling with which is the appropriate/desired behavior, output 
the crlf for the last line
or not. Apache's common coder appears to append the crlf for the last 
line, but our sun.misc
version does not (but sun.misc.BASE64 actually appends the line 
separator if the last line
happens to have 76 characters, I would assume this is a bug though). The 
current implement

tries to match the sun.misc. I'm happy to go either way.

But, as you suggested, it might be worth explicitly describing whatever 
behavior we choose.




3. The test confirms decoding can correctly reverse the encoding but 
it says nothing about the correctness of the encoding. Maybe we can 
just use "10.  Test Vectors" of RFC 4648?




I do have a version of TestBase64 actually compares encoded results of 
j.u.Base64,
sun.misc.BASE64Encoder and org.apache.commons.codec.binary.Base64. Maybe 
I should

at least plug in the code for comparing with sun.misc.Base64Encoder.

Thanks!
-Sherman





On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for 
JDK8.


http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace
"+" and
 "/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions),
which uses
 "standard" base64 mapping, a 76 characters per line limit and uses
crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], 
String,
ByteBuffer and a pair of "EncoderInputStream" and 
"DecoderOutputStrream",

which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
"simple" utility
class [2].  We kinda concluded that the version proposed probably 
provides

the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Weijun Wang


Several questions:

1. In encode0(byte[] src, byte[] dst)

 281 if (linepos == linemax && (atom != 0 || sp < sl)) {

Maybe atom != 0 is not necessary?

2. Is it necessary to explicitly mention in the spec that there is no 
CrLf at the end of a MIME encoded string?


3. The test confirms decoding can correctly reverse the encoding but it 
says nothing about the correctness of the encoding. Maybe we can just 
use "10.  Test Vectors" of RFC 4648?


Thanks
Max

On 10/11/2012 01:54 AM, Xueming Shen wrote:


A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace
"+" and
 "/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions),
which uses
 "standard" base64 mapping, a 76 characters per line limit and uses
crlf pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], String,
ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream",
which we believe/hope should satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this
"simple" utility
class [2].  We kinda concluded that the version proposed probably provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

Re: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Joe Darcy


Hello,

On 10/10/2012 1:03 PM, Iris Clark wrote:

Hi, Sherman.

I'm glad to see this coming in.  As you said, long overdue.

I'm curious.  What are the plans are to encourage migration from the JDK 
private and unsupported sun.misc.BASE64{En,DE}coder classes?  Compile-time 
warning?  Documentation?  Something else?


On that point, sun.misc.BASE64{En,DE}coder may present a fine 
opportunity to use the @Deprecated annotation...


-Joe



Thanks,
iris

-Original Message-
From: Xueming Shen
Sent: Wednesday, October 10, 2012 10:55 AM
To: core-libs-dev
Subject: Review/comment needed for the new public java.util.Base64 class


A standard/public API for Base64 encoding and decoding has been long overdue. 
JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and
  "/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions), which 
uses
  "standard" base64 mapping, a 76 characters per line limit and uses crlf 
pair
   \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
   specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], String, ByteBuffer and a pair of 
"EncoderInputStream" and "DecoderOutputStrream", which we believe/hope should 
satisfy most of the common use cases.
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this "simple" 
utility class [2].  We kinda concluded that the version proposed probably provides the 
best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

RE: Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Iris Clark

Hi, Sherman.

I'm glad to see this coming in.  As you said, long overdue.

I'm curious.  What are the plans are to encourage migration from the JDK 
private and unsupported sun.misc.BASE64{En,DE}coder classes?  Compile-time 
warning?  Documentation?  Something else?

Thanks,
iris

-Original Message-
From: Xueming Shen 
Sent: Wednesday, October 10, 2012 10:55 AM
To: core-libs-dev
Subject: Review/comment needed for the new public java.util.Base64 class


A standard/public API for Base64 encoding and decoding has been long overdue. 
JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace "+" and
 "/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions), which 
uses
 "standard" base64 mapping, a 76 characters per line limit and uses crlf 
pair
  \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line
separator(s)
  specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], String, 
ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream", which 
we believe/hope should satisfy most of the common use cases. 
Additional
method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this "simple" 
utility class [2].  We kinda concluded that the version proposed probably 
provides the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

Review/comment needed for the new public java.util.Base64 class

2012-10-10 Thread Xueming Shen



A standard/public API for Base64 encoding and decoding has been long
overdue. JDK8  has a JEP [1] for this particular request.

Here is the draft proposal to add a public Base64 utility class for JDK8.

http://cr.openjdk.java.net/~sherman/4235519/webrev

This class basically provides 4 variants of Base64 encoding scheme,

(1) the default RFC 4648, which uses standard mapping, no line breaks,
(2) the URL-safe RFE 4648, no line breaks, use "-" and "_" to replace 
"+" and

"/" for the mapping
(3) the default MIME style, as in RFC 2045 (and its earlier versions), 
which uses
"standard" base64 mapping, a 76 characters per line limit and uses 
crlf pair

 \r\n for line break.
(4) an extend-able MIME base64, with the char-per-line and the line 
separator(s)

 specified by developer.

The encoder/decoder now provides encoding and decoding for byte[], String,
ByteBuffer and a pair of "EncoderInputStream" and "DecoderOutputStrream",
which we believe/hope should satisfy most of the common use cases. 
Additional

method(s) can be added if strongly desired.

We tried couple slightly different styles of design for such this 
"simple" utility

class [2].  We kinda concluded that the version proposed probably provides
the best balance among readability, usability and extensibility.

Please help review and comment on the API and implementation.

Thanks!
-Sherman

[1] http://openjdk.java.net/jeps/135
[2] http://cr.openjdk.java.net/~sherman/base64/

55 matches

Mail list logo