[jira] [Commented] (CODEC-166) Base64 could be faster

2013-04-29 Thread Jochen Wiedmann (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644455#comment-13644455
 ] 

Jochen Wiedmann commented on CODEC-166:
---

I consider this a non-issue. If anyone wants to have a performant 
implementation, it's reasonable to expect him or her to use the streaming API. 
And if that is quick: Where is the issue?


 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 2.0

 Attachments: base64bench.zip, CODEC-166.patch, CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-04-28 Thread Emmanuel Bourg (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644118#comment-13644118
 ] 

Emmanuel Bourg commented on CODEC-166:
--

An observation regarding the benchmark, each implementation should be tested 
independently in a different run. When benchmarking the performance for Commons 
CSV I noticed that I got very different figures depending on the execution 
order of the tests. The only reliable way to measure the performance was to run 
the tests independently. Also the performance was quite different if I ran just 
the benchmark, or something else before (like the unit tests) and then the 
benchmark. And make sure you are running the server VM!

 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 2.0

 Attachments: base64bench.zip, CODEC-166.patch, CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-04-28 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644128#comment-13644128
 ] 

Thomas Neidhart commented on CODEC-166:
---

I experimented with caliper lately (see http://code.google.com/p/caliper/), and 
it was quite promising.

 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 2.0

 Attachments: base64bench.zip, CODEC-166.patch, CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-02-20 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582444#comment-13582444
 ] 

Thomas Neidhart commented on CODEC-166:
---

Hi Julius,

the only change I did was as described on the mailinglist: use (part) of the 
result in some kind of calculation to prevent it from being optimized away:

{noformat}
long d = 0;
long start = System.currentTimeMillis();
for (int i = 0; i  FACTOR * REPS; i++) {
encoded = Base64.encodeBase64(data);
d += encoded[i % encoded.length];
}
printEncodeStat(start, data, d);
{noformat}

The value d is then passed to the print method to be output. This of course for 
all occurrences of this pattern.

 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 1.8

 Attachments: base64bench.zip, CODEC-166.patch, CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-02-19 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581308#comment-13581308
 ] 

Thomas Neidhart commented on CODEC-166:
---

With the latest patch I get these figures:

{noformat}
 MEDIUM DATA new byte[1234]

iHarder...
encode 237.0 MB/s 104 per run=0.0052decode 83.0 MB/s 
encode 237.0 MB/s 104 per run=0.0052decode 83.0 MB/s 

MiGBase64...
encode 320.0 MB/s 77 per run=0.00385decode 132.0 MB/s 
encode 316.0 MB/s 78 per run=0.0039decode 132.0 MB/s 

Apache Commons Codec...
encode 184.0 MB/s 134 per run=0.0067decode 85.0 MB/s 
encode 185.0 MB/s 133 per run=0.0066505decode 85.0 MB/s 

  LARGE DATA new byte[12345]

iHarder...
encode 235.0 MB/s 525 per run=0.0525decode 83.0 MB/s 
encode 234.0 MB/s 526 per run=0.0526decode 83.0 MB/s 

MiGBase64...
encode 314.0 MB/s 393 per run=0.039295decode 134.0 MB/s 
encode 314.0 MB/s 392 per run=0.039206decode 134.0 MB/s 

Apache Commons Codec...
encode 187.0 MB/s 660 per run=0.066decode 90.0 MB/s 
encode 187.0 MB/s 659 per run=0.0659decode 90.0 MB/s 
{noformat}

 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 1.8

 Attachments: base64bench.zip, CODEC-166.draft.patch, CODEC-166.patch, 
 CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-02-19 Thread Mikael Grev (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581320#comment-13581320
 ] 

Mikael Grev commented on CODEC-166:
---

Thomas,

I have to ask, why not use the MiGBase64 version that Julius provided?

 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 1.8

 Attachments: base64bench.zip, CODEC-166.draft.patch, CODEC-166.patch, 
 CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-02-19 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581336#comment-13581336
 ] 

Thomas Neidhart commented on CODEC-166:
---

Well, it may look stupid to you, but if we can improve the speed of our own 
implementation, we do not have to maintain two different versions, as the same 
code is also used for streaming.

In general I am also fine to include your version in a more compact way as 
before, but I wanted to see whats possible with our existing impl (there are 
still more weird optimizations, like using a char[] for the encoding table 
which is slightly faster although it requires a cast).

 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 1.8

 Attachments: base64bench.zip, CODEC-166.draft.patch, CODEC-166.patch, 
 CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-02-19 Thread Mikael Grev (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581341#comment-13581341
 ] 

Mikael Grev commented on CODEC-166:
---

It doesn't look stupid. I was just wondering as I saw no discussion as to why. 
That discussion has probably been in the private, which is fine. 
Btw, I have not had any bug reports for MiGBase64 since 2004, so it should be 
pretty solid.

Cheers,
Mikael

 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 1.8

 Attachments: base64bench.zip, CODEC-166.draft.patch, CODEC-166.patch, 
 CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-02-19 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581363#comment-13581363
 ] 

Thomas Neidhart commented on CODEC-166:
---

Discussion normally happens on the mailinglist, see here: 
http://markmail.org/message/xvb6nzfdlthzjcnu

 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 1.8

 Attachments: base64bench.zip, CODEC-166.draft.patch, CODEC-166.patch, 
 CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-02-19 Thread Julius Davies (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581528#comment-13581528
 ] 

Julius Davies commented on CODEC-166:
-

Hi, Mikael,

I don't think there's been any discussion in private between Apache-Commons 
committers --- at least none that I was involved in.   We try to do everything 
in the open.   There was some discussion on the ML instead of in the bug here 
(a reply thread to my original commit, which creates an automatic email to the 
ML).

Your comment here in the bug-tracker helps, btw.  Always nice to know that the 
original project team supports a fork.


---

Question for Thomas:

Would you mind attaching your micro-benchmark?   Looks like you made some good 
improvements to it!


---

Comment for everyone,

Thomas's patch will help us in any case (e.g., it will improve streaming 
performance if we take my patch; it will improve all performance if we don't).

So let's take Thomas's patch for sure, and continue discussing the MiGBase64 
fork I've prepared.




 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 1.8

 Attachments: base64bench.zip, CODEC-166.draft.patch, CODEC-166.patch, 
 CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-02-19 Thread Julius Davies (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581834#comment-13581834
 ] 

Julius Davies commented on CODEC-166:
-


Here's a before  after run on my machine.  I also threw in my patch for a run. 
 It doesn't conflict with TN's patch!  (Nice!)


{noformat}
Trunk
-
TINY DATA new byte[12]
encode 18.0 MB/sdecode 25.0 MB/s
encode 18.0 MB/sdecode 24.0 MB/s

MEDIUM DATA new byte[1234]
encode 137.0 MB/sdecode 186.0 MB/s
encode 137.0 MB/sdecode 182.0 MB/s
{noformat}



{noformat}
TN Patch
-
TINY DATA new byte[12]
encode 159.0 MB/sdecode 139.0 MB/s
encode 147.0 MB/sdecode 140.0 MB/s

MEDIUM DATA new byte[1234]
encode 311.0 MB/sdecode 212.0 MB/s
encode 299.0 MB/sdecode 221.0 MB/s
{noformat}


{noformat}
TN Patch + ApacheModifiedMiGBase64 Patch
-
TINY DATA new byte[12]
encode 275.0 MB/sdecode 178.0 MB/s
encode 279.0 MB/sdecode 178.0 MB/s

MEDIUM DATA new byte[1234]
encode 553.0 MB/sdecode 261.0 MB/s
encode 558.0 MB/sdecode 263.0 MB/s
{noformat}


I find it kind of weird that my patch consistently runs about 8% faster than 
MiGBase64 on encode.  Doesn't make sense.   I do worse on decode because I 
handle the AA==AA== situation (padding in the middle), and normally MiGBase64 
doesn't check for that.


And here's an interesting number to keep in mind.  We're all somewhat close to 
this (off by a factor of 5 or 10).  Surely this is a reasonable (and 
unattainable) upper bound:

{noformat}
Just counting bytes 1-by-1...
encode 2570.0 MB/sdecode 2328.0 MB/s
encode 2243.0 MB/sdecode 2243.0 MB/s
{noformat}




 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 1.8

 Attachments: base64bench.zip, CODEC-166.patch, CODEC-166_speed.patch


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CODEC-166) Base64 could be faster

2013-01-30 Thread Julius Davies (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567068#comment-13567068
 ] 

Julius Davies commented on CODEC-166:
-

This could also help conform to user expectations.  If we write pure-static 
implementations just for these cases, that would also solve CODEC-165.

 Base64 could be faster
 --

 Key: CODEC-166
 URL: https://issues.apache.org/jira/browse/CODEC-166
 Project: Commons Codec
  Issue Type: Bug
Affects Versions: 1.7
Reporter: Julius Davies
Assignee: Julius Davies
 Fix For: 1.8


 Our Base64 consistently performs 3 times slower compared to MiGBase64 and 
 iHarder in the byte[] and String encode() methods.
 We are pretty good on decode(), though a little slower (approx. 33% slower) 
 than MiGBase64.
 We always win in the Streaming methods (MiGBase64 doesn't do streaming).  
 Yay!  :-) :-) :-)
 I put together a benchmark.  Here's a typical run:
 {noformat}
   LARGE DATA new byte[12345]
 iHarder...
 encode 486.0 MB/sdecode 158.0 MB/s
 encode 491.0 MB/sdecode 148.0 MB/s
 MiGBase64...
 encode 499.0 MB/sdecode 222.0 MB/s
 encode 493.0 MB/sdecode 226.0 MB/s
 Apache Commons Codec...
 encode 142.0 MB/sdecode 146.0 MB/s
 encode 138.0 MB/sdecode 150.0 MB/s
 {noformat}
 I believe the main approach we can consider to improve performance is to 
 avoid array copies at all costs.   MiGBase64 even counts the number of valid 
 Base64 characters ahead of time on decode() to precalculate the result's size 
 and avoid any array copying!
 I suspect this will mean writing out separate execution paths for the String 
 and byte[] methods, and keeping them out of the streaming logic, since the 
 streaming logic is founded on array copy.
 Unfortunately this means we will diminish internal reuse of the streaming 
 implementation, but I think it's the only way to improve performance, if we 
 want to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira