Author: damjan
Date: Thu Oct 16 04:49:30 2014
New Revision: 1632210
URL: http://svn.apache.org/r1632210
Log:
Format some comments better.
Modified:
commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java
Modified:
commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java
URL:
http://svn.apache.org/viewvc/commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java?rev=1632210&r1=1632209&r2=1632210&view=diff
==============================================================================
---
commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java
(original)
+++
commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java
Thu Oct 16 04:49:30 2014
@@ -22,10 +22,13 @@ final class Dct {
* Here's the cost, exluding modified (de)quantization, for transforming an
* 8x8 block:
*
- * Algorithm Adds Multiplies RightShifts Total Naive 896 1024 0 1920
- * "Symmetries" 448 224 0 672 Vetterli and 464 208 0 672 Ligtenberg Arai,
- * Agui and 464 80 0 544 Nakajima (AA&N) Feig 8x8 462 54 6 522 Fused
mul/add
- * 416 (a pipe dream)
+ * Algorithm Adds Multiplies RightShifts Total
+ * Naive 896 1024 0 1920
+ * "Symmetries" 448 224 0 672
+ * Vetterli and Ligtenberg 464 208 0 672
+ * Arai, Agui and Nakajima (AA&N) 464 80 0 544
+ * Feig 8x8 462 54 6 522
+ * Fused mul/add (a pipe dream) 416
*
* IJG's libjpeg, FFmpeg, and a number of others use AA&N.
*
@@ -33,21 +36,25 @@ final class Dct {
* are reduced from 80 in AA&N to only 54. But in practice:
*
* Benchmarks, Intel Core i3 @ 2.93 GHz in long mode, 4 GB RAM Time taken
to
- * do 100 million IDCTs (less is better): Rene' Stöckel's Feig, int: 45.07
- * seconds My Feig, floating point: 36.252 seconds AA&N, unrolled loops,
- * double[][] -> double[][]: 25.167 seconds
+ * do 100 million IDCTs (less is better):
+ * Rene' Stöckel's Feig, int: 45.07 seconds
+ * My Feig, floating point: 36.252 seconds
+ * AA&N, unrolled loops, double[][] -> double[][]: 25.167 seconds
*
* Clearly Feig is hopeless. I suspect the performance killer is simply the
* weight of the algorithm: massive number of local variables, large code
* size, and lots of random array accesses.
*
- * Also, AA&N can be optimized a lot: AA&N, rolled loops, double[][] ->
- * double[][]: 21.162 seconds AA&N, rolled loops, float[][] -> float[][]:
no
- * improvement, but at some stage Hotspot might start doing SIMD, so let's
+ * Also, AA&N can be optimized a lot:
+ * AA&N, rolled loops, double[][] -> double[][]: 21.162 seconds
+ * AA&N, rolled loops, float[][] -> float[][]: no improvement,
+ * but at some stage Hotspot might start doing SIMD, so let's
* use float AA&N, rolled loops, float[] -> float[][]: 19.979 seconds
- * apparently 2D arrays are slow! AA&N, rolled loops, inlined 1D AA&N
- * transform, float[] transformed in-place: 18.5 seconds AA&N, previous
- * version rewritten in C and compiled with "gcc -O3" takes: 8.5 seconds
+ * apparently 2D arrays are slow!
+ * AA&N, rolled loops, inlined 1D AA&N
+ * transform, float[] transformed in-place: 18.5 seconds
+ * AA&N, previous version rewritten in C and compiled with "gcc -O3"
+ * takes: 8.5 seconds
* (probably due to heavy use of SIMD)
*
* Other brave attempts: AA&N, best float version converted to 16:16 fixed