Siarhei Siamashka <siarhei.siamas...@gmail.com> writes: >> In the case of 2-pass scaling, typically all the pixels from the >> source image are participating in calculations (unless M >= 2 * N, >> which is 2x+ downscaling). So the number of horizontal interpolation >> operations is M * N (each source scanline gets horizontally scaled >> to size N and cached). The number of vertical interpolation >> operations is N * N (we are doing it for every destination pixel >> on the second pass). So the total number of interpolation operations >> is "M * N + N * N" for 2-pass bilinear scaling. >> >> Now we only need to compare "N * N + N * N" with "M * N + N * N". > > In fact the comparison turns into "3 * N * N" vs. "(M + N) * N" > operations, which is clearly more favourable for 2-pass processing > in plain C for a wide range of scaling factors.
There may be some confusion here regarding "1-pass" and "2-pass". The algorithm I'm advocating makes only one pass, but keeps a cache of two horizontally expanded scanlines. For big downscalings, this does at most 2 * N horizontal interpolations, since in the worst case, processing one destination scanline will require expanding two source scanlines. So the real comparison is something like 3 * N * N vs. MIN (2 * N, M) * N + N * N which is never worse than the 1-pass algorithm in terms of arithmetic, though I agree it has more overhead in other places. A true 2-pass algorithm that first expands _all_ scanlines horizontally would indeed always generate M * N horizontal interpolations, but I don't think such an approach is a good idea (at least not for a simple bilinear filter). >> > Single-pass bilinear is *not* a big loss for upscaling, because as >> > you noticed there is overhead of the intermediate values. It is >> > pretty much equivalent to stretching the image in one direction, and >> > then in the other. You need to store this intermediate stretched >> > image, which is bigger than the original. >> >> Increasing the scale factor for upscaling is going to eventually let >> the 2-pass algorithm outperform its single pass counterpart. There >> must be a crossover point somewhere, and it can be used to make a >> decision which algorithm to select for better performance in each >> particular case. > > Still the underlying instruction set does matter a lot. We can't simply > ignore the existence of wide SIMD instruction sets. There is no reason the horizontal interpolations couldn't be done with wide SIMD as well. You would keep two fixed point coordinates around in a register and then compute the left/right weights based on them in a format suitable for pmaddubsw. (The issue with 128/0 weights is actually not as bad as I thought. When the weights are 128 and 0, pmaddubsw will treat 128 as -128 and compute -128 * l + 0 * r which produces a negative number. Since this is the *only* negative number that can be produced, and what we really want is a multiplication with 128, a simple pabsw (also available in SSSE3) will fix up the result). Søren _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman