Hi, > > For this one, did you try comparing Chris' patch to your on ARMv6? > > Not previously, but I just checked. These are the times for > t-firefox-chalkboard on ARMv6: > > Current HEAD: 35.5 s > Chris's patch: 9.7 s > My patch: 6.6 s > > They're not necessarily mutually exclusive, as mine won't accelerate > operations where the source and mask need to be padded differently. > > Since there haven't been any comments on the way in which I > implemented > fallback to general_composite_rect(), there's just the minor issue of > updating .gitignore before you do push it - I'll reissue this patch > with > that in.
First, sorry to be slow to respond to this. Part of the reason for that is that I don't really like this patch for a couple of reasons, but don't have a really convincing alternative proposal. At a high level, it is pretty clear that non-transformed images are sampled outside their bounds often enough that special-cased support is probably warranted. There is already support for NORMAL repeat, and your patch adds PAD, but images with NONE extension could also be optimized in various cases, either by clipping the border away or implementing it with a solid fill depending on the operator. But on the other hand, both the existing fast_path_composite_tiled() and especially your patch have much more complex behavior than other composite routines. I have some specific complaints below, but even if those were fixed, fast_composite_pad_repeat() would still be a rather complex function. It is tempting to do the support for these operations at the pixman_image_composite32() level by breaking them into smaller composites before entering the fast path system, but that has the significant drawback that it prevents the architecture specific fast paths from doing the whole operation by themselves. Here are some specific issues with this patch: - The flags flags of images should be computed with _pixman_image_validate(), not just overwritten. Also, I'm skeptical about simply reusing the solid function that was looked up. There is nothing in principle that prevents the flags from depending on the color in the image, and as such every time the pixel in the solid fill image changes, a new validate/lookup cycle should really be done. - The fallback to the general code that you have is not how the fast path system is intended to work. When a fast path has been chosen, it is supposed to carry out the operation by itself as fast as possible and not fall back to slower code. There could in principle (and at some point there may) be other fast paths in between this one and the general code. It may be better in this case to add a new flag that indicates that the properties required for the mask are present, or simply require the mask to be NULL. Either of these would sidestep the issue of falling back to general_composite_rectangle(). - You added a simple test program (thanks for that, that already puts you ahead of most people), but it only checks one single composite operation. This fast path is complex enough that I think the test should verify many more alignments of source and destination (and mask if support for that is kept), and more formats and operators. Also, the test should be added in a separate commit so that it is easy to verify that CRC values don't change. - Some minor issues: - Variable declarations should be at the top of blocks (as in C89, not C99). - This: + src_flags = (info->src_flags & ~FAST_PATH_NORMAL_REPEAT) | + FAST_PATH_SAMPLES_COVER_CLIP_NEAREST; seems like it should be ~FAST_PATH_PAD_REPEAT instead? But the biggest objection I have is that there is just way too much code here for an operation that is ultimately not that common. There are nine separate blocks of code that are almost exactly alike. Unfortunately, this is the objection that I can't don't have a very good alternate proposal for. The best I can come up with is something like the pseudo-code below, but I'm not sure how well it will actually work. The basic idea is to break the operation down into nine smaller composites, where the nine source images are subimages of the original source with repeat mode NORMAL. This takes advantage of two existing optimizations: the fast_path_tiled() for 1xn images and the solid fast paths for repeating 1x1 images. struct info_t { int pos; int composite_size; int src_pos; int src_size; }; info_t columns[] = { { dest_x, - src_x, 0, 1 }, { dest_x - src_x, src_width, 0, src_width }, { dest_x - src_x + src_width, width - src_width + src_x, src_width - 1, 1 }, }; info_t columns[] = { { dest_y, - src_y, 0, 1 }, { dest_y - src_y, src_height, 0, src_height }, { dest_y - src_y + src_height, height - src_height + src_y, src_height - 1, 1 }, }; for (i = 0; i < 3; ++i) { for (j = 0; j < 3; ++j) { info.x = columns[j].pos; info.y = rows[i].pos; info.width = columns[j].composite_size; info.height = rows[i].composite_size; create image from source bits at rectangle [ columns[j].src_pos, rows[i].src_pos, columns[j].src_size, rows[i].src_size, ] set NORMAL repeat on this image call _pixman_image_validate(); lookup composite function for image intersect info.{x,y,width,height} with composite rect; call composite function } } An issue here is that if the single pixels in the corners are opaque and the operator is OVER, we'd ideally want to do the 'strength reduction' from optimize_operator() in pixman.c to convert the OVER to SRC instead so that the operation becomes a solid fill (or even a noop) rather than an over_n_8888(). It is tempting here to just call pixman_image_composite32() recursively here instead of doing the composite lookup manually, but it wouldn't surprise me if that causes issues. Possibly optimize_operator() should just be exported instead. > > Also, did we ever find out whether it was > > a bug in Firefox. I'm still somewhat skeptical that it's intended > > for a > > PAD image to be accessed so far out of bounds. > > No idea, though I find it hard to imagine that the single-pixel-wide > images weren't deliberately being extended a long way, as they're not > very useful otherwise. Obviously they could use tile repeat for such > images and it would have the same result, though it would boil down to > src_8888_8888 calls rather than src_n_8888 and so I expect that would > be slower. But the image here was a 128 x 128 image with a big border. That seems much less likely to be deliberately PAD extended. In fact, looking at that image, it looks like a noise texture designed for NORMAL repeating. Soren _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman