[Freerdp-devel] Update on SSE2 for RemoteFX

S. Erisman Tue, 14 Jun 2011 23:50:41 -0700

I finished adding SSE2 optimizations for the Inverse DWT decodingroutines this evening.


Here are the current performance numbers from my Atom D510 test system:


Without SSE:
                                             |-----------------------|
                PROFILER                     |    elapsed seconds    |
|--------------------------------------------|-----------------------|
| code section                  | iterations |     total |      avg. |
|-------------------------------|------------|-----------|-----------|
| rfx_decode_rgb                |      57385 | 54.530000 |  0.000950 |
| rfx_decode_component          |     172155 | 42.120000 |  0.000245 |
| rfx_rlgr_decode               |     172155 | 10.560000 |  0.000061 |
| rfx_differential_decode       |     172155 |  0.240000 |  0.000001 |
| rfx_quantization_decode       |     172155 |  3.980000 |  0.000023 |
| rfx_dwt_2d_decode             |     172155 | 26.250000 |  0.000152 |
| rfx_decode_YCbCr_to_RGB       |      57385 | 10.260000 |  0.000179 |
|--------------------------------------------------------------------|

With SSE:
                                             |-----------------------|
                PROFILER                     |    elapsed seconds    |
|--------------------------------------------|-----------------------|
| code section                  | iterations |     total |      avg. |
|-------------------------------|------------|-----------|-----------|
| rfx_decode_rgb                |      47871 | 20.000000 |  0.000418 |
| rfx_decode_component          |     143613 | 17.010000 |  0.000118 |
| rfx_rlgr_decode               |     143613 | 12.230000 |  0.000085 |
| rfx_differential_decode       |     143613 |  0.150000 |  0.000001 |
| rfx_quantization_decode_SSE2  |     143613 |  0.730000 |  0.000005 |
| rfx_dwt_2d_decode_SSE2        |     143613 |  3.060000 |  0.000021 |
| rfx_decode_YCbCr_to_RGB_SSE2  |      47871 |  1.020000 |  0.000021 |
|--------------------------------------------------------------------|

As you can see, we are currently getting a little more than 100%performance gain by using SSE. It is noticeably faster and moreresponsive as well. Looking at just the SSE vs. non-SSE methods we aregetting > 500% improvement.

Running the numbers through a calculation (accounting for some of thesemethods being called more than others) gives this break-down:

61.00%  rlgr
0.72%   diff
3.59%   quant (sse)
15.07%  dwt (sse)
5.02%   ycbcr (sse)
14.59%  other

So, the one large remaining non-SSE method (rfx_rlgr_decode) isaccounting for about 61% (85*3 / 418) of the total RemoteFX processingtime currently. This method might be hard to optimized using SSE,however, as it appears to be more stream/logic based thanloop/calculation based. It is definitely worth taking a further lookat, however, to see if there are other optimizations that can be made.

It might also be worth taking a look at the 'other' category. I assumethis includes the final assembly of the RGB data into it's outputformat. This might be able to be optimized using SSE still.

FYI... I probably won't be able to push updates quite as fast over thenext 2 weeks, as we are at the end of a large project at work that isrequiring extra effort to get across the finish line. I would stilllike to see if there is any more performance we can get out of this codethough. If someone on the list has SSE optimization experience, I wouldlove a code review... particularly around order of operations and cacheusage. We might be able to get another couple % improvement with somevery minor changes.

Lastly... I should get my new AMD Zacate based board tomorrow. Over thenext couple of weeks, I want to take a stab at an alternate OpenCLaccelerated version of this RemoteFX code as well. Any other interestor experience in this type of acceleration?


Thanks,
 Steve

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev

_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel

[Freerdp-devel] Update on SSE2 for RemoteFX

Reply via email to