Marc,
I took your suggestions into account, revised my earlier patch, and
committed my changes to a new fork:
https://github.com/serisman/FreeRDP
... more comments below ...
On 6/7/2011 9:29 PM, Marc-André Moreau wrote:
> Hi Steve,
>
> Well, that was fast :) I had started thinking of the different ways we
> can integrate this SSE acceleration within the rest in a clean way. I
> see two major options:
>
> 1) The SSE code is part of the library, and can be disabled with a
> compile-time option. Methods for SSE and non-SSE have the same
> signature. At initialization, detection of SSE support level is
> performed, and a callback is registered to point to one of the methods
> (SSE or non-SSE).
This is basically what my original patch was doing (minus having the
compile-time option).
> 2) The SSE code is part of a loadable sub-module, just like a plugin.
> detection is done in the "main" code and SSE support functions are
> loaded and registered dynamically. The only issue here is that there
> is not much value to doing it this way, since SSE support level can be
> detected and used dynamically anyway, and SSE is only available on the
> intel architecture. On systems where SSE is not available at all, such
> as ARM, the compile-time option should discard it completely.
I agree with you that this seems like the wrong option.
> I think the part which needs to be abstracted is the "SIMD" system.
> The decoder should perform a check for available SIMD (SSE on intel,
> NEON on ARM) and use it if available, but when compiling for either
> intel or arm, only the relevant implementation should actually be
> compiled in. I think it would make sense to put the SSE code in
> separate files which are only included if FreeRDP is compiled for the
> architecture which makes sense for it. An analogy could be made with
> the way we currently deal with multiple cryptographic libraries. We
> have one file for abstracting each crypto library for our use, and at
> compile time only one of then is compiled, just like SSE would be
> compiled for intel only and NEON for arm only.
>
> What would you think of option 1), done in a similar way to the
> current cryptographic abstraction layer code?
I have most of this done in my new fork (see above). There is a
configure option (--with-sse2) that is enabled by default that controls
adding the sse2 .c/.h files and defining WITH_SSE2. The _RFX_CONTEXT
has a 'delegate' for the 1 method that has been optimized (so far). The
rfx_context_new method will set the delegate to the default non-sse
software routine, and then call rfx_sse2_init if WITH_SSE2 is defined.
the rfx_sse2_init method then will check whether the CPU actually
supports sse2 (future) and changes the delegate(s) to the sse optimized
version(s). Eventually we could probably have the --with-sse2 config
option smart enough to disable itself automatically based on which CPU
arch was being compiled. On the code side I separated all the SSE code
into files in a new 'sse2' directory under libfreerdp-rfx. What was
done with SSE could easily be duplicated for NEON for the ARM
architecture as well.
Is this a good enough structure to be able to continue adding more
optimizations?
Thanks,
Steve
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Freerdp-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freerdp-devel