Eric, See reply embedded
On 10/1/07, Eric Blossom <[EMAIL PROTECTED]> wrote: > > On Mon, Oct 01, 2007 at 06:07:51PM -0700, Tim Meehan wrote: > > Eric, > > > > > > The QA code (qa_gr_fir_ccf.cc) forces a 16 byte alignment. When the > > malloc16Allign is replaced with a regular malloc in the QA code, make > check > > fails. > > > > I believe that there is an additional requirement that the data passed > to > > the low-level SSE code have the real sample start on the 0th or 2nd 4 > byte > > float. For example the R / C represents 4 byte floats (Real, Complex) , > 0 > > represents "forced alignment" from gr_fir_ccf_simd.cc > > RCRC... OK > > 00RC... OK > > 0RCR... Not OK > > Hmmm. Does it ever use the 0RCR case? I would expect only the first > two. It may be reusing the fff simd code which generates all 4 > alignments for the taps, but I wouldn't expect to see the 0RCR or 000R > input cases. Yes I do see the 0RCR or 0000R case. For example when I change the QA code to use stack allocation for the input (uncommenting a piece of code that was originally there, lines 110 and 111 in the QA code from trunk) the check will fail. Input is at address 0xbcd87d4 this gets 16-byte aligned to address 0xbfcd87d0 This illustrates the 0RCR case. > Q: Is my assumption of the additional requirement correct? > > > > Q: I don't think it will be easy to force the additional requirement > with > > the same trick used in gr_fir_ccf_simd.cc; do you agree? > > I don't see that this as an additional constraint. > gr_complex == std::complex<float> is always laid out (<real>,<imag>). > sizeof(gr_complex) == 8, so with 16-byte alignment, we still always > have good alignment. Are you seeing a case where the input has the > real on a mod 8 == 4 boundary instead of a mod 8 == 0 boundary? yes, see example below. If so, (1) where's the input data coming from, (2) what version of the > compiler are you using? 1) In the example above the data was allocated on the stack from the qa code with *i_type input[INPUT_LEN]; //(i_type is gr_complex)* which will case the QA code fail instead of i_type *input = (i_type *)malloc16Align(INPUT_LEN * *sizeof*(i_type)); which is in the QA code, and will make it pass. 2) I am using three different compiles / versions of gcc on two different machines getting the same results gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) gcc (GCC) 4.2.1 (Debian 4.2.1-3) gcc (GCC) 4.1.2.20070502 (Red Hat 4.1.2-12) However, back to your first point, if we are using the 0RCR case, then > the code is completely wrong, and I don't see how it could ever pass > the QA tests (which it seem to). On the other hand, there could be > some problem with how the float taps are mapped across the complex > input (It's been along time since I looked at the code...) The QA tests are passing because they force the 16-byte alignment. Thanks for looking at this! > > Eric > > > Tim > > > > > > > > > > > Yes, it does get called at "make check" time. > > > > > > FWIW, it's run by way of gnuradio-core/src/tests/test_all > > > > > > It's possible that there's an alignment requirement that's not being > > > honored at runtime. The low-level SSE code (fcomplex_dotprod_sse64.S) > > > requires that its input and taps be 16-byte aligned. gr_fir_ccf_simd > > > allocates 16-byte aligned buffers for the relevant buffers, so it > > > should be working OK. Perhaps one of you seeing the problem could > > > add an assert or two to confirm that the alignment is correct. > > > > > > Eric >
_______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio