Eric,

See reply embedded

On 10/1/07, Eric Blossom <[EMAIL PROTECTED]> wrote:
>
> On Mon, Oct 01, 2007 at 06:07:51PM -0700, Tim Meehan wrote:
> > Eric,
> >
> >
> > The QA code (qa_gr_fir_ccf.cc) forces a 16 byte alignment.  When the
> > malloc16Allign is replaced with a regular malloc in the QA code, make
> check
> > fails.
> >
> > I believe that there is an additional requirement that the data passed
> to
> > the low-level SSE code have the real sample start on the 0th or 2nd 4
> byte
> > float.  For example the R / C represents 4 byte floats (Real, Complex) ,
> 0
> > represents "forced alignment" from gr_fir_ccf_simd.cc
> > RCRC...  OK
> > 00RC...  OK
> > 0RCR...  Not OK
>
> Hmmm.  Does it ever use the 0RCR case?  I would expect only the first
> two.  It may be reusing the fff simd code which generates all 4
> alignments for the taps, but I wouldn't expect to see the 0RCR or 000R
> input cases.


Yes I do see the 0RCR or 0000R case.  For example when I change the QA code
to use stack allocation for the input  (uncommenting a piece of code that
was originally there, lines 110 and 111 in the QA code from trunk) the check
will fail.
Input is at address 0xbcd87d4   this gets 16-byte aligned to address
0xbfcd87d0
This illustrates the 0RCR case.





> Q: Is my assumption of the additional requirement correct?
> >
> > Q: I don't think it will be easy to force the additional requirement
> with
> > the same trick used in gr_fir_ccf_simd.cc; do you agree?
>
> I don't see that this as an additional constraint.
> gr_complex == std::complex<float> is always laid out (<real>,<imag>).
> sizeof(gr_complex) == 8, so with 16-byte alignment, we still always
> have good alignment.  Are you seeing a case where the input has the
> real on a mod 8 == 4 boundary instead of a mod 8 == 0 boundary?



yes, see example below.

If so, (1) where's the input data coming from, (2) what version of the
> compiler are you using?



1)
In the example above the data was allocated on the stack from the qa code
with
*i_type      input[INPUT_LEN];    //(i_type is gr_complex)*

which will case the QA code fail
instead of

i_type       *input = (i_type *)malloc16Align(INPUT_LEN * *sizeof*(i_type));

which is in the QA code, and will make it pass.

2)
I am using three different compiles / versions of gcc on two different
machines getting the same results

gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
gcc (GCC) 4.2.1 (Debian 4.2.1-3)
gcc (GCC) 4.1.2.20070502 (Red Hat 4.1.2-12)

However, back to your first point, if we are using the 0RCR case, then
> the code is completely wrong, and I don't see how it could ever pass
> the QA tests (which it seem to).  On the other hand, there could be
> some problem with how the float taps are mapped across the complex
> input  (It's been along time since I looked at the code...)


The QA tests are passing because they force the 16-byte alignment.


Thanks for looking at this!
>
> Eric
>
> > Tim
> >
> > >
> > >
> > > Yes, it does get called at "make check" time.
> > >
> > > FWIW, it's run by way of gnuradio-core/src/tests/test_all
> > >
> > > It's possible that there's an alignment requirement that's not being
> > > honored at runtime.  The low-level SSE code (fcomplex_dotprod_sse64.S)
> > > requires that its input and taps be 16-byte aligned.  gr_fir_ccf_simd
> > > allocates 16-byte aligned buffers for the relevant buffers, so it
> > > should be working OK.   Perhaps one of you seeing the problem could
> > > add an assert or two to confirm that the alignment is correct.
> > >
> > > Eric
>
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to