Hi,

Romain Beauxis wrote
> Hi!
>
> I think I was hit by a bug trigered by certain frame size ans sse.

That's probably access to non-aligned memory by SSE operations.

> Attached is a minimal patch to that reproduces the segfault.
> The backtrace is:
> #0  scale_uint8_x_4_y_bicubic_sse2 (ctx=0x601290, scanline=<value
> optimized out>, dest_start=0x60c210 "") at scale_y.h:110
> #1  0x00007ffff7697e31 in gavl_video_scale_context_scale (ctx=0x601290,
> src=<value optimized out>, dst=0x6026d0) at scale_context.c:1482
> #2  0x00007ffff7694e6f in gavl_video_scaler_scale (s=0x6011f0,
> src=0x7fffffffe530, dst=<value optimized out>) at scale.c:655
> #3  0x00007ffff76a5b56 in gavl_video_convert (cnv=<value optimized out>,
> input_frame=0x15e, output_frame=0x60c210) at video.c:517
> #4  0x0000000000400869 in main ()
>
> The segfault does not occur when not compiled with SSE.
> It does not occur either when the input frame is 340x340
> or below (I think) or 400x400 and above (I think).
>
> I can reproduce perfectly here in an intel N450. I have not
> had the opportunity to try this code on other archs though..

Ok, I checked test.c and found:

> memset(&in_f,0,sizeof(gavl_video_frame_t));
> in_f.planes[0] = malloc(350*350*4);

Why not use gavl_video_frame_create() ?

> memset(in_f.planes[0],0,350*350*4);
> in_f.strides[0] = 350;

That's wrong, it's 350*4 for GAVL_RGBA_32. This alone would
probably fix the alignment already.

Ok the problem is the following: A proper fix would be to do several
checks (== additional branches) within the processing loop, and this
is something I would like to avoid, since it slows things down
(i.e. messes up branch prediction).
And given the fact that simply using the recommended gavl function will
fix the problem, my motivation to change the code is even lower.

So the only sensible thing to do IMO is to update the documentation
accordingly. In particular:

- *always* use gavl_video_frame_[create|destroy]()

- When using application supplied memory, pass NULL as format
  for gavl_video_frame_create() and zero the pointers before
  gavl_video_frame_destroy().

- When using application supplied memory, make sure that all scanlines are
  aligned at 16 byte boundaries. If this isn't possible, use a temporary
  frame and gavl_video_frame_copy(). gavl_video_frame_copy() is quite
  well optimized and doesn't require alignment (better to fix the
  application though)

- If non-aligned frames are passed to gavl, it might crash without
  warning (and the crash is *very* hard to trace with valgrind or gdb)

- It does so because we don't sacrifice speed for protection against
  mis-usage

If someone wants to make a Doxygen patch, or has a different solution or
suggestion, please speak up.

Burkhard


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Gmerlin-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gmerlin-general

Reply via email to