On 10/01/2012 07:40 PM, Riccardo Magliocchetti wrote:
Il 01/10/2012 18:11, Michael Meeks ha scritto:
On Fri, 2012-09-28 at 20:26 +0200, Riccardo Magliocchetti wrote:
Here you go, you'll find the valgrind output sans all the zlib / python
related errors. Thanks a lot!

update to git 7a3a8a254363801bf6893b23d67f86a7461f8f3b and more or less
same valgrind errors. See attached trace.

    It looks really interesting :-) it's hard to tell (from valgrind)
where
the threads were pre-empted [ actually logging that might be quite a
nice feature to add to for valgrind ;-], but there is already the
SolarMutex guard to try to stop this from happening.

    I append the odd trace with some thoughts.

    It is -possible- that we never get around to taking the solar
mutex in
the first instance in 'main' - such that by the time we come to release
and re-take it in vcl/headless/svpinst.cxx:

void SvpSalInstance::Yield( bool bWait, bool bHandleAllCurrentEvents )

    the count is zero - could you add some fprintf debugging to check
that
- just dump nAcquireCount. If that is the case this patch might help:

--- a/vcl/headless/svpinst.cxx
+++ b/vcl/headless/svpinst.cxx
@@ -171,6 +171,9 @@ bool SvpSalInstance::CheckTimeout( bool
bExecuteTimers )
                  // timed out, update timeout
                  m_aTimeout = aTimeOfDay;
                  m_aTimeout += m_nTimeoutMS;
+
+                osl::SolarGuard aGuard( mpSalYieldMutex );
+
                  // notify
                  ImplSVData* pSVData = ImplGetSVData();
                  if( pSVData->mpSalTimer )

    If it does - it's not the real fix ;-) but it'd be good to know.

    Thanks for posting the trace - rather interesting :-)

nAcquireCount is always 0 and the patch above let me run two unoconv
testsuite runs in a row without valgrind logs.

It indeed looks like Application::Yield shall be routinely called with the solar mutex unlocked, and that timer callbacks called from within Yield shall be called with the solar mutex locked, so e.g. sal_gtk_timeout_dispatch (vcl/unx/gtk/app/gtkdata.cxx; when using the GTK VCL plug-in) explicitly locks the solar mutex before calling the timer callback in a stack like

#7  0x00007ffff32e940e in Timer::ImplTimerCallbackProc () at 
lo/core/vcl/source/app/timer.cxx:133
#8  0x00007fffe64e370b in SalTimer::CallCallback (this=0x11001d0) at 
lo/core/vcl/inc/saltimer.hxx:57
#9  0x00007fffe64e2d54 in sal_gtk_timeout_dispatch (pSource=0x12be2a0) at 
lo/core/vcl/unx/gtk/app/gtkdata.cxx:844
#10 0x0000003d8d647825 in g_main_dispatch (context=0x6597b0) at gmain.c:2539
#11 g_main_context_dispatch (context=context@entry=0x6597b0) at gmain.c:3075
#12 0x0000003d8d647b58 in g_main_context_iterate (context=context@entry=0x6597b0, 
block=block@entry=0, dispatch=dispatch@entry=1, self=<optimized out>) at 
gmain.c:3146
#13 0x0000003d8d647c14 in g_main_context_iteration (context=0x6597b0, 
may_block=0) at gmain.c:3207
#14 0x00007fffe64e207a in GtkData::Yield (this=0x6113a0, bWait=true, 
bHandleAllCurrentEvents=false) at lo/core/vcl/unx/gtk/app/gtkdata.cxx:582
#15 0x00007fffe64e5c38 in GtkInstance::Yield (this=0x611830, bWait=true, 
bHandleAllCurrentEvents=false) at lo/core/vcl/unx/gtk/app/gtkinst.cxx:563
#16 0x00007ffff32debcc in ImplYield (i_bWait=true, i_bAllEvents=false) at 
lo/core/vcl/source/app/svapp.cxx:425
#17 0x00007ffff32dafb7 in Application::Yield (i_bAllEvents=false) at 
lo/core/vcl/source/app/svapp.cxx:459
#18 0x00007ffff32daf58 in Application::Execute () at 
lo/core/vcl/source/app/svapp.cxx:404
#19 0x00007ffff7a8cda4 in desktop::Desktop::Main (this=0x7fffffffe430) at 
lo/core/desktop/source/app/app.cxx:1661
#20 0x00007ffff32e644b in ImplSVMain () at lo/core/vcl/source/app/svmain.cxx:162
#21 0x00007ffff32e6578 in SVMain () at lo/core/vcl/source/app/svmain.cxx:199
#22 0x00007ffff7acddc7 in soffice_main () at 
lo/core/desktop/source/app/sofficemain.cxx:74
#23 0x0000000000400948 in sal_main () at lo/core/desktop/source/app/main.c:48
#24 0x0000000000400929 in main (argc=1, argv=0x7fffffffe618) at 
lo/core/desktop/source/app/main.c:47

That means that <http://cgit.freedesktop.org/libreoffice/core/commit/?id=002f8b272bc343477c649c648c5ceb2ccf0fba89> "vcl/headless: [HACK] workaround solarMutexx splat with headless mode" looks like it is the right fix indeed (rather than a hack).

Now, the interesting thing is that the Linux-specific CreateSalInstance (called from ImplSVMain -> InitVCL before calling the application's Main in ImplSVMain) explicitly locks the solar mutex thanks to <http://cgit.freedesktop.org/libreoffice/core/commit/?id=852574f46f686a936a1b267e5780ca17d0f0d5ab> "INTEGRATION: CWS ause0c2 (1.3.18); FILE MERGED: 2004/03/12 15:25:43 hjs 1.3.18.1: #115868# anti freeze," while all the other variants of CreateSalInstance apparently do not (incl. the one in vcl/headless/headlessinst.cxx that is relevant when configured --enable-headless, as is the case for this mail thread). That means that all of desktop::Desktop::Main -> Application::Execute -> Application::Yield runs with the solar mutex locked on Linux (sans --enable-headless configuration) and with the solar mutex unlocked everywhere else.

The commit message for 852574f46f686a936a1b267e5780ca17d0f0d5ab is unfortunately rather unhelpful, but temporarily reverting it and doing "make check" on a Linux box starts to produce crashes within desktop::Desktop::Main -> Application::Execute -> Application::Yield -> ..., in code that apparently expects the solar mutex to be locked.

Stephan

==28574== Thread 1:
==28574== Invalid read of size 8
==28574==    at 0x739C702: SfxStateCache::SetState_Impl(unsigned
short, SfxPoolItem const*, unsigned char) (statcach.cxx:454)
==28574==    by 0x738042E:
SfxBindings::UpdateControllers_Impl(SfxInterface const*,
SfxFoundCache_Impl const*, SfxPoolItem const*, unsigned short)
(bindings.cxx:1500)
==28574==    by 0x7383C5B: SfxBindings::Update_Impl(SfxStateCache*)
(bindings.cxx:437)
==28574==    by 0x7383FD7: SfxBindings::NextJob_Impl(Timer*)
(bindings.cxx:1646)
==28574==    by 0x92BF4AD: Timer::ImplTimerCallbackProc() (timer.cxx:142)

    The YieldMutex -should- be held at this point excluding this other
thread (?).

==28574==    by 0x95D152D: SvpSalInstance::CheckTimeout(bool)
(saltimer.hxx:57)
==28574==    by 0x95D17D7: SvpSalInstance::Yield(bool, bool)
(svpinst.cxx:288)
==28574==    by 0x92B4530: Application::Yield(bool) (svapp.cxx:434)
==28574==    by 0x92B45E6: Application::Execute() (svapp.cxx:413)
==28574==    by 0x50AED58: desktop::Desktop::Main() (app.cxx:1711)
==28574==    by 0x92BDFB8: ImplSVMain() (svmain.cxx:173)
==28574==    by 0x92BE8E4: SVMain() (svmain.cxx:210)
==28574==    by 0x50E1CF4: soffice_main (sofficemain.cxx:83)
==28574==    by 0x40069A: main (main.c:25)

==28574==  Address 0x1466bdc8 is 56 bytes inside a block of size 72
free'd
==28574==    at 0x4C27FF2: operator delete(void*)
(vg_replace_malloc.c:387)
==28574==    by 0x737F479: SfxBindings::DeleteControllers_Impl()
(bindings.cxx:325)
==28574==    by 0x7380572: SfxBindings::~SfxBindings() (bindings.cxx:264)
==28574==    by 0x7380718: SfxBindings::~SfxBindings() (bindings.cxx:275)
==28574==    by 0x758B91E: SfxFrame::DoClose_Impl() (frame.cxx:178)
==28574==    by 0x75AAFC1: SfxBaseController::dispose()
(sfxbasecontroller.cxx:1041)

sfx2/source/view/sfxbasecontroller.cxx:

    This method has a live:

            SolarMutexGuard aGuard;

==28574==    by 0x16B8E785:
framework::Frame::setComponent(com::sun::star::uno::Reference<com::sun::star::awt::XWindow>
const&,
com::sun::star::uno::Reference<com::sun::star::frame::XController>
const&) (frame.cxx:1380)
==28574==    by 0x16B8E1CF: framework::Frame::close(unsigned char)
(frame.cxx:1633)
==28574==    by 0x758BE87: SfxFrame::DoClose() (frame.cxx:140)
==28574==    by 0x797F51C: SfxBroadcaster::Broadcast(SfxHint const&)
(brdcst.cxx:49)
==28574==    by 0x7511211:
SfxModelListener_Impl::notifyClosing(com::sun::star::lang::EventObject
const&) (objxtor.cxx:173)
==28574==    by 0x7534972: SfxBaseModel::close(unsigned char)
(sfxbasemodel.cxx:1491)
==28574==    by 0x1E776909: SwXTextDocument::close(unsigned char)
(unotxdoc.cxx:578)
==28574==    by 0x7537FB0: SfxBaseModel::dispose() (sfxbasemodel.cxx:788)
==28574==    by 0x1211D6E7: gcc3::callVirtualMethod(void*, unsigned
int, void*, _typelib_TypeDescriptionReference*, bool, unsigned long*,
unsigned int, unsigned long*, unsigned int, double*, unsigned int)
(callvirtualmethod.cxx:128)
==28574==    by 0x12120FEE:
cpp_call(bridges::cpp_uno::shared::UnoInterfaceProxy*,
bridges::cpp_uno::shared::VtableSlot,
_typelib_TypeDescriptionReference*, int, _typelib_MethodParameter*,
void*, void**, _uno_Any**) (uno2cpp.cxx:246)
==28574==    by 0x12121ACD:
bridges::cpp_uno::shared::unoInterfaceProxyDispatch(_uno_Interface*,
_typelib_TypeDescription const*, void*, void**, _uno_Any**)
(uno2cpp.cxx:440)
==28574==    by 0x15D5CE5E:
binaryurp::IncomingRequest::execute_throw(binaryurp::BinaryAny*,
std::vector<binaryurp::BinaryAny,
std::allocator<binaryurp::BinaryAny>  >*) const (incomingrequest.cxx:251)
==28574==    by 0x15D5D8AB: binaryurp::IncomingRequest::execute()
const (incomingrequest.cxx:80)
==28574==    by 0x15D61B4C: request (reader.cxx:89)
==28574==    by 0x663A506: cppu_threadpool::JobQueue::enter(long,
unsigned char) (jobqueue.cxx:115)
==28574==    by 0x663B00D: cppu_threadpool::ORequestThread::run()
(thread.cxx:179)
==28574==    by 0x663B4E9: threadFunc (thread.hxx:196)
==28574==    by 0x4E467AE: osl_thread_start_Impl (thread.c:261)
==28574==    by 0x5E53EFB: start_thread (pthread_create.c:304)
==28574==    by 0x5B8E89C: clone (clone.S:112)
_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to