[chromium-dev] Re: purecall exceptions and the manbearpig
On Fri, Apr 3, 2009 at 7:19 PM, Tommi to...@chromium.org wrote: Yes, that's one way of running into purecall. but, just in case my email is being misunderstood, now with italics! :) purecall is not called *when* an exception occurs. purecall actually *throws the exception - or exits the program* purecall is called when attempting to call a virtual method for which there is no implementation. purecall is the default virtual method if you will. Yes, that's the low level description of purecall, and no one is debating that. But it is also misleading, because, from a high level perspective, when you look at my code, you see that the developer actually did implement the virtual method explicitly, so, still from a high level perspective, it can also happen for a virtual method that does have an implementation if the object has been deleted prior to the call. [All that because when the derived object is deleted, one of the thing it does is to revert its vtable to the base class vtable. That part is not obvious/known to the high level developer] It's not because you implement all your virtual functions correctly that your objects wont purecall. But I'm sure you know that, I just wanted to make sure I'm not misunderstood either ;) Nicolas When you call _set_purecall_handler, you're giving _purecall a pointer to your function that purecall will delegate to. There's not an exception that triggers this. Calling purecall is just a regular function call. Here's CRT's implementation of __purecall: void __cdecl _purecall() { _purecall_handler purecall = (_purecall_handler) _decode_pointer(__pPurecall); if(purecall != NULL) { purecall(); /* shouldn't return, but if it does, we drop back to default behaviour */ } _NMSG_WRITE(_RT_PUREVIRT); /* do not write the abort message */ _set_abort_behavior(0, _WRITE_ABORT_MSG); abort(); } and here's the implementation of _set_purecall_handler: _purecall_handler _set_purecall_handler(_purecall_handler pNew) { _purecall_handler pOld = NULL; pOld = (_purecall_handler) _decode_pointer(__pPurecall); __pPurecall = (_purecall_handler) _encode_pointer(pNew); return pOld; } On Fri, Apr 3, 2009 at 8:42 PM, Nicolas Sylvain nsylv...@chromium.orgwrote: The code below shows that it's possible to throw a purecall exception by calling a function from a delete object. I suspect this is what is happening in our code. Nicolas class Derived; class Base { public: Base(Derived *derived): m_pDerived(derived) {}; ~Base() {}; // Needed, dont know why. virtual void function(void) = 0; void bleh(); Derived * m_pDerived; }; class Derived : public Base { public: Derived() : Base(this) {}; // C4355 virtual void function(void) {}; }; void Base::bleh() { m_pDerived - function(); } void purecall(void) { __debugbreak(); } #include windows.h int _tmain(int argc, _TCHAR* argv[]) { _set_purecall_handler(purecall); Base* base = NULL; { Derived myDerived; myDerived.function(); base = myDerived; } base-bleh(); } On Fri, Apr 3, 2009 at 2:17 PM, Tommi to...@chromium.org wrote: purecall isn't called when an exception occurs. purecall actually throws the exception - or exits the program (by default the crt throws up a dialog and then abort()s). in addition to cpu's email, raymond chen's article is a good (and short) read :) http://blogs.msdn.com/oldnewthing/archive/2004/04/28/122037.aspx On Fri, Apr 3, 2009 at 3:15 PM, Huan Ren hu...@google.com wrote: Based on what I saw in the bug, it looks like an exception happening during CALL instruction may lead to PureCall(). For example, an object obj has been freed and later on someone calls obj-func(). Then the assembly code looks like this: // ecx: pointer to obj which is in memory // [ecx]: supposed to be pointer to vtable, it has invalid value since obj is freed // edx: now has pointer to vtable, which is invalid mov edx,dword ptr [ecx] // deref the vtable and make the call call dword ptr [edx+4] When a (hardware) exception happens during the call instruction, the control will be eventually transfered to the routine handling this type of exception which I *think* is PureCall(). Huan On Fri, Apr 3, 2009 at 11:26 AM, Ricardo Vargas rvar...@chromium.org wrote: I certainly don't want to imply that it is the case with this particular bug, but I have seen crashes when the cause of the problem is using an object that was previously deleted (and only end up with this exception when all the planets are properly aligned). I guess that it depends on the actual class hierarchy of the objects in question, but I'd think that simple examples end up on a lot of crashes right after the cl that exposes the problem. On Fri, Apr 3, 2009 at 12:52 AM, Dean McNamee de...@chromium.org wrote: You
[chromium-dev] Re: purecall exceptions and the manbearpig
hehe, understood! :) On Sat, Apr 4, 2009 at 12:43 PM, Nicolas Sylvain nsylv...@chromium.orgwrote: On Fri, Apr 3, 2009 at 7:19 PM, Tommi to...@chromium.org wrote: Yes, that's one way of running into purecall. but, just in case my email is being misunderstood, now with italics! :) purecall is not called *when* an exception occurs. purecall actually *throws the exception - or exits the program* purecall is called when attempting to call a virtual method for which there is no implementation. purecall is the default virtual method if you will. Yes, that's the low level description of purecall, and no one is debating that. But it is also misleading, because, from a high level perspective, when you look at my code, you see that the developer actually did implement the virtual method explicitly, so, still from a high level perspective, it can also happen for a virtual method that does have an implementation if the object has been deleted prior to the call. [All that because when the derived object is deleted, one of the thing it does is to revert its vtable to the base class vtable. That part is not obvious/known to the high level developer] It's not because you implement all your virtual functions correctly that your objects wont purecall. But I'm sure you know that, I just wanted to make sure I'm not misunderstood either ;) Nicolas When you call _set_purecall_handler, you're giving _purecall a pointer to your function that purecall will delegate to. There's not an exception that triggers this. Calling purecall is just a regular function call. Here's CRT's implementation of __purecall: void __cdecl _purecall() { _purecall_handler purecall = (_purecall_handler) _decode_pointer(__pPurecall); if(purecall != NULL) { purecall(); /* shouldn't return, but if it does, we drop back to default behaviour */ } _NMSG_WRITE(_RT_PUREVIRT); /* do not write the abort message */ _set_abort_behavior(0, _WRITE_ABORT_MSG); abort(); } and here's the implementation of _set_purecall_handler: _purecall_handler _set_purecall_handler(_purecall_handler pNew) { _purecall_handler pOld = NULL; pOld = (_purecall_handler) _decode_pointer(__pPurecall); __pPurecall = (_purecall_handler) _encode_pointer(pNew); return pOld; } On Fri, Apr 3, 2009 at 8:42 PM, Nicolas Sylvain nsylv...@chromium.orgwrote: The code below shows that it's possible to throw a purecall exception by calling a function from a delete object. I suspect this is what is happening in our code. Nicolas class Derived; class Base { public: Base(Derived *derived): m_pDerived(derived) {}; ~Base() {}; // Needed, dont know why. virtual void function(void) = 0; void bleh(); Derived * m_pDerived; }; class Derived : public Base { public: Derived() : Base(this) {}; // C4355 virtual void function(void) {}; }; void Base::bleh() { m_pDerived - function(); } void purecall(void) { __debugbreak(); } #include windows.h int _tmain(int argc, _TCHAR* argv[]) { _set_purecall_handler(purecall); Base* base = NULL; { Derived myDerived; myDerived.function(); base = myDerived; } base-bleh(); } On Fri, Apr 3, 2009 at 2:17 PM, Tommi to...@chromium.org wrote: purecall isn't called when an exception occurs. purecall actually throws the exception - or exits the program (by default the crt throws up a dialog and then abort()s). in addition to cpu's email, raymond chen's article is a good (and short) read :) http://blogs.msdn.com/oldnewthing/archive/2004/04/28/122037.aspx On Fri, Apr 3, 2009 at 3:15 PM, Huan Ren hu...@google.com wrote: Based on what I saw in the bug, it looks like an exception happening during CALL instruction may lead to PureCall(). For example, an object obj has been freed and later on someone calls obj-func(). Then the assembly code looks like this: // ecx: pointer to obj which is in memory // [ecx]: supposed to be pointer to vtable, it has invalid value since obj is freed // edx: now has pointer to vtable, which is invalid mov edx,dword ptr [ecx] // deref the vtable and make the call call dword ptr [edx+4] When a (hardware) exception happens during the call instruction, the control will be eventually transfered to the routine handling this type of exception which I *think* is PureCall(). Huan On Fri, Apr 3, 2009 at 11:26 AM, Ricardo Vargas rvar...@chromium.org wrote: I certainly don't want to imply that it is the case with this particular bug, but I have seen crashes when the cause of the problem is using an object that was previously deleted (and only end up with this exception when all the planets are properly aligned). I guess that it depends on the actual class hierarchy of the objects in question, but I'd think that simple examples end up on a lot of crashes right after the
[chromium-dev] Re: purecall exceptions and the manbearpig
I certainly don't want to imply that it is the case with this particular bug, but I have seen crashes when the cause of the problem is using an object that was previously deleted (and only end up with this exception when all the planets are properly aligned). I guess that it depends on the actual class hierarchy of the objects in question, but I'd think that simple examples end up on a lot of crashes right after the cl that exposes the problem. On Fri, Apr 3, 2009 at 12:52 AM, Dean McNamee de...@chromium.org wrote: You could, however, corrupt the vtable pointer (not the vtable). Say somehow 32 was added to it, now the table is misaligned, and you might get a purecall, etc. Not sure that's likely at all though. Since the vtable pointer is the first field, it seems ripe for problems w/ use after free, etc. I kinda doubt that's what's happening here though. Anyone who is working on one of these can bug me and I'll look at the crash dump. On Fri, Apr 3, 2009 at 7:24 AM, Tommi to...@chromium.org wrote: On Thu, Apr 2, 2009 at 7:09 PM, cpu c...@chromium.org wrote: On Apr 2, 3:53 pm, Nicolas Sylvain nsylv...@chromium.org wrote: Another simple(r) example :http://msdn.microsoft.com/en-us/library/t296ys27(VS.80).aspx http://msdn.microsoft.com/en-us/library/t296ys27(VS.80).aspxBut, as discussed in bug 8544, we've see many purecall crashes that happens and we don't think it's related to virtual functions. The only thing I can think of is that the vtable is corrupted. (overwritten or freed) Does it not make sense? I don't think you can overwrite a vtables because they should be in the code section of the executable (the pages marked as read-execute), they are known at compile time and it would not make sense to construct them on the fly. But if you know of a case then that would be very interesting. yes they should be protected with read/execute and besides, you'd have to overwrite entries in the vtable with a pointer to __purecall for that to happen Nicolas On Thu, Apr 2, 2009 at 1:54 PM, cpu c...@chromium.org wrote: After reading some speculation in bugs such as http://code.google.com/p/chromium/issues/detail?id=8544I felt compelled to dispel some myths and misunderstandings about the origin and meaning of the mythical _purecall_ exception. My hope is that then you can spot the problems in our source code and fix them. Sorry for the long post. So first of all, what do you see when you get this error? if you are in a debug build and you are not eating the exceptions via some custom handler you see this dialog: --- Debug Error! R6025 - pure virtual function call (Press Retry to debug the application) --- Abort Retry Ignore --- For chrome/chromium we install a special handler, which forces a crash dump in which case you'll see in in the debugger analysis something like this: [chrome_dll_main.cc:100] - `anonymous namespace'::PureCall() [purevirt.c:47] - _purecall Before going into too much detail, let me show you a small program that causes this exception: = class Base { public: virtual ~Base() { ThreeFn(); } virtual void OneFn() = 0; virtual void TwoFn() = 0; void ThreeFn() { OneFn(); TwoFn(); } }; class Concrete : public Base { public: Concrete() : state_(0) { } virtual void OneFn() { state_ += 1; } virtual void TwoFn() { state_ += 2; } private: int state_; }; int _tmain(int argc, _TCHAR* argv[]) { Concrete* obj = new Concrete(); obj-OneFn(); obj-TwoFn(); obj-ThreeFn(); delete obj; return 0; } = Can you spot the problem? do you know at which line it crashes, do you know why? if so I have wasted your time, apologies. If you are unsure then read on. This program crashes when trying to call OneFn() with a purecall exception on debug build. On release build it exits with no error, but your mileage might vary depending on what optimizations are active. The call stack for the crash is: msvcr80d.dll!__purecall() + 0x25 -- shows the dialog (debug only) app.exe!Base::ThreeFn() Line 16 + 0xfc - error here app.exe!Base::~Base() Line 10 C++ app.exe!Concrete::~Concrete() + 0x2b app.exe!Concrete::`scalar deleting destructor'() + 0x2b - delete obj So as you have guessed it has to do with calling virtual functions from a destructor. What happens is that during construction an object evolves from the
[chromium-dev] Re: purecall exceptions and the manbearpig
purecall isn't called when an exception occurs. purecall actually throws the exception - or exits the program (by default the crt throws up a dialog and then abort()s). in addition to cpu's email, raymond chen's article is a good (and short) read :) http://blogs.msdn.com/oldnewthing/archive/2004/04/28/122037.aspx On Fri, Apr 3, 2009 at 3:15 PM, Huan Ren hu...@google.com wrote: Based on what I saw in the bug, it looks like an exception happening during CALL instruction may lead to PureCall(). For example, an object obj has been freed and later on someone calls obj-func(). Then the assembly code looks like this: // ecx: pointer to obj which is in memory // [ecx]: supposed to be pointer to vtable, it has invalid value since obj is freed // edx: now has pointer to vtable, which is invalid mov edx,dword ptr [ecx] // deref the vtable and make the call call dword ptr [edx+4] When a (hardware) exception happens during the call instruction, the control will be eventually transfered to the routine handling this type of exception which I *think* is PureCall(). Huan On Fri, Apr 3, 2009 at 11:26 AM, Ricardo Vargas rvar...@chromium.org wrote: I certainly don't want to imply that it is the case with this particular bug, but I have seen crashes when the cause of the problem is using an object that was previously deleted (and only end up with this exception when all the planets are properly aligned). I guess that it depends on the actual class hierarchy of the objects in question, but I'd think that simple examples end up on a lot of crashes right after the cl that exposes the problem. On Fri, Apr 3, 2009 at 12:52 AM, Dean McNamee de...@chromium.org wrote: You could, however, corrupt the vtable pointer (not the vtable). Say somehow 32 was added to it, now the table is misaligned, and you might get a purecall, etc. Not sure that's likely at all though. Since the vtable pointer is the first field, it seems ripe for problems w/ use after free, etc. I kinda doubt that's what's happening here though. Anyone who is working on one of these can bug me and I'll look at the crash dump. On Fri, Apr 3, 2009 at 7:24 AM, Tommi to...@chromium.org wrote: On Thu, Apr 2, 2009 at 7:09 PM, cpu c...@chromium.org wrote: On Apr 2, 3:53 pm, Nicolas Sylvain nsylv...@chromium.org wrote: Another simple(r) example :http://msdn.microsoft.com/en-us/library/t296ys27(VS.80).aspx http://msdn.microsoft.com/en-us/library/t296ys27(VS.80).aspxBut, as discussed in bug 8544, we've see many purecall crashes that happens and we don't think it's related to virtual functions. The only thing I can think of is that the vtable is corrupted. (overwritten or freed) Does it not make sense? I don't think you can overwrite a vtables because they should be in the code section of the executable (the pages marked as read-execute), they are known at compile time and it would not make sense to construct them on the fly. But if you know of a case then that would be very interesting. yes they should be protected with read/execute and besides, you'd have to overwrite entries in the vtable with a pointer to __purecall for that to happen Nicolas On Thu, Apr 2, 2009 at 1:54 PM, cpu c...@chromium.org wrote: After reading some speculation in bugs such as http://code.google.com/p/chromium/issues/detail?id=8544I felt compelled to dispel some myths and misunderstandings about the origin and meaning of the mythical _purecall_ exception. My hope is that then you can spot the problems in our source code and fix them. Sorry for the long post. So first of all, what do you see when you get this error? if you are in a debug build and you are not eating the exceptions via some custom handler you see this dialog: --- Debug Error! R6025 - pure virtual function call (Press Retry to debug the application) --- Abort Retry Ignore --- For chrome/chromium we install a special handler, which forces a crash dump in which case you'll see in in the debugger analysis something like this: [chrome_dll_main.cc:100] - `anonymous namespace'::PureCall() [purevirt.c:47] - _purecall Before going into too much detail, let me show you a small program that causes this exception: = class Base { public: virtual ~Base() { ThreeFn(); } virtual void OneFn() = 0; virtual void TwoFn() = 0; void ThreeFn() { OneFn(); TwoFn(); } }; class Concrete : public Base { public: Concrete() : state_(0) { }
[chromium-dev] Re: purecall exceptions and the manbearpig
Another simple(r) example : http://msdn.microsoft.com/en-us/library/t296ys27(VS.80).aspx http://msdn.microsoft.com/en-us/library/t296ys27(VS.80).aspxBut, as discussed in bug 8544, we've see many purecall crashes that happens and we don't think it's related to virtual functions. The only thing I can think of is that the vtable is corrupted. (overwritten or freed) Does it not make sense? Nicolas On Thu, Apr 2, 2009 at 1:54 PM, cpu c...@chromium.org wrote: After reading some speculation in bugs such as http://code.google.com/p/chromium/issues/detail?id=8544 I felt compelled to dispel some myths and misunderstandings about the origin and meaning of the mythical _purecall_ exception. My hope is that then you can spot the problems in our source code and fix them. Sorry for the long post. So first of all, what do you see when you get this error? if you are in a debug build and you are not eating the exceptions via some custom handler you see this dialog: --- Debug Error! R6025 - pure virtual function call (Press Retry to debug the application) --- Abort Retry Ignore --- For chrome/chromium we install a special handler, which forces a crash dump in which case you'll see in in the debugger analysis something like this: [chrome_dll_main.cc:100] - `anonymous namespace'::PureCall() [purevirt.c:47] - _purecall Before going into too much detail, let me show you a small program that causes this exception: = class Base { public: virtual ~Base() { ThreeFn(); } virtual void OneFn() = 0; virtual void TwoFn() = 0; void ThreeFn() { OneFn(); TwoFn(); } }; class Concrete : public Base { public: Concrete() : state_(0) { } virtual void OneFn() { state_ += 1; } virtual void TwoFn() { state_ += 2; } private: int state_; }; int _tmain(int argc, _TCHAR* argv[]) { Concrete* obj = new Concrete(); obj-OneFn(); obj-TwoFn(); obj-ThreeFn(); delete obj; return 0; } = Can you spot the problem? do you know at which line it crashes, do you know why? if so I have wasted your time, apologies. If you are unsure then read on. This program crashes when trying to call OneFn() with a purecall exception on debug build. On release build it exits with no error, but your mileage might vary depending on what optimizations are active. The call stack for the crash is: msvcr80d.dll!__purecall() + 0x25-- shows the dialog (debug only) app.exe!Base::ThreeFn() Line 16 + 0xfc - error here app.exe!Base::~Base() Line 10 C++ app.exe!Concrete::~Concrete() + 0x2b app.exe!Concrete::`scalar deleting destructor'() + 0x2b - delete obj So as you have guessed it has to do with calling virtual functions from a destructor. What happens is that during construction an object evolves from the earliest base class to the actual type and during destruction the object devolves (is that a word?) from the actual object to the earliest base class; when we reach ~Base() body the object is no longer of type Concrete but of type Base and thus the call Base::OneFn () is an error because that class does not in fact have any implementation. What the compiler does is create two vtables, the vtable of Concrete looks like this: vtable 1: [ 0 ] - Concrete::OneFn() [ 1 ] - Concrete::TwoFn() vtable 2: [ 0 ]- msvcr80d.dll!__purecall() [ 1 ]- msvcr80d.dll!__purecall() The dtor of Concrete is the default dtor which does nothing except calling Base::~Base(), but the dtor of base does: this-vtbl_ptr = vtable2 call ThreeFn() Now, why doesn't the release build crash? That's because the compiler does not bother with generating the second vtable, after all is not going to be used and thus also eliminates the related lines such as this-vtbl_ptr = vtable2. Therefore the object reaches the base dtor with the vtbl_ptr pointing to vtable1 which makes the call ThreeFn() just work. But that was just luck. If you ever modify the base class, such as introducing a new virtual function that is not pure, like this: class Base { public: virtual ~Base() { ThreeFn(); } virtual void OneFn() = 0; virtual void TwoFn() = 0; virtual void FourFn() { --- new function, not pure virtual wprintf(Law snap); } void ThreeFn() { OneFn(); TwoFn(); } }; // Same program below. // ... // Then you are forcing the compiler to generate vtable 2, which looks: vtable 2: [ 0 ]- msvcr80d.dll!__purecall() [ 1 ]- msvcr80d.dll!__purecall() [ 2 [- Base::FourFn() And now the purecall crash magically happens (on the same spot) on release builds, which is quite surprising since the trigger was the introduction of FourFn() which has _nothing_ to do with the crash