Re: Thunderbird crashing when C_SignInit returns other than CKR_OK

2010-12-21 Thread Matej Kurpel

On 19. 12. 2010 9:27, Nelson Bolyard wrote:

On 2010-12-16 19:21 PDT, Marsh Ray wrote:

On 12/16/2010 04:39 PM, Matej Kurpel wrote:

ChildEBP RetAddr  Args to Child
0015f130 5fa0c52b e06d7363 0001 0003
KERNELBASE!RaiseException+0x58 (FPO: [Non-Fpo])
0015f168 5fa14f13 0015f178 5fa7aa24 5fa5c11c
MOZCRT19!_CxxThrowException+0x46 (FPO: [Non-Fpo]) (CONV: stdcall)
[f:\sp\vctools\crt_bld\self_x86\crt\prebuild\eh\throw.cpp @ 161]

So Mozilla builds its own CRT without FPO, cool.

Yes, Mozilla builds its own CRT, which is a modified version of the MSVC
CRT, whose sources come only with the pay (not free) versions of MSVC.
They do this in order to replace MSVC's normal heap code (malloc) with
their own JEmalloc.

Mozilla's source repository doesn't include ANY of the MSVC source code,
but only includes a ed script that patches that source without including
any of it.  Sadly, this means that people with the free MSVC cannot build
MOZCRT19, because they lack the sources to be patched.  IMO, this is a
flaw for an open source project, but ...  :(


0015f180 003b474b 0028 0015f290 5f9ad1d9 MOZCRT19!operator new+0x73
(FPO: [1,3,0]) (CONV: cdecl)

The above func must be statically linked from the Mic CRT into the Moz
CRT. So it's still FPO. Weird.

Right.  IIRC, it's built from the plain old MSVC new.cpp source.
It calls malloc and throws an exception if malloc returns NULL.


[e:\buildbot\win32_build_31\build\objdir-tb\mozilla\memory\jemalloc\crtsrc\new@61]

Looking at
http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/ I don't
see the source or crtsrc\new.cpp. Must be copied in from Microsoft
source code a build time.

Right.


In any case, 'operator new' is throwing a C++ exception. Ordinarily that
would be due to a bad parameter (e.g., -1) or lack of memory.

Right.  Any NULL return from malloc causes this.


In this case is it maybe asking for 0x0028 = 40 bytes?

I wouldn't bet much money that JEmalloc never modifies its input
arguments.  That's always allowed in c (as you know) which always passes
arguments by value.


0015f198 003b47db 09385800  003d3b55
thunderbird!nsDOMEvent::nsDOMEvent+0x63 (FPO: [Non-Fpo]) (CONV: thiscall)
[e:\buildbot\win32_build_31\build\mozilla\content\events\src\nsdomevent@136]

http://mxr.mozilla.org/mozilla-central/source/content/events/src/nsDOMEvent.cpp
Line 132 is in the middle of a comment, so clearly I'm n ot looking at
the right source. Below it is a 'new nsEvent'.

The sources from which Thunderbird are built come from Mozilla's
comm-central repository.  I think that line 136 could be either a
reference to the line on which the new call itself occurs, or the
following line.

The versions of the nsdomevent source in which the new call occurs on line
135 are dated 2009-04-02 14:34 -0500 ... 2009-06-30 10:56 +0300 
and line 136 from  2009-09-11 16:13 -0700 ... 2009-11-30 13:31 -0500
all of which are over a year old now.
See
http://hg.mozilla.org/mozilla-central/log/90b17476216d/content/events/src/nsDOMEvent.cpp
and
http://hg.mozilla.org/mozilla-central/log/d9267e3d8f8c/content/events/src/nsDOMEvent.cpp
and
http://hg.mozilla.org/mozilla-central/annotate/9e7a2c507c41/content/events/src/nsDOMEvent.cpp#l136


But 'nsEvent' looks like it would take more than 40 bytes.

yes.


So, skipping down a bit, it looks like something has already gone wrong
before this exception is thrown. The app is attempting to show an alert
box, which fails because of an out-of-memory condition.

Agreed.  further back on the stack, we see:


nsMsgSendReport::DisplayReport+0x28c  nsmsgsendreport@428]
nsMsgComposeAndSend::Fail+0x73nsmsgsend@3812]
nsMsgComposeAndSend::GatherMimeAttachments+0x113d nsmsgsend@1147]

That suggests that the attempt to generate and attach all the attachments
failed, and I'd guess that is likely due to Matej's intentional
introduction of a failure into C_SignInit.

So, C_SignInit failed, and then the attempt to report that failure in an
alert pop-up dialog fails due to heap allocation failure, perhaps due to
heap exhaustion, or heap corruption.


The details are probably not important.

Well, I think the big question is: why does the heap allocation fail?


You need to track down where the first error occurs.

My first wild guess is that Matej's PKCS#11 module is doing something bad
to the heap.  My second one is that NSS or PSM is trying to free to the
MOZCRT17 heap something that was allocated from another heap.


How can I check if I am doing something bad to the heap, please? Sadly, 
I am not so skilled C++ programmer (well, rather a noobish one) and I 
mostly don't know about the inside stuff you were talking about here...
Also, the code for C_SignInit is nearly the same as for C_DecryptInit 
which works fine. Plus, when I only return non-CKR_OK error code from 
C_SignInit (and do nothing else in it), it still crashes.
I would like to solve this problem very much. If I can be of more help - 
if you need 

Re: Thunderbird crashing when C_SignInit returns other than CKR_OK

2010-12-21 Thread Marsh Ray

On 12/21/2010 06:44 AM, Matej Kurpel wrote:


How can I check if I am doing something bad to the heap, please? Sadly,
I am not so skilled C++ programmer (well, rather a noobish one) and I
mostly don't know about the inside stuff you were talking about here...


It's OK, everybody has to debug this problem occasionally.


Also, the code for C_SignInit is nearly the same as for C_DecryptInit
which works fine. Plus, when I only return non-CKR_OK error code from
C_SignInit (and do nothing else in it), it still crashes.


1. Go over all your code again and make sure nothing is writing past the 
end of the memory you get from new/malloc, or someone else gives to you. 
Search in your code for 'memcopy' and friends, a bad parameter to those 
functions can easily cause this. Search for C-style (casts) of pointers 
and reinterpret_cast.


2. Make sure you don't pass a pointer to some object which remembers it 
and then delete/free the pointer while that object is still using it. 
Try simply commenting out everywhere you manually free memory. It will 
be a memory leak, but you might be able to figure out which one(s) cause 
the crash that way.


3. See if you can reproduce the problem on Linux. Run it with Valgrind 
and/or Electric Fence These are similar to PageHeap, often times open 
source apps will already have a build configuration for that on Linux.


4. Test it with Microsoft's PageHeap tool. There's lots of documentation 
on it and probably some forums that can help you with that. If that 
doesn't find it right away, try re-building with the Release Microsoft C 
Runtime library as discussed.



I would like to solve this problem very much. If I can be of more help -
if you need more info (or output from some more debugging programs),
just ask.


You can do it.

- Marsh
--
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto