[JPR]

Hi Dave, Tim,

This kind of crash is always difficult to track down, for it is not easily 
reproductible. From what I see (and as Tim pointed) it seems there is a memory 
problem that is revelated in the process LabProjects List. But a memory problem 
can occur a while before the actual crash, because the application may have a 
corrupted memory and not be aware of it until the crash.

- Is your application compiled? If yes, be sure that the Range checking option 
is set.
- Is the LabProjects ListProcess a client process on server, or a worker or 
process running on the server?
- The time of crash seems irrelevant, but may be it's linked to a peak in 
activity and a server or network stress?
- A client problem causing a server crash is unlikely, but it may help to know 
if there is a correlation between the crash and a particular client doing a 
particular operation.
- Do you know which method is executed when it crashes?
- Do you use interprocess variables like arrays for instance?
- How much memory has been given to the server and to the cache?

This is just a short list of points to check, but it may help to reduce the 
problem to a small part of the application.

My very best,

JPR


> On 2 Sep 2018, at 21:00, 4d_tech-requ...@lists.4d.com wrote:
> 
> From: Tim Nevels <timnev...@mac.com>
> To: 4d_tech@lists.4d.com
> Subject: Re: Isolating the Cause of a Server Crash
> Message-ID: <be3bf13d-9f79-4715-aadf-240c4c189...@mac.com>
> Content-Type: text/plain; charset=utf-8
> 
> On Sep 1, 2018, at 2:00 PM, Dave Nasralla wrote:
> 
>> One of our systems is crashing about every 3 days and I can't seem to
>> isolate the cause. Lately these are crashes with a Mac crash report
>> appearing on the screen.
>> Some system details are:
>> - 4D Built Server app with v17.0 HF1 (64 bit Server with 64 Mac and
>> 32 bit Windows Clients)
>> - Mac and Windows Clients
>> - Mac OS 10.13.5
>> 
>> What I know so far:
>> - I have the Server Debug file. It ends with a "." and so the last
>> command appears to have executed.
>> - I'm using the Report Info component, logging every 5 minutes. There
>> doesn't seem to be memory problems or run away cache issues.
>> - I also know who was one each time it crashes and said out an email
>> to those users to find patterns (so far I've found none).
>> - The crashes typically happen around 10am to 11am.
>> - The client and server builds match.
>> 
>> I'm debating turning on the client debugger files and then harvesting
>> them afterwards when the user logs back in. I'm open to other
>> debugging techniques.
>> 
>> There are other v17 systems running on the same machine with zero issue.
>> 
>> Below is a snippet of the crash report. It seems to be different each
>> time, but here is the latest. Thread 73 crashed, so I only included
>> that one.
>> 
>> Thanks,
>> 
>> dave nasralla
>> ------------------------------------
>> Process:               Corporate [93958]
>> Path:                  /Users/USER/*/Corporate
>> Server.app/Contents/MacOS/Corporate
>> Identifier:            4d.com.Corporate Server.app
>> Version:               17.0 build 17.226566 (???)
>> Code Type:             X86-64 (Native)
>> Parent Process:        ??? [1]
>> Responsible:           Corporate [93958]
>> User ID:               501
>> 
>> Date/Time:             2018-08-31 11:00:05.952 -0500
>> OS Version:            Mac OS X 10.13.5 (17F77)
>> Report Version:        12
>> Anonymous UUID:        723511FD-4CA0-6E8B-0642-883209248DFC
>> 
>> 
>> Time Awake Since Boot: 3700000 seconds
>> 
>> System Integrity Protection: enabled
>> 
>> Crashed Thread:        73  LabProjects List (id = -114)
>> 
>> Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
>> Exception Codes:       EXC_I386_GPFLT
>> Exception Note:        EXC_CORPSE_NOTIFY
>> 
>> Termination Signal:    Segmentation fault: 11
>> Termination Reason:    Namespace SIGNAL, Code 0xb
>> Terminating Process:   exc handler [0]
>> ----------------------------------------------------------
>> 
>> 
>> Thread 73 Crashed:: LabProjects List (id = -114)
>> 0   4d.com.Corporate Server.app       0x000000010694fdbe
>> V4DConnection::OnPostpone(bool) + 40
>> 1   4d.com.Corporate Server.app       0x0000000106b095f7
>> V4DServerUser::PostponeServiceConnection() + 35
>> 2   4d.com.Corporate Server.app       0x0000000106b20567
>> V4DServer::exec_ConnectionPostpone(V4DRequestReply&, V4DTaskConcrete*,
>> short) + 395
>> 3   4d.com.Corporate Server.app       0x0000000106b211ca
>> V4DServer::exec_streamreq(V4DRequestReply&, V4DTaskConcrete*) + 100
> 
> Hi Dave,
> 
> Crashing every 3 days is a real problem and totally unacceptable. So what can 
> be done to try and make this situation better? We need to make changes to 
> make this crashing stop. But what changes? 
> 
> Here is my thinking as I read this crash report. Keep in mind I’m not an 
> expert on this, so I may be wrong in some areas. If I am wrong hopefully 
> those that know more can correct me — and in turn help me and others 
> understand more about how to read these macOS crash reports. (Thinking about 
> Miyako, JPR, Christian Sakowski and Rob Laveaux — they are real experts in 
> this area. Real macOS programmers that know how to read these things 
> properly.)
> 
> The crash report is supposed to provide a programmer with information on 
> exactly here the program crashed and the cause of the crash. If you have the 
> special 4D “debug” version it will contain more “symbols” and thus when 4D 
> crashes you get better names for functions instead of just memory address 
> offset. I think you even get 4D command names that were involved in the 
> crash. But the basic crash dump info that we have here can help point to the 
> general area of concern. Here is a website that helps explain crash dumps and 
> how to read them: 
> 
> https://www.maketecheasier.com/read-macos-crash-reports-troubleshoot-mac/
> 
> This is 4D v17.0 build 226566 that is running compiled in 64bit mode (Code 
> Type: x86-64). So first thought is that this could be a 4D 64bit issue. 
> That’s important because some of the code is completely different between 
> 32bit 4D and 64bit 4D. The 64bit code could be newly written code, the 32bit 
> code could be legacy code that has been around for years. 
> 
> Thread 73 “LabProjects List” is what crashed. Do you have a table named 
> “LabProjects” or maybe a MODIFY SELECTION or a listbox window that shows 
> records in this table? Or a process that has that name? Makes me think that 
> you do. That’s another pointer to where in your application the crashing 
> problem occurred.
> 
> Exception Type is "EXC_BAD_ACCESS (SIGSEGV)” and that means "the program 
> attempts to access memory incorrectly or with an invalid address”. Could be a 
> C pointer that went bad or something doing with virtual memory or even how 4D 
> allocates its own memory internally. Could be 4D data cache related. 
> Basically 4D tried to access memory is was not allowed to access and macOS 
> killed 4D so that it could not damage other parts of the system and cause 
> them to crash. Thank you macOS for watching out and protecting us from 
> complete system corruption and crashing. Windows does this too.
> 
> The last area is where we can see exactly where in 4D — and even the 4D C or 
> Objective C function name — that was running when macOS said “enough, this 
> application has gone crazy, I need to kill it before it does damage to other 
> applications.” The functions are listed in reverse chronological order, so 
> the one at the bottom is where the “call chain” started. The one at the top 
> is where it died.
> 
> The function name is "V4DConnection::OnPostpone(bool)” and at the code at 40 
> bytes from the start of that function is where the offending memory address 
> statement occurred. The name “V4DConnection” makes me think this is related 
> to networking, 4D Server handling network actions with 4D Client. The 
> “OnPostpone” makes me think this is somehow related to sleeping or a 4D 
> Client connection that has been asleep and needs to now wake up. And lastly 
> it make me think “this is related to the new network layer code”. Again, this 
> is just my thinking. I could be completely wrong about all of this. 
> 
> So now my brain tries to build a scenario that could most likely happen that 
> could be connected to this situation. Happens during the day between 10am and 
> 11am. It’s a work day with users connected. People came in to work got 
> connected to 4D Server, then wandered off to a meeting or something and their 
> computer went to sleep. You are using 4D Server compiled 64bit so you MUST be 
> using the new network layer. Legacy is only available in 32bit compiled 4D 
> Server macOS. 
> 
> There is this new network layer feature where if a 4D Client machine goes 
> into sleep mode you don’t lose your 4D Server connection. So that when the 
> user wakes up the 4D Client machine it notifies 4D Server and the old network 
> connection is reenergized and brought back to life. That “OnPostpone” mention 
> above makes me think this also. Maybe something went wrong in that area of 
> 4D. It is a tricky area because sleep could last for hours or days and memory 
> could be moved around and pointer can easily go bad in those type of 
> situations. 
> 
> So there is my analysis. Now what changes could you make to stop these damn 
> crashing situations? Here are some idea:
> 
> - You say it happens about every 3 days, so just restart 4D Server every 
> single day. Giant PITA I know. But just an idea for what to do now to 
> eliminate the crashing. 
> 
> - Stop all 4D Client machines from sleeping. You’d have to physically go to 
> every machine and turn off system sleeping and allow the display to go to 
> sleep. You can’t rely on users to do this, and do it right. This is what I 
> would do, if I had physical access to all the machine — or at least RDP 
> access — so that I could make sure every machine had system sleep turned off. 
> (Of course you already have App Napping turned off on the 4D Server machine 
> so that’s not part of this issue, right?)
> 
> - Crash dump lists Build Number 226566. v17.0 has build 225365. v17.0 HF1 has 
> build 226237. A quick check of 4D forums “Nightly Builds 4D v17” shows this 
> build is from 8/22/18. So you are running a nightly build. I’m guessing you 
> used v17.0 and had problems, went to v17.0 HF1 and still had problems, so you 
> went to nightly builds to try and find a fix. Maybe you keep doing that. 
> Current nightly build is 226837. You may find they’ve fixed the bug that is 
> biting you. 
> 
> - Stop using the new network layer. You would have to stop using 64bit 4D 
> Server so the many not be a viable option. You are limited to a 2GB data 
> cache. But maybe if you can stop the crashing now it worth that limitation. 
> That means compiling a 32bit version of 4D Server and 4D Client, and 
> replacing all the 64bit 4D Client applications with the 32bit version. I 
> think you could use the auto client update feature to automate this. 

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**********************************************************************

Reply via email to