Re: Segmentation faults in wasm workers

'Sam Clegg' via emscripten-discuss Fri, 26 May 2023 13:47:16 -0700

Can I ask why you chose not to use pthreads to start with?  I'd like to
understand better why folks would choose wasm workers over pthreads.


On Fri, May 26, 2023 at 3:25 AM 'Dieter Weidenbrück' via emscripten-discuss
<emscripten-discuss@googlegroups.com> wrote:

> Hi Sam,
> IIRC, when I started with Emscripten a while ago the program would abort
> in case of a memory error. As my app is comparable to a desktop app, this
> was not acceptable, so I set ABORTING_MALLOC to 0. I understand that this
> flag has a different meaning today. Here is how all my allocation calls
> work:
>
> Error_T allocMemPtr(MemPtr_T *p,uint32_T size,boolean_T clear) {
> _MemPtr_T mp;
>
> if (clear)
> mp = (_MemPtr_T)calloc(1,size + sizeof(_Mem_T));
> else
> mp = (_MemPtr_T)malloc(size + sizeof(_Mem_T));
> if (mp) {
> mp->size = size;
> *p = (MemPtr_T)((char_T*)mp + sizeof(_Mem_T));
> return kErr_NoErr;
> }
> return kErr_MemErr;
> }
> Error_T setMemPtrSize(MemPtr_T *p,uint32_T size){
> _MemPtr_T m = _MP(*p);
> MemPtr_T newPtr;
>
> newPtr = realloc(m,size + sizeof(_Mem_T));
> if (newPtr) {
> m = (_MemPtr_T)newPtr;
> m->size = size;
> *p = (MemPtr_T)((char_T*)m + sizeof(_Mem_T));
> return kErr_NoErr;
> }
> return kErr_MemErr;
> }
>
> So I should catch all errors. However,  errors (i.e. return value == 0)
> are not reported by malloc or calloc during the problems I am experiencing.
> I added debug lines, but not a single failure was recorded.
> Removing ABORTING_MALLOC did not result in any change of error outcome.
>
> I see two different behaviors now:
> - setting  up workers and checking that they run by
> static void startUpWorker(void) {
> #ifdef __EMSCRIPTEN__
> int32_T w = emscripten_wasm_worker_self_id();
> if (! emscripten_current_thread_is_wasm_worker()){
> EM_ASM_({
> console.log("Error: No worker: " + $0);
> },w);
> }
> #endif //__EMSCRIPTEN__
> }
> - then I do my stuff and receive about 10 of the "Uncaught RuntimeError:
> memory access out of bounds" errors.
> - no failures of malloc/calloc recognized
>
> The second behavior is
> - in main() I call this routine:
> static void memtest(void) {
> #define NUM_CHUNKS  15
> const int CHUNK_SIZE = 100 * 1024 * 1024;
> int i;
> void* p[NUM_CHUNKS];
> Error_T err = kErr_NoErr;
>
> for (int i = 0; i < NUM_CHUNKS; i++) {
> err = allocMemPtr(&p[i],CHUNK_SIZE,FALSE); //see function above
> if (err != kErr_NoErr || p[i] == NULLPTR) {
> printf("Error chunk %d\n",i);
> break;
> }
> }
> for (int i = 0; i < NUM_CHUNKS; i++) {
> if (p[i] == NULLPTR)
> break;
> disposeMemPtr(p[i]);
> }
> }
> - then I start up the workers as described above
> - then I do my stuff
> - sometimes this results in error free behavior, but not always. If an
> error occurs, I only get one "Uncaught RuntimeError" message.
>
> I am pretty confident that I handle memory allocation correctly, because
> my background is in development of desktop apps in C for 30+ years, and
> there you better not have any leaks and keep the app running whenever
> possible. So I must be doing something wrong when dealing with multiple
> threads.
> I will try out pthreads next, because I have no idea anymore what the
> cause could be here.
>
> Cheers,
> Dieter
> s...@google.com schrieb am Donnerstag, 25. Mai 2023 um 23:20:33 UTC+2:
>
>> Is there some reason you added `-sABORTING_MALLOC=0`.. that looks a
>> little suspicious, since it means the program can continue after malloc
>> fails.. which mean that any callsite that doesn't check the return value of
>> malloc can lead to segfaults.   If you remove that setting does the
>> behaviour change?
>>
>>
>>
>> On Thu, May 25, 2023 at 1:27 PM 'Dieter Weidenbrück' via
>> emscripten-discuss <emscripte...@googlegroups.com> wrote:
>>
>>> Hi Sam,
>>>
>>> I can run the code in a single thread without problems, and I have done
>>> that for a while. So I assume that the code is stable.
>>>
>>> Here is the command line I use  in a .bat file:
>>> emcc ./src/main.c ^
>>> ...
>>> ./src/w_com.c ^
>>> -I ./include/ ^
>>> -g3 ^
>>> --source-map-base ./ ^
>>> -gsource-map ^
>>> -s ALLOW_MEMORY_GROWTH=1 ^
>>> -s ENVIRONMENT=web,worker ^
>>> --shell-file ./index_template.html ^
>>> -s SUPPORT_ERRNO=0 ^
>>> -s MODULARIZE=1 ^
>>> -s ABORTING_MALLOC=0 ^
>>> -sWASM_WORKERS ^
>>> -s "EXPORT_NAME='wasmMod'" ^
>>> -s EXPORTED_FUNCTIONS="['_malloc','_free','_main']" ^
>>> -s EXPORTED_RUNTIME_METHODS=
>>> "['cwrap','UTF16ToString','UTF8ToString','stringToUTF8','allocateUTF8']"
>>> ^
>>> -o index.html
>>>
>>> I will start familiarizing myself with pthreads to test whether that
>>> would work better.
>>>
>>> BTW, as an old C programmer I am fascinated by emscripten and its
>>> possibilities. Excellent job!
>>>
>>> Cheers,
>>> Dieter
>>>
>>> s...@google.com schrieb am Donnerstag, 25. Mai 2023 um 20:29:58 UTC+2:
>>>
>>>> This looks like some kind of memory corruption, most likely due to the
>>>> use of muiltithreading/wasm_workers    Are you able to build a single
>>>> threaded version of your program, or one that uses normal pthreads rather
>>>> than wasm workers?
>>>>
>>>> Also, can you share the full link command you are using?
>>>>
>>>> cheers,
>>>> sam
>>>>
>>>> On Thu, May 25, 2023 at 9:20 AM 'Dieter Weidenbrück' via
>>>> emscripten-discuss <emscripte...@googlegroups.com> wrote:
>>>>
>>>>> This is a memory snapshot when using SAFE_HEAP. So here I am quite
>>>>> below the browser limits, still the segfault occurs in different places.
>>>>> Ignore the first console line, it results from Norton Utilities I
>>>>> think.
>>>>>
>>>>> [image: error2.png]
>>>>>
>>>>> Dieter Weidenbrück schrieb am Donnerstag, 25. Mai 2023 um 18:06:27
>>>>> UTC+2:
>>>>>
>>>>>> Hi Sam,
>>>>>> I noticed already that I am bumping against browser limits,
>>>>>> especially with sanitizer switched on, so I reduced the pre-allocation
>>>>>> calls.
>>>>>> It turns out that asan uses so much memory that I can't use it to
>>>>>> analyze this case.
>>>>>>
>>>>>> I use
>>>>>> -s ALLOW_MEMORY_GROWTH=1
>>>>>> but don't specify any MAXIMUM_MEMORY.
>>>>>>
>>>>>> No pthreads version so far. I might try this next.
>>>>>>
>>>>>> Cheers,
>>>>>> Dieter
>>>>>>
>>>>>> s...@google.com schrieb am Donnerstag, 25. Mai 2023 um 17:55:41
>>>>>> UTC+2:
>>>>>>
>>>>>>> Firstly, if you are allocating 1.8Gb you are likely pushing up
>>>>>>> against browser limits.  Are you specifying a MAXIMUM_MEMORY of larger 
>>>>>>> than
>>>>>>> 2GB?
>>>>>>>
>>>>>>> Secondly, it looks like you are using wasm workers, which are still
>>>>>>> relatively new.  Do you have a version of your code that uses pthreads
>>>>>>> instead?  It might tell is if the issue is related to wasm workers.
>>>>>>>
>>>>>>> cheers,
>>>>>>> sam
>>>>>>>
>>>>>>> On Thu, May 25, 2023 at 8:06 AM 'Dieter Weidenbrück' via
>>>>>>> emscripten-discuss <emscripte...@googlegroups.com> wrote:
>>>>>>>
>>>>>>>> The joy was premature, even with pre-allocated heap size segfaults
>>>>>>>> occur. :(
>>>>>>>>
>>>>>>>> Dieter Weidenbrück schrieb am Donnerstag, 25. Mai 2023 um 16:28:37
>>>>>>>> UTC+2:
>>>>>>>>
>>>>>>>>> All,
>>>>>>>>> I am experiencing segmentation faults when using wasm workers.
>>>>>>>>> Overview:
>>>>>>>>> I am working on a project with considerable 3D data sets. The code
>>>>>>>>> has been stable for a while when running in the main thread alone. 
>>>>>>>>> Then I
>>>>>>>>> started using js workers (no shared memory), and again all was well.
>>>>>>>>> Now I've switched to SharedArrayBuffers and wasm workers, and I
>>>>>>>>> keep running into random problems.
>>>>>>>>> I have prepared the code such that I can run with 0 workers up to
>>>>>>>>> hardware.concurrency workers. All is well with 0 workers, but as soon 
>>>>>>>>> as I
>>>>>>>>> use one or more workers, I keep getting segfaults because of invalid
>>>>>>>>> pointers, access out of bounds and similar.
>>>>>>>>>
>>>>>>>>> What happens in main thread and what in the wasm workers:
>>>>>>>>> I allocate all objects in the main thread when importing the 3D
>>>>>>>>> file. Then i fire off a function for each object that will do some 
>>>>>>>>> serious
>>>>>>>>> calculations of the data, including allocating and disposing of 
>>>>>>>>> memory. The
>>>>>>>>> workers allocate approx. 300 to 400 MB in addition to the main 
>>>>>>>>> thread. All
>>>>>>>>> this happens in the same sharedArrayBuffer, of course.
>>>>>>>>>
>>>>>>>>> Here is what I've tried so far:
>>>>>>>>> - compiling with SAFE_HEAP=1
>>>>>>>>> not a lot of  helpful information,
>>>>>>>>> - compiling with -fsanitize=address
>>>>>>>>> everything works without problems here!
>>>>>>>>> - compiling with ASSERTIONS=2
>>>>>>>>> gave me this information:
>>>>>>>>> [image: error.png]
>>>>>>>>>
>>>>>>>>> To me it looks like another resize call is executed while other
>>>>>>>>> workers keep working on the buffer, and then something gets into 
>>>>>>>>> conflict.
>>>>>>>>> To test this, I allocated 1.8 GB right after startup in the main
>>>>>>>>> thread and disposed the mem blocks again just to trigger heap resize. 
>>>>>>>>> After
>>>>>>>>> that everything works like a charm.
>>>>>>>>>
>>>>>>>>> Is there anything I am doing wrong?
>>>>>>>>> Sorry for not providing a sample, but there is a lot of code
>>>>>>>>> involved, and it is not easy to simulate this behavior. Happy to 
>>>>>>>>> answer
>>>>>>>>> questions.
>>>>>>>>>
>>>>>>>>> All comments are appreciated.
>>>>>>>>> Thanks,
>>>>>>>>> Dieter
>>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "emscripten-discuss" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to emscripten-disc...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/emscripten-discuss/80d56314-59d8-4332-bb2e-ebe00fe52ea3n%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/emscripten-discuss/80d56314-59d8-4332-bb2e-ebe00fe52ea3n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "emscripten-discuss" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to emscripten-disc...@googlegroups.com.
>>>>>
>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/emscripten-discuss/cfc03512-f69f-44b0-8c14-1f1a8e4ffe9fn%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/emscripten-discuss/cfc03512-f69f-44b0-8c14-1f1a8e4ffe9fn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "emscripten-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to emscripten-disc...@googlegroups.com.
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/emscripten-discuss/e568e189-4259-460f-9601-e7996927cdb7n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/emscripten-discuss/e568e189-4259-460f-9601-e7996927cdb7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to emscripten-discuss+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/emscripten-discuss/b20d2de8-2532-4441-b8fc-3ef8f049f7f0n%40googlegroups.com
> <https://groups.google.com/d/msgid/emscripten-discuss/b20d2de8-2532-4441-b8fc-3ef8f049f7f0n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to emscripten-discuss+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/emscripten-discuss/CAL_va28k7RyF2n-x6B8M9pbgri2bCDCQA7N%2BG7x-6GVP%2Bpqumg%40mail.gmail.com.

Re: Segmentation faults in wasm workers

Reply via email to