Re: nsh_fileapps and usage of sched_lock()
Hi, Thanks for the responses, they are very much appreciated. I suspected that the scheduler lock can be removed without too severe side effects (I made a PR of it already). I made a PR of removing it from nsh_fileapps already since I think that is potentially a more critical place to fix due to the file I/O. Loading the process into memory can take a relatively long time. I did notice that the builtin apps do the same but deemed that as not-so-critical because the builtins are already in memory. I did not immediately fix it due to regression mitigation but I can fix the builtin apps as well if you wish. Br, Ville Juven On Wed, Oct 25, 2023 at 5:55 PM Gregory Nutt wrote: > On 10/25/2023 8:48 AM, Gregory Nutt wrote: > > On 10/25/2023 8:18 AM, Alan C. Assis wrote: > >> On 10/25/23, Nathan Hartman wrote: > >>> On Wed, Oct 25, 2023 at 5:16 AM Ville Juven > wrote: > >>> > Hi all, > > I noticed that when spawning a new task/process from a file in > nsh_fileapps, the scheduler is locked prior to calling posix_spawn(), > which > does the file loading into memory etc. > > I noticed one issue with this; when the file size is large (in the > order > of > MB) the scheduler is locked for very long periods at a time, in the > order > of hundreds of milliseconds. > The same logic exists in apps/nshlib/nsh_builtin.c. In fact, it looks > like one was just cloned from the other. Both should behave the same way. >
Re: nsh_fileapps and usage of sched_lock()
On 10/25/2023 8:48 AM, Gregory Nutt wrote: On 10/25/2023 8:18 AM, Alan C. Assis wrote: On 10/25/23, Nathan Hartman wrote: On Wed, Oct 25, 2023 at 5:16 AM Ville Juven wrote: Hi all, I noticed that when spawning a new task/process from a file in nsh_fileapps, the scheduler is locked prior to calling posix_spawn(), which does the file loading into memory etc. I noticed one issue with this; when the file size is large (in the order of MB) the scheduler is locked for very long periods at a time, in the order of hundreds of milliseconds. The same logic exists in apps/nshlib/nsh_builtin.c. In fact, it looks like one was just cloned from the other. Both should behave the same way.
Re: nsh_fileapps and usage of sched_lock()
On 10/25/2023 8:18 AM, Alan C. Assis wrote: On 10/25/23, Nathan Hartman wrote: On Wed, Oct 25, 2023 at 5:16 AM Ville Juven wrote: Hi all, I noticed that when spawning a new task/process from a file in nsh_fileapps, the scheduler is locked prior to calling posix_spawn(), which does the file loading into memory etc. I noticed one issue with this; when the file size is large (in the order of MB) the scheduler is locked for very long periods at a time, in the order of hundreds of milliseconds. This sounds like a bug. The scheduler should not be locked during IO-bound operations, since there is no way to know how long they will take. Loading from flash could take hundreds of milliseconds (which is already terrible) but imagine a different scenario where loading from a network with connection problems outside of the device could lock the device for many seconds! If I understood this comment correctly: /* Lock the scheduler in an attempt to prevent the application from * running until waitpid() has been called. */ then maybe instead of forcing a sched_lock() we could change the task_state to TSTATE_TASK_INACTIVE or some other that prevent the task to be scheduled again before the posix_spawnp() get finished. BR, Alan I think that the sched_lock() is not necessary. Notice that the this is only "an attempt" to keep the application from running and executing. Without the sched_lock(), the task may run and exit before there is a change to call waitpid() which should effect only the user experience. A good test to make sure that still works would be remove the sched_lock/unlock and add a test case like: int main(int argc, char **argv) { return 0; } That is the case that the logic is trying to avoid, but it seems like it should work fine. Subsequent logic handles this case (but provides no use feedback). See comments: /* Wait for the application to exit. We did lock the scheduler * above, but that does not guarantee that the application did not * already run to completion in the case where I/O was redirected. * Here the scheduler will be unlocked while waitpid is waiting * and if the application has not yet run, it will now be able to * do so. * * NOTE: WUNTRACED does nothing in the default case, but in the * case the where CONFIG_SIG_SIGSTOP_ACTION=y, the file app * may also be stopped. In that case WUNTRACED will force * waitpid() to return with ECHILD. */ And /* If the child thread doesn't exist, waitpid() will return the * error ECHLD. Since we know that the task was successfully * started, this must be one of the cases described above; we * have to assume that the task already exit'ed. In this case, * we have no idea if the application ran successfully or not * (because NuttX does not retain exit status of child tasks). * Let's assume that is did run successfully. */ So it looks to me like the case where the program exits is already handled.
Re: nsh_fileapps and usage of sched_lock()
On 10/25/23, Nathan Hartman wrote: > On Wed, Oct 25, 2023 at 5:16 AM Ville Juven wrote: > >> Hi all, >> >> I noticed that when spawning a new task/process from a file in >> nsh_fileapps, the scheduler is locked prior to calling posix_spawn(), >> which >> does the file loading into memory etc. >> >> I noticed one issue with this; when the file size is large (in the order >> of >> MB) the scheduler is locked for very long periods at a time, in the order >> of hundreds of milliseconds. > > > > This sounds like a bug. The scheduler should not be locked during IO-bound > operations, since there is no way to know how long they will take. Loading > from flash could take hundreds of milliseconds (which is already terrible) > but imagine a different scenario where loading from a network with > connection problems outside of the device could lock the device for many > seconds! > If I understood this comment correctly: /* Lock the scheduler in an attempt to prevent the application from * running until waitpid() has been called. */ then maybe instead of forcing a sched_lock() we could change the task_state to TSTATE_TASK_INACTIVE or some other that prevent the task to be scheduled again before the posix_spawnp() get finished. BR, Alan
Re: nsh_fileapps and usage of sched_lock()
On Wed, Oct 25, 2023 at 5:16 AM Ville Juven wrote: > Hi all, > > I noticed that when spawning a new task/process from a file in > nsh_fileapps, the scheduler is locked prior to calling posix_spawn(), which > does the file loading into memory etc. > > I noticed one issue with this; when the file size is large (in the order of > MB) the scheduler is locked for very long periods at a time, in the order > of hundreds of milliseconds. This sounds like a bug. The scheduler should not be locked during IO-bound operations, since there is no way to know how long they will take. Loading from flash could take hundreds of milliseconds (which is already terrible) but imagine a different scenario where loading from a network with connection problems outside of the device could lock the device for many seconds! More below... So my question is, is the scheduler lock strictly necessary in this case ? > The only reason I could surmise from the comment below is that waitpid is > not performed on a stale pid (or even a possibly re-taken pid ?): > > /* Lock the scheduler in an attempt to prevent the application from >* running until waitpid() has been called. >*/ > > sched_lock(); > > A follow-up question obviously is, what happens if the scheduler lock is > removed ? Will something bad happen and if so, is there a way to mitigate > this (other than the sched_lock())? I have not studied the code but conceptually the file should be loaded (or other IO operations) and only when ready to perform scheduler operation the scheduler should be locked for the minimal length of time. More below... My reason for removing the scheduler lock is obviously that I lose all > real-time assurances of the OS when I'm starting a process, e.g. I start > losing sensor data during the process load time. This is exactly why the scheduler should not be locked for extended lengths of time. Nathan
nsh_fileapps and usage of sched_lock()
Hi all, I noticed that when spawning a new task/process from a file in nsh_fileapps, the scheduler is locked prior to calling posix_spawn(), which does the file loading into memory etc. I noticed one issue with this; when the file size is large (in the order of MB) the scheduler is locked for very long periods at a time, in the order of hundreds of milliseconds. So my question is, is the scheduler lock strictly necessary in this case ? The only reason I could surmise from the comment below is that waitpid is not performed on a stale pid (or even a possibly re-taken pid ?): /* Lock the scheduler in an attempt to prevent the application from * running until waitpid() has been called. */ sched_lock(); A follow-up question obviously is, what happens if the scheduler lock is removed ? Will something bad happen and if so, is there a way to mitigate this (other than the sched_lock())? My reason for removing the scheduler lock is obviously that I lose all real-time assurances of the OS when I'm starting a process, e.g. I start losing sensor data during the process load time. Any and all help on this is greatly appreciated. BR, Ville Juven / pussuw on github