On Fri, Mar 31, 2017 at 8:29 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com> wrote:
> On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh > <kuntalghosh.2...@gmail.com> wrote: > > > > 1. Put an Assert(0) in ParallelQueryMain(), start server and execute > > any parallel query. > > In LaunchParallelWorkers, you can see > > nworkers = n nworkers_launched = n (n>0) > > But, all the workers will crash because of the assert statement. > > 2. the server restarts automatically, initialize > > BackgroundWorkerData->parallel_register_count and > > BackgroundWorkerData->parallel_terminate_count in the shared memory. > > After that, it calls ForgetBackgroundWorker and it increments > > parallel_terminate_count. In LaunchParallelWorkers, we have the > > following condition: > > if ((BackgroundWorkerData->parallel_register_count - > > BackgroundWorkerData->parallel_terminate_count) >= > > max_parallel_workers) > > DO NOT launch any parallel worker. > > Hence, nworkers = n nworkers_launched = 0. > parallel_register_count and parallel_terminate_count, both are > unsigned integer. So, whenever the difference is negative, it'll be a > well-defined unsigned integer and certainly much larger than > max_parallel_workers. Hence, no workers will be launched. I've > attached a patch to fix this. The current explanation of active number of parallel workers is: * The active * number of parallel workers is the number of registered workers minus the * terminated ones. In the situations like you mentioned above, this formula can give negative number for active parallel workers. However a negative number for active parallel workers does not make any sense. I feel it would be better to explain in code that in what situations, the formula can generate a negative result and what that means. Regards, Neha