On Wed, Jul 5, 2017 at 6:25 PM, Mithun Cy <mithun...@enterprisedb.com> wrote:
> On Mon, Jul 3, 2017 at 11:58 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> On Sun, Jul 2, 2017 at 10:32 PM, Mithun Cy <mithun...@enterprisedb.com> 
>> wrote:
>>> On Tue, Jun 27, 2017 at 11:41 AM, Mithun Cy <mithun...@enterprisedb.com> 
>>> wrote:
>>>> On Fri, Jun 23, 2017 at 5:45 AM, Thom Brown <t...@linux.com> wrote:
>>>>>
>>>>> Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
>>>>> detect an autoprewarm process already running.  I'd want this to
>>>>> return NULL or an error if called for a 2nd time.
>>>>
>>>> We log instead of error as we try to check only after launching the
>>>> worker and inside worker. One solution could be as similar to
>>>> autoprewam_dump_now(), the autoprewarm_start_worker() can init shared
>>>> memory and check if we can launch worker in backend itself. I will try
>>>> to fix same.
>>>
>>> I have fixed it now as follows
>>>
>>> +Datum
>>> +autoprewarm_start_worker(PG_FUNCTION_ARGS)
>>> +{
>>> +   pid_t       pid;
>>> +
>>> +   init_apw_shmem();
>>> +   pid = apw_state->bgworker_pid;
>>> +   if (pid != InvalidPid)
>>> +       ereport(ERROR,
>>> +               (errmsg("autoprewarm worker is already running under PID 
>>> %d",
>>> +                       pid)));
>>> +
>>> +   autoprewarm_dump_launcher();
>>> +   PG_RETURN_VOID();
>>> +}
>>>
>>> In backend itself, we shall check if an autoprewarm worker is running
>>> then only start the server. There is a possibility if this function is
>>> executed concurrently when there is no worker already running (Which I
>>> think is not a normal usage) then both call will say it has
>>> successfully launched the worker even though only one could have
>>> successfully done that (other will log and silently die).
>>
>> Why can't we close this remaining race condition?  Basically, if we
>> just perform all of the actions in this function under the lock and
>> autoprewarm_dump_launcher waits till the autoprewarm worker has
>> initialized the bgworker_pid, then there won't be any remaining race
>> condition.  I think if we don't close this race condition, it will be
>> unpredictable whether the user will get the error or there will be
>> only a server log for the same.
>
> Yes, I can make autoprewarm_dump_launcher to wait until the launched
> bgworker set its pid, but this requires one more synchronization
> variable between launcher and worker. More than that I see
> ShmemInitStruct(), LWLockAcquire can throw ERROR (restarts the
> worker), which needs to be called before setting pid. So I thought it
> won't be harmful let two concurrent calls to launch workers and we
> just log failures. Please let me know if I need to rethink about it.
>

I don't know whether you need to rethink but as presented in the
patch, it seems unclear to me about the specs of API.  As this is an
exposed function to the user, I think the behavior should be well
defined.  If you don't think changing code has many advantages, then
at the very least update the docs to indicate the expectation and
behavior of the API.  Also, I think it is better to add few comments
in the code to tell about the unpredictable behavior in case of race
condition and the reason for same.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to