[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-25 Thread lslebodn
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

lslebodn commented:
"""
master:
* e6a5f8c58539fc31fd81fac89cfc85703b4250ea
* 087162b85e191af51637904702813969b35eaadc

sssd-1-14:
* 0606a71b698c4acf954ba7284e62acbd0aa5e52d
* 442985a7af2262fab57f56c7a8cd40af10081610

LS

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-275086247
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-13 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
To me as well. I tested again the watchdog restart and the timeshift and both 
cases work fine.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-272409701
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-10 Thread pbrezina
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

pbrezina commented:
"""
Looks good to me.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-271574259
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-10 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
Patchset updated!
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-271558289
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-10 Thread lslebodn
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

lslebodn commented:
"""
Are there any objection to the proposed change?
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-271523691
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-10 Thread lslebodn
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

lslebodn commented:
"""
I checked the man page and API on linux and BSD and setpgrp has a different API 
on these platforms.
But there is similar function which do the same and has the same API on both 
platforms. `setpgid()`.

It would be good if you could change it. It will reduce some platform specific 
ifdefs. And we needn't change SELinux policy twice from setpgrp -> setpgid.

ATM; there is a comment that we should fail  after failure in setpgrp after 
fixed SElinux policy.
But it would cause a problems on older distributions where we do not plan to 
rebase sssd and therefore SELinux policy would not be updated. Upstream version 
would not work there.

I think we should just scream to log file and inform users that it can leak 
some child processes in rare cases. And we should use -getpid() in watchdog 
handler. It will kill whole process groups if  setpgrp/setpgid passed and only 
the process if it such process group does not exist.

Here is a diff; feel free to rephrase some comments
```
diff --git a/src/util/server.c b/src/util/server.c
index 60019a7d3..9d1af6a0a 100644
--- a/src/util/server.c
+++ b/src/util/server.c
@@ -460,16 +460,16 @@ int server_setup(const char *name, int flags,
 struct logrotate_ctx *lctx;
 char *locale;
 int watchdog_interval;
+pid_t my_pid;
 
-ret = setpgrp();
+my_pid = getpid();
+ret = setpgid(my_pid, my_pid);
 if (ret != EOK) {
 ret = errno;
 DEBUG(SSSDBG_MINOR_FAILURE,
-  "Failed setting process group: %s[ %d]\n",
+  "Failed setting process group: %s[ %d]. "
+  "We might leak processes in case of failure\n",
   sss_strerror(ret), ret);
-
-/* Do not abort here till SELinux policies are fixed,
- * otherwise SSSD won't start up. */
 }
 
 ret = chown_debug_file(NULL, uid, gid);
diff --git a/src/util/util_watchdog.c b/src/util/util_watchdog.c
index 044e87f6e..68f055f3b 100644
--- a/src/util/util_watchdog.c
+++ b/src/util/util_watchdog.c
@@ -50,7 +50,12 @@ static void watchdog_detect_timeshift(void)
 if (cur_time < prev_time) {
 /* Time shift detected. We need to restart watchdog. */
 if (write(watchdog_ctx.pipefd[1], "1", 1) != 1) {
-kill(-getpgrp(), SIGTERM);
+/* getpid() should return the same value as getpgrp()
+ * If they are not the same then setpgrp()/setpgid() failed
+ * and we should not kill whole process group but only current
+ * process.
+ */
+kill(-getpid(), SIGTERM);
 }
 }
 }
```
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-271523487
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-09 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
New version of the series has been pushed and attend both Pavel and Lukáš 
comments.

As mentioned in the first patch this series will only work with SELinux on 
Permissive mode and a follow up patch will be needed once SELinux policies get 
updated on Fedora/EL.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-271327849
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-05 Thread lslebodn
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

lslebodn commented:
"""
On (05/01/17 04:52), Jakub Hrozek wrote:
>btw now I'm wondering if the setpgrp should be a separate patch also for 
>stable branches because I guess the bug was present in sssd for quite a long 
>time?
>
I kind of misunderstood the question.
You meant 1.13 branch.

We call `kill(-getpgrp(), SIGTERM);` only for final cleanup in
monitor_quit.

But in case of setpgrp() we would be able to
call do following diff and cleanup would be automaticaly done per service
```
-  kret = kill(svc->pid, SIGTERM);
+  kret = kill(-svc->pid, SIGTERM);
```

But patch for 1.13 and for 1.14 might be different.

LS

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-270641029
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-05 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
On Thu, Jan 5, 2017 at 2:03 PM, lslebodn  wrote:

> On (05/01/17 04:52), Jakub Hrozek wrote:
> >btw now I'm wondering if the setpgrp should be a separate patch also for
> stable branches because I guess the bug was present in sssd for quite a
> long time?
> >
> sure; Should Fabiano prepare new PR or it can be in this one.
> Because it's related.
>

As the patch should come before this one I'd prefer to have both in this
very same PR.

Best Regards,
--
Fabiano Fidêncio

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-270639914
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-05 Thread lslebodn
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

lslebodn commented:
"""
On (05/01/17 04:52), Jakub Hrozek wrote:
>btw now I'm wondering if the setpgrp should be a separate patch also for 
>stable branches because I guess the bug was present in sssd for quite a long 
>time?
>
sure; Should Fabiano prepare new PR or it can be in this one.
Because it's related.

LS

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-270639258
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-05 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
btw now I'm wondering if the setpgrp should be a separate patch also for stable 
branches because I guess the bug was present in sssd for quite a long time?
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-270637274
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-05 Thread lslebodn
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

lslebodn commented:
"""
On (03/01/17 06:41), fidencio wrote:
>Argh, and also:
>```
>diff --git a/src/util/util_watchdog.c b/src/util/util_watchdog.c
>index 17954d1..77ba705 100644
>--- a/src/util/util_watchdog.c
>+++ b/src/util/util_watchdog.c
>@@ -50,7 +50,7 @@ static bool watchdog_detect_timeshift(void)
> if (cur_time < prev_time) {
> /* Time shift detected. We need to restart watchdog. */
> if (write(watchdog_ctx.pipefd[1], "1", 1) != 1) {
>-_exit(1);
>+kill(-getpgrp(), SIGTERM);
That would work if we call setpgrp() in server_setup
or just in sssd_be; because other processes does/should not use fork/exec

You can try following example:
```
#include 
#include 
#include 

int test_func(int main_sec_sleep)
{
int ret;
pid_t main_pid = getpid();
setpgrp();

printf(" getpid() %d:\n", getpid());
printf(" getpgrp() %d:\n", getpgrp());

for (int i=0; i<3; i++) {
ret = fork();
if ( ret == 0 ) {
printf("  getpid() %d:\n", getpid());
printf("  getpgrp() %d:\n", getpgrp());
sleep(1000);
break;
}
}

if ( main_pid == getpid()) {
sleep(main_sec_sleep);
kill(-main_pid, 9);
}
}

int main(void)
{
int ret;
pid_t main_pid = getpid();

printf("getpid() %d:\n", getpid());
printf("getpgrp() %d:\n", getpgrp());

for (int i=0; i<3; i++) {
ret = fork();
if ( ret == 0 ) {
test_func(5 + i);
break;
}
}

if ( main_pid == getpid()) {
sleep(15);
kill(-main_pid, 9);
}
}

```

As you can see I could use -main_pid() in kill
becsaue getpid and getpgrp returned the same value

LS


"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-270635169
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-05 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
ACK
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-270630387
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-05 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
Hmm. I really didn't notice it before but you're right.
It ends up killing all explicitly started responders on my setup. Sorry for not 
having it tested properly before submitting this version.

I'll replace it by _exit(1) as it was before and resubmit the patch.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-270624750
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-05 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
hmm, it seems I was wrong and at least with systemd (is that the difference?) 
when we kill the whole process group also the nss and pam responders (that were 
started explicitly) are killed.

So I was wrong. Can you please retest if you see the same? But in general I 
think we can stick to the previous version, sorry about guiding you down the 
wrong path.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-270622372
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-03 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
Argh, and also:
```
diff --git a/src/util/util_watchdog.c b/src/util/util_watchdog.c
index 17954d1..77ba705 100644
--- a/src/util/util_watchdog.c
+++ b/src/util/util_watchdog.c
@@ -50,7 +50,7 @@ static bool watchdog_detect_timeshift(void)
 if (cur_time < prev_time) {
 /* Time shift detected. We need to restart watchdog. */
 if (write(watchdog_ctx.pipefd[1], "1", 1) != 1) {
-_exit(1);
+kill(-getpgrp(), SIGTERM);
 }
 
 return true;
```
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-270128553
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-03 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
Here is the diff between the older version and the one about to be pushed:
```
[ffidenci@cat sssd]$ git diff
diff --git a/src/util/util_watchdog.c b/src/util/util_watchdog.c
index b4889a1..17954d1 100644
--- a/src/util/util_watchdog.c
+++ b/src/util/util_watchdog.c
@@ -75,7 +75,7 @@ static void watchdog_handler(int sig)
 
 /* if a pre-defined number of ticks passed by kills itself */
 if (__sync_add_and_fetch(_ctx.ticks, 1) > WATCHDOG_MAX_TICKS) {
-_exit(1);
+kill(-getpgrp(), SIGTERM);
 }
 }
 
@@ -127,8 +127,8 @@ static errno_t watchdog_fd_recv_data(int fd)
 }
 
 static void watchdog_fd_read_handler(struct tevent_context *ev,
-struct tevent_fd *fde,
-uint16_t flags,
+ struct tevent_fd *fde,
+ uint16_t flags,
  void *data)
 {
 errno_t ret;

```
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-270127095
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-02 Thread lslebodn
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

lslebodn commented:
"""
On (02/01/17 06:15), fidencio wrote:
>On Mon, Jan 2, 2017 at 8:35 AM, lslebodn  wrote:
>
>> retest this please
>>
>
>Is this comment for the author of the patch or is this comment for the
>reviewer of the patch?
>
It is a command for jenkins plugin to retest a PR.

https://github.com/jenkinsci/ghprb-plugin/blob/master/README.md
https://wiki.jenkins-ci.org/display/JENKINS/Github+pull+request+builder+plugin

There seems to be some issues with epel repository
and therefore installation of packages failed
https://ci.centos.org/job/sssd-CentOS6/260/console

LS

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-269985115
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2017-01-01 Thread lslebodn
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

lslebodn commented:
"""
retest this please
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-269941038
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-21 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
Not sure why CentOS CI failed.
Internal CI has passed in the most of the supported distros and the failures 
doesn't seem related to this patch: 
http://sssd-ci.duckdns.org/logs/job/59/64/summary.html
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-268540889
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-21 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
Okay, patch has been updated.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-268503583
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-20 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
CI: http://sssd-ci.duckdns.org/logs/job/59/63/summary.html
Rawhide failure is unrelated.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-268283999
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-20 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
Patch updated according to Simo's suggestion.
Removing the label "Changes requested" as it's done for review.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-268272046
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-15 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
I just set the Changes Requested label so that it's clear to reviewers new 
patch set is coming up..
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-267285051
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-14 Thread pbrezina
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

pbrezina commented:
"""
Sounds good.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-267003707
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-13 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
On Tue, Dec 13, 2016 at 08:16:33AM -0800, Simo Sorce wrote:
> On Tue, 2016-12-13 at 08:02 -0800, Jakub Hrozek wrote:
> > On Tue, Dec 13, 2016 at 07:06:58AM -0800, Simo Sorce wrote:
> > > On Tue, 2016-12-13 at 05:59 -0800, Jakub Hrozek wrote:
> > > > On Tue, Dec 13, 2016 at 02:44:44AM -0800, Simo Sorce wrote:
> > > > > On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote:
> > > > > > Pavel,
> > > > > > 
> > > > > > On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina 
> > > > > > 
> > > > > > wrote:
> > > > > > 
> > > > > > > There are two scenarios:
> > > > > > >
> > > > > > >1. timeshift during system boot -- it is very common to be 
> > > > > > > several
> > > > > > >hours
> > > > > > >2. timeshift due to an ntp update when booted up -- usually 
> > > > > > > only few
> > > > > > >seconds, not a big deal
> > > > > > >
> > > > > > > The problem with tevent timer is that if we shift backwards the 
> > > > > > > timer
> > > > > > > remains too far in the future. This applies to all timers, not 
> > > > > > > only for the
> > > > > > > watchdog. Forward shift is not a problem, it just executes the 
> > > > > > > timers
> > > > > > > immediately. Resetting the watchdog helps in a way that sssd is 
> > > > > > > not killed,
> > > > > > > we don't have any capability to reschedule all timed event and we 
> > > > > > > actually
> > > > > > > can not tell that sssd will be functioning properly (dyndns, sudo 
> > > > > > > refresh,
> > > > > > > enumeration, domain refresh, even idle timer on socket 
> > > > > > > activation)... all
> > > > > > > those operations that depends on time() would become unreliable.
> > > > > > >
> > > > > > > I think the best thing to do would be restart the process 
> > > > > > > (although the
> > > > > > > question is how would this affect the boot up) and patch tevent 
> > > > > > > to deal
> > > > > > > with timeshift either by using monotonic clock or by detecting 
> > > > > > > them and
> > > > > > > altering timers accordingly.
> > > > > > >
> > > > > > 
> > > > > > In the latest version of patch I've just called _exit(1) when the 
> > > > > > timeshift
> > > > > > is detected.
> > > > > > About patching tevent, I've seen some old discussions happening and 
> > > > > > it
> > > > > > doesn't seem a trivial thing to do. Would the patch, as it is right 
> > > > > > now, be
> > > > > > acceptable and then a work on tevent could be done later (yes, I'd 
> > > > > > add it
> > > > > > to my queue and do it as soon as we have an agreement on doing 
> > > > > > this)?
> > > > > 
> > > > > This is really a blunt tool (calling exit()), but until tevent can be
> > > > > fixed the only other option would be to use some wrapper to keep track
> > > > > of all existing timed events and cancel and restart them all if the
> > > > > clock changes abruptly.
> > > > 
> > > > that's why I suggested signaling self to a tevent-driven signal handler
> > > > from where we can just set up the timer anew. 
> > > > 
> > > > If there is any other way to 'break out' of the POSIX signal handler
> > > > into somewhere where we can call tevent/talloc (or in general unsafe
> > > > calls) I'm all ears.
> > > 
> > > I guess I need to understand better what exactly you want to do to be
> > > able to advice on something. I can think of a coulpe of options, none of
> > > them particularly elegant :)
> > 
> > OK, let me try to explain better.
> > 
> > A machine drifts time. Then an SSSD process receives SIGRT in
> > watchdog_handler() and detects the time has drifted, so it avoids
> > increasing the watchdog ticks counter -- this is done in
> > watchdog_detect_timeshift() at the moment.
> > 
> > At that point, in the current master, we call teardown_watchdog() and
> > setup_watchdog() to set a new watchdog (the part that is based on tevent
> > timers). This is unsafe to do in a signal handler because it involves
> > malloc and free among others called from tevent.
> > 
> > What I'm trying to figure out is how to reset the watchdog when I detect
> > in watchdog_detect_timeshift() the time is out of sync and the tevent
> > timer that resets the ticks will not arrive until the sssd process
> > receives enough SIGRT signals to get itself killed.
> > 
> > Does the question make sense now?
> 
> Yes,
> I see a few ways, a file descriptor (like tevent does for handling
> signals) used to fire a tevent_fd event that will perform these actions.
> Or a global variable, that is checked in the main loop at idle times and
> resets the watchdog, finally a mutex on which a dedicated thread waits,
> the thread only job is to run the reset if the mutex is released (but
> then the mutex needs to be re-acquired, probably on the next tick), but
> then again not sure if the code run in the setup_watchdog() is thread
> safe.

Thank you.

Without doing any coding, 

[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-13 Thread simo5
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

simo5 commented:
"""
On Tue, 2016-12-13 at 08:02 -0800, Jakub Hrozek wrote:
> On Tue, Dec 13, 2016 at 07:06:58AM -0800, Simo Sorce wrote:
> > On Tue, 2016-12-13 at 05:59 -0800, Jakub Hrozek wrote:
> > > On Tue, Dec 13, 2016 at 02:44:44AM -0800, Simo Sorce wrote:
> > > > On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote:
> > > > > Pavel,
> > > > > 
> > > > > On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina 
> > > > > 
> > > > > wrote:
> > > > > 
> > > > > > There are two scenarios:
> > > > > >
> > > > > >1. timeshift during system boot -- it is very common to be 
> > > > > > several
> > > > > >hours
> > > > > >2. timeshift due to an ntp update when booted up -- usually only 
> > > > > > few
> > > > > >seconds, not a big deal
> > > > > >
> > > > > > The problem with tevent timer is that if we shift backwards the 
> > > > > > timer
> > > > > > remains too far in the future. This applies to all timers, not only 
> > > > > > for the
> > > > > > watchdog. Forward shift is not a problem, it just executes the 
> > > > > > timers
> > > > > > immediately. Resetting the watchdog helps in a way that sssd is not 
> > > > > > killed,
> > > > > > we don't have any capability to reschedule all timed event and we 
> > > > > > actually
> > > > > > can not tell that sssd will be functioning properly (dyndns, sudo 
> > > > > > refresh,
> > > > > > enumeration, domain refresh, even idle timer on socket 
> > > > > > activation)... all
> > > > > > those operations that depends on time() would become unreliable.
> > > > > >
> > > > > > I think the best thing to do would be restart the process (although 
> > > > > > the
> > > > > > question is how would this affect the boot up) and patch tevent to 
> > > > > > deal
> > > > > > with timeshift either by using monotonic clock or by detecting them 
> > > > > > and
> > > > > > altering timers accordingly.
> > > > > >
> > > > > 
> > > > > In the latest version of patch I've just called _exit(1) when the 
> > > > > timeshift
> > > > > is detected.
> > > > > About patching tevent, I've seen some old discussions happening and it
> > > > > doesn't seem a trivial thing to do. Would the patch, as it is right 
> > > > > now, be
> > > > > acceptable and then a work on tevent could be done later (yes, I'd 
> > > > > add it
> > > > > to my queue and do it as soon as we have an agreement on doing this)?
> > > > 
> > > > This is really a blunt tool (calling exit()), but until tevent can be
> > > > fixed the only other option would be to use some wrapper to keep track
> > > > of all existing timed events and cancel and restart them all if the
> > > > clock changes abruptly.
> > > 
> > > that's why I suggested signaling self to a tevent-driven signal handler
> > > from where we can just set up the timer anew. 
> > > 
> > > If there is any other way to 'break out' of the POSIX signal handler
> > > into somewhere where we can call tevent/talloc (or in general unsafe
> > > calls) I'm all ears.
> > 
> > I guess I need to understand better what exactly you want to do to be
> > able to advice on something. I can think of a coulpe of options, none of
> > them particularly elegant :)
> 
> OK, let me try to explain better.
> 
> A machine drifts time. Then an SSSD process receives SIGRT in
> watchdog_handler() and detects the time has drifted, so it avoids
> increasing the watchdog ticks counter -- this is done in
> watchdog_detect_timeshift() at the moment.
> 
> At that point, in the current master, we call teardown_watchdog() and
> setup_watchdog() to set a new watchdog (the part that is based on tevent
> timers). This is unsafe to do in a signal handler because it involves
> malloc and free among others called from tevent.
> 
> What I'm trying to figure out is how to reset the watchdog when I detect
> in watchdog_detect_timeshift() the time is out of sync and the tevent
> timer that resets the ticks will not arrive until the sssd process
> receives enough SIGRT signals to get itself killed.
> 
> Does the question make sense now?

Yes,
I see a few ways, a file descriptor (like tevent does for handling
signals) used to fire a tevent_fd event that will perform these actions.
Or a global variable, that is checked in the main loop at idle times and
resets the watchdog, finally a mutex on which a dedicated thread waits,
the thread only job is to run the reset if the mutex is released (but
then the mutex needs to be re-acquired, probably on the next tick), but
then again not sure if the code run in the setup_watchdog() is thread
safe.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York


"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266782930
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-13 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
On Tue, Dec 13, 2016 at 07:06:58AM -0800, Simo Sorce wrote:
> On Tue, 2016-12-13 at 05:59 -0800, Jakub Hrozek wrote:
> > On Tue, Dec 13, 2016 at 02:44:44AM -0800, Simo Sorce wrote:
> > > On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote:
> > > > Pavel,
> > > > 
> > > > On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina 
> > > > 
> > > > wrote:
> > > > 
> > > > > There are two scenarios:
> > > > >
> > > > >1. timeshift during system boot -- it is very common to be several
> > > > >hours
> > > > >2. timeshift due to an ntp update when booted up -- usually only 
> > > > > few
> > > > >seconds, not a big deal
> > > > >
> > > > > The problem with tevent timer is that if we shift backwards the timer
> > > > > remains too far in the future. This applies to all timers, not only 
> > > > > for the
> > > > > watchdog. Forward shift is not a problem, it just executes the timers
> > > > > immediately. Resetting the watchdog helps in a way that sssd is not 
> > > > > killed,
> > > > > we don't have any capability to reschedule all timed event and we 
> > > > > actually
> > > > > can not tell that sssd will be functioning properly (dyndns, sudo 
> > > > > refresh,
> > > > > enumeration, domain refresh, even idle timer on socket activation)... 
> > > > > all
> > > > > those operations that depends on time() would become unreliable.
> > > > >
> > > > > I think the best thing to do would be restart the process (although 
> > > > > the
> > > > > question is how would this affect the boot up) and patch tevent to 
> > > > > deal
> > > > > with timeshift either by using monotonic clock or by detecting them 
> > > > > and
> > > > > altering timers accordingly.
> > > > >
> > > > 
> > > > In the latest version of patch I've just called _exit(1) when the 
> > > > timeshift
> > > > is detected.
> > > > About patching tevent, I've seen some old discussions happening and it
> > > > doesn't seem a trivial thing to do. Would the patch, as it is right 
> > > > now, be
> > > > acceptable and then a work on tevent could be done later (yes, I'd add 
> > > > it
> > > > to my queue and do it as soon as we have an agreement on doing this)?
> > > 
> > > This is really a blunt tool (calling exit()), but until tevent can be
> > > fixed the only other option would be to use some wrapper to keep track
> > > of all existing timed events and cancel and restart them all if the
> > > clock changes abruptly.
> > 
> > that's why I suggested signaling self to a tevent-driven signal handler
> > from where we can just set up the timer anew. 
> > 
> > If there is any other way to 'break out' of the POSIX signal handler
> > into somewhere where we can call tevent/talloc (or in general unsafe
> > calls) I'm all ears.
> 
> I guess I need to understand better what exactly you want to do to be
> able to advice on something. I can think of a coulpe of options, none of
> them particularly elegant :)

OK, let me try to explain better.

A machine drifts time. Then an SSSD process receives SIGRT in
watchdog_handler() and detects the time has drifted, so it avoids
increasing the watchdog ticks counter -- this is done in
watchdog_detect_timeshift() at the moment.

At that point, in the current master, we call teardown_watchdog() and
setup_watchdog() to set a new watchdog (the part that is based on tevent
timers). This is unsafe to do in a signal handler because it involves
malloc and free among others called from tevent.

What I'm trying to figure out is how to reset the watchdog when I detect
in watchdog_detect_timeshift() the time is out of sync and the tevent
timer that resets the ticks will not arrive until the sssd process
receives enough SIGRT signals to get itself killed.

Does the question make sense now?

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266778681
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-13 Thread simo5
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

simo5 commented:
"""
On Tue, 2016-12-13 at 05:59 -0800, Jakub Hrozek wrote:
> On Tue, Dec 13, 2016 at 02:44:44AM -0800, Simo Sorce wrote:
> > On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote:
> > > Pavel,
> > > 
> > > On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina 
> > > wrote:
> > > 
> > > > There are two scenarios:
> > > >
> > > >1. timeshift during system boot -- it is very common to be several
> > > >hours
> > > >2. timeshift due to an ntp update when booted up -- usually only few
> > > >seconds, not a big deal
> > > >
> > > > The problem with tevent timer is that if we shift backwards the timer
> > > > remains too far in the future. This applies to all timers, not only for 
> > > > the
> > > > watchdog. Forward shift is not a problem, it just executes the timers
> > > > immediately. Resetting the watchdog helps in a way that sssd is not 
> > > > killed,
> > > > we don't have any capability to reschedule all timed event and we 
> > > > actually
> > > > can not tell that sssd will be functioning properly (dyndns, sudo 
> > > > refresh,
> > > > enumeration, domain refresh, even idle timer on socket activation)... 
> > > > all
> > > > those operations that depends on time() would become unreliable.
> > > >
> > > > I think the best thing to do would be restart the process (although the
> > > > question is how would this affect the boot up) and patch tevent to deal
> > > > with timeshift either by using monotonic clock or by detecting them and
> > > > altering timers accordingly.
> > > >
> > > 
> > > In the latest version of patch I've just called _exit(1) when the 
> > > timeshift
> > > is detected.
> > > About patching tevent, I've seen some old discussions happening and it
> > > doesn't seem a trivial thing to do. Would the patch, as it is right now, 
> > > be
> > > acceptable and then a work on tevent could be done later (yes, I'd add it
> > > to my queue and do it as soon as we have an agreement on doing this)?
> > 
> > This is really a blunt tool (calling exit()), but until tevent can be
> > fixed the only other option would be to use some wrapper to keep track
> > of all existing timed events and cancel and restart them all if the
> > clock changes abruptly.
> 
> that's why I suggested signaling self to a tevent-driven signal handler
> from where we can just set up the timer anew. 
> 
> If there is any other way to 'break out' of the POSIX signal handler
> into somewhere where we can call tevent/talloc (or in general unsafe
> calls) I'm all ears.

I guess I need to understand better what exactly you want to do to be
able to advice on something. I can think of a coulpe of options, none of
them particularly elegant :)

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York


"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266762324
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-13 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
On Tue, Dec 13, 2016 at 02:44:44AM -0800, Simo Sorce wrote:
> On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote:
> > Pavel,
> > 
> > On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina 
> > wrote:
> > 
> > > There are two scenarios:
> > >
> > >1. timeshift during system boot -- it is very common to be several
> > >hours
> > >2. timeshift due to an ntp update when booted up -- usually only few
> > >seconds, not a big deal
> > >
> > > The problem with tevent timer is that if we shift backwards the timer
> > > remains too far in the future. This applies to all timers, not only for 
> > > the
> > > watchdog. Forward shift is not a problem, it just executes the timers
> > > immediately. Resetting the watchdog helps in a way that sssd is not 
> > > killed,
> > > we don't have any capability to reschedule all timed event and we actually
> > > can not tell that sssd will be functioning properly (dyndns, sudo refresh,
> > > enumeration, domain refresh, even idle timer on socket activation)... all
> > > those operations that depends on time() would become unreliable.
> > >
> > > I think the best thing to do would be restart the process (although the
> > > question is how would this affect the boot up) and patch tevent to deal
> > > with timeshift either by using monotonic clock or by detecting them and
> > > altering timers accordingly.
> > >
> > 
> > In the latest version of patch I've just called _exit(1) when the timeshift
> > is detected.
> > About patching tevent, I've seen some old discussions happening and it
> > doesn't seem a trivial thing to do. Would the patch, as it is right now, be
> > acceptable and then a work on tevent could be done later (yes, I'd add it
> > to my queue and do it as soon as we have an agreement on doing this)?
> 
> This is really a blunt tool (calling exit()), but until tevent can be
> fixed the only other option would be to use some wrapper to keep track
> of all existing timed events and cancel and restart them all if the
> clock changes abruptly.

that's why I suggested signaling self to a tevent-driven signal handler
from where we can just set up the timer anew. 

If there is any other way to 'break out' of the POSIX signal handler
into somewhere where we can call tevent/talloc (or in general unsafe
calls) I'm all ears.

> The easiest option is to teach tevent about the existence of monotonic
> clocks and allow to set a default clock type to use as well as overrides
> for specific timers that want to fire on the actual time not just X
> seconds in the future.
> 
> Simo.
> 
> -- 
> Simo Sorce * Red Hat, Inc * New York
> 
> 
> 
> -- 
> You are receiving this because you commented.
> Reply to this email directly or view it on GitHub:
> https://github.com/SSSD/sssd/pull/107#issuecomment-266706120

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266745077
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-13 Thread simo5
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

simo5 commented:
"""
On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote:
> Pavel,
> 
> On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina 
> wrote:
> 
> > There are two scenarios:
> >
> >1. timeshift during system boot -- it is very common to be several
> >hours
> >2. timeshift due to an ntp update when booted up -- usually only few
> >seconds, not a big deal
> >
> > The problem with tevent timer is that if we shift backwards the timer
> > remains too far in the future. This applies to all timers, not only for the
> > watchdog. Forward shift is not a problem, it just executes the timers
> > immediately. Resetting the watchdog helps in a way that sssd is not killed,
> > we don't have any capability to reschedule all timed event and we actually
> > can not tell that sssd will be functioning properly (dyndns, sudo refresh,
> > enumeration, domain refresh, even idle timer on socket activation)... all
> > those operations that depends on time() would become unreliable.
> >
> > I think the best thing to do would be restart the process (although the
> > question is how would this affect the boot up) and patch tevent to deal
> > with timeshift either by using monotonic clock or by detecting them and
> > altering timers accordingly.
> >
> 
> In the latest version of patch I've just called _exit(1) when the timeshift
> is detected.
> About patching tevent, I've seen some old discussions happening and it
> doesn't seem a trivial thing to do. Would the patch, as it is right now, be
> acceptable and then a work on tevent could be done later (yes, I'd add it
> to my queue and do it as soon as we have an agreement on doing this)?

This is really a blunt tool (calling exit()), but until tevent can be
fixed the only other option would be to use some wrapper to keep track
of all existing timed events and cancel and restart them all if the
clock changes abruptly.
The easiest option is to teach tevent about the existence of monotonic
clocks and allow to set a default clock type to use as well as overrides
for specific timers that want to fire on the actual time not just X
seconds in the future.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York


"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266706120
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-13 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
Pavel,

On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina 
wrote:

> There are two scenarios:
>
>1. timeshift during system boot -- it is very common to be several
>hours
>2. timeshift due to an ntp update when booted up -- usually only few
>seconds, not a big deal
>
> The problem with tevent timer is that if we shift backwards the timer
> remains too far in the future. This applies to all timers, not only for the
> watchdog. Forward shift is not a problem, it just executes the timers
> immediately. Resetting the watchdog helps in a way that sssd is not killed,
> we don't have any capability to reschedule all timed event and we actually
> can not tell that sssd will be functioning properly (dyndns, sudo refresh,
> enumeration, domain refresh, even idle timer on socket activation)... all
> those operations that depends on time() would become unreliable.
>
> I think the best thing to do would be restart the process (although the
> question is how would this affect the boot up) and patch tevent to deal
> with timeshift either by using monotonic clock or by detecting them and
> altering timers accordingly.
>

In the latest version of patch I've just called _exit(1) when the timeshift
is detected.
About patching tevent, I've seen some old discussions happening and it
doesn't seem a trivial thing to do. Would the patch, as it is right now, be
acceptable and then a work on tevent could be done later (yes, I'd add it
to my queue and do it as soon as we have an agreement on doing this)?


> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
> 
> .
>

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266702069
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-13 Thread pbrezina
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

pbrezina commented:
"""
There are two scenarios:
1) timeshift during system boot -- it is very common to be several hours
2) timeshift due to an ntp update when booted up -- usually only few seconds, 
not a big deal

The problem with tevent timer is that if we shift backwards the timer remains 
too far in the future. This applies to all timers, not only for the watchdog. 
Forward shift is not a problem, it just executes the timers immediately. 
Resetting the watchdog helps in a way that sssd is not killed, we don't have 
any capability to reschedule all timed event and we actually can not tell that 
sssd will be functioning properly (dyndns, sudo refresh, enumeration, domain 
refresh, even idle timer on socket activation)... all those operations that 
depends on time() would become unreliable.

I think the best thing to do would be restart the process (although the 
question is how would this affect the boot up) and patch tevent to deal with 
timeshift either by using monotonic clock or by detecting them and altering 
timers accordingly.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266700933
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-12 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
On Mon, Dec 12, 2016 at 07:30:30AM -0800, Simo Sorce wrote:
> well you could have a globalk variable for the watchdog and change it from a 
> custom signal handler, but the point of the watchdog is to go thorugh the 
> tevent handler instead so that we are sure the machinery is working and not 
> stuck somwhere.
> Resetting directly from the singal handler would bypass all processing and 
> therefore render the watchdog useless I guess.

The problem here (as I understand it, Pavel or Fabiano can correct me if
I'm wrong) is that the watchdog increases the counter inside a POSIX
signal handler, but resets the counter in a tevent timer (to make sure
the mainloop is being processed).

Now, if the time drifts, we still are receiving the monotonic SIGRT
signals into the POSIX handlers, but because the tevent timer never
gets invoked (it's set to be invoked in a time in the future, because
the time drifted), we never reset the counter.

We can detect the time has drifted in the POSIX SIGRT handler, the
question I'm trying to answer is how should we restart the tevent timer
when we receive the SIGRT signal, but we because we are in the POSIX
handler, we are quite restriced in what we can do..

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266462611
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-12 Thread simo5
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

simo5 commented:
"""
well you could have a globalk variable for the watchdog and change it from a 
custom signal handler, but the point of the watchdog is to go thorugh the 
tevent handler instead so that we are sure the machinery is working and not 
stuck somwhere.
Resetting directly from the singal handler would bypass all processing and 
therefore render the watchdog useless I guess.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266460639
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-12 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
On Mon, Dec 12, 2016 at 06:55:13AM -0800, Simo Sorce wrote:
> Yes we should ask, I think we really need to try to use monotonic clocks for 
> most tasks.

Thank you, @simo5, do you also have a preference for how to reset the
watchdog from a 'plain' POSIX signal handler where we can't use
signal-unsafe functions? My suggestion was to signal 'self' to get into
a tevent-managed signal handler, Fabiano suggested to play it safe and
just exist and let monitor restart sssd_be.

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266458115
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-12 Thread simo5
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

simo5 commented:
"""
Yes we should ask, I think we really need to try to use monotonic clocks for 
most tasks.
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266450977
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-12 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
I'm not sure I like the idea of killing a service because the admin runs 
ntpdate, even if the monitor should restart the service. @simo5 do you have a 
preference here?

I really wonder why doesn't tevent use monotonic clock itself, should we ask on 
the samba list?
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266449652
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-12 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
On Mon, Dec 12, 2016 at 3:37 PM, Jakub Hrozek 
wrote:

> So..I wonder how much of a hack this is but if the intent here is to reset
> the watchdog from a signal handler, could we send a signal to self 
> (kill(getpid(),
> SIGHUP)) with a signal that is handled by tevent and in that
> tevent-driven signal handler, reset the watchdog?
>

Things are getting more and more hacky.
I'm wondering here, what would be the problem to simply `_exit(1)` in case
of a timeshift is detected?


> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
> 
> .
>

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266447493
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-12 Thread jhrozek
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek commented:
"""
So..I wonder how much of a hack this is but if the intent here is to reset the 
watchdog from a signal handler, could we send a signal to self (`kill(getpid(), 
SIGHUP)`) with a signal that is handled by tevent and in that tevent-driven 
signal handler, reset the watchdog?
"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266446340
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-12 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
On Mon, Dec 12, 2016 at 1:50 PM, Pavel Březina 
wrote:

> *@pbrezina* commented on this pull request.
> --
>
> In src/util/util_watchdog.c
> :
>
> > + *
> + * As restarting the watchdog from the signal_handler may
> + * lead sssd_be to a deadlock due to the use of non async-
> + * signal-safe function. let's just (re)set the timer's
> + * time.
> + */
> +
> +/* As done when setting up the watchdog handler,
> + * we give 1 second head start to the watchdog event */
> +its.it_value.tv_sec = watchdog_ctx.interval.tv_sec + 1;
> +its.it_value.tv_nsec = 0;
> +its.it_interval.tv_sec = watchdog_ctx.interval.tv_sec;
> +its.it_interval.tv_nsec = 0;
> +ret = timer_settime(watchdog_ctx.timerid, 0, , NULL);
> +if (ret == -1) {
> +_exit(1);
>
> This is not sufficient because we need to reset the tevent timer. I think
> watchdog itself is actually fired in the right time because the timer clock
> is adjusted. The problematic timer is the tevent one, which needs to be
> rescheduled.
>

Okay. I guess I see what you mean.
Even with this timeout being reset the watchdog would end up killing the
process due to the tevent timeout, is it?
Considering that's the case I'd say there's nothing much that can be done
from our side apart from ignoring the ticks on those cases. At least till
we have it changed/fixed on tevent's side. And suggestions are welcome!


> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
> 
> .
>
Thanks for the review!
--
Fabiano Fidêncio

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266431337
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-12 Thread fidencio
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

fidencio commented:
"""
On Mon, Dec 12, 2016 at 1:22 PM, lslebodn  wrote:

> On (12/12/16 01:20), fidencio wrote:
> >While debugging rhbz#1396912 some deadlock on sssd_be was noticed[0] and
> >it's been caused by the use of non async-signal-safe functions from the
> >signal_handler (please, see man 7 signal for more info about which are
> >the async-signal-safe functions that can be used).
> >
> >The removal of those functions we lose some debug messages,
> If these messages are important we can write them(static strings) to stderr
> or other file descriptor directly with write which can be safely
> called inside signal handler.
>

I would say it doesn't worth the trouble, to be honest.
But, well, other's opinions are welcome!


>
> >which is the
> >least of problems now. Also instead of re-setting the watchdog timeout
> >in case of time-shift (please, see b8ceaeb for more info), let's just
> >re-set the timer's time.
> >
> >Along with this commit, the function to teardown the watchdog has been
> >removed as it's not used anymore.
> >
> Must not be removed.
>
> One of the main usesaces is to disable watchdog for debugging with gdb.
> It is very usefull. You might add comment there why it should not be
> removed.
>

Okay.


>
> NOTE: I didn't read the patch. Just commenting commit message.
>
> LS
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
> 
> .
>

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266430843
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

2016-12-12 Thread lslebodn
  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler

lslebodn commented:
"""
On (12/12/16 01:20), fidencio wrote:
>While debugging rhbz#1396912 some deadlock on sssd_be was noticed[0] and
>it's been caused by the use of non async-signal-safe functions from the
>signal_handler (please, see man 7 signal for more info about which are
>the async-signal-safe functions that can be used).
>
>The removal of those functions we lose some debug messages,
If these messages are important we can write them(static strings) to stderr
or other file descriptor directly with write which can be safely
called inside signal handler.

>which is the
>least of problems now. Also instead of re-setting the watchdog timeout
>in case of time-shift (please, see b8ceaeb for more info), let's just
>re-set the timer's time.
>
>Along with this commit, the function to teardown the watchdog has been
>removed as it's not used anymore.
>
Must not be removed.

One of the main usesaces is to disable watchdog for debugging with gdb.
It is very usefull. You might add comment there why it should not be removed.

NOTE: I didn't read the patch. Just commenting commit message.

LS

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266418517
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org