Re: Possible bug in or around posix_fadvise after r292326

2016-01-05 Thread Peter Holm
On Tue, Jan 05, 2016 at 01:07:40PM +0200, Konstantin Belousov wrote:
> On Mon, Jan 04, 2016 at 10:05:21PM -0800, Benno Rice wrote:
> > Hi Konstantin,
> > 
> > I recently updated my dev box to r292962. After doing this I attempted to 
> > set up PostgreSQL 9.4. When I ran initdb the last phase hung. Using 
> > procstat -kk I found it appeared to be stuck in a loop inside a 
> > posix_fadvise syscall. I could not ^C or ^Z the initdb process. I could 
> > kill it but a subsequent attempt to rm -rf the /usr/local/pgsql/data 
> > directory also got stuck and was unkillable by any means. Rebooting allowed 
> > me to remove the directory but the initdb process still hung when I re-ran 
> > it.
> > 
> > I tried PostgreSQL 9.3 with similar results.
> > 
> > Looking at the source code for initdb I found that it calls posix_fadvise 
> > like so[1]:
> > 
> >  /*
> >   * We do what pg_flush_data() would do in the backend: prefer to use
> >   * sync_file_range, but fall back to posix_fadvise.  We ignore errors
> >   * because this is only a hint.
> >   */
> >  #if defined(HAVE_SYNC_FILE_RANGE)
> >  (void) sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE);
> >  #elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
> >  (void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
> >  #else
> >  #error PG_FLUSH_DATA_WORKS should not have been defined
> >  #endif
> > 
> > Looking for recent commits involving POSIX_FADV_DONTNEED I found r292326:
> > 
> > https://svnweb.freebsd.org/changeset/base/292326 
> > 
> > 
> > Backing this revision out allowed the initdb process to complete.
> > 
> > My current theory is that some how we???re getting ENOLCK or EAGAIN from 
> > the BUF_TIMELOCK call in bnoreuselist:
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/vfs_subr.c?view=annotate#l1676
> >  
> > 
> > 
> > Leading to an infinite loop in vop_stdadvise:
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/vfs_default.c?annotate=292373#l1083
> >  
> > 
> > 
> > I haven???t managed to dig any deeper than that yet.
> > 
> > Is there any other information I could give you to help narrow this down?
> 
> I do not see this issue locally.
> 

I do:

(kgdb) f 9
#9  0x80ac7956 in vop_stdadvise (ap=0xfe081dc6d930) at 
../../../kern/vfs_default.c:1087
1087error = bnoreuselist(&bo->bo_dirty, bo, startn, 
endn);
(kgdb) l
1082endn = ap->a_end / bsize;
1083for (;;) {
1084error = bnoreuselist(&bo->bo_clean, bo, startn, 
endn);
1085if (error == EAGAIN)
1086continue;
1087error = bnoreuselist(&bo->bo_dirty, bo, startn, 
endn);
1088if (error == EAGAIN)
1089continue;
1090break;
1091}
(kgdb) info loc
vp = (struct vnode *) 0xf8008bdaa9c0
bo = (struct bufobj *) 0xf8008bdaab28
startn = 0x0
endn = 0x
start = 0x0
end = 0x8000
bsize = 0x8000
error = 0x0
(kgdb)

https://people.freebsd.org/~pho/stress/log/kostik855.txt

- Peter
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Possible bug in or around posix_fadvise after r292326

2016-01-05 Thread Konstantin Belousov
On Tue, Jan 05, 2016 at 01:07:40PM +0200, Konstantin Belousov wrote:
> On Mon, Jan 04, 2016 at 10:05:21PM -0800, Benno Rice wrote:
> > Hi Konstantin,
> > 
> > I recently updated my dev box to r292962. After doing this I attempted to 
> > set up PostgreSQL 9.4. When I ran initdb the last phase hung. Using 
> > procstat -kk I found it appeared to be stuck in a loop inside a 
> > posix_fadvise syscall. I could not ^C or ^Z the initdb process. I could 
> > kill it but a subsequent attempt to rm -rf the /usr/local/pgsql/data 
> > directory also got stuck and was unkillable by any means. Rebooting allowed 
> > me to remove the directory but the initdb process still hung when I re-ran 
> > it.
> > 
> > I tried PostgreSQL 9.3 with similar results.
> > 
> > Looking at the source code for initdb I found that it calls posix_fadvise 
> > like so[1]:
> > 
> >  /*
> >   * We do what pg_flush_data() would do in the backend: prefer to use
> >   * sync_file_range, but fall back to posix_fadvise.  We ignore errors
> >   * because this is only a hint.
> >   */
> >  #if defined(HAVE_SYNC_FILE_RANGE)
> >  (void) sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE);
> >  #elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
> >  (void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
> >  #else
> >  #error PG_FLUSH_DATA_WORKS should not have been defined
> >  #endif
> > 
> > Looking for recent commits involving POSIX_FADV_DONTNEED I found r292326:
> > 
> > https://svnweb.freebsd.org/changeset/base/292326 
> > 
> > 
> > Backing this revision out allowed the initdb process to complete.
> > 
> > My current theory is that some how we???re getting ENOLCK or EAGAIN from 
> > the BUF_TIMELOCK call in bnoreuselist:
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/vfs_subr.c?view=annotate#l1676
> >  
> > 
> > 
> > Leading to an infinite loop in vop_stdadvise:
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/vfs_default.c?annotate=292373#l1083
> >  
> > 
> > 
> > I haven???t managed to dig any deeper than that yet.
> > 
> > Is there any other information I could give you to help narrow this down?
> 
> I do not see this issue locally.
> 
> When the hang in initdb occur, what is the state of the initdb thread
> which performs advise() ?  Is it "brlsfl" sleep, or is the thread running ?
> 
> If buffer lock is not available, and this is the cause of the ENOLCK/EAGAIN,
> then the question is who is the owner of the corresponding buffer lock.
> You could overview the state of the system with 'ps' command in ddb, and
> 'show alllocks' would list owner, unless buffer was async.
> 
> Also, I do not quite understand the behaviour of SIGINT/SIGKILL.  Could
> it be that the process was not killed by SIGKILL as well ?  It would be
> consistent with the vnode lock still owned and preventing the accesses.

Just in case, if this is due to the quadratic loop behaviour, there
is no need to restart from the very start in the new stdadvise(DONTNEED)
implementation.  So regardless of the answers to the questions above,
you might also try this patch.

diff --git a/sys/kern/vfs_default.c b/sys/kern/vfs_default.c
index fd83f87..3da8618 100644
--- a/sys/kern/vfs_default.c
+++ b/sys/kern/vfs_default.c
@@ -1080,15 +1080,9 @@ vop_stdadvise(struct vop_advise_args *ap)
bsize = vp->v_bufobj.bo_bsize;
startn = ap->a_start / bsize;
endn = ap->a_end / bsize;
-   for (;;) {
-   error = bnoreuselist(&bo->bo_clean, bo, startn, endn);
-   if (error == EAGAIN)
-   continue;
+   error = bnoreuselist(&bo->bo_clean, bo, startn, endn);
+   if (error == 0)
error = bnoreuselist(&bo->bo_dirty, bo, startn, endn);
-   if (error == EAGAIN)
-   continue;
-   break;
-   }
BO_RUNLOCK(bo);
VOP_UNLOCK(vp, 0);
break;
diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index ace97e8..8cac32f 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -1670,6 +1670,7 @@ bnoreuselist(struct bufv *bufv, struct bufobj *bo, 
daddr_t startn, daddr_t endn)
ASSERT_BO_LOCKED(bo);
 
for (lblkno = startn;; lblkno++) {
+again:
bp = BUF_PCTRIE_LOOKUP_GE(&bufv->bv_root, lblkno);
if (bp == NULL || bp->b_lblkno >= endn)
break;
@@ -1677,7 +1678,9 @@ bnoreuselist(struct bufv *bufv, struct bufobj *bo, 
daddr_t startn, daddr_t endn)
LK_INTERLOCK, BO_LOCKPTR(bo), "brlsfl", 0, 0);
if (error != 0) {
BO_RLOCK(bo);
-  

Re: Possible bug in or around posix_fadvise after r292326

2016-01-05 Thread Konstantin Belousov
On Mon, Jan 04, 2016 at 10:05:21PM -0800, Benno Rice wrote:
> Hi Konstantin,
> 
> I recently updated my dev box to r292962. After doing this I attempted to set 
> up PostgreSQL 9.4. When I ran initdb the last phase hung. Using procstat -kk 
> I found it appeared to be stuck in a loop inside a posix_fadvise syscall. I 
> could not ^C or ^Z the initdb process. I could kill it but a subsequent 
> attempt to rm -rf the /usr/local/pgsql/data directory also got stuck and was 
> unkillable by any means. Rebooting allowed me to remove the directory but the 
> initdb process still hung when I re-ran it.
> 
> I tried PostgreSQL 9.3 with similar results.
> 
> Looking at the source code for initdb I found that it calls posix_fadvise 
> like so[1]:
> 
>  /*
>   * We do what pg_flush_data() would do in the backend: prefer to use
>   * sync_file_range, but fall back to posix_fadvise.  We ignore errors
>   * because this is only a hint.
>   */
>  #if defined(HAVE_SYNC_FILE_RANGE)
>  (void) sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE);
>  #elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
>  (void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
>  #else
>  #error PG_FLUSH_DATA_WORKS should not have been defined
>  #endif
> 
> Looking for recent commits involving POSIX_FADV_DONTNEED I found r292326:
> 
> https://svnweb.freebsd.org/changeset/base/292326 
> 
> 
> Backing this revision out allowed the initdb process to complete.
> 
> My current theory is that some how we???re getting ENOLCK or EAGAIN from the 
> BUF_TIMELOCK call in bnoreuselist:
> 
> https://svnweb.freebsd.org/base/head/sys/kern/vfs_subr.c?view=annotate#l1676 
> 
> 
> Leading to an infinite loop in vop_stdadvise:
> 
> https://svnweb.freebsd.org/base/head/sys/kern/vfs_default.c?annotate=292373#l1083
>  
> 
> 
> I haven???t managed to dig any deeper than that yet.
> 
> Is there any other information I could give you to help narrow this down?

I do not see this issue locally.

When the hang in initdb occur, what is the state of the initdb thread
which performs advise() ?  Is it "brlsfl" sleep, or is the thread running ?

If buffer lock is not available, and this is the cause of the ENOLCK/EAGAIN,
then the question is who is the owner of the corresponding buffer lock.
You could overview the state of the system with 'ps' command in ddb, and
'show alllocks' would list owner, unless buffer was async.

Also, I do not quite understand the behaviour of SIGINT/SIGKILL.  Could
it be that the process was not killed by SIGKILL as well ?  It would be
consistent with the vnode lock still owned and preventing the accesses.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Possible bug in or around posix_fadvise after r292326

2016-01-04 Thread Adrian Chadd
+1, saw this locally on my up to date amd64 laptop. :(


-a
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Possible bug in or around posix_fadvise after r292326

2016-01-04 Thread Benno Rice

> On Jan 4, 2016, at 22:05, Benno Rice  wrote:
> 
> Hi Konstantin,
> 
> I recently updated my dev box to r292962. After doing this I attempted to set 
> up PostgreSQL 9.4. When I ran initdb the last phase hung. Using procstat -kk 
> I found it appeared to be stuck in a loop inside a posix_fadvise syscall. I 
> could not ^C or ^Z the initdb process. I could kill it but a subsequent 
> attempt to rm -rf the /usr/local/pgsql/data directory also got stuck and was 
> unkillable by any means. Rebooting allowed me to remove the directory but the 
> initdb process still hung when I re-ran it.

[snip]

> 
> I haven’t managed to dig any deeper than that yet.
> 
> Is there any other information I could give you to help narrow this down?

Rebooted with a WITNESS kernel and got the following LOR after initdb started:

https://gist.github.com/jeamland/69a07c4523f0dea4c26c 



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Possible bug in or around posix_fadvise after r292326

2016-01-04 Thread Benno Rice
Hi Konstantin,

I recently updated my dev box to r292962. After doing this I attempted to set 
up PostgreSQL 9.4. When I ran initdb the last phase hung. Using procstat -kk I 
found it appeared to be stuck in a loop inside a posix_fadvise syscall. I could 
not ^C or ^Z the initdb process. I could kill it but a subsequent attempt to rm 
-rf the /usr/local/pgsql/data directory also got stuck and was unkillable by 
any means. Rebooting allowed me to remove the directory but the initdb process 
still hung when I re-ran it.

I tried PostgreSQL 9.3 with similar results.

Looking at the source code for initdb I found that it calls posix_fadvise like 
so[1]:

 /*
  * We do what pg_flush_data() would do in the backend: prefer to use
  * sync_file_range, but fall back to posix_fadvise.  We ignore errors
  * because this is only a hint.
  */
 #if defined(HAVE_SYNC_FILE_RANGE)
 (void) sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE);
 #elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
 (void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
 #else
 #error PG_FLUSH_DATA_WORKS should not have been defined
 #endif

Looking for recent commits involving POSIX_FADV_DONTNEED I found r292326:

https://svnweb.freebsd.org/changeset/base/292326 


Backing this revision out allowed the initdb process to complete.

My current theory is that some how we’re getting ENOLCK or EAGAIN from the 
BUF_TIMELOCK call in bnoreuselist:

https://svnweb.freebsd.org/base/head/sys/kern/vfs_subr.c?view=annotate#l1676 


Leading to an infinite loop in vop_stdadvise:

https://svnweb.freebsd.org/base/head/sys/kern/vfs_default.c?annotate=292373#l1083
 


I haven’t managed to dig any deeper than that yet.

Is there any other information I could give you to help narrow this down?

Thanks,
Benno.

[1] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/initdb/initdb.c;h=35e39ce4b31b2f437d6e28eaf90500a22d229c6a;hb=HEAD#l631
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"