Re: [HACKERS] Stuck spins in current
> > > BTW, I've got ~320tps with 50 clients inserting (int4, text[1-256]) > > > records into 50 tables (-B 16384, wal_buffers = 256) on Ultra10 > > > with 512Mb RAM, IDE (clients run on the same host as server). > > > > Not bad. What were you getting before these recent changes? > > As I already reported - with O_DSYNC this test shows 30% better > performance than with fsync. > > (BTW, seems in all my tests I was using -O0 flag...) Good data point. I could never understand why we would ever use the normal sync if we had a data-only sync option available. I can imagine the data-only being the same, but never slower. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Stuck spins in current
> > BTW, I've got ~320tps with 50 clients inserting (int4, text[1-256]) > > records into 50 tables (-B 16384, wal_buffers = 256) on Ultra10 > > with 512Mb RAM, IDE (clients run on the same host as server). > > Not bad. What were you getting before these recent changes? As I already reported - with O_DSYNC this test shows 30% better performance than with fsync. (BTW, seems in all my tests I was using -O0 flag...) Vadim ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Stuck spins in current
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: >> "Vadim Mikheev" <[EMAIL PROTECTED]> writes: > Anyway, deadlock in my tests are very correlated with new log file > creation - something probably is still wrong... >> >> Well, if you can reproduce it easily, seems like you could >> get in there and verify or disprove my theory about where >> the deadlock is. > You were right - deadlock disappeared. Okay, good. I'll bet the correlation to new-log-file was just because the WAL insert_lck gets held for a longer time than usual if XLogInsert is forced to call XLogWrite and that in turn is forced to make a new log file. Were you running with wal_files = 0? The problem would likely not have shown up at all if logfiles were created in advance... > BTW, I've got ~320tps with 50 clients inserting (int4, text[1-256]) > records into 50 tables (-B 16384, wal_buffers = 256) on Ultra10 > with 512Mb RAM, IDE (clients run on the same host as server). Not bad. What were you getting before these recent changes? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
RE: [HACKERS] Stuck spins in current
> "Vadim Mikheev" <[EMAIL PROTECTED]> writes: > > Anyway, deadlock in my tests are very correlated with new log file > > creation - something probably is still wrong... > > Well, if you can reproduce it easily, seems like you could > get in there and verify or disprove my theory about where > the deadlock is. You were right - deadlock disappeared. BTW, I've got ~320tps with 50 clients inserting (int4, text[1-256]) records into 50 tables (-B 16384, wal_buffers = 256) on Ultra10 with 512Mb RAM, IDE (clients run on the same host as server). Vadim ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Stuck spins in current
"Vadim Mikheev" <[EMAIL PROTECTED]> writes: > Anyway, deadlock in my tests are very correlated with new log file > creation - something probably is still wrong... Well, if you can reproduce it easily, seems like you could get in there and verify or disprove my theory about where the deadlock is. Or send the testbed and I'll try ... regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Stuck spins in current
> At this point I must humbly say "yes, you told me so", because if I No, I didn't - I must humbly say that I didn't foresee this deadlock, so "I didn't tell you so" -:) Anyway, deadlock in my tests are very correlated with new log file creation - something probably is still wrong... Vadim ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Stuck spins in current
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > And you know - I've run same tests on ~ Mar 9 snapshot > without any problems. Oh, I see it: Process A is doing GetSnapShotData. It holds SInvalLock and calls ReadNewTransactionId, which wants XidGenLockId. Process B is doing GetNewTransactionId. It holds XidGenLockId and has run out of XIDs, so it needs to write a NEXTXID log record. Therefore, it calls XLogInsert which wants the insert_lck. Process C is inside XLogInsert on its first xlog entry of a transaction. It holds the insert_lck and wants to put its XID into MyProc->logRec, for which it needs SInvalLock. Ooops. At this point I must humbly say "yes, you told me so", because if I hadn't insisted that we needed NEXTXID records then we wouldn't have this deadlock. It looks to me like the simplest answer is to take NEXTXID records out again. (Fortunately, there doesn't seem to be any comparable cycle involving OidGenLock, or we'd need to think of a better answer.) I shall retire to lick my wounds, and make the changes tomorrow ... regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Stuck spins in current
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > xlog.c revision 1.55 from Feb 26 already had log file > zero-filling, so ... >> >> Oh, you're right, I didn't study the CVS log carefully enough. Hmm, >> maybe the control file lock isn't the problem. The abort() in >> s_lock_stuck should have left a core file --- what is the backtrace? > After 10 times increasing DEFAULT_TIMEOUT in s_lock.c > I got abort in xlog.c:626 - waiting for insert_lck. > But problem is near new log file creation code: system > goes sleep just after new one is created. Have you learned any more about this? Or can you send your test program so other people can try it? In the meantime, even if it turns out there's a different problem here, it seems clear to me that it's a bad idea to use a plain spinlock to interlock xlog segment creation. The spinlock timeouts are not set high enough to be safe for something that could take several seconds. Unless someone objects, I will go ahead and work on the change I suggested yesterday to not hold the ControlFileLockId spinlock while we are zero-filling the new segment. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
RE: [HACKERS] Stuck spins in current
> > And you know - I've run same tests on ~ Mar 9 snapshot > > without any problems. > >> > >> That was before I changed the code to pre-fill the file --- > >> now it takes longer to init a log segment. And we're only > >> using a plain SpinAcquire, not the flavor with a longer timeout. > > > xlog.c revision 1.55 from Feb 26 already had log file > > zero-filling, so ... > > Oh, you're right, I didn't study the CVS log carefully enough. Hmm, > maybe the control file lock isn't the problem. The abort() in > s_lock_stuck should have left a core file --- what is the backtrace? After 10 times increasing DEFAULT_TIMEOUT in s_lock.c I got abort in xlog.c:626 - waiting for insert_lck. But problem is near new log file creation code: system goes sleep just after new one is created. Vadim ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Stuck spins in current
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > And you know - I've run same tests on ~ Mar 9 snapshot > without any problems. >> >> That was before I changed the code to pre-fill the file --- >> now it takes longer to init a log segment. And we're only >> using a plain SpinAcquire, not the flavor with a longer timeout. > xlog.c revision 1.55 from Feb 26 already had log file > zero-filling, so ... Oh, you're right, I didn't study the CVS log carefully enough. Hmm, maybe the control file lock isn't the problem. The abort() in s_lock_stuck should have left a core file --- what is the backtrace? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
RE: [HACKERS] Stuck spins in current
> >> Got it at spin.c:156 with 50 clients doing inserts into > >> 50 tables (int4, text[1-256 bytes]). > >> -B 16384, -wal_buffers=256 (with default others wal params). > > > SpinAcquire() ... but on which lock? > > After a little bit of thought I'll bet it's ControlFileLockId. I see "XLogWrite: new log file created..." in postmaster' log - backend writes this after releasing ControlFileLockId. > Likely we shouldn't be using a spinlock at all for that, but the > short-term solution might be a longer timeout for this > particular lock. > Alternatively, could we avoid holding that lock while initializing a > new log segment? How to synchronize with checkpoint-er if wal_files > 0? And you know - I've run same tests on ~ Mar 9 snapshot without any problems. Vadim ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Stuck spins in current
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: >> Alternatively, could we avoid holding that lock while initializing a >> new log segment? > How to synchronize with checkpoint-er if wal_files > 0? I was sort of visualizing assigning the created xlog files dynamically: create a temp file of a PID-dependent name fill it with zeroes and fsync it acquire ControlFileLockId rename temp file into place as next uncreated segment update pg_control release ControlFileLockId Since the things are just filled with 0's, there's no need to know which segment it will be while you're filling it. This would leave you sometimes with more advance files than you really needed, but so what ... > And you know - I've run same tests on ~ Mar 9 snapshot > without any problems. That was before I changed the code to pre-fill the file --- now it takes longer to init a log segment. And we're only using a plain SpinAcquire, not the flavor with a longer timeout. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
RE: [HACKERS] Stuck spins in current
> > How to synchronize with checkpoint-er if wal_files > 0? > > I was sort of visualizing assigning the created xlog files > dynamically: > > create a temp file of a PID-dependent name > fill it with zeroes and fsync it > acquire ControlFileLockId > rename temp file into place as next uncreated segment > update pg_control > release ControlFileLockId > > Since the things are just filled with 0's, there's no need to > know which segment it will be while you're filling it. > > This would leave you sometimes with more advance files than you really > needed, but so what ... Yes, it has sence, but: > > And you know - I've run same tests on ~ Mar 9 snapshot > > without any problems. > > That was before I changed the code to pre-fill the file --- > now it takes longer to init a log segment. And we're only > using a plain SpinAcquire, not the flavor with a longer timeout. xlog.c revision 1.55 from Feb 26 already had log file zero-filling, so ... Vadim ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Stuck spins in current
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: >> Got it at spin.c:156 with 50 clients doing inserts into >> 50 tables (int4, text[1-256 bytes]). >> -B 16384, -wal_buffers=256 (with default others wal params). > SpinAcquire() ... but on which lock? After a little bit of thought I'll bet it's ControlFileLockId. Likely we shouldn't be using a spinlock at all for that, but the short-term solution might be a longer timeout for this particular lock. Alternatively, could we avoid holding that lock while initializing a new log segment? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Stuck spins in current
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > Got it at spin.c:156 with 50 clients doing inserts into > 50 tables (int4, text[1-256 bytes]). > -B 16384, -wal_buffers=256 (with default others wal params). SpinAcquire() ... but on which lock? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])