Re: [PATCHES] Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
Tom Lane wrote: "Oliver Elphick" [EMAIL PROTECTED] writes: FATAL 2: Checkpoint lock is busy while data base is shutting down It's not just on Alpha; I've seen that on my i386 Linux system. FWIW, I do *not* see this behavior on HPUX. It seems perfectly reproducible on the Debian Alpha box. Is it reproducible on your i386 box, or only sometimes? Hmm. I'm just waking up a bit more. Now I'm thinking slightly more clearly, I saw the problem yesterday when I was doing an Alpha build on faure.debian.org; so I think it was actually on Alpha, not i386 after all. Sorry for the red herring. -- Oliver Elphick[EMAIL PROTECTED] Isle of Wight http://www.lfix.co.uk/oliver PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47 GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C "For God shall bring every work into judgment, with every secret thing, whether it be good, or whether it be evil." Ecclesiastes 12:14
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote: | Brent Verner [EMAIL PROTECTED] writes: | | Please apply it locally and let me know what you find. | | what I'm seeing now is much the same. | | Drat. More to do, then. after hours in the gdb-hole, I see this... maybe a clue? :) src/include/access/common/heaptuple.c: 450 { 451 452 /* 453* Fix me when going to a machine with more than a four-byte 454* word! 455*/ 456 off = att_align(off, att[j]-attlen, att[j]-attalign); 457 458 att[j]-attcacheoff = off; 459 460 off = att_addlength(off, att[j]-attlen, tp + off); 461 } I'm pretty sure I don't know best how to fix this, but I've got some randomly entered code compiling now :) If it passes the regression tests I'll send it along. brent 'glad the coffee shop in the backyard is open now :)'
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
Brent Verner [EMAIL PROTECTED] writes: after hours in the gdb-hole, I see this... maybe a clue? :) I don't think that comment means anything. Possibly it's a leftover from a time when there was something unportable there. But if att_align were broken on Alphas, you'd have a lot worse problems than what you're seeing. regards, tom lane
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
Brent Verner [EMAIL PROTECTED] writes: these are the steps leading up the the assignment of the fscked fcache-fcinfo.arg[i] at execQual.c:603, which is what will eventually blow up ExecEvalFieldSelect. That looks OK as far as it goes. Inside ExecEvalVar, you need to look at the tuple_type data structure in more detail, specifically p *tuple_type-attrs[0] p *tuple_type-attrs[1] (I think the leading * is correct here, try omitting it if gdb gets unhappy.) (gdb) print *variable $57 = {type = T_Var, varno = 65001, varattno = 1, vartype = 21220, vartypmod = 8, varlevelsup = 0, varnoold = 1, varoattno = 0} That part looks promising --- vartypmod is sizeof(Pointer) not -1, so the front-end part of my patch seems to be working. What I suspect we'll find is that the tupledesc doesn't show sizeof the first field to be 8 the way we want. Which would imply that I missed a place (or multiple places :-() that needs to know about the convention for typmod of a tuple datatype. regards, tom lane
Re: [PATCHES] Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
Brent Verner [EMAIL PROTECTED] writes: | Hm. I thought I'd fixed that. Are you up to date on | src/backend/utils/adt/oid.c ? Current CVS has rev 1.42. yup. got that version -- 1.42 2000/12/22 21:36:09 tgl You're right, it was still broken :-(. I think I've got it now, though. Oliver Elphick was kind enough to arrange access to an Alpha running Debian Linux, and I find that current-as-of-this-moment sources pass all regression tests in either serial or parallel test mode on that system. Curiously, however, the system fails when you try to shut it down: Smart Shutdown request at Thu Dec 28 02:41:49 2000 DEBUG: shutting down FATAL 2: Checkpoint lock is busy while data base is shutting down Shutdown failed - abort I have no idea why this should be. Evidently there's something wrong with the TAS() macro --- yet it seems to work fine elsewhere. Ideas anyone? regards, tom lane
Re: [PATCHES] Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
On 27 Dec 2000 at 21:45 (-0500), Tom Lane wrote: | Brent Verner [EMAIL PROTECTED] writes: | | Hm. I thought I'd fixed that. Are you up to date on | | src/backend/utils/adt/oid.c ? Current CVS has rev 1.42. | | yup. got that version -- 1.42 2000/12/22 21:36:09 tgl | | You're right, it was still broken :-(. I think I've got it now, though. i'll check it tomorrow. | Oliver Elphick was kind enough to arrange access to an Alpha running | Debian Linux, and I find that current-as-of-this-moment sources pass | all regression tests in either serial or parallel test mode on that | system. Curiously, however, the system fails when you try to shut | it down: good. I'm glad you guys linked up :) | Smart Shutdown request at Thu Dec 28 02:41:49 2000 | DEBUG: shutting down | FATAL 2: Checkpoint lock is busy while data base is shutting down | Shutdown failed - abort I'm not seeing this with my latest revision of the TAS() asm. Smart Shutdown request at Wed Dec 27 19:25:45 2000 DEBUG: shutting down DEBUG: MoveOfflineLogs: remove DEBUG: database system is shut down | I have no idea why this should be. Evidently there's something wrong | with the TAS() macro --- yet it seems to work fine elsewhere. Ideas | anyone? re-evaluating the asm stuff now. thanks. brent
Re: [PATCHES] Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
Tom Lane wrote: ... system. Curiously, however, the system fails when you try to shut it down: Smart Shutdown request at Thu Dec 28 02:41:49 2000 DEBUG: shutting down FATAL 2: Checkpoint lock is busy while data base is shutting down Shutdown failed - abort I have no idea why this should be. Evidently there's something wrong with the TAS() macro --- yet it seems to work fine elsewhere. Ideas anyone? It's not just on Alpha; I've seen that on my i386 Linux system. -- Oliver Elphick[EMAIL PROTECTED] Isle of Wight http://www.lfix.co.uk/oliver PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47 GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C "For God shall bring every work into judgment, with every secret thing, whether it be good, or whether it be evil." Ecclesiastes 12:14
Re: [PATCHES] Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
"Oliver Elphick" [EMAIL PROTECTED] writes: Smart Shutdown request at Thu Dec 28 02:41:49 2000 DEBUG: shutting down FATAL 2: Checkpoint lock is busy while data base is shutting down Shutdown failed - abort It's not just on Alpha; I've seen that on my i386 Linux system. Oooh, that's interesting. I was just blindly assuming that it was a problem with the Alpha spinlock code (we've sure heard plenty of discussion of same). But maybe there's an actual logic bug in the checkpoint code. I don't see one in a quick scan though. FWIW, I do *not* see this behavior on HPUX. It seems perfectly reproducible on the Debian Alpha box. Is it reproducible on your i386 box, or only sometimes? Vadim, any ideas? regards, tom lane
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
Brent Verner [EMAIL PROTECTED] writes: | Please apply it locally and let me know what you find. what I'm seeing now is much the same. Drat. More to do, then. i've been in circles trying to figure out where fcinfo-arg is filled. can you point me toward that? See src/backend/utils/fmgr/README and src/backend/utils/fmgr/fmgr.c. But fmgr is probably only the carrier of disease, not the source... regards, tom lane
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote: | Brent Verner [EMAIL PROTECTED] writes: | | Please apply it locally and let me know what you find. | | what I'm seeing now is much the same. sorry, I sent the previous email w/o the details of the different behavior. Inside ExecEvalFieldSelect(), result is now 303, instead of 110599844 (...or whatever is was). I'm not sure if this gives you any additional clues. thanks. brent