Re: [HACKERS] Coding style question
I think Tom stated it pretty well: When the variable is going to be set anyway in straight-line code at the top of the function, then it's mostly a matter of taste whether you set it with an initializer or an assignment. the key phrase is: "set anyway in straigh-tline code at the top of the function" (I don't go so far as to introduce artificial scopes just for the sake of nesting variable declarations). I don't introduce artificial scopes either. However, I do try to declare variables in the most-tightly-enclosing scope. For example, if a variable is only used in one branch of an if statement, declare the variable inside that block, not in the enclosing scope. good... This may not inform the current conversation at all, but a while back I went on a cross-compiler compatibility binge for all of my active projects, and I found that some compilers (*cough* Borland *cough) had some very strange compiler/run time errors unless all variables were declared at the top of the function, before any other code gets executed. For better or for worse, I started strictly declaring all variables in this manner, with initialization happening afterward, and the behavior has stuck with me. I don't know whether any compilers used for postgres builds still have this issue - it's been a few years. I also find that if you're declaring a lot of variables in a single block, that's usually a sign that the block is too large and should be refactored (e.g. by moving some code into separate functions). If you keep your functions manageably small (which is not always the case in the Postgres code, unfortunately), the declarations are usually pretty clearly visible. I couldn't agree more. Insert emphatic agreement here. Refactoring into smaller functions or doing a bit of object orientation almost always solves that readability problem for me. -- Nolan Cafferky Software Developer IT Department RBS Interactive [EMAIL PROTECTED]
Re: [HACKERS] Coding style question
Nolan Cafferky wrote: This may not inform the current conversation at all, but a while back I went on a cross-compiler compatibility binge for all of my active projects, and I found that some compilers (*cough* Borland *cough) had some very strange compiler/run time errors unless all variables were declared at the top of the function, before any other code gets executed. For better or for worse, I started strictly declaring all variables in this manner, with initialization happening afterward, and the behavior has stuck with me. I don't know whether any compilers used for postgres builds still have this issue - it's been a few years. We expect the compiler to be C89 compliant at a minimum. If it rejects simple initialisation in the declarations or variables declared in an inner scope then it's hopeless for our purposes, surely. We have lots of such code. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Coding style question
Shouldn't we turn on warnings by the compiler on uninitialized variables? This can also be helpful. --Imad www.EnterpriseDB.com On 11/2/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I've noticed a trend in the PostgreSQL code base - for some reason, we tend to avoid initializing automatic variables (actually, the code base is pretty mixed on this point). For example in _bt_check_unique() we have: static TransactionId _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel, Buffer buf, ScanKey itup_scankey) { TupleDescitupdesc = RelationGetDescr(rel); int natts = rel-rd_rel-relnatts; OffsetNumber offset, maxoff; Page page; BTPageOpaque opaque; Buffer nbuf = InvalidBuffer; page = BufferGetPage(buf); opaque = (BTPageOpaque) PageGetSpecialPointer(page); maxoff = PageGetMaxOffsetNumber(page); offset = _bt_binsrch(rel, buf, natts, itup_scankey, false); ... Notice that four variables are not initialized; instead we assign values to them immediately after declaring them. I would probably write that as: static TransactionId _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel, Buffer buf, ScanKey itup_scankey) { TupleDescitupdesc = RelationGetDescr(rel); int natts= rel-rd_rel-relnatts; Page page = BufferGetPage(buf); OffsetNumber maxoff = PageGetMaxOffsetNumber(page); BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); OffsetNumber offset = _bt_binsrch(rel, buf, natts, itup_scankey, false); Buffer nbuf = InvalidBuffer; ... I'm not trying to be pedantic (it just comes naturally), but is there some reason that the first form would be better? I know that there is no difference in the resulting code, so this is purely a style/maintainability question. I guess the first form let's you intersperse comments (which is good). On the other hand, the second form makes it easy to see which variables are un-initialized. The second form also prevents you from adding any code between declaring the variable and assigning a value to it (which is good in complicated code; you might introduce a reference to an unitialized variable). Just curious. -- Korry ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Coding style question
[EMAIL PROTECTED] writes: I would probably write that as: static TransactionId _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel, Buffer buf, ScanKey itup_scankey) { TupleDescitupdesc = RelationGetDescr(rel); int natts= rel-rd_rel-relnatts; Page page = BufferGetPage(buf); OffsetNumber maxoff = PageGetMaxOffsetNumber(page); BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); OffsetNumber offset = _bt_binsrch(rel, buf, natts, itup_scankey, false); Buffer nbuf = InvalidBuffer; The disadvantage of using initializers is that you end up contorting the code to allow you to squeeze things into the initializers and it limits what you can do later to the code without undoing them. For example, if later you find out you have to, say, lock a table before the itupdesc initializer then all of the sudden you have to rip out all the initializers and rewrite them as assignments after the statement acquiring the table lock. I admit to a certain affinity to lisp-style programming and that does lead to me tending to try to use initializers doing lots of work in expressions. But I find I usually end up undoing them before I'm finished because I need to include a statement or because too much work needs to be done with one variable before some other variable can be initialized. But including complex expensive function calls in initializers will probably only confuse people trying to follow the logic of the code. Including _bt_binsrch() as an initializer for example hides the bulk of the work this function does. People expect initializers to be simple expressions, macro calls, accessor functions, and so on. Not to call out to complex functions that implement key bits of the function behaviour. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Coding style question
On Thu, 2006-11-02 at 23:53 +0500, imad wrote: Shouldn't we turn on warnings by the compiler on uninitialized variables? This can also be helpful. Those warnings should already be enabled, at least with GCC. -Neil ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Coding style question
Gregory Stark wrote: [EMAIL PROTECTED] writes: I would probably write that as: static TransactionId _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel, Buffer buf, ScanKey itup_scankey) { TupleDescitupdesc = RelationGetDescr(rel); int natts= rel-rd_rel-relnatts; Page page = BufferGetPage(buf); OffsetNumber maxoff = PageGetMaxOffsetNumber(page); BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); OffsetNumber offset = _bt_binsrch(rel, buf, natts, itup_scankey, false); Buffer nbuf = InvalidBuffer; The disadvantage of using initializers is that you end up contorting the code to allow you to squeeze things into the initializers and it limits what you can do later to the code without undoing them. True. And in any case, we tend not to be terrribly anal about style preferences, especially if they are not documented. I am sure I have done lots of things in ways other people would not dream of, and I have certainly seen code done in a style I would never use. This is a not atypical situation for open source projects, unlike commercial situations where it is easier to enforce a corporate style. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Coding style question
On 11/2/06, Gregory Stark [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] writes: I would probably write that as: static TransactionId _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel, Buffer buf, ScanKey itup_scankey) { TupleDescitupdesc = RelationGetDescr(rel); int natts= rel-rd_rel-relnatts; Page page = BufferGetPage(buf); OffsetNumber maxoff = PageGetMaxOffsetNumber(page); BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); OffsetNumber offset = _bt_binsrch(rel, buf, natts, itup_scankey, false); Buffer nbuf = InvalidBuffer; The disadvantage of using initializers is that you end up contorting the code to allow you to squeeze things into the initializers and it limits what you can do later to the code without undoing them. For example, if later you find out you have to, say, lock a table before the itupdesc initializer then all of the sudden you have to rip out all the initializers and rewrite them as assignments after the statement acquiring the table lock. Well, its about the coding style. And I doubt there exists a data type which may not have an initializer. A NULL / Zero is an option in all cases and you can do whatever you want to assign it a value immediately after the initialization section. My two cents! --Imad www.EnterpriseDB.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Coding style question
Gregory Stark [EMAIL PROTECTED] writes: People expect initializers to be simple expressions, macro calls, accessor functions, and so on. Not to call out to complex functions that implement key bits of the function behaviour. Yeah, I agree with that. But as Andrew noted, we don't really have any hard and fast coding rules --- the only guideline is to do your best to make your code readable, because other people *will* have to read it. In the particular example here I find Korry's proposed coding less readable than what's there, but I can't entirely put my finger on why. Maybe it's just the knowledge that it's less easily modifiable. In general, I'd say initializers with side effects or nonobvious ordering dependencies are definitely bad style, because someone might innocently rearrange them, eg to group all the variables of the same datatype together. You can get away with ordering dependencies like TupleDescitupdesc = RelationGetDescr(rel); int natts = itupdesc-natts; because the dependency is obvious (even to the compiler). Anything more complex than this, I'd write as a statement not an initializer. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Coding style question
The disadvantage of using initializers is that you end up contorting the code to allow you to squeeze things into the initializers and it limits what you can do later to the code without undoing them. For example, if later you find out you have to, say, lock a table before the itupdesc initializer then all of the sudden you have to rip out all the initializers and rewrite them as assignments after the statement acquiring the table lock. Good point (and I can't argue with your example). But, I think initializers also force you to declare variables in the scope where they are needed. Instead of declaring every variable at the start of the function, it's better to declare them as nested as practical (not as nested as possible, but as nested as practical). That means, you might write the code like this: static TransactionId _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel, Buffer buf, ScanKey itup_scankey) { if( lockTable( ... )) { TupleDesc itupdesc = RelationGetDescr(rel); int natts = rel-rd_rel-relnatts; Page page = BufferGetPage(buf); OffsetNumber maxoff = PageGetMaxOffsetNumber(page); ... The biggest advantage to that style of coding is that you know when each variable goes out of scope. If you declare every variable at the start of the function (and you don't initialize it), it can be very difficult to tell at which points in the code the variable holds a (meaningful) value. If you declare local variables in nested scopes, you know that the variable disappears as soon as you see the closing '}'. I admit to a certain affinity to lisp-style programming and that does lead to me tending to try to use initializers doing lots of work in expressions. But I find I usually end up undoing them before I'm finished because I need to include a statement or because too much work needs to be done with one variable before some other variable can be initialized. But including complex expensive function calls in initializers will probably only confuse people trying to follow the logic of the code. Including _bt_binsrch() as an initializer for example hides the bulk of the work this function does. People expect initializers to be simple expressions, macro calls, accessor functions, and so on. Not to call out to complex functions that implement key bits of the function behaviour. Agreed - you can certainly take initialization way too far, I just think we don't take it far enough in many cases and I'm wondering if there's some advantage that I'm not aware of. -- Korry
Re: [HACKERS] Coding style question
Shouldn't we turn on warnings by the compiler on uninitialized variables? This can also be helpful. Those warnings should already be enabled, at least with GCC. Yes, the compiler can detect unitialized variables, But, that introduces a new problem. There are a lot of tools out there (like GCC) that can find unitialized variables (or more specifically, variables which are used before initialization). I've seen too many less-scarred developers add an = NULL to quiet down the tool. But that's (arguably) worse than leaving the variable uninitialized - if you simply initialize a variable to a known (but not correct) value, you've disabled the tool; it can't help you find improperly initialized variables anymore. The variable has a value, but you still don't know at which point in time it has a correct value. That's another reason I prefer initializers (and nested variable declarations) - I can put off creating the variable until I can assign a meaningful value to it. (I don't go so far as to introduce artificial scopes just for the sake of nesting variable declarations). Too many scars... -- Korry
Re: [HACKERS] Coding style question
imad [EMAIL PROTECTED] writes: Well, its about the coding style. And I doubt there exists a data type which may not have an initializer. A NULL / Zero is an option in all cases and you can do whatever you want to assign it a value immediately after the initialization section. My two cents! Actually, there are a lot of situations where putting an initializer is definitely *bad* style in my opinion, because it can hide errors of omission that the compiler would otherwise find for you. The most common example you'll see in the Postgres code is variables that should be set in each branch of an if or switch construct: int val; switch (foo) { case A: val = 42; break; case B: val = 52; break; ... default: elog(ERROR, ...); val = 0; /* keep compiler quiet */ break; } return val; Someone might think it better to initialize val to 0 and get rid of the useless (unreachable) assignment in the default case, but I say not. With this structure, you'll get an uninitialized-variable warning if you forget to set val in any one of the cases of the switch. That's a check worth having, especially if the code is at all complicated within the cases. When the variable is going to be set anyway in straight-line code at the top of the function, then it's mostly a matter of taste whether you set it with an initializer or an assignment. But whenever there are multiple places that might need to set it, I try to structure the code to exploit the compiler's ability to catch uninitialized variables, and that means not using an initializer. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Coding style question
Yeah, I agree with that. But as Andrew noted, we don't really have any hard and fast coding rules --- the only guideline is to do your best to make your code readable, because other people *will* have to read it. I'm not really looking for hard/fast rules. Just picking brains. In the particular example here I find Korry's proposed coding less readable than what's there, but I can't entirely put my finger on why. Maybe it's just the knowledge that it's less easily modifiable. In general, I'd say initializers with side effects or nonobvious ordering dependencies are definitely bad style, because someone might innocently rearrange them, eg to group all the variables of the same datatype together. You can get away with ordering dependencies like TupleDescitupdesc = RelationGetDescr(rel); int natts = itupdesc-natts; because the dependency is obvious (even to the compiler). Anything more complex than this, I'd write as a statement not an initializer. Agreed - I contrived an example (I just happened to be reading _bt_check_unique()). In fact, I would not initialize 'offset' as I suggested, but I probably would initialize just about everything else. (In fact, in the actual code, there's a code-comment that describes the initialization of offset - I would certainly leave that in place). -- Korry
Re: [HACKERS] Coding style question
[EMAIL PROTECTED] writes: initializers also force you to declare variables in the scope where they are needed. Instead of declaring every variable at the start of the function, it's better to declare them as nested as practical (not as nested as possible, but as nested as practical). I agree that in many places it'd be better style to declare variables in smaller scopes ... but that's not the point you started the thread with. In any case, the initializer-vs-assignment decision is the same no matter what scope you're talking about --- I don't see how that forces you to do it either way. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Coding style question
On 11/3/06, Tom Lane [EMAIL PROTECTED] wrote: imad [EMAIL PROTECTED] writes: Well, its about the coding style. And I doubt there exists a data type which may not have an initializer. A NULL / Zero is an option in all cases and you can do whatever you want to assign it a value immediately after the initialization section. My two cents! Actually, there are a lot of situations where putting an initializer is definitely *bad* style in my opinion, because it can hide errors of omission that the compiler would otherwise find for you. The most common example you'll see in the Postgres code is variables that should be set in each branch of an if or switch construct: int val; switch (foo) { case A: val = 42; break; case B: val = 52; break; ... default: elog(ERROR, ...); val = 0; /* keep compiler quiet */ break; } return val; Someone might think it better to initialize val to 0 and get rid of the useless (unreachable) assignment in the default case, but I say not. With this structure, you'll get an uninitialized-variable warning if you forget to set val in any one of the cases of the switch. That's a check worth having, especially if the code is at all complicated within the cases. When the variable is going to be set anyway in straight-line code at the top of the function, then it's mostly a matter of taste whether you set it with an initializer or an assignment. But whenever there are multiple places that might need to set it, I try to structure the code to exploit the compiler's ability to catch uninitialized variables, and that means not using an initializer. regards, tom lane Thats right! The next thing is that, should this be left out on the compiler? I mean, there may still be some scenarios where compiler won't be able to help us. For instance, passing an uninitialized variable to a function by reference. Regards --Imad ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Coding style question
On Thu, 2006-11-02 at 14:23 -0500, [EMAIL PROTECTED] wrote: Yes, the compiler can detect unitialized variables, In most situations, anyway. I've seen too many less-scarred developers add an = NULL to quiet down the tool. But that's (arguably) worse than leaving the variable uninitialized - if you simply initialize a variable to a known (but not correct) value, you've disabled the tool; it can't help you find improperly initialized variables anymore. I definitely agree that this is not good style[1] -- if you see any Postgres code that does this, I think it should be fixed. (This is probably something you could teach a compiler to warn you about, as well.) That's another reason I prefer initializers (and nested variable declarations) - I can put off creating the variable until I can assign a meaningful value to it. Well, clearly you should only assign meaningful values to variables, but I don't see anything wrong with omitting an initializer, initializing the variable before using it, and letting the compiler warn you if you forget to do this correctly. I agree with Greg's rationale on when to include an initializer in a variable declaration (when the initializer is trivial: e.g. casting a void * to a more specific pointer type, or using one of the PG_GETARG_XXX() macros in a UDF). (I don't go so far as to introduce artificial scopes just for the sake of nesting variable declarations). I don't introduce artificial scopes either. However, I do try to declare variables in the most-tightly-enclosing scope. For example, if a variable is only used in one branch of an if statement, declare the variable inside that block, not in the enclosing scope. I also find that if you're declaring a lot of variables in a single block, that's usually a sign that the block is too large and should be refactored (e.g. by moving some code into separate functions). If you keep your functions manageably small (which is not always the case in the Postgres code, unfortunately), the declarations are usually pretty clearly visible. -Neil [1] http://advogato.org/person/nconway/diary.html?start=24 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Coding style question
[EMAIL PROTECTED] writes: initializers also force you to declare variables in the scope where they are needed. Instead of declaring every variable at the start of the function, it's better to declare them as nested as practical (not as nested as possible, but as nested as practical). I agree that in many places it'd be better style to declare variables in smaller scopes ... but that's not the point you started the thread with. In any case, the initializer-vs-assignment decision is the same no matter what scope you're talking about --- I don't see how that forces you to do it either way. Right - I should have said that proper initialization encourages you to declare variables in nested scopes (proper meaning that the initializer puts a meaningful value into the variable, not just a default NULL or 0) - if the initializer depends on a computed value, you can't initialize until that value has been computed. I guess the two issues are not all that related - you can initialize without nesting (in many cases) and you can nest without initializing. They are both readability and maintainability issues to me. Thanks for the feedback. -- Korry
Re: [HACKERS] Coding style question
Well, clearly you should only assign meaningful values to variables, but I don't see anything wrong with omitting an initializer, initializing the variable before using it, and letting the compiler warn you if you forget to do this correctly. The problem that that introduces is that you have to unravel the code if you want to maintain it, especially if you're changing complex code (many code paths) and some variable is initialized long after it's declared. You have to search the code to figure out at which point the variable gains a meaningful value. In the example I cited, each variable was assigned a value immediately after being declared. That wasn't a good example - in some places, we declare all variables at the start of the function, but we don't assign a value to some of the variables until 20 or 30 lines of code later (and if there are multiple code paths, you have to follow each one to find out when the variable gains a meaningful value). I agree with Greg's rationale on when to include an initializer in a variable declaration (when the initializer is trivial: e.g. casting a void * to a more specific pointer type, or using one of the PG_GETARG_XXX() macros in a UDF). I agree too. I wasn't trying to suggest that every variable must be initialized. I think Tom stated it pretty well: When the variable is going to be set anyway in straight-line code at the top of the function, then it's mostly a matter of taste whether you set it with an initializer or an assignment. the key phrase is: set anyway in straigh-tline code at the top of the function (I don't go so far as to introduce artificial scopes just for the sake of nesting variable declarations). I don't introduce artificial scopes either. However, I do try to declare variables in the most-tightly-enclosing scope. For example, if a variable is only used in one branch of an if statement, declare the variable inside that block, not in the enclosing scope. good... I also find that if you're declaring a lot of variables in a single block, that's usually a sign that the block is too large and should be refactored (e.g. by moving some code into separate functions). If you keep your functions manageably small (which is not always the case in the Postgres code, unfortunately), the declarations are usually pretty clearly visible. I couldn't agree more. Thanks for your comments. -- Korry