[forwarded submission from a non-member address -- rjk]
From: Sean Quinlan <[EMAIL PROTECTED]> Date: Wed, 13 Mar 2002 10:24:09 -0500 Subject: Re: [Boston.pm] maintenance of large perl code bases To: [EMAIL PROTECTED] First off, WOW! Thanks to everyone for all the responses. I'll be trying to catch up all day, but please forgive me if I don't get another email out until this evening. At 01:45 AM 3/13/02 -0500, Charles Reitzel wrote: >Thanks, Sean, for raising these issues. I am fairly new to Perl, but have >worked for years on that other "write once, read never" language C and, >worse, C++. The fact is, for all its well intentioned support for sound SW >engineering practice, I have seen horrid tangles of Java. I am forced to >conclude that the issues and practices related to maintaining large code >bases are not language specific. To the contrary, I found the discipline >learned coding in C has served me very well in Perl and Java. I have heard this opinion before, and have to agree (from my own limited experience). > >First things first, may I suggest that, before storyboarding a >presentation, that you write a coding standards document. If you want >training to go with the standards, more power to you. But a coding >standards document is a handy reference for developers at work. Consider >this email as sort of a rough draft, if you will. Funny you should say that. Part of what I'm doing at my 'new' job is writing a Perl coding standard. Of course I'm mostly just updating/stealing from existing docs, including Scuds perlmodstyle doc in bleedperl (thanks Shane!). One of the things I plan on emphisizeing in it is code reuse and writing modules. If anyone is interested, I can make this document available somewhere, and I suppose reference it (as well as other standard sources) in any presentation. Currently it is more just a style guide. P.S.: I started commenting below, but will not have time to really get into this as it deserves right now. I will try to return to this in more depth this week! I _really_ appriciate your taking the time to put this together! > >What are the goals of coding standards? > >In order: >1) Quality >2) Clarity >3) Performance > >Now, to achieve these ends, developers have worked out some common >practices that minimize unintended side effects from changes, minimize >naming conflicts, maximize future code re-use and maximize future >implementation flexibility. I think of Quality, Clarity and Performance as >the strategic goals, while these others are the tactical objectives. They >also introduce issues related to larger code bases with multiple developers >working simultaneously. So, to cut to the chase, here are a few practices >that will go a long way to keep you out of the soup! I make no claims of >authorship for any of these things. They are just ideas that I have >learned from other talented and knowledgeable developers along the way. If >I had to boil it down to one word, it is Modularity. > >Dependencies Management > >Early in your coding phase, draw a block diagram that shows runtime >dependencies vertically. Each block represents a module. Modules appearing >directly above another module will call into the lower module and are said >to "depend" on that module. Here's the thing, if you can draw this >diagram, you need to rethink your design. Although, it is easy enough to >code bi-directional dependencies, it is a bad idea and should only be done >as a last resort. Leave 3rd party modules out of the diagram, they don't >call you (well, callbacks are possible, but this is not actually a >violation and are beyond the scope of an email). > >In a typical scenario, let's say a CGI script, you have a top-level >module. It will "use" or "require" other modules, perhaps >dynamically. These modules, in turn, will use and require other modules, >and so on, until all necessary modules are loaded. Almost every shop has a >utility module or two or three, which depend on nothing else. I.e. any >module can safely depend on these. In fact, these modules may depend on >different 3rd party modules, which may be the driver for separating them in >the first place. E.g. local HTML widget routines will depend on different >things than date manipulation logic, and so on. In theory, both could be >used within either a mod_perl page or a CGI generated page. > >Testing > >Now think about the usual development cycle. As coding nears completion, >testing and bug fixing takes up more and more of everyone's time. Now the >module loader will not have a problem with circular references, but your >testing procedures will. > >So, here's the deal. Modules at the bottom "freeze" first. Any changes to >these modules will require a complete regression test of all "dependent" >modules. Thus, modules at the bottom, also tend to have the most thorough >testing harnesses. Thus, all of its public interfaces can be regression >tested to weed out as many errors as possible before double checking all >the dependent modules. The rigor of these procedures is proportional to >the cost of a release (in downtime, manufacturing costs, training costs, >etc., etc.). > >While were on the subject, I have found the convention of writing a simple >test script for each Perl package (.pm file) to be very effective. If >nothing more, it helps to debug library-type code. Once it exists, it can >help to spot problems early by acting as a basic regression test. To this >end, I have written a simple little assert routine which, like the C >macro, evaluates an expression and, if false or an exception is caught, it >prints the results. I agree! This is 'easy' (though a small up front time investment), and something I commonly practice and encourage as well. Being lazy, I just use the Test:: modules and some testing scripts for every function/method in every module, not only running make test on install, but after, and frequently during, any coding sesion. Once I started doing this, I rapidly became addicted. > >I have attached the file assert.pl and a sample test script. Note this >needs to be "required" into each test script so that any variables declared >in the module will be defined inside the eval(). Comments, correction and >suggestions are appreciated. Thanks! > >Encapsulation > >This is object orientation 101. If you get encapsulation, by comparison, >everything else is gravy. Perhaps the simplest way to define encapsulation >is to say, in Perl terms, modules that depend on a module Acme::XYZ should >not be broken if it changes from blessed array references to blessed hash >references. The only way to make this happen, is to not access the data >structures directly from other modules. Instead, write "methods" (member >functions, messages, what have you) to access or modify the state of the >object. > >What does this mean for you? You have to take a bit of time, up front, to >define what the interface should be. What does this buy you? First, it >lets two different developers (or teams) work in parallel with a high >degree of confidence that their stuff will work together at the end. This >can be a lifesaver on a compressed schedule. Second, it gives each team >the ability to make improvements to their implementation without having to >even confer with the other team(s). They know, if they keep the "contract" >intact, that everything will be OK. Now, contracts depend on more than >syntax. But syntax is a necessary, if not sufficient, part of compatibility. > >Naming > >Naming really matters. You just can't name all your variables tmp1, tmp2 >and expect anyone, even yourself, to understand what a routine is >doing. This goes triple for type names (i.e. Perl packages, C++ or Java >classes). Think very, very carefully - for 15 minutes - about module >names. Be ready, in the first week, to rename a package if you didn't get >it right. After that, it's set in concrete. You're stuck with it. Oh >well. The name should describe, in terms of the application, what data or >behavior the variable represents. Be wordy as hell for module names and >global variables. Be concise but not obscure for local variables. 4-8 >chars is OK, this isn't COBOL. Never, ever use 1 character variables. Not >even "i". Try "ix". > >For in-house code, pick a top level package qualifier. If you work for >Acme, Inc. Use names like Acme::Util, Acme::Date, Acme::Form, etc. this >way your module names won't collide with modules off of CPAN. Perl makes >this easy. Along these same lines, observe good Perl etiquette and do not >export more than you need to (if anything) from your modules. This was a >mistake I made often when first learning Perl. > >Pick a capitalization scheme and stick to it. For myself, I like >module/class names starting with caps, whereas variables begin with lower >case - with all of these using camel case to join multiple words. E.g. my >$formGen = Acme::FormGen->new( $cgi ); This way I don't have to switch >conventions when I switch to Java or C++! > >Performance > >Without getting into details, suffice it to say that it is much, much >easier to find and fix performance problems in well organized, maintainable >code. For example, encapsulation allows you to optimize performance of a >shared module without breaking anything else. This stuff really works! > >Well, that's the basics. I'm sure there are more things I could mention, >but you don't want a book here. I'll wager most of these look familiar to >most folks coming from the systems side. But I'll bet there is a scientist >or two out there new to Perl that will benefit. Hope this helps. > >Enjoy, >Charlie > > >At 11:01 PM 3/12/2002 -0500, Sean Quinlan wrote: > >>[forwarded submission from a non-member address -- rjk] >> >> >>From: Sean Quinlan <[EMAIL PROTECTED]> >>Date: Tue, 12 Mar 2002 20:43:52 -0500 >>Subject: maintenance of large perl code bases >>To: [EMAIL PROTECTED] >> >>I had hoped to bring up this question at tomorrows meeting, but >>Wednesday's are hard, and tomorrow looks impossible. So maybe someone can >>toss this up for discussion, and hopefully let the list know the key points. >> >>I know there are sights out there, such as Boston.com it appears, and I've >>heard about some large financial institutions, that rely on substantial >>amounts of Perl code. Obviously for a successful business, having that >>code be maintainable is (or should be!) of significant importance. But I >>regularly hear complaints, largely from non-Perl (or Perl primary anyway) >>people from other industries coming into bioinformatics, about these large, >>unmaintainable Perl code bases. >> >>Now, in my experience, I have to admit this is largely more true than >>not. Usually because most of the software was written by people who were >>biologists/engineers/physicists/whatever first, and programmers (sometimes >>distant) second, often without thought or concern of it's long term >>usability. So I've heard of a few places now moving away from Perl, >>frequently apparently forcing a large ground up recode in some other >>(usually in Java, and I've heard some interesting 'rumors' as to why) language. >> >>I see little point in arguing with this from the standpoint of simply Perl >>first. I know others better than I have done talks and presentations on >>writing maintainable Perl code, and probably on the problems with porting >>old code to a more maintainable format. I want to steal from those >>people... blatantly (with credits of course). >> >>What I would like to do is to collaborate with a few people who have: >>1) Done presentations related to the subject of code maintenance (and a >>little QA thrown in might be good). >> >>2) Have been involved with or responsible for large installations of Perl >>code that was well maintained. >> >>3) Others involved with bioinformatics interested in or having experience >>with this problem. >> >>What I would like to and up with are sources for presentations (preferably >>a couple already canned of varied lengths) on the subject of maintaining >>large Perl code bases written specifically as it applies to >>bioinformatics. If you don't want/have time to collaborate, but have >>pointers to good sources of information/inspiration, please also pipe up. >> >>Thanks everyone!!! >> >>-------------------------------------------------------------- >>Sean P. Quinlan >>http://people.ne.mediaone.net/squinlan/index.html >>mailto:[EMAIL PROTECTED] >>"You can discover more about a person in an hour of play than in a year of >>conversation" - Plato > >Attachment Converted: "C:\INTERNET\EUDORA\Attach\assert.pl" > >Attachment Converted: "C:\INTERNET\EUDORA\Attach\testvaluemap.pl" > -------------------------------------------------------------------------- Sean P. Quinlan http://people.ne.mediaone.net/squinlan/index.html mailto:[EMAIL PROTECTED] "You can discover more about a person in an hour of play than in a year of conversation" - Plato
