Re: [Boston.pm] maintenance of large perl code bases

Sean Quinlan Thu, 14 Mar 2002 07:02:37 -0800


[forwarded submission from a non-member address -- rjk]



From: Sean Quinlan <[EMAIL PROTECTED]>
Date: Wed, 13 Mar 2002 10:24:09 -0500
Subject: Re: [Boston.pm] maintenance of large perl code bases
To: [EMAIL PROTECTED]

First off, WOW! Thanks to everyone for all the responses. I'll be trying to
catch up all day, but please forgive me if I don't get another email out
until this evening.

At 01:45 AM 3/13/02 -0500, Charles Reitzel wrote:
>Thanks, Sean, for raising these issues.  I am fairly new to Perl, but have 
>worked for years on that other "write once, read never" language C and, 
>worse, C++.  The fact is, for all its well intentioned support for sound SW 
>engineering practice, I have seen horrid tangles of Java.  I am forced to 
>conclude that the issues and practices related to maintaining large code 
>bases are not language specific.  To the contrary, I found the discipline 
>learned coding in C has served me very well in Perl and Java.

I have heard this opinion before, and have to agree (from my own limited
experience).

>
>First things first, may I suggest that, before storyboarding a 
>presentation, that you write a coding standards document.  If you want 
>training to go with the standards, more power to you.  But a coding 
>standards document is a handy reference for developers at work. Consider 
>this email as sort of a rough draft, if you will.

Funny you should say that. Part of what I'm doing at my 'new' job is
writing a Perl coding standard. Of course I'm mostly just updating/stealing
from existing docs, including Scuds perlmodstyle doc in bleedperl (thanks
Shane!). One of the things I plan on emphisizeing in it is code reuse and
writing modules. If anyone is interested, I can make this document
available somewhere, and I suppose reference it (as well as other standard
sources) in any presentation. Currently it is more just a style guide.

P.S.: I started commenting below, but will not have time to really get into
this as it deserves right now. I will try to return to this in more depth
this week! I _really_ appriciate your taking the time to put this together!

>
>What are the goals of coding standards?
>
>In order:
>1) Quality
>2) Clarity
>3) Performance
>
>Now, to achieve these ends, developers have worked out some common 
>practices that minimize unintended side effects from changes, minimize 
>naming conflicts, maximize future code re-use and maximize future 
>implementation flexibility.  I think of Quality, Clarity and Performance as 
>the strategic goals, while these others are the tactical objectives.  They 
>also introduce issues related to larger code bases with multiple developers 
>working simultaneously.  So, to cut to the chase, here are a few practices 
>that will go a long way to keep you out of the soup!  I make no claims of 
>authorship for any of these things.  They are just ideas that I have 
>learned from other talented and knowledgeable developers along the way.  If 
>I had to boil it down to one word, it is Modularity.
>
>Dependencies Management
>
>Early in your coding phase, draw a block diagram that shows runtime 
>dependencies vertically.  Each block represents a module. Modules appearing 
>directly above another module will call into the lower module and are said 
>to "depend" on that module.  Here's the thing, if you can draw this 
>diagram, you need to rethink your design.  Although, it is easy enough to 
>code bi-directional dependencies, it is a bad idea and should only be done 
>as a last resort.  Leave 3rd party modules out of the diagram, they don't 
>call you (well, callbacks are possible, but this is not actually a 
>violation and are beyond the scope of an email).
>
>In a typical scenario, let's say a CGI script, you have a top-level 
>module.  It will "use" or "require" other modules, perhaps 
>dynamically.  These modules, in turn, will use and require other modules, 
>and so on, until all necessary modules are loaded. Almost every shop has a 
>utility module or two or three, which depend on nothing else.  I.e. any 
>module can safely depend on these.  In fact, these modules may depend on 
>different 3rd party modules, which may be the driver for separating them in 
>the first place.  E.g. local HTML widget routines will depend on different 
>things than date manipulation logic, and so on.  In theory, both could be 
>used within either a mod_perl page or a CGI generated page.
>
>Testing
>
>Now think about the usual development cycle.  As coding nears completion, 
>testing and bug fixing takes up more and more of everyone's time.  Now the 
>module loader will not have a problem with circular references, but your 
>testing procedures will.
>
>So, here's the deal.  Modules at the bottom "freeze" first.  Any changes to 
>these modules will require a complete regression test of all "dependent" 
>modules.  Thus, modules at the bottom, also tend to have the most thorough 
>testing harnesses.  Thus, all of its public interfaces can be regression 
>tested to weed out as many errors as possible before double checking all 
>the dependent modules.  The rigor of these procedures is proportional to 
>the cost of a release (in downtime, manufacturing costs, training costs, 
>etc., etc.).
>
>While were on the subject, I have found the convention of writing a simple 
>test script for each Perl package (.pm file) to be very effective.  If 
>nothing more, it helps to debug library-type code.  Once it exists, it can 
>help to spot problems early by acting as a basic regression test.  To this 
>end, I have written a simple little  assert routine which, like the C 
>macro, evaluates an expression and, if false or an exception is caught, it 
>prints the results.

I agree! This is 'easy' (though a small up front time investment), and
something I commonly practice and encourage as well. Being lazy, I just use
the Test:: modules and some testing scripts for every function/method in
every module, not only running make test on install, but after, and
frequently during, any coding sesion. Once I started doing this, I rapidly
became addicted.

>
>I have attached the file assert.pl and a sample test script.  Note this 
>needs to be "required" into each test script so that any variables declared 
>in the module will be defined inside the eval().  Comments, correction and 
>suggestions are appreciated.

Thanks!

>
>Encapsulation
>
>This is object orientation 101.  If you get encapsulation, by comparison, 
>everything else is gravy.  Perhaps the simplest way to define encapsulation 
>is to say, in Perl terms, modules that depend on a module Acme::XYZ should 
>not be broken if it changes from blessed array references to blessed hash 
>references.  The only way to make this happen, is to not access the data 
>structures directly from other modules.  Instead, write "methods" (member 
>functions, messages, what have you) to access or modify the state of the 
>object.
>
>What does this mean for you?  You have to take a bit of time, up front, to 
>define what the interface should be.  What does this buy you?  First, it 
>lets two different developers (or teams) work in parallel with a high 
>degree of confidence that their stuff will work together at the end.  This 
>can be a lifesaver on a compressed schedule.  Second, it gives each team 
>the ability to make improvements to their implementation without having to 
>even confer with the other team(s).  They know, if they keep the "contract" 
>intact, that everything will be OK.  Now, contracts depend on more than 
>syntax.  But syntax is a necessary, if not sufficient, part of compatibility.
>
>Naming
>
>Naming really matters.  You just can't name all your variables tmp1, tmp2 
>and expect anyone, even yourself, to understand what a routine is 
>doing.  This goes triple for type names (i.e. Perl packages, C++ or Java 
>classes).  Think very, very carefully - for 15 minutes - about module 
>names.  Be ready, in the first week, to rename a package if you didn't get 
>it right.  After that, it's set in concrete.  You're stuck with it.  Oh 
>well.  The name should describe, in terms of the application, what data or 
>behavior the variable represents.  Be wordy as hell for module names and 
>global variables.  Be concise but not obscure for local variables.  4-8 
>chars is OK, this isn't COBOL.  Never, ever use 1 character variables.  Not 
>even "i".  Try "ix".
>
>For in-house code, pick a top level package qualifier.  If you work for 
>Acme, Inc.  Use names like Acme::Util, Acme::Date, Acme::Form, etc.  this 
>way your module names won't collide with modules off of CPAN.  Perl makes 
>this easy.  Along these same lines, observe good Perl etiquette and do not 
>export more than you need to (if anything) from your modules.  This was a 
>mistake I made often when first learning Perl.
>
>Pick a capitalization scheme and stick to it.  For myself, I like 
>module/class names starting with caps, whereas variables begin with lower 
>case - with all of these using camel case to join multiple words.  E.g. my 
>$formGen = Acme::FormGen->new( $cgi );  This way I don't have to switch 
>conventions when I switch to Java or C++!
>
>Performance
>
>Without getting into details, suffice it to say that it is much, much 
>easier to find and fix performance problems in well organized, maintainable 
>code.  For example, encapsulation allows you to optimize performance of a 
>shared module without breaking anything else.  This stuff really works!
>
>Well, that's the basics.  I'm sure there are more things I could mention, 
>but you don't want a book here.  I'll wager most of these look familiar to 
>most folks coming from the systems side.  But I'll bet there is a scientist 
>or two out there new to Perl that will benefit.  Hope this helps.
>
>Enjoy,
>Charlie
>
>
>At 11:01 PM 3/12/2002 -0500, Sean Quinlan wrote:
>
>>[forwarded submission from a non-member address -- rjk]
>>
>>
>>From: Sean Quinlan <[EMAIL PROTECTED]>
>>Date: Tue, 12 Mar 2002 20:43:52 -0500
>>Subject: maintenance of large perl code bases
>>To: [EMAIL PROTECTED]
>>
>>I had hoped to bring up this question at tomorrows meeting, but 
>>Wednesday's are hard, and tomorrow looks impossible. So maybe someone can 
>>toss this up for discussion, and hopefully let the list know the key points.
>>
>>I know there are sights out there, such as Boston.com it appears, and I've 
>>heard about some large financial institutions, that rely on substantial 
>>amounts of Perl code. Obviously for a successful business, having that 
>>code be maintainable is (or should be!) of significant importance. But I 
>>regularly hear complaints, largely from non-Perl (or Perl primary anyway) 
>>people from other industries coming into bioinformatics, about these large,
>>unmaintainable Perl code bases.
>>
>>Now, in my experience, I have to admit this is largely more true than 
>>not.  Usually because most of the software was written by people who were 
>>biologists/engineers/physicists/whatever first, and programmers (sometimes 
>>distant) second, often without thought or concern of it's long term 
>>usability. So I've heard of a few places now moving away from Perl, 
>>frequently apparently forcing a large ground up recode in some other 
>>(usually in Java, and I've heard some interesting 'rumors' as to why)
language.
>>
>>I see little point in arguing with this from the standpoint of simply Perl 
>>first. I know others better than I have done talks and presentations on 
>>writing maintainable Perl code, and probably on the problems with porting 
>>old code to a more maintainable format. I want to steal from those 
>>people... blatantly (with credits of course).
>>
>>What I would like to do is to collaborate with a few people who have:
>>1) Done presentations related to the subject of code maintenance (and a 
>>little QA thrown in might be good).
>>
>>2) Have been involved with or responsible for large installations of Perl 
>>code that was well maintained.
>>
>>3) Others involved with bioinformatics interested in or having experience 
>>with this problem.
>>
>>What I would like to and up with are sources for presentations (preferably 
>>a couple already canned of varied lengths) on the subject of maintaining 
>>large Perl code bases written specifically as it applies to 
>>bioinformatics. If you don't want/have time to collaborate, but have 
>>pointers to good sources of information/inspiration, please also pipe up.
>>
>>Thanks everyone!!!
>>
>>--------------------------------------------------------------
>>Sean P. Quinlan
>>http://people.ne.mediaone.net/squinlan/index.html
>>mailto:[EMAIL PROTECTED]
>>"You can discover more about a person in an hour of play than in a year of 
>>conversation" - Plato
>
>Attachment Converted: "C:\INTERNET\EUDORA\Attach\assert.pl"
>
>Attachment Converted: "C:\INTERNET\EUDORA\Attach\testvaluemap.pl"
>
--------------------------------------------------------------------------
Sean P. Quinlan
http://people.ne.mediaone.net/squinlan/index.html
mailto:[EMAIL PROTECTED]
"You can discover more about a person in an hour of play than in a year of
conversation" - Plato

Re: [Boston.pm] maintenance of large perl code bases

Reply via email to