[OT] Request for Extreme Programming with Perl Experiences
Apologies for the off-topic post... I'm looking for stories, annecdotes, comparisons, etc. from people who are using extreme programming using Perl. Even if you are using only some of the practices, such as, testing, coding, and refactoring, your input will be useful. The good, the bad, the ugly. Anything that is about a real XP experience using Perl in real-world situations. I would especially like to hear about large application development in a commercial environment. The stories are for my book: Extreme Programming with Perl. If you want to keep your name and company anonymous, please let me know. If I decide to include your story, you'll get to review the text before I submit it to my editor. If you want to reply to a list, send it to the extremeperl Yahoo group. Or, you can send it directly to me (nagler) at bivio.biz, and I'll keep it confidential. Thanks, Rob -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html
Re: submit input truncation
Bill Marrs writes: But, I'm getting an intermittent problem with POSTs where the input data is being truncated. This causes havoc, especially in my forum system. [snip] Has anyone else seen this? Is there some fix for it? We have seen this on mp1. We read $r-header_in('Content-length') worth of data. If the data is less, we assume it is a double click and toss the request (more or less). Here's the exact code: http://petshop.bivio.biz/src?s=Bivio::Agent::HTTP::Form Search for Content-length. Rob
Re: OSCON ideas - MVC talk
Andy Wardley writes: I like the sound of it, but I should warn you that I have a personal crusade against inappropriate use of the phrase MVC in relation to web development. So how about a panel discussion. I would gladly represent the MVC camp. :-) (see http://www.bivio.biz/hm/why-bOP for my position.) I am thinking about giving a talk about subject matter oriented programming (SMOP). SMOP separates the programming concerns to allow you to concentrate on the subject matter with minimal distractions. If you are familiar with patterns, it's the interpreter pattern taken to the extreme. The example would be to compare Sun's Pet Store with our own http://petshop.bivio.biz. The 3 major SMOP languages in bOP's PetShop allow you to focus on the subject matter in the models, views, and controllers without getting bogged down in syntax and unnecessary repetition. This is not a SMOP from J2EE's Pet Store[1]: tr td class=petstore_form align=right bFirst Name/b /td td align=left colspan=2 waf:input cssClass=petstore_form name=given_name_a type=text size=30 maxlength=30 validation=validation waf:valuec:out value=${customer.account.contactInfo.givenName}//waf:value /waf:input /td /tr tr td class=petstore_form align=right bLast Name/b /td td align=left colspan=2 waf:input cssClass=petstore_form type=text name=family_name_a size=30 maxlength=30 waf:valuec:out value=${customer.account.contactInfo.familyName}//waf:value /waf:input /td /tr And, this is a SMOP in bOP[2]: [ vs_form_field('UserAccountForm.User.first_name'), ], [ vs_form_field('UserAccountForm.User.last_name'), ], The intent is to demonstrate the power of Perl to distill the essence of the subject matter. Interest? Rob [1] http://java.sun.com/blueprints/code/index.html#java_pet_store_demo [2] http://petshop.bivio.biz/src?s=View.account
Re: development techniques
mpm writes: Debugging of the applications now looks like: $ced-log('warn',No price for this product) Here's an an alternative that we've evolved from Modula-2 to C to Java to Perl :-) Firstly, I try to distinguish between stuff I always want to see and debugging messages. The former we call logging, and wrap it in a class Bivio::IO::Alert which also outputs the source line of the caller, time, etc. configurably. This is very handy for figuring out what's complaining. The latter we call trace messages which is presented by Bivio::IO::Alert, but is defined as follows: _trace('No price for this product') if $_TRACE; The if $_TRACE is an optimization, which can be left out but avoids the overhead of argument evaluation. The _trace() subroutine and $_TRACE variable is dynamically generated by our Trace module, which any package can register with as follows: use vars ('$_TRACE'); Bivio::IO::Trace-register; You can then configure tracing with two configuration values, which also can be passed on the command line. Here's an example: 'Bivio::IO::Trace' = { package_filter = '/Bivio/ !/PDF/', call_filter = '$sub ne Bivio::Die::_new', }, Here I want to see tracing from all packages with the word Bivio in their names but not PDF, and I want to ignore individual calls from the subroutine Bivio::Die::_new. In practice, we rarely use the call_filter, so from any bOP command line utility, you can say, e.g., b-release install my-package --TRACE=/Release/ which translates to: 'Bivio::IO::Trace' = { package_filter = '/Release/', }, You can set the call filter or any other configuration value from the command line with --Bivio::IO::Trace.call_filter='$sub ne foo' We use LWP for testing. For things like cookies and argument parsing, LWP is great for regression testing. For content, it is much harder to come up with a pass/fail situation since the content can change, but still possible. You might want to check out Bivio::Test::Language::HTTP. It parses the incoming HTML, and allows you to write scripts like: test_setup('PetShop'); home_page(); follow_link('Dogs'); follow_link('Corgi'); follow_link('Female Puppy Corgi'); add_to_cart(); checkout_as_demo(); This particular code does a number of things including validating that animals are getting in the cart. Additional script language is defined in Bivio::PetShop::Test::PetShop, which subclasses Bivio::Test::Language::HTTP, which provides follow_link and home_page generically. I haven't found a better way to do web development testing durring development. Possibly writing the test first would provide some improvement since you know when you have completed the change(see XP docs). I agree. A very important practice is unit testing, especially with large applications. For an alternative to Test::More and xUnit, have a look at Bivio::Test, which allows you to write tests that look like: Bivio::Test-unit([ 'Bivio::Type::DateTime' = [ from_literal = [ [undef] = [undef], ['2378497 9'] = ['2378497 9'], ['-9'] = [undef, Bivio::TypeError-DATE_TIME], ['Feb 29 0:0:0 MST 1972'] = ['2441377 0'], ['Feb 29 13:13:13 XXX 2000'] = ['2451604 47593'], ['1972/2/29 0:0:0'] = ['2441377 0'], ['2000/2/29 13:13:13'] = ['2451604 47593'], ['Sun Dec 16 13:47:35 GMT 2001'] = ['2452260 49655'], ], from_local_literal = [ [undef] = [undef, undef], ['2378497 9'] = ['2378497 7209'], ['-9'] = [undef, Bivio::TypeError-DATE_TIME], ['Feb 29 0:0:0 MST 1972'] = ['2441377 7200'], ['Feb 29 13:13:13 XXX 2000'] = ['2451604 54793'], ['1972/2/29 0:0:0'] = ['2441377 7200'], ['2000/2/29 13:13:13'] = ['2451604 54793'], ], ], ]); We can write a lot of tests very quickly with this module. We don't always do this, but every time we don't, we regret it and end up writing a test anyway after figuring out that we still aren't perfect coders. :-) Yet another trick we use is executing a task from within emacs or on the command line. A task in bOP is what the controller executes when a URI is requested. For example, b-test task login There are two advantages to this: 1) you don't have to restart Apache and go to another program (browser or crawler) and 2) you get the stack trace when something goes wrong and you can type C-c C-e (in emacs) to go right to the error. We added this facility recently, because we got tired of the internal server error restart loops. They slow things down tremendously, and anyway, you often want to look at the HTML to see if some thing has changed. The output of 'b-test task' is the resultant HTML and any mail messages that would be sent, which you can then search on immediately directly in emacs without first having to say Tools - View Source and get
Re: General interest question: PDF contents handling in PostgreSQL.
Fabián R. Breschi writes: I wonder if using ModPerl and PostgreSQL there's any possibility to resemble what in Oracle is called 'Intermedia', in this particular case parsing/indexing content of PDF files inside PostgreSQL as a LOB or alternatively as a flat OS file with metadata parsed/indexed from it into the RDBMS. We use Intermedia and Postres on separate projects. Oracle's PDF parsing can be emulated with pdftotext. You'll need a search engine. Frankly, I'm not totally pleased with Intermedia. It's indexer is slow, and you have to re-optimize often. This affects a bunch of stuff related to the database, e.g., redo logs, which makes db management more difficult. If I had the time, I'd probably drop it. Rob
Re: How can I tell if a request was proxy-passed form an SSLserver?
John Siracusa writes: and that does the trick. The full code for the module is at the end of this message. But I still think this is an ugly hack, and I'd like to be able to do this using standard apache modules or config parameters... Our hack is to forward 443 to port 81 on the middle tier: VirtualHost 1.2.3.4:443 ... ProxyVia on ... RewriteRule ^(.*) http://middle.tier.host:81$1 [proxy] We set a value (is_secure = 1) on our internal request object when it is initialized if the incoming port is 81. We also set remote_ip with: $r-connection-remote_ip($1) if ($r-header_in('x-forwarded-for') || '') =~ /((?:\d+\.){3}\d+)/; This makes the log entries useful. There might be an easier way to do this. Rob
Re: [O] Re: Yahoo is moving to PHP ??
Perrin Harkins writes: Correct Perl style is probably not something that any two people will ever agree on. If you use Extreme Programming, the whole team has to agree. Collective ownership, pair programming, and refactoring all suffer if you don't have a common coding style. The use of map, unless, closures, eval, etc. needs to be discussed and agreed on. It's a sign of a weak team if you can't agree on these details. I've seen some hideous Java code at this place that really takes the wind out of the Java is more maintainable argument. I thought all Java code is hideous. ;-) Rob
Re: [OTish] Version Control?
Michael Schout writes: example, one time we upgraded Apache::Filter between releases. Unfortunately, the old version was not compatible with the new version, so a single machine could run either the current release branch, or the development branch, but not both simultaneously (because Apache::Filter was incomptaible and was installed under /usr/lib/perl5). We are transitioning (slowly) between perl 5.005 and 5.6.1. Our trick is to have separate 5.005 and 5.6.1 build/test (and sometimes dev) machines. I'm not sure this solves your problem. 1) some Makefile.PL's refuse to generate a Makefile if PREREQ_PM's are not satisfied (if we haven't built them yet) If we have to bootstrap, we do a regular CPAN install on the build machine and then install over it with the RPM build. Also, we use Red Hat which has many CPAN modules already installed (see uninstall instructions below), so bootstrapping is rarely an issue. 2) some Makefile.PL's are INTERACTIVE, and you cant turn it off (e.g.: Apache::Filter requires you to hit Return a number of times at a MINIMUM. perl Makefile.PL /dev/null works for us. We encapsulate it in a macro (see below). So we resorted to a set of overly-complicated GNUmakefiles that would generate Makefile's from Makefile.PL's, and these would set PERL5LIB to find the dependencies (e.g.: DBD-Pg would put ../DBI/blib into PERL5LIB). Here's our spec file: Name: perl.modules Summary: Perl Modules not in stock RH7 Group: Perl/Modules Provides: perl.modules perl-libwww-perl Requires: perl Version: 5.6 BuildRoot: install %define modules BSD-Resource IO-stringy Digest-MD5 Digest-HMAC Digest-SHA1 MD5 Crypt-IDEA Crypt-DES Crypt-Blowfish Crypt-CBC DBI DBD-Pg DBD-Oracle DBD-Sybase DBD-mysql TimeDate MailTools MIME-tools Devel-Symdump Image-Size Compress-Zlib Archive-Zip File-MMagic TermReadKey Crypt-SSLeay libwww-perl Parse-RecDescent Mail-Field-Received POP3Client Mail-2IMAPClient Test-Simple Time-HiRes Digest-Nilsimsa razor-agents Mail-Audit Mail-SpamAssassin XML-XPath httpmail %description Perl Modules not in stock RH7 or newer CPAN versions. To remove RedHat standard installs, do: rpm -e --nodeps $(rpm -qa | egrep 'perl-(DBD|DBI|libwww)') If you want to use Sybase (SQL Server), you need: b-release install freetds-0.53-1.i386.rpm And to compile this, you need: b-release install freetds-devel-0.53-1.i386.rpm %prep %{cvs} external/perl-modules-5.6 %build cd external/perl-modules-5.6 unset PERL_MM_OPT for f in %{modules}; do ( if test $f = 'Crypt-IDEA'; then export PERL_MM_OPT='POLLUTE=1' elif test $f = 'DBD-Sybase'; then export SYBASE=/usr elif test $f = 'DBD-Pg'; then export POSTGRES_LIB=/usr/lib POSTGRES_INCLUDE=/usr/include/pgsql fi cd $f %{perl_make} ) done %install cd external/perl-modules-5.6 for f in %{modules}; do (set -e; cd $f; %{perl_make_install}) done cd $RPM_BUILD_ROOT %{allfiles} ../files %files -f files %clean [ $RPM_BUILD_ROOT != / ] rm -rf $RPM_BUILD_ROOT %pre # Perl must be setup properly perl -e 'require syscall.ph' 2 /dev/null || ( umask 022 cd /usr/include h2ph -r -l . /dev/null ) The macros perl_make_install and perl_make are defined below. We run a program (Bivio::Util::Release mentioned in another post) which generates the actual spec file and calls rpm. (%{allfiles} and %{cvs} are trivial and defined there, too.) This program also builds a separate directory and defines topdir, etc. correctly so you can build everything as any user. sub _perl_make { return '%define perl_make umask 022 perl Makefile.PL /dev/null ' . make POD2MAN=true\n . '%define perl_make_install umask 022; make ' . join(' ', map { uc($_) . '=$RPM_BUILD_ROOT' . $Config::Config{$_}; } grep($_ =~ /^install(?!style)/ $Config::Config{$_} $Config::Config{$_} =~ m!^/!, sort(keys(%Config::Config . ' POD2MAN=true pure_install ' . ' find $RPM_BUILD_ROOT%{_libdir}/perl? -name *.bs ' . -o -name .packlist -o -name perllocal.pod | xargs rm -f\n; } [Uh oh, there's that nasty map function. ;-] Note that we don't install man pages. This slows down the build/install, and perldoc is just as easy to type as man. :-) We use this same function for all our perl apps. Indeed, to build a new app, our specfile looks like: Copyright: Logistics R Us, Inc. Requires: Bivio-bOP apache ImageMagick-perl %define perl_root LogisticalNightmare %define perl_exe_prefix ln _b_release_include('perl-app.include'); perl-app.include knows how to read our tree structure, which is consistent across projects, and it installs all perl, programs, images, views, etc. How does everyone else cope with this (managing trees of CPAN modules / CPAN module tree build environments)? Maybe we are sort of unique in that we use so many 3rd
Re: [OTish] Version Control?
Dominic Mitchell writes: How do you cope with the problem that perl has of running different versions of modules? Actually, we have two problems. One problem is developing with multiple versions and the other is what you mention, running production systems. Sometimes I might be in the middle of some big refactoring, and a customer calls with a problem. I then do: cd mkdir src_bla cd src_bla cvs checkout perl/Bla perl/Bivio where Bivio is our shared code. Then I set PERLLIB to ~/src_bla. We've got a bash command that allows me to switch the configuration and PERLLIB as well. It's very easy to do. Oh, and we *never* (almost :-) put code in programs. The programs invoke a *.pm file's main so we can say bla-some-command and always get the right version. We solve the second problem by buying cheap machines which run Linux just fine. (I just bought 4 x Dell 2300, 2 x Dell 1300, and 2 x white box for $1800. $-) It just isn't worth my time trying to make two sites work on the same machine, although we do this in a couple of cases (e.g. www.bivio.biz and petshop.bivio.biz). When two or more sites do share the same machine, we always run the same version of the infrastructure. This avoids many problems, e.g. running into defects twice and managing multiple versions. We don't tag our CVS. We can backout changes with RPM. We do several releases a week on active applications, and one release a week on applications in maintenance mode. One final reason to avoid multiple versions is with schema changes. The more different database versions you have, the more confusing. On bivio.com we upgraded the schema about 250 times in about two years. It would have been impossible to keep the development, test, and production if these three diverged too much. Rob
Re: [OTish] Version Control?
Another approach which allows easy sharing between projects is: ~/src/perl/ + Project1/ + Project2/ + Project3/ where Project[123] are root package names. Set PERLLIB=~/src/perl and you can get access to any *.pm in the system, each has a globally unique name. This makes it easy to implement cross-project refactorings. We use CVS for source management, and we use RPMs for deployment. RPM allows you to ask what release of ProjectN is on the system. RPM also allows you to manage permissions and ownership easily. Our RPM spec builder/installer can be found at: http://petshop.bivio.biz/src?s=Bivio::Util::Release Rob
Re: Yahoo is moving to PHP ??
Perrin Harkins writes: The real application stuff is built in other languages. (At least this is the impression I get from the paper and from talking to people there.) I think Yahoo Stores is written in Lisp. I also believe it handles the front and back end. Would be interesting to know why this was left out of the discussion. Rob
Re: Yahoo is moving to PHP ??
Tagore Smith writes: I think it would be harder to hire people to work on his system (of course you'd probably also get more experienced people, so that might not be such a bad thing). This raises the $64 question: If you could hire 10 PHP programmers at $50/hour or 4 Perl programmers at $125/hour, which team would deliver more business value over the life of the site? Graham's system uses macros extensively, and from other code of his that I've read (Graham wrote a couple of books about Lisp), I'd bet that he uses recursion and mapping functions a lot as well. His On Lisp book is a classic on macros--which are similar to closures in Perl. You can download it for free: http://www.paulgraham.com/onlisp.html My guess is that Graham's answer to the above question would be: Hire two Lisp programmers at $250/hour. :-) Rob
Re: cobranding strategies?
Kirk Rogers writes: I'm looking to build cobranding capabilities into a mod_perl site and am looking for some documentation or guidelines to follow. Anyone know of documentation that I can find. We've had some pretty stringent requirements that led us to indirecting all fonts, tagged text, colors, icons, and URIs. It also turned out to be handy for building up two independent sites in one mod_perl server and handling skins for the same site. This may be overkill if you just want the custom-logo-in-the-corner cobrand, but I suspect you are looking for more. Have a look at http://petshop.bivio.biz/src?s=Bivio::UI::Facade for the interface, and for an example facade http://petshop.bivio.biz/src?s=Bivio::PetShop::Facade::PetShop If you have any questions, mail me directly. Rob
Re: [OT] - Mailing List Servers/mods .. etc
Jim Morrison [Mailinglists] writes: I'm wondering if there is any point in looking for a piece of third party software/module etc, that will handle the sending of the mail or should I work directly with sendmail? (Is sendmail the best mailserver for this kind of thing?) sendmail has its problems, but I can send about 10K msgs/hour on a low-end server (500Mhz). It's good enough for most low-end mailing list problems. I'd be happy to write something along the line of formail.pl on my own, so I kinda know what I'm doing, but I'm gonna have to take things like Return to sender errors and such into account.. Tough problem in general, which companies like experian, doubleclick and returnpath.net spend lots of money on. You need to know how to parse this information without false positives. My question I guess is: - Is it ok to send 100's or 1000's of mails to sendmail in one go, or is there a better way of doing bulk mail? I don't think you should worry about it right now. sendmail can handle the load. You can always use an internal relay if you need to distribute the load. Hardware is cheap. - Are there any mods to help with dealing with returned mail etc..? bOP has a C program called b-sendmail-http.[1] It's a gateway from sendmail to http. We handle all mail through mod_perl. You can use b-sendmail-http with any HTTP implementation, because it simply wraps the e-mail, client IP, and envelope to into multipart/form-data.[2] - Is there a good list of people doing this sort of thing? (Or do you mind the thread being a little off-topic!) I like it, then my current project is in this space. :-) I don't think I'm trying to reinvent the wheel.. Just that I think there is so much of my own coding involved, I'm not sure if I'm going to be able to get away with anything less than writing it from scratch.. The code isn't complicated, but the detailed knowledge is. There are a number of mailinglist packages out there including ultimate bbs, which is used by quite a number of sites. We rolled our own, because email is integrated with other apps (e.g. search, file sharing, and group join). Rob [1] http://www.bivio.biz/f/bOP/bin/b-sendmail-http.c [2] http://petshop.bivio.biz/src?s=Bivio::Biz::Model::MailReceiveBaseForm
RE: Linux + Apache Worm exploiting pre 0.9.6g OpenSSL vulnerabilities on the loose
Christian Gilmore writes: I believe the virus only affects systems pre-0.9.6e: http://www.openssl.org/news/secadv_20020730. Also note that vendors may have retrofited older versions with the patch. For example, Red Hat still is at 0.9.5a 0.9.6b (see http://rhn.redhat.com/errata/RHSA-2002-160.html for more info) Rob
Re: bivio and mod_perl
zt.zamosc.tpsa.pl writes: Do many mod_perl programmers use bOP by bivio.biz in their large projects? At least 3. :-) We have a few downloads, but I doubt anybody is using it for anything serious besides us. (Others, please correct me if I'm wrong.) Could you share with your experience at working with it? What is unique to bOP, which also is its weakness, is that we exploit Perl to the max. We avoid special syntaxes, such as XML, except for input and output. This means we get all the power of Perl in view languages, acceptance tests, unit tests, etc. This makes it hard for anybody who is not a Perl application developer to build applications in bOP, i.e. we function as designers and programmers--sometimes we get help from graphic artists or writer. The documentation looks very very... promissing. It works. There is no design documentation for a variety of reasons, so you have to be prepared to look at code and examples to figure out how it works and how to use it. bOP has been commercially deployed for years and evolves on demand, e.g. the View language itself was only added relatively recently and we just released our e-commerce component. What we like is that we don't have to program very much to get a lot done, but when we need to write ordinary Perl code, bOP helps us instead of hindering us. To me, there are two ways to use bOP: as an example or as a platform. I think many people have looked at it, and rolled their own. Infrastructure in mod_perl is *easy*. It's the applications that are the hard part (in any platform). Any infrastructure has to match your style or you have to be willing to adapt. If you like learning or already understand declarative programming, you may find bOP suits your needs out of the box. Rob
Re: [ANNOUNCE] Petal 0.1
Jean-Michel Hiver writes: My only problem deals with template caching. Currently Petal does the following: * Generate events to build a 'canonical' template file * Convert that template file to Perl code ** Cache the Perl code onto disk * Compiles the Perl code as a subroutine ** Caches the subroutine in memory I wonder how much code you would save if you wrote the templates in Perl and let the Perl interpreter do the above. Sorry, I know this doesn't help you answer your question, but by eliminating XML from the design, the debate about SAX vs XML::Parser would be irrelevant. Your code would run faster, and you would need fewer 3rd party APIs. Rob
Re: [ANNOUNCE] Petal 0.1
Jean-Michel Hiver writes: I wonder how much code you would save if you wrote the templates in Perl and let the Perl interpreter do the above. I recommend that you read this Page: http://www.perl.com/pub/a/2001/08/21/templating.html?page=2 Please read the Application Servers section of: http://www.bivio.biz/hm/why-bOP I'm an OO-advocate, I believe in proper separation of logic, content and presentation Moi aussi. What does this have to do with using Perl for business logic and presentation logic? and on top of that I want people to be able to edit templates easily in dreamweaver, frontpage, etc and send templates thru HTML tidy to be able to always output valid XHTML. If you are an OO-advocate, you would hide the presentation format in objects, e.g. Table, String, and Link. This ensures the output is valid through the (re)use of independently tested objects. Objects also provide a mechanism for overriding behavior. Petal lets me do that. If that's not of any use to you, fine. The world is full of excellent 'inline style' modules such as HTML::Mason, HTML::Embperl and other Apache::ASP. These all work on the assumption that the template is written in HTML. If you start with OO Perl, you do not inline anything, not even the HTML. Here is an example page: http://petshop.bivio.biz/items?p=RP-LI-02 And here is the HTML-less source: http://petshop.bivio.biz/src?s=View.items Apologies to those who are tired of the *ML vs. Perl debate. Rob
Re: [OT] Better Linux server platform: Redhat or SuSe?
David Dyer-Bennet writes: Obviously hardware RAID will save CPU cycles somewhat, and SCSI disks of the right type will increase IO bandwidth somewhat, but if you're not short of those things and still want the added security of mirroring, I think the software RAID is a viable option. Harware RAID is usually hotswappable, which is quite nice. Rob
Re: Is mod_perl the right solution for my GUI dev?
Fran Fabrizio writes: - Real-time data updates. HTTP is stateless: it serves up the page then closes the connection. Any updating involves a round-trip back to the server. In traditional GUI, you just hold a db connection and repaint the areas that are updated. Solved with refresh? JavaScript and Java can also help here. For interactivity, check out: http://www.cs.brown.edu/people/dla/polytope/tetra.html - State maintenance. Since it is stateless, you have to jump through a lot of hoops to realize that two requests are coming from the same person, since they could be handled by two different child processes or even two different servers. This has all sorts of ramifications on user login, user preferences, where the user was in the application, etc... you have to do a lot of work on the server side to realize that it's the same client that keeps talking to you. Cookies work fine. - Fancy interface widgets/layouts. HTML/CSS/JavaScript/DHTML can only get you so far. If you need fancy menu types, forms, layouts, etc... it quickly becomes tedious/impossible on the web. Tedious is questionable. Impossible, I seriously doubt. Remember, you can always delegate part of your screen to a Java applet, although I strongly recommend you avoid this. This is just the tip of the iceberg. Let's talk about the positives: + You update the server and instantly all clients are up-to-date. + You can detect incorrect usage, bugs, etc. by parsing a single log file, in real-time + The system is immune to operate system upgrades. And DLL hell on Windows boxes. + You access the system from anywhere reliably and securely. You don't have to open up a database connection to anybody but the Web server(s). + There is only one version of the software. + Support people can view the output sent to the client exactly as the client received it. Including following a series of actions. + The use of a Web browser is familiar to most users. + The user can keep multiple views of the pages she wants, not what the application decides to offer. + Bookmarks allow users to structure their view of the application. Advanced users can create new organizations (short cut pages) for themselves and their co-users. + Users can share information easily (send page by email, mail bookmarks, print page, save to disk, save picture, etc.) I'm sure others will add to the list. Rob
RE: mod_perl/passing session information (MVC related, maybe...)
Vuillemot, Ward W writes: I log into your web-site as memberA. You kindly leave me a delicious cookie with my username stored in it. Maybe even my password (I hope not!). Now, I know that another member, memberB, has special rights to your site. What is stopping me from editting the cookie to memberB's username and hijacking their account? If you can crack Blowfish, IDEA, etc., you are in. Then again you can probably just sniff the network for memberB's username and everybody else's passwords for that matter, even via SSL. Part of bOP is multi-tiered security architecture including something I call data gateways to help protect against programmer mistakes. And if you do store the password information in the cookie...you are letting each user be compromised either as the cookie is flung through the Internet ether, or minimally on their own computer where someone else can easily access the cookies. If you have access to someone's cookie file, you probably can log their keystrokes. Contact your local spy agency for more information on how to do this. With sessionID, you have an ID and information that is checksum'd. Sessions and user IDs are equivalent. They are called credentials which allow access to a system. There's no fundamental difference between hijacking a session or stealing a user id/password. If I wanted to delete a user and ensure they immediately lost all access, it is rather trivial to go through all active sessions in the db, see if the user I am deleting matches the username in the session information, and if so delete the session record. Denormalization is the root of all evil. The extra step involves more code, more bugs, and more system resources. Other than that, you're right. You can do this, but the question I ask: Do you need to? Rob
Re: separating C from V in MVC
Dave Rolsky writes: Trying to jam a thick layer of OO-goodness over relational data is asking for a mess. Most OLTP applications share a lot in common. The user inputs data in forms. The fields they edit often correspond one-to-one with database fields, and certainly their types. The user wants reports which are usually closely mapped to a table/view/join, i.e. an ordered list of tuples. A reasonable O/R mapping can solve this problem easily. Like Perl, it makes the easy things easy and the hard things possible. The bOP Pet Shop demostrates how you can build a simple application with only a couple of custom SQL queries. The rest are simple joins and CRUD. If you need more complex queries, there are escapes. You still probably end up with a list of tuples for your reports. The key we have found is avoiding indirection by naming fields and models the same in SQL and Perl objects. This allows you to seamlessly switch between the two. We've found the O/R mapping to be an indispensable part of the system. Since all data is contained in objects, the views/widgets don't need to how the data is populated. They access all data through a single interface. Rob
Re: mod_perl/passing session information (MVC related, maybe...)
Perrin Harkins writes: My preferred design for this is to set one cookie that lasts forever and serves as a browser ID. I like this. It's clean and simple. In this sense, a browser is not really a session. The only thing I don't like is garbage collection. unique browser ID (or session ID, if you prefer to give out a new one each time someone comes to the site) lets you track this for unregistered users. We call this a visitor id. In the PetShop we have a cart id, but we're not too happy with the abstraction. I don't see that as a big deal. You'd have to delete lots of other data associated with a user too. Actually deleting a user is something I've never seen happen anywhere. We do. Especially when we went from free to fee. :-( The big issue I have with session data is that it is often a BLOB which you can't query. Well, eToys handled more than 2.5 million pages per hour, but caching can be important for much smaller sites in some situations. I'd like numbers on smaller and some. :) Here's a situation where a small site could need caching: We cache, too. An interesting query is the club count on bivio.com's home page. The count of clubs is a fast query, but the count of the members is not (about 4 seconds). We compute a ratio when the server starts of the members to clubs. We then run the club count query and use the ratio to compute the member count. We restart the servers nightly, so the ratio is computed once a day. Maybe I just have bad luck, but I always seem to end up at companies where they give me requirements like these. It's the real world. Denormalization is necessary, but only after you test the normal case. One of the reasons I got involved in this discussion is that I saw a lot of messages about solutions and very few with numbers identifying the problem. Rob
Re: separating C from V in MVC
Dave Rolsky writes: The Pet Shop has a grand total of 13 tables. How well does this approach work with 90 tables? Works fine with bivio.com, which has 50 tables. How does it handle arbitrary queries that may join 1-6 tables, with conditionals and sorting of arbitrary complexity? The ListModel can override or augment its query. You can load a ListModel from an arbitrary data source as a result. After the load, it can fix up rows, e.g. computing percent portfolio is not done in SQL but in Perl in internal_post_load_row(). The automatic sorting is handy for simple joins. For complex queries, there's no fully automatic solution for sorting. Here's a simple query: http://petshop.bivio.biz/pub/products?p=DOGS The ListModel declares which columns are sortable: order_by = [ 'Product.name', 'Product.product_id', ], The view doesn't need to say anything, because the Table widget queries the ListModel meta-data. The SQL query is dynamically constructed by the o HTTP query value. For complex queries, you may be able to take advantage of the sort infrastructure. There are no guarantees, but you have the rope. The software is designed for the 80% solution. As we see patterns develop in our code, we add general cases to the infrastructure. I'm not a big fan of O/R. I prefer R/O. But to each their own. I guess we do R/O in the sense that we design the database relationally and then map PropertyModels one-to-one with the tables. Is that what you mean by R/O? Rob
RE: mod_perl/passing session information (MVC related, maybe...)
Jeff AA writes: An advantage of the session/id is that you end up with stateful query instances, Stateful instances are also problematic. You have essentially two paths through the code: first time and subsequent time. If you write the code statelessly, there is only one path. Fewer bugs, smaller code, less development. Sessions are caches. Add them only when you know you need them. and can remember [at least for a short period!] the total number of items, so that you can say 'Results 1 to 10 of 34,566' without having to count all results every time. Maybe this is just because we are using Oracle, but if you do a query: SELECT count(*) FROM bla, bla... followed up by: SELECT field1, field2, ... FROM bla, bla... Oracle will cache the query compilation and results so it is very fast (basically a round-trip to database server) for the second query. We execute these two queries on every paged list on every request. One of the advantages of a declarative OR mapping is that you can do things like sort to select asfields and order queries consistently. Oracle takes advantage of this. I don't know of mySQL or Postgres do, too, but they probably will someday. It's a bit slow (seconds) with Oracle's Context engine, which we've been considering replacing. Most of our queries are not text searches iwc Oracle queries take less than 20ms per query. We're not a large site (peak 50K views/day), and we have enough hardware (two front ends, two middle tier, one db). Our smaller sites (e.g. bivio.biz) run on minimal hardware and use Postgres. They use the same code, and it seems to work fine. Rob
Re: mod_perl/passing session information (MVC related, maybe...)
Perrin Harkins writes: I find you can tie this cache stuff up inside of your data access objects and make it all transparent to the other code. Absolutely. A session is useful for very limited things, like remembering if this user is logged in and linking him to a user_id. We store this information in the cookie. I don't see how it could be otherwise. It's the browser that maintains the login state. Consider the following scenario: * User logs in. * Site Admin decides to delete the user. * In our stateless servers, the user_id is invalidated immediately. * Next request from User, he's implicitly logged out, because the user_id is verified on every request. In the case of a session-based server, you have to delete the user and invalidate any sessions which the user owns. Although Oracle can be fast, some data models and application requirements make it hard to do live queries every time and still have decent performance. This is especially true as traffic starts to climb. I've tried to put numbers on some of this. I've never worked on a 1M/day site, so I don't know if this is the point where you need sessions. What sites other than etoys needs this type of session caching? Rob
Re: separating C from V in MVC
Matt Sergeant writes: There's quite a few things that are a lot harder to do with XML in plain perl (especially in SAX) than they are in XSLT. This assumes you need XML in the first place. It's trivial to manipulate Perl data structures in Perl. It's also easy to manipulate XML in Perl. However, it's impossible(?) to manipulate Perl data structures in XSLT. Rob
Re: [OT] MVC soup (was: separating C from V in MVC)
Bill Moseley writes: Anyone have links to examples of MVC Perl code (mostly controller code) that does a good job of M and C separation, and good ways to propagate errors back to the C? I humbly (do believe that ;-) submit http://petshop.bivio.biz Every page contains the control logic which is dynamically parsed from the Task configuration. Here's an example: http://petshop.bivio.biz/pub/products?p=BIRDS The configuration for this task is: [qw( PRODUCTS 500 GENERAL ANYBODY Model.ProductList-execute_load_all_with_query View.products )], The name of the task which is used for all internal linkages is PRODUCTS. The number is a convenience for FormContext, i.e. our closure mechanism for holding state between HTTP forms. The realm is GENERAL, i.e. there is no particular owner. You might have a USER realm or CLUB (group) realm, which have owners. Permission bit is ANYBODY. You can have multiple permission bits, e.g. DATA_WRITEDATA_READ. The rest of the list are items which are executed serially. The syntax is ClassMap.Class. A class map allows you to configure where your models are loaded from. Here's another example: [qw( LOGIN 517 GENERAL ANYBODY Action.UserLogout Model.UserLoginForm View.login next=CART MISSING_COOKIES=MISSING_COOKIES )], The '=' elements (which is not strictly perl, but hey, we all have are inconsistencies ;-) map events to other tasks. For example, if you get a MISSING_COOKIES exception you go to the MISSING_COOKIES task. next=CART says that the next task on an OK on the form is the CART task. All tasks can be found in http://petshop.bivio.biz/src?s=Bivio::PetShop::Agent::TaskId This is all you need to know about the controller if you use bOP. You list your tasks and bOP's Agent does the rest. BTW, the tasks might be executed via e-mail or HTTP or the command line. The controller abstracts this away, too. (We actually removed our Bivio::Agent::Mail implementation, because it made more sense to implement everything via Apache instead of custom servers.) The interface for Views, Actions, and Models is called execute. You'll be passed a Bivio::Agent::Request object which holds the context for the transaction. Rob
Re: separating C from V in MVC
Andy Wardley writes: Because Perl is a general purpose programming language. TT implements a general purpose presentation language. A different kettle of fish altogether. These are the reserve words of TT: GET CALL SET DEFAULT INSERT INCLUDE PROCESS WRAPPER IF UNLESS ELSE ELSIF FOR FOREACH WHILE SWITCH CASE USE PLUGIN FILTER MACRO PERL RAWPERL BLOCK META TRY THROW CATCH FINAL NEXT LAST BREAK RETURN STOP CLEAR TO STEP AND OR NOT MOD DIV END Looks an awful lot like the same keywords in any general-purpose programming language. It's like asking why XML has different syntax and semantics from Perl. Well, if you read the XSLT spec and then look at an XSLT program, you'll see a lot of verbosity and a lot of general purpose constructs like variables, conditionals, and loops. I haven't done much with XSLT, but I do know you can get it in an infinite loop. That seems pretty general purpose to me. I think the rule is: if you can solve Towers of Hanoi in the language, its general purpose enough. True formatting languages, such as, Scribe do not contain general-purpose constructs, so you couldn't solve the Towers of Hanoi. HTML is another good example (ignoring script). I find it easier to have a little language which is tailored to the task at hand. Let's separate syntax from semantics. You can use Perl syntax very easily without adopting the semantics for the little language constructs. For example, here's a bOP configuration file: { 'Bivio::Ext::DBI' = { database = 'petdb', user = 'petuser', password = 'petpass', connection = 'Bivio::SQL::Connection::Postgres', }, 'Bivio::IO::ClassLoader' = { delegates = { 'Bivio::Agent::TaskId' = 'Bivio::PetShop::Agent::TaskId', 'Bivio::Agent::HTTP::Cookie' = 'Bivio::Delegate::PersistentCookie', 'Bivio::UI::HTML::FormErrors' = 'Bivio::PetShop::UI::FormErrors', 'Bivio::TypeError' = 'Bivio::PetShop::TypeError', 'Bivio::Auth::Support' = 'Bivio::Delegate::SimpleAuthSupport', }, maps = { Model = ['Bivio::PetShop::Model', 'Bivio::Biz::Model'], Type = [ 'Bivio::PetShop::Type', 'Bivio::Type'], HTMLWidget = ['Bivio::PetShop::Widget', 'Bivio::UI::HTML::Widget', 'Bivio::UI::Widget'], Facade = ['Bivio::PetShop::Facade'], Action = ['Bivio::PetShop::Action', 'Bivio::Biz::Action'], TestLanguage = ['Bivio::PetShop::Test'], }, }, 'Bivio::UI::Facade' = { default = 'PetShop', }, 'Bivio::UI::Text' = { http_host = 'petshop.bivio.biz', mail_host = 'bivio.biz', }, }; You could use XML, Lisp, or some other syntax for this. Since the implementation of the configuration parser is in Perl, we use eval as the config parser. When I program in Lisp, I use Lisp syntax for config and eval for the parser again. The syntax is different, but the semantics probably are the same. Perrin Harkins writes: The thing that worries me about a widget approach is that I would have the same problem I had with CGI.pm's HTML widgets way back: the designers can't change the HTML easilly. Getting perl developers out of the HTML business is my main reason for using templating. I think this is where our experience diverges. I have hired designers before and every time we had to recode the HTML and the JavaScript anyway. My approach is to apply the Once And Only Once principle, which simplifies design changes (discussed more below). Andy Wardley writes: This is abstraction. Not to be confused with MVC which is one particular architecture well suited to GUI applications. Blindly applying MVC without understanding the real issues (abstraction of front/back ends, separation of concerns, don't repeat yourself, etc.) is likely to build a system which is highly fragmented. Maintenance becomes harder because everything is split up into many different pieces and it becomes difficult to see the wood for the trees. If you apply Once And Only Once extemely, you'll find that MVC is a nice fit for just about any information system. Despite our best intentions, this web site doesn't neatly fall into clearly defined chunks of model, application and view. Well, actually, those parts do split down quite nicely. But then you look at localisation, for example, and we find there is localisation required in the data backend, localisation required in the applications and localisation required in the templates. Thus, localisation is an aspect which cuts across the system. By building a strict MVC we've fragmented localisation and have to trawl through hundreds of different files to localise the site. To solve this problem, we added a letter. bOP is MVCF, where F stands for Facade. A Facade allows you to control icons, files, colors, fonts, text, and tasks. You can
Re: separating C from V in MVC
Perrin Harkins writes: You can actually do that pretty comfortably with Template Toolkit. You could use a filter for example, which might look like this: [% FILTER font('my_first_name_font') %] ... some text, possibly with other template directives in it... [% END %] One of the reasons Perl is popular is its idioms. Having to say something in three lines is not as idiomatic as one line. It takes a lot of discipline to use it everywhere. In other words, I don't think the above is more comfortable than: String(['User.first_name'], 'my_first_name_font'); Note also the accessor for User.first_name in Template Toolkit is probably nontrivial. Rob
Re: separating C from V in MVC
Perrin Harkins writes: The advantage is that my example can contain other templating code: [% FILTER font('basic_info_font') %] Hello [% User.first_name %]!BR [% IF User.accounts %] You have these accounts:BR [% FOREACH User.accounts %] [% name %]: [% balance %]BR [% END %] [% END %] [% END %] Unless I'm missing something about your example, the FILTER concept seems more powerful. [Skirting on the edge of YATW. :-] I think they are equivalent as far as power. I go back to why people use Perl, because it makes the easy jobs easy and the hard jobs possible. All programming languages are Turing Complete, but we don't like programming Turing Machines. Here's your expanded example in widgets: String(Prose('EOF'), 'basic_info_font'); Hello String(['Model.User', 'first_name']);!br If(['Model.AccountList', '-get_result_set_size'], Join([ You have these accounts:br, Table('Model.AccountList', [ 'name', 'balance', ]), ]), ); EOF The Table widget will print a table with headings defined by the Facade (our term for skin). The widgets for name and balance are looked up dynamically. balance will be right adjusted. Unless I missing something, the template example won't align properly in HTML. This is a significant semantic difference between FOREACH and Table. Would you expand on the example so that name and balance are columnar? Rob
RE: separating C from V in MVC
Jeff AA writes: space and that column 5 which contains a possibly long name should use the remaining available space, whilst column 1 which contains a name should not be wrapped? We call this a Grid widget in our framework (bOP). There are many options: http://petshop.bivio.biz/src?s=Bivio::UI::HTML::Widget::Grid and here's an example use: http://petshop.bivio.biz/src?s=View.menu Rob
Re: separating C from V in MVC
Barry Hoggard writes: Do you have a favorite approach for writing the Model objects? One solution is to create an interface for accessors, i.e. get, which the views call on objects they need to access. Our controller and model objects share this same accessor interface, which allows the views to access control and database values the same way. For example, vs_form_field('UserAccountForm.RealmOwner.name', {}, [['-get_request'], 'task_id', '-equals_by_name', 'USER_ACCOUNT_CREATE']) The first parameter to vs_form_field identifies the RealmOwner.name field of FormModel UserAccountForm. The second parameter contains optional attributes. The third param defines a conditional which says: only display this row if the view is being rendered in the USER_ACCOUNT_CREATE task. We use the same view in the user account register and edit tasks. The view doesn't allow you to change your User ID in edit mode. The business logic doesn't allow edits either, but you still have to control the visible state. You could do that with a model, but that's denormalization. Rather than copying state, we go directly to the source, the request object. The Request is not a model, but an ordinary Perl object, which implements the WidgetValueSource interface. Originally, we didn't have this clear separation of WidgetValueSource and Model. That change really helped us in the view code. There are other WidgetValueSource objects (formatters, icons, etc.) and the views access the data in the same way. Perhaps you can accomplish this with hash references, but I find that involves a lot of copying. Having a method call to an object allows the object to control the behavior, e.g. dynamically computing values. Not coupling it to a heavier Model interface gives you a lot of flexibility. For the most part, all the views want is the values. Rob P.S. Nice to meet you, Barry.
Re: separating C from V in MVC
Perrin Harkins writes: That's exactly what I'm saying, except that I don't see what your layout manager is for. You should just pass some data to a template (or maybe to something fancier for certain types of output like Excel) and then the template makes all the decisions about how to show that data. The layout manager is an important tool, which doesn't fit in with the template model. It comes from widget-based GUI toolkits like Java, Tk, etc. Layout managers accept a declaration ('this cell is northwest', 'this other one expands across the bottom', etc.). They interpret the decl at run-time. It's like HTML, but more declarative. Some attributes of our Grid manager are: cell_nowrap - don't wrap the text in the cell cell_align - gens valign and align from a single compass direction cell_expand - this cell eats up the rest of the columns in the row row_control - a conditional value to control whether row is displayed cell_width - contains the HTML width= value (may be dynamic) With Java's GridBag and other layout managers, you relate the cells in some way and the layout manager does the right thing. Since this particular layout manager is HTML, we relate the cells in a row-major matrix. Since it's Perl, it's compact. Here's a simple example: Grid({ string_font = 'page_text', pad = 5, values = [ [ String(Join([ 'Please confirm that the following data is correct ' .'and press the bContinue/b button to ship the ' .'order', ])), ], [ String('Billing Address', 'page_heading'), ], [ $address_widget('bill_to', 1), ], [ String('Shipping Address', 'page_heading'), ], [ $address_widget('ship_to', 2), ], [ ImageFormButton({ image = 'continue', field = 'ok_button', alt = 'Continue', }), ], ], }), Rob
Re: separating C from V in MVC
Perrin Harkins writes: The same template? How does the layout manager help with that? Does it modify the template? It would make more sense to me if this were a sort of abstraction to factor out common layout ideas from multiple templates. I think we're miscommunicating. I'm talking widgets, and you're talking templates. A layout manager is a bit of a red herring in mod_perl. I was simply trying to explain how they came to be and why they make sense. In GUIs, the layout manager is responsible for placement when the window is resized. In mod_perl, it plays a lesser role, because the browser does most of the work (thank goodness). Templates and widgets are pretty much the same thing (see discussion at end). It's how you use them that makes a difference. We have a String widget. You could just as well make a string template. It's not natural in template languages to wrap every piece of text in a string template, however. A String widget/template allows you to control the rendering of all fonts dynamically. If the String widget/template sees the incoming request is from IE5+, it doesn't render the font if the font is the same as the default font. The Style widget/template renders the default font in a style if the browser is IE5+. This avoids the stylesheet bugs in all other browsers and gives 90% of your users who are running IE5+ a much lighter weight page. It's cumbersome to do wrap all text in string templates, because the calling mechanism is verbose. Most template languages I've looked at only support named parameters. Widgets can have named parameters, e.g. String({ value = ['User.first_name'], string_font = 'my_first_name_font', }); but it is much more convenient to use positional notation: String(['User.first_name'], 'my_first_name_font'); The way I like to think of this is that HTML corresponds to machine language. Templates correspond to assembly language. Widgets correspond to structured programming. You can program everything in assembly language, it's just more cumbersome. This is why people invented macro assemblers, but there is still a significant difference between building a system in C or Macro-11. This is why a layout manager is a natural concept to me. It's a widget which does something with the results of other widgets. What's cool about HTML is that you can do this post draw, i.e., after the widget renders a child, it can look at the result to determine its next action. For example, the String widget can escape the output of its child. I haven't seen this in template languages and rarely in GUI toolkits. Rob
Re: schedule server possible?
But I will need a thread that processes the backend stuff, such as maintaining the database and message queue (more like a cron). Is this configuration possible? You can do this now. We rely on cron to kick off the job, but all the business logic is in Apache/mod_perl. The advantage of using cron is that it has rich support for scheduling. Rob
@DB::args not working on 5.6.1 and 1.26
It seems that DB::args is empty on mod_perl 1.26 and perl 5.6.1. This is stock Red Hat 7.2 (apache 1.3.22). The code which references DB::args works in perl 5.6.1. It also appears that the failure only occurs after the perl restarts. The first time Apache loads mod_perl, DB::args is being set correctly. I assume that DB::args isn't empty running under PERLDB, but I haven't tried this. The use of DB::args is not for debugging, so I can't use Apache::DB. Anybody else seeing this? Thanks, Rob
Re: [OT] Encrypting Embedded URLs
Nigel Hamilton writes: http://www.foo.com?params=aJHKJHKJHKJHHGHFTDTDGDFDFGDGHDHG879879 A built-in checksum would be a bonus ... any ideas? You can use any of the Crypt::CBC ciphers. We then use a modified MIME::Base64 encoding which is more compact than encrypt_hex and doesn't require a subsequent escaping for URI specials. See http://petshop.bivio.biz/src?s=Bivio::MIME::Base64 for the simple algorithm (the error checking hack on MIME::Base64::decode may no longer be necessary with newer versions of MIME::Base64). Rob
Re: Apache::File correction
undef $/; # enable slurp mode I think the local is pretty important, especially in mod_perl: local $/; This has the same effect (the undef is unnecessary). It's also a good idea to enclose the code in a subroutine with error checking: sub read_file { my($file) = _; open(FH, $file) || die(error opening $file: $!); local($/); my($content) = FH; close(FH) defined($content) || die(error reading $file: $!); return \$content; } Rob
Re: [OT-ish] Session refresh philosophy
Hans Juergen von Lengerke writes: Why not put everything in one field? Are there restrictions? Does it make a difference when using POST? That's what we do. There doesn't appear to be a restriction with POST. For while, we were encoding entire forms in URLs, but the limits got to us for really large forms. Rob
Re: [OT-ish] Session refresh philosophy
[EMAIL PROTECTED] writes: Looking at CGI::EncryptForm that Perrin mentioned, it appears that that module would address this concern by storing client-side in a single encrypted string that gets put in one hidden form variable. That also avoids having to verify more than once. It is always good to validate the data even if it was encrypted. It is also generally a good idea not to give the user any secrets, even if they are encrypted. In other words, avoid trusting the user. [EMAIL PROTECTED] writes: No, this just means that input must be validated once again when the last «really, really sure ?» button is depressed. Conceptually, this divides the pages of your site into two categories (not unlike the view vs. controller distinction in Model-View-Controller paradigm for GUIs): those that just interact with the user and do the navigation, and those that actually have side effects such as writing data into your database, sending e-mails, placing orders etc. It is MVC. However, instead of thinking of pages, I like to think in terms of tasks. The same task that renders the form also validates and executes it. In the case of execution, the result is a redirect described by the site's state machine. A form in our world has four states: execute_empty (fill in defaults), execute_ok, execute_other (e.g., cancel or sub form), and execute_unwind (coming back from a sub form). All of these paths go through the same task. Rob
Re: [OT] MVC and web design
___cliff rayman___ writes: please take this as interested and not critical. i was viewing the source: http://petshop.bivio.biz/src?s=View.items Criticism welcome. I hope you don't mind the rant below. and i noticed these lines: - snip a ])-put( cellpadding = 2, cellspacing = 2, ), - snip - this looks like the presentation layer peeking through. The view components are all presentation. I didn't mention that the framework is actually MVCF, were the F stands for Facade. The server that runs http://petshop.bivio.biz also run http://www.bivio.biz The pet shop facade is: http://petshop.bivio.biz/src?s=Bivio::PetShop::Facade::PetShop and the www facade is something different, and not visible from the petshop facade. A facade in bOP controls the entire look and feel. In the case you pointed out, it might be a good idea to put the cellspacing and cellpadding in the facade, too. It was just laziness. the petshop site is obviously a demo, and therefore does not have the polished look of a professional site, which is very understandable. what i wonder is, could a professional web design team make a polished website without involving the programmers? Well, I guess it depends on what you mean by WebSite and programmers. I think of the pet shop as an application, not a WebSite. The same argument would apply for GUI desktop applications. Are you a programmer if you use JBuilder or PowerBuilder? I think so. Are you a programmer if you build a WebSite with ColdFusion or PHP?Again, I think so. If you are a programmer, then you need to know how to program. I don't see anything hard about programming Perl in a constrained environment if you are a website designer/programmer. Structure is important in most WebSites, and all web-delivered applications imiho. If you just want to do layout, there are many tools which are much better than an HTML editor, e.g., Photoshop. Once the layout is complete, you give it to coders who encode it in whatever language is best for the application delivery mechanism. what happens when a cell padding of 3 is more desirable for the design? The designer modifies the source in CVS, tests it, and checks it in. it seems to me, that in all of the technologies i have looked at thus far, that attempt to separate the presentation layer from the model/view, the precision and flexibility needed to graphically communicate to the user is more difficult that the standard pagedesign approaches (dreamweaver and a little embperl or other embedded language thrown into the mix) . phrased another way, how does bivio or other mvc technology, let web artists design sites as beautiful as http://www.marthastewart.com or the even more beautiful http://www.genwax.com (cheap plug)? rant Ah, that is the question. The answer is beauty is in the eye of the user. I work with a lot of sites at the technical level, and I'm continually may amazed at the low quality of the sites from a user perspective. Let's take Martha Stewart (please ;-) and visit your account. For your info, the link is: http://www.marthastewart.com/page.jhtml;jsessionid=4HVBOQCWGUVEHWCKUUXCIIWYJKSS0JO0?type=page-catid=cat688 This is a good example of the business logic creeping in to the UI. What do I care if Martha programs in Java. What happens to her users' bookmarks if she switches to C#, or heaven forbid Perl? In bOP, you can have any link you want associated with a task on a per Facade basis. In fact you can have multiple links pointing to the same task. Look at the links in http://www.bivio.com/demo and see if they make sense to you. We have some pretty advanced users, who take our links and embed them in custom home pages in their files area (which is browsable unlike most groupware sites). We can maintain backward compatibility forever. Now when I come to the page on Martha's site which asks me to login it's very pretty and weighs in at 45KB without counting the rose (32KB). I can't login here, because there are no form fields. The rose is very pretty though (did I say that already?). Many of our users still connect to us with AOL at 26kbps with 60mhz/32MB boxes. They definitely appreciate the fact that most or pages are under 20KB. Only because our pages are programmed that way. In summary, I buy into the minimalist approach of Nielsen. Visit http://useit.com for more info. Usability is designed, and it takes a lot of time to design and test it. The actual coding part is minuscule in comparison. /rant Rob
Re: Session refresh philosophy
Milo Hyson writes: shopping-cart-style application (whereby someone selects/configures multiple items before they're ultimately committed to a database) how else would you do it? There has to be some semi-persistent (i.e. inter-request) data where selections are stored before they're confirmed. As I understand it, the session data is state which is committed to the database on each request (possibly). It would seem to me that instead of denomalizing the state into a separate session table, you should just store it in a normal table. If the data needs to be expired, then it can be time stamped when it is written. The point is that it's always simpler to use the existing tables directly rather than making a copy and storing it in the database somewhere else. This usually reduces the code by half or more, because you don't have to worry about making the copy in the first place. Simpler code is more reliable and usually runs faster. To me, sessions are negativist. My expectation is that users will end up clicking OK (making the purchase). If that is the case, you are much better off putting the data were belongs right from start. You may bind it to an ephemeral entity, such as a shopping cart, but when the order is complete the only thing you have to do is free the cart and replace it with an order. The items, amounts, and special considerations have already been stored. If most of your users are filling shopping baskets and walking away from them, it may be a problem with the software. Checkout http://www.useit.com for some ideas on how to improve the ratio. Often you can avoid any server side persistence by using hidden fields in the forms. We use this technique extensively, and we have encapsulated it so that it is easy to use. For example, you might have a sub form which asks the user to fill in an address. When the user clicks on the fill in address button, the server squirrels away the context of the current form in the hidden fields of the address form. When the user clicks OK on the address form, the fields are stuffed back into the original form including the new address. If you have a performance problem, solve it when you can measure it. Sessions can mitigate performance problems, but so can intelligent caching, which avoids statefulness in the client-server protocol. Rob P.S. For sample sessionless sites, visit http://www.bivio.com and http://petshop.bivio.biz (which runs on a 3 year old 300mhz box running Apache and Postgres).
Re: Session refresh philosophy
Perrin Harkins writes: Actually, even this stuff could be put into a normalized sessions table rather than serialized to a blob with Storable. It just means more work if you ever change what's stored in the session. This is a tough question. If you store it in a blob, you can't query it with an ad hoc SQL query. If you store it in a table, you have to deal with data evolution. On the whole, I vote for tables over blobs. My reasoning is that you have to deal with data evolution anyway. We have had about 200 schema changes in the last two years, and very few of them have had anything to do with user/visitor state. Rob
Re: Session refresh philosophy
Milo Hyson writes: 1) A fix-up handler is called to extract the session ID from a cookie. [snip] 1a) If for some reason no session was found (e.g. no cookie) a new one is [snip] 2) During content-generation, the application obtains the session reference [snip] 3) A clean-up handler is called to re-serialize the session and stick it back I may be asking the wrong question: is there a need for sessions? This seems like a lot of work when, for most applications, sessions are unnecessary. Rob
Re: Mistaken identity problem with cookie
small operations. I'm pretty convinced that the problem is on their end. My theory is that these proxies may have cached the cookie with an IP address which they provide their clients. Have you tried capturing all ethernet packets and seeing if the raw data supports this conclusion. Checkout: http://www.ethereal.com/ We have found that it is the bigger ISPs which have faulty caches. Usually it is a DNS problem, not an HTTP caching problem. Another trick is throwing a time stamp in every cookie. This is useful for other reasons, e.g. cookie expiration and validation. Cheers, Rob
extremeperl@yahoogroups.com
It seems there are a number of people interested in Extreme Programming in Perl, so there's yaml at: http://groups.yahoo.com/group/extremeperl/ Cheers, Rob
Re: UI Regression Testing
Hi Craig, Have you ever heard of the hw verification tool Specman Elite by Verisity (www.verisity.com)? No, but it looks interesting. It would be good to have something like this for unit tests. I haven't had very good experience with automated acceptance testing, however. The software should be robust against garbage in, but the main problem we have is making sure the numbers add up, and that we generate the correct tax forms! It's pretty tricky stuff. FWIW, we are very happy with our unit test structure. It has evolved over many years, and many different languages. I've appended a simple example, because it is quite different than most of the unit testing frameworks out there. It uses the XP philosophy of once and only once as well as test what is likely to break. Rob -- #!perl -w # $Id: Integer.t,v 1.7 2001/11/24 04:30:19 nagler Exp $ # use strict; use Bivio::Test; use Bivio::Type::Integer; use Bivio::TypeError; Bivio::Test-unit([ 'Bivio::Type::Integer' = [ get_min = -9, get_max = 9, get_precision = 9, get_width = 10, get_decimals = 0, can_be_zero = 1, can_be_positive = 1, can_be_negative = 1, from_literal = [ ['9'] = [9], ['+9'] = [9], ['-9'] = [-9], ['x'] = [undef, Bivio::TypeError-INTEGER], [undef] = [undef], [''] = [undef], [' '] = [undef], ['-99'] = [undef, Bivio::TypeError-NUMBER_RANGE], ['-09'] = [-9], ['+09'] = [9], ['-9'] = [-9], ['+9'] = [9], ['+10'] = [undef, Bivio::TypeError-NUMBER_RANGE], ['-10'] = [undef, Bivio::TypeError-NUMBER_RANGE], ], ], Bivio::Type::Integer-new(1,10) = [ get_min = 1, get_max = 10, get_precision = 2, get_width = 2, get_decimals = 0, can_be_zero = 0, can_be_positive = 1, can_be_negative = 0, from_literal = [ ['1'] = [1], ['+1'] = [1], ['0'] = [undef, Bivio::TypeError-NUMBER_RANGE], ['11'] = [undef, Bivio::TypeError-NUMBER_RANGE], ['-1'] = [undef, Bivio::TypeError-NUMBER_RANGE], [undef] = [undef], ['-09'] = [undef, Bivio::TypeError-NUMBER_RANGE], ['+09'] = [9], ], ], ]);
Re: UI Regression Testing
Perrin Harkins writes: But what about the actual data? In order to test my $product-name() method, I need to know what the product name is in the database. That's the hard part: writing the big test data script to run every time you want to run a test (and probably losing whatever data you had in that database at the time). There are several issues here. I have answers for some but not all. We don't do complex unit tests. We save those for the acceptance test suite. The unit tests do simple things. I've attached a basic unit test for our DBI abstraction layer. It runs on Oracle and Postgres. Acceptance tests take over an hour to run. We have a program which sets up some basic users and clubs. This is run once. It could be run before each test suite run, but we don't. We have tests which test creating users, entering subscription payments, twiddling files and email. By far the biggest piece is testing our accounting. As I said, we used student labor to write the tests. They aren't perfect, but they catch lots of errors that we miss. Have a look at: http://petshop.bivio.biz/src?s=Bivio::PetShop::Util This program populates the database for our petshop demo. It builds the entire schema, too. The test suite for the petshop will assume this data. The amount of data need not be large. This isn't the point of acceptance testing imo. What you want is enough data to exercise features such as paging, form submits, etc. Our production database is multi-GB. We do have a particularly nasty problem of our quote database. We update all of our quote databases nightly using the same software which talks to our quote provider. This tests the software in real-time on all systems. We run our acceptance test suite in the morning after all the nightly stuff is done. It takes hours to re-import our quote database. You need a test system distinct from your production and development systems. It should be as close in configuration to the production system as possible. It can be very cheap. Our test system consists of a refurb Compaq Presario and a Dell 1300 with 4 disks. We use hardware RAID on production and software RAID on test. Differences like these don't matter. The database source needs to be configurable. Disk is cheap. You can have multiple users (schemata) using the same database host. Our database abstraction allows us to specify the target database vendor, instance, user, and password. Our command line utility software allows us to switch instances easily, and the config module does, too. I often test against my development database at the same time as I compare the same results against the test database. I can do this, e.g. b-petshop -db test create_db All utilities have a '-db' argument. Alternatively, I can specify the user in long hand for the Connection test below: perl -w Connection.t --Bivio::Ext::DBI.database=test All config parameters can be specified this way, or in a dynamically selectable file. This has been by far the biggest obstacle for me in testing, and from Gunther's post it sounds like I'm not alone. If you have any ideas about how to make this less painful, I'd be eager to hear them. It isn't easy. We don't write a unit test per class. Indeed we're far from this. OTOH, we reuse heavily. For example, we don't need to test our product list: http://petshop.bivio.biz/src?s=Bivio::PetShop::Model::ProductList It contains no code, only declarations. All the SQL is generated by the object-relational mapping layer which handles paging, column sorting, and so on. The view is as simple: http://petshop.bivio.biz/src?s=View.products Neither of these modules is likely to break, so we feel confident about not writing unit tests for them. Rob -- #!/usr/bin/perl -w use strict; use Bivio::Test; use Bivio::SQL::Connection; my($_TABLE) = 't_connection_t'; Bivio::Test-unit([ Bivio::SQL::Connection-create = [ execute = [ # Drop the table first, we don't care about the result [drop table $_TABLE] = undef, ], commit = undef, { method = 'execute', result_ok = \_expect_statement, } = [ # We expect to get a statement back. [EOF] = [], create table $_TABLE ( f1 numeric(8), f2 numeric(8), unique(f1, f2) ) EOF [insert into $_TABLE (f1, f2) values (1, 1)] = [], ], commit = undef, execute = [ [insert into $_TABLE (f1, f2) values (1, 1)] = Bivio::DieCode-DB_CONSTRAINT, ], { method = 'execute', result_ok = \_expect_one_row, } = [ [update $_TABLE set f2 = 13 where f2 = 1] = [], ], execute_one_row = [ [select f2 from $_TABLE where f2 = 13] =
Re: UI Regression Testing
Gunther Birznieks writes: From the description of your scenario, it sounds like you have a long product life cycle etc. We release weekly. We release to test multiple times a day. We code freeze the test system over the weekend. We run all weekly jobs on test during the day on Sat, and then release to production Sat Night. The job testing change was introduced recently. On production, we have a large job which runs on Tues. It also ran on Tues on test. We changed something later in the week one release which broke the job, but it wasn't tested. Now, we get that extra assurance of having the weeklies run just before the release. Having an efficient release mechanism is critical. Also, we get paged when something goes wrong on production. With Perl, we can and do patch individual files midweek in critical emergencies. For example, our ec code broke soon after our site went to a pure subscription model. It was fun, because it broke from too many paying customers. $-) Needless to say, we patched the system asap! I think your testing, especially regression testing and the amount of effort you put into it makes a lot of sense because your software is a long-term investment possibly even a product. Yes, that's an important point. We run the accounting for over 8,000 investment clubs. We have a responsibility to make sure the software is reliable. We released our Petshop demo without any tests. :) To each his own I guess. Agreed. Rob
Re: UI Regression Testing
Have you considered talking about Testing at OSC this summer? Mischael Schwern's talk was a great success last summer. Thanks for the suggestion. I'll think about it, and see what I can do. Also writing things down as a doc explaining how things work, with some light examples, to add to our knowledge base would be really cool! Absolutely. If there other Extreme Perl programmers out there, send me a private email. Rob
Re: performance coding project? (was: Re: When to cache)
This project's idea is to give stright numbers for some definitely bad coding practices (e.g. map() in the void context), and things which vary a lot depending on the context, but are interesting to think about (e.g. the last example of caching the result of ref() or a method call) I think this would be handy. I spend a fair bit of time wondering/testing myself. Would be nice to have a repository of the tradeoffs. OTOH, I spend too much time mulling over unimportant performance optimizations. The foreach/map comparison is a good example of this. It only starts to matter (read milliseconds) at the +100KB and up range, I find. If a site is returning 100KB pages for typical responses, it has a problem at a completely different level than map vs foreach. Rob Pre-optimization is the root of all evil -- C.A.R. Hoare
Re: UI Regression Testing
Is anyone familiar with how to go about setting up a test suite for a web UI -- without spending an arm and a leg? (Remember, Bricolage is an OSS effort!). Yes, it's very easy. We did this using student labor, because it is an excellent project for students and it's probably cheaper. It's very important. We run our test suite nightly. I'm an extreme programming (XP) advocate. Testing is one of the most important practices in XP. I'm working on packaging what we did so it is fit for public consumption. Expect something in a month or so. It'll come with a rudimentary test suite for our demo petshop app. There are many web testers out there. To put it bluntly, they don't let you write maintainable test suites. The key to maintainability is being able to define your own domain specific language. Just like writing maintainable code, you have to encapsulate commonality and behavior. The scripts should be short and only contain the details pertinent to the particular test. Perl is ideal for this, because you can easily create domain specific languages. Rob
Re: UI Regression Testing
Have you tried webchat? You can find webchatpp on CPAN. Just had a look. It appears to be a rehash of chat (expect) for the web. Great stuff, which is really needed and demonstrates the power of Perl for test scripting. But... This is a bit hard to explain. There are two types of XP testing: unit and acceptance. Unit testing is pretty clear in Perl circles (ok, I have a thing or two to say about it, but not now :-). Acceptance testing (aka functional testing) is traditionally handled by a third party testing organization. The test group writes scripts. If they are testing GUIs, they click in scripts via a session recorder. They don't program anything. There's almost no reuse, and very little abstraction. XP flips testing on its head. It says that the programmers are responsible for testing, not some 3rd party org. The problem I have found is that instead of programming the test suite, XPers script it, using the same technology that a testing organization would use. With the advent of the web, this is a real shame. HTTP and HTML are middleware. You have full programmatic control to test your application. You can't control the web browser, so you still need to do some ad hoc how does it look testing, but this isn't the hard part. The acceptance test suite is testing the system from the user's point of view. In XP, the user is the customer, and the customer writes tests. In my opinion, this means the customer writes tests in a pair with a programmer. The programmer's job is to create a language which the user understands. Here's an example from our test suite: Accounting-setup_investment('AAPL'); The user knows what an investment is. She also knows that AAPL is a stock ticker. This statement sets up the environment (using LWP to the app) to execute tests such as entering dividends, buys, sells, etc. The test infrastructure must support the ability to create new language elements with the ability to build elements using the other elements. This requires modularization, and today this means classes and instances. There's also a need for state management, just like the request object in your web application. Part of the packaging process we're going through is making it even easier to create domain specific languages. You actually want to create lots of dialects, e.g. in our case this means investments, cash accounts, member accounts, and message boards. These dialects use building blocks such as logging in, creating a club, and so on. At the bottom you use LWP or webchat. However, the user doesn't care if the interface is HTTP or Windows. You're job as a test suite programmer is meeting her domain knowledge, and abstracting away details like webchat's CLICK and EXPECT OK. In the end, your test suite is a domain knowledge repository. It contains hundreds of concise scenarios comprised of statements, or facts, in knowledge base parlance. The execution of the test suite asserts all the facts are true about your application. The more concise the test language. The more easily the user-tester can verify that she has encoded her expertise correctly. Rob
Re: UI Regression Testing
Gunther Birznieks writes: the database to perform a test suite, this can get time consuming and entails a lot of infrastructural overhead. We haven't found this to be the case. All our database operations are programmed. We install the database software with an RPM, run a program to build the database, and program all schema upgrades. We've had 194 schema upgrades in about two years. unit testing being done on the basis of writing a test class for every class you write. Ugh! That means that any time you refactor you throw away the 2x the coding you did. By definition, refactoring doesn't change observable behavior. You validate refactorings with unit tests. See http://www.refactoring.com To some degree, there should be intelligent rules of thumb as to which interfaces tests should be written to because the extreme of writing tests for everything is quite bad. Again, we haven't seen this. Every time I don't have unit tests, I get nervous. How do I know if I broke something with my change? Finally, unit tests do not guarantee an understanding of the specs because the business people generally do not read test code. So all the time spent writing the test AND then writing the program AND ONLY THEN showing it to the users, then you discover it wasn't what the user actually wanted. So 2x the coding time has been invalidated when if the user was shown a prototype BEFORE the testing coding commenced, then the user could have confirmed or denied the basic logic. Unit tests aren't about specs. They are about APIs. Acceptance tests need to be written by the user or written so the user can understand them. You need both kinds of testing. See http://www.xprogramming.com/xpmag/Reliability.htm Rob
Re: When to cache
1) The old cache entry is overwritten with the new. 2) The old cache entry is expired, thus forcing a database hit (and subsequent cache load) on the next request. 3) Cache only stuff which doesn't expire (except on server restarts). We don't cache any mutable data, and there are no sessions. We let the database do the caching. We use Oracle, which has a pretty good cache. We do cache some stuff that doesn't change, e.g. default permissions, and we release weekly, which involves a server restart and a refresh of the cache. If you hit http://www.bivio.com , you'll get a page back in under 300ms. There are probably 10 database queries involved if you are logged in. This page is complex, but far from our most complex. For example, this page http://www.bivio.com/demo_club/accounting/investments sums up all the holdings of a portfolio from the individual transactions (buys, sells, splits, etc.). It also comes back in under 300ms. Sorry if this wasn't the answer you were looking for. :) Rob
Re: When to cache
Perrin Harkins writes: To fix this, we moved to not generating anything until it was requested. We would fetch the data the first time it was asked for, and then cache it for future requests. (I think this corresponds to your option 2.) Of course then you have to decide on a cache consistency approach for keeping that data fresh. We used a simple TTL approach because it was fast and easy to implement (good enough). I'd be curious to know the cache hit stats. BTW, this case seems to be an example of immutable data, which is definitely worth caching if performance dictates. However, for many of us caching is a necessity for decent performance. I agree with latter clause, but take issue with the former. Typical sites get a few hits a second at peak times. If a site isn't returning typical pages in under a second using mod_perl, it probably has some type of basic problem imo. A common problem is a missing database index. Another is too much memory allocation, e.g. passing around a large scalar instead of a reference or overuse of objects (classical Java problem). It isn't always the case that you can fix the problem, but caching doesn't fix it either. At least understand the performance problem(s) thoroughly before adding the cache. Here's a fun example of a design flaw. It is a performance test sent to another list. The author happened to work for one of our competitors. :-) That may well be the problem. Building giant strings using .= can be incredibly slow; Perl has to reallocate and copy the string for each append operation. Performance would likely improve in most situations if an array were used as a buffer, instead. Push new strings onto the array instead of appending them to a string. #!/usr/bin/perl -w ### Append.bench ### use Benchmark; sub R () { 50 } sub Q () { 100 } @array = ( x R) x Q; sub Append { my $str = ; map { $str .= $_ } @array; } sub Push { my @temp; map { push @temp, $_ } @array; my $str = join , @temp; } timethese($ARGV[0], { append = \Append, push = \Push }); Such a simple piece of code, yet the conclusion is incorrect. The problem is in the use of map instead of foreach for the performance test iterations. The result of Append is an array of whose length is Q and whose elements grow from R to R * Q. Change the map to a foreach and you'll see that push/join is much slower than .=. Return a string reference from Append. It saves a copy. If this is the page, you'll see a significant improvement in performance. Interestingly, this couldn't be the problem, because the hypothesis is incorrect. The incorrect test just validated something that was faulty to begin with. This brings up you can't talk about it unless you can measure it. Use a profiler on the actual code. Add performance stats in your code. For example, we encapsulate all DBI accesses and accumulate the time spent in DBI on any request. We also track the time we spend processing the entire request. Adding a cache is piling more code onto a solution. It sometimes is like adding lots of salt to bad cooking. You do it when you have to, but you end up paying for it later. Sorry if my post seems pedantic or obvious. I haven't seen this type of stuff discussed much in this particular context. Besides I'm a contrarian. ;-) Rob
Re: When to cache
When you dig into it, most sites have a lot of data that can be out of sync for some period. Agreed. We run an accounting application which just happens to be delivered via the web. This definitely colors (distorts?) my view. heavy SQL. Some people would say to denormalize the database at that point, but that's really just another form of caching. Absolutely. Denormalization is the root of all evil. ;-) No need to do that yourself. Just use DBIx::Profile to find the hairy queries. History. Also, another good trick is to make sure your select statements are as similar as possible. It is often better to bundle a couple of similar queries into a single one. The query compiler caches queries. Ironically, I am quoted in Philip Greenspun's book on web publishing saying just what you are saying: that databases should be fast enough without middle-tier caching. Sadly, sometimes they just aren't. Every system design decision often has an equally valid converse. The art is knowing when to buy and when to sell. And Greenspun's book is a great resource btw. Rob
RE: Forking another process in Apache?
Chris Hutchinson writes: Avoids much work in httpd, and allows user to hang up web connection and return later to continue viewing status. We used to do this, but found it more complex (more protocols and server types) than simply letting Apache/mod_perl handle the job. I guess this depends on the frequency of long requests, but in our case the mix is handle nicely with a single common server using http as the only protocol. The idea is that all the work is handled by the middle tier. This includes processing incoming mail messages, long running jobs, and credit card processing. There's a lot of common code between all these tasks, so memory is shared efficiently. One trick for long running jobs started by an http request is to reply to the user as normal and do the long part in a PerlCleanupHandler. This avoids a fork of a large process, which keeps the memory usage relatively constant. This simplifies resource allocation. Just another way to do it. Rob
Re: RFC: Exception::Handler
I'm afraid I don't get it - isn't it what the finally functionality in Error.pm (CPAN) does ? try { stuffThatMayThrow(); } finally { releaseResources(); }; One reason for exceptions is to separate error handling code from the normal control flow. This makes the normal control flow easier to read. If releaseResources() is to be called whenever an exception occurs, then it is advantageous to eliminate the extra syntax in the class's methods and just have releaseResources() called whenever an exception occurs and the object is on the stack. Our exception handling class searches down the stack looking for objects which implement handle_die(). It then calls $object-handle_die($die), where $die is the exception instance. This increases the cost and complexity of exception handling, while decreasing the cost and complexity of normal control flow. It also ensures that whenever the object is involved in an exception, handle_die() is called giving it an opportunity to examine the exception and clean up global state if necessary. This eliminates a lot of explicit try/catches. Well, destructors are of some help too in that issue. Not if the object is a class or if the object is still live, e.g. the request context. We don't do a lot of instance creation/destruction in our code. For example, our Task instances are created at start up. They are executed repeatedly. Tasks decide whether to commit/rollback on every execution, independent of the path through the Task class. I'm agree with the need for try/catch. That's often the best way to handle exceptions. There are cases where a global view is need, however. Like Aspects, it ensures that you don't forget or have to put in code where it is absolutely needed. Rob
Re: RFC: Exception::Handler
Matt Sergeant writes: I don't like this for the same reason I don't like $SIG{__DIE__} - it promotes action at a distance. In a 1000 line .pm file I *want* to have my exception catching mechanism next to my eval{} block. You need this flexibility, but Perl allows you to do more, for good reasons. One of the things I don't like about traditional try/catch handling is that it doesn't allow for class level programming. You need to allow any subroutine to try/catch exceptions (die). It's also nice to notify any object in the stack that there is an unhandled exception passing through its code. This eliminates a lot of explicit try/catches. This allows reuse without clutter. If you're familiar with Aspects, it's basically the same concept. Rob
Re: Tips tricks needed :)
By the way, is there a perl module to do calculations with money? We use Math::BigInt to do fixed point. We couldn't get the other math modules to work a few years back. Our wrapper (Bivio::Type::Number) normalizes the rounding and allows subclasses to specify precision, decimals, min, max, etc. It's not fast, but fast enough. :-) It's part of bOP, which is available under the Artistic license from http://www.bivio.biz/hm/download-bOP We don't do too much math in the database, i.e. with PL/SQL and such. One thing we have done which has really helped is to define the sign of all amounts/quantities so that we can use SQL's SUM() function. Our database is normalized, which speeds development and reduces bugs. Using SUM() keeps queries fast (100MS) even processing ~1K rows to produce a portfolio. Cheers, Rob
Re: Tips tricks needed :)
Perrin Harkins writes: Okay, wishful thinking. I don't use Class::Singleton, but I have written my own versions of Object::Registrar a few times to accomplish the same goal. Ditto. We use a registry mechanism, too. One thing I don't quite understand is the need to clear out a singleton. Why would a singleton need to hold transient state? Rob
RE: mod_perl vs. C for high performance Apache modules
I spoke to the technical lead at Yahoo who said mod_perl will not scale as well as c++ when you get to their level of traffic, but for a large ecommerce site mod_perl is fine. Scalability has less to do with language/execution environment than which database you are using. Path length is affected by language, but that's usually not the major factor in scalability. You want short path lengths to get more efficiency out of your machines. Rob
Re: form upload limit
There is no such a limit in Apache and probably most browsers. By default, LimitRequestBody is 0 (unlimited) in Apache. We limit incoming requests with this directive, so server resources aren't consumed by excessive. I think POST_MAX happens after the request is already read into memory. LimitXMLRequestBody has a default limit of 100. There are other LimitRequest* directives which limit various aspects of the header. Rob
Re: ASP.NET Linux equivalent?
Dave Hodgkinson writes: I did an auto-form generator-from-schema thing once. Too many exceptions and meta-data involved to actually make it really worthwhile. Check out the mapping for, e.g. http://petshop.bivio.biz/pub/products?p=FISH and click on Model.ProductList and View.products to see how we handle an automated mapping. We find it extremely convenient. Rob
Re: Persistent HTTP Connections ala Apache::DBI
Has anyone done such a thing before? No doubt. Can someone point me to docs or modules which could help doing this? Perhaps raw sockets might be a consideration. However, Apache is great middleware, so I tend to use it in cases like this. You might want to use a session-based approach between the db-Apache and the app-Apache. The db-Apache would cache the connections to the legacy DB returning sessions the app-Apache which would cache them as well. You'd get the performance of cached DB connections without having to ensure the HTTP connections remain alive across app-Apache queries. When a session times out on the db-Apache tier, just rollback (assuming your DB is transactional) and put it in the free pool for new sessions. Or is this whole idea maybe just plain stupid? I don't think so. I assume the DB connection cost is high (on the order of seconds) iwc you need some way to cache connections. Are there obvious caveats I haven't thought of? Garbage collection is an issue. How do you know when to timeout (rollback) queries on db-Apache? Are the queries atomic to app-Apache, i.e. within a single end-user HTTP request or do they span multiple end-user requests? (This latter a good idea, imo.) mfg, Rob
RE: Cookie authentication
If you happen to type in a URL, they can revive your session from the cookie. Pretty nifty trick. This would seem to be a security hole to me. URLs appear in the logs of the server as well as any proxy servers along the way. If the URL contains reusuable auth info, anybody accessing any of the logs could gain access to customer accounts. to prevent proxy caches from caching personalized pages and serving them to the wrong end-user. If you want to ensure privacy, use: $r-header_out('Cache-Control' = 'private'); If you want to turn off caching altogether, use: $r-header_out(Pragma = 'no-cache'); Rob
Re: [Maybe OT] Modular design - calling pages like a subroutine with a twist.
When PageA calls PageB, as soon as PageB finishes presenting the form it doesn't stop but drops out the bottom and returns immediately to PageA. In bOP http://www.bivio.net/hm/download-bOP we use FormContext to solve this problem. PageB requires context and bOP knows how to return to PageA through the saved context. We call this unwinding. You can nest the stack as deep as you like. The context is saved in the URL if PageA isn't a form, or in the called form's hidden fields, if it is. The entire form state is saved in the latter case. PageB and PageA are FormModels in bOP. If you visit our Pet Shop demo http://petshop.bivio.net, you'll form context this used in the LoginForm, OrderConfirmationForm, and ShippingAddressForm. Here's all the business logic in our ShippingAddressForm: sub execute_ok { my($self) = @_; # copy the current values into the OrderForm context $self-put_context_fields(%{$self-internal_get}); return; } sub internal_initialize { my($self) = @_; my($info) = { require_context = 1, version = 1, visible = [ 'Order.ship_to_first_name', 'Order.ship_to_last_name', 'EntityAddress_2.addr1', 'EntityAddress_2.addr2', 'EntityAddress_2.city', 'EntityAddress_2.state', 'EntityAddress_2.zip', 'EntityAddress_2.country', 'EntityPhone_2.phone', ], }; return $self-merge_initialize_info( $self-SUPER::internal_initialize, $info); } In this case, we get the shipping address from the user, execute_ok is called which stuffs the forms values into the calling form's context. The infrastructure automatically unwinds to the OrderForm with the newly filled in values. The OrderForm doesn't know about the ShippingAddressForm. Technically, the ShippingAddressForm doesn't know about the OrderForm. It only requires the calling form to have fields with the same name. The relationship between the pages (tasks in bOP) is not specified by the forms. That's handled by the control logic. If a task has a form, it can specify the next and cancel tasks. This way you can reuse the business logic quite easily. Tasks can control the use of context. FormModels specify whether they can accept it or not. Hope this helps. Rob
Re: [Maybe OT] Modular design - calling pages like a subroutine with a twist.
In my opinion, trying to abstract that stuff away in a web application causes to more problems than it solves, especially where back buttons and bookmarks are concerned. We haven't found this to be the case. Our servers are sessionless, so bookmarks work fine. Back buttons aren't any more or less of a problem. I actually haven't heard of any problems with our sub-forms and back buttons. People do bookmark URLs with form context, but that's a good thing. It usually is the login page and they login and it automatically restores the page which they thought they bookmarked (which redirected to login in the first place). I think it's easier to take a state machine approach, the way CGI::MxScreen or Apache::PageKit do. I don't think this works. The state machine can manage states going forward, but not backward. Consider the problem of a Symbol Lookup on our site (www.bivio.com). We come into it from just about any accounting page having to do with a stock transaction. It's a single task, which looks up the ticker and fills it in in the Calling form. You need to stack the state or you have to introduce N new states (for entry from forms A, B, C, D, ...). It did take about two years to come up with a decent implementation of FormContext. It's a non-trivial problem, but it can be generalized and it solves the problem we had. Rob
Re: [Maybe OT] Modular design - calling pages like a subroutine with a twist.
Perrin Harkins writes: breaks caused by the request model of HTTP, and that's what I was commenting on. You're talking about a way to preserve data across multiple page requests. FormContext maintains an HTTP call stack, which holds the parameters (form, query, path_info) and return address (calling Task). Tasks are work units (server subroutines). URIs are UI elements, which is why we don't store them in the FormContext. If I understand your FormContext approach correctly, you are storing the state of the current application in URLs or hidden fields. This is what we used at eToys as well, and I think it's a pretty common solution. FormContext is a formal stack architecture. The callee can reach into the stack to get or to modify caller's form data as in the ShippingAddressForm case. It also handles the case of a call from a non-form Task, e.g. if you bookmark your private home page on a site, the LoginForm requires context so it knows where to return to after successful authentication. The Login task needs no knowledge of who called it; it just returns to the Task specified in its FormContext. If there is no FormContext, it returns to its next task specified by the state machine. The reason I brought up sessions is that the above mechanism wouldn't work if there were sessions. Sessions might time out or go away for bookmarked pages. FormContext survives server restarts and renaming of the calling page's URI. Rob
Re: http or https in URL?
But how do I get the protocol, http or https. You can check the port on $c-local_addr. 443 is https. Rob
Re: Neo-Classical Transaction Processing
Perrin Harkins writes: The trouble here should be obvious: sooner or later it becomes hard to scale the database. You can cache the read-only data, but the read/write data isn't so simple. Good point. Fortunately, the problem isn't new. Theoretically, the big players like Oracle and DB2 offer clustering solutions to deal with this, but they don't seem to get used very often. Oracle was built on an SMP assumption. They added clustering later. It doesn't scale well, which is probably why you haven't heard of people using their parallel server solutions. I don't know much about DB2, but I'm pretty sure it assumes shared memory. Tandem's Non-Stop SQL is a shared nothing architecture. It scales well, but isn't cheap to walk in the door. Other sites find ways to divide their traffic up (users 1 - n go to this database, n - m go to that one, etc.) Partitioning is a great way to get scalability, if you can do it. However, you can usually scale up enough just by getting a bigger box to run your database on until you reach the reach the realm of Yahoo and Amazon, so this doesn't become an issue for most sites. I agree. This is why I think Apache/mod_perl is a great solution for the majority of web apps. The scaling issues supposedly being solved by J2EE don't exist. On another note, one of the ways to make sure your database scales better is to keep the database as simple as possible. I've seen a lot of solutions which rely on stored procedures to get performance. All this does is make the database slower and more of a bottleneck. But how can you actually make a shared nothing system for a commerce web site? They may not be sharing local memory, but you'll need read/write access to the same data, which means shared locking and waiting somewhere along the line. I meant shared nothing in the sense of multiprocessor architectures. SMP (symmetric multiprocessing) relies on shared memory. This is the J2EE/E10K model. shared nothing is the Neo Classical model. Really these are NUMAs (non-uniform memory architecture), because most servers are SMPs. Here's a classic from Stonebraker on the subject: http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf DeWitt has a lot of papers on parallelism and distributed db design: http://www.cs.wisc.edu/~dewitt/ Cheers, Rob
Neo-Classical Transaction Processing (was Re: Excellent article...)
Joe Schaefer writes: experience, the only way to build large scale systems is with stateless, single-threaded servers. ^^ Could you say some more about what you mean by this? Do you mean something like use a functional language (like Haskell or Scheme), rather than an imperative language (like C, Java, Perl ...), Not exactly, but this is an interesting topic. or are you talking more about the application's platform and design (e.g. http://www.kegel.com/c10k.html )? This article addresses path length, which is the single-threaded part. Scalability is not addressed. Both parts are important to understand when you build enterprise systems. I changed the subject to Neo-Classical Transaction Processing which is the way I look at web applications. If you'll bear with me, I can explain the Neo and Classical parts with a picture. Here's a classical transaction processing system: T __ e ++ +--+ / \ r -||--| |---\__/ m | Transaction | | +--+| | i -|Monitor |-| || | n || | | +--+ | DB | a -||| |-| | l || | | | Custom | | | s ++ +--| | Servers| | | | | | | | +--| | | | | | \__/ +--+ Now here's a a typical (large) Apache/mod_perl setup (Neo-Classical): __ B ++ +--+ / \ r -||--| |---\__/ o |Apache | | +--+| | w -| mod_proxy|-| || | s || | | +--+ | DB | e -||| |-| | r || | | | Apache | | | s ++ +--| | mod_perl | | | | | | | | +--| | | | | | \__/ +--+ The browsers are connected to a fast IP router(s), equivalent to yesteryears I/O processor. The mod_proxy servers are simple switches, just like the TM. (Unlike the TM, front-ends don't manage the transactions.) It's usually a given that the front-ends are stateless. Their job is dynamically routing for load-balancing and reliability. They also serve icons and other stateless files. If a front-end crashes, the IP router ignores it and goes to another front-end. No harm done. The IP router also balances the load, something that isn't provided by classical I/O processors. The mod_perl servers are the work horses, just like the custom servers. In a classical OLTP system, the customer servers are stateless, that is, if a server goes down, the TM/mod_proxy server routes around it. (The TM rollsback any transactions and restarts the entire request, which is interesting but irrelevant for this discussion.) If the work servers are fully loaded, you simply add more hardware. If all the servers are stateless, the system scales linearly, i.e. the number of servers is directly proportional to the number of users that can be served. That's the stateless part. Threading is the other issue. Should the servers (mod_proxy or mod_perl) be threaded. In classical OLTPs, the work servers are single threaded (as in one request at a time) and the TM handles multiple simultaneous requests, but isn't multi-threaded (in the Java sense). The work server can be thought of as a resource unit. Usually it represents a fair bit of code and takes up a chunk of memory. It can only process so many requests per unit time. If the work server is multi-threaded, it is harder to manage resources and configure for peak load. In the single threaded model, each work server (process) is a reservation for the resources it needs for one request. In a multi-threaded model, the resource reservations are less clear. It might have a shared database connection pool or it might have two simultaneous requests which need more memory. The meaning of capacity becomes fuzzy. If the whole multi-threaded server ever has to wait on a single shared
Re: Excellent article on Apache/mod_perl at eToys
is easier and more standardized, and well documented. But I feel like coding front-end web applications is much easier in Perl where the workflow bits change all the time. For this, I like using SOAP on the backend Java server and SOAP on the front-end Perl. I don't quite understand the difference between worflow in the front-end and workflow in the back-end. They both change. The danger of making one part of the system easier to change is that people tend to cheat. They won't put the business logic in the back-end if it takes twice as long. To me, the major issue in Perl vs Java is dynamic vs static typing. Building large scale systems in Perl is much like building them in Smalltalk or Lisp. It takes a certain mindset. The lack of compiled interfaces means you need much more discipline (e.g. unit testing). The payoff is big with Perl, because you can refactor more easily and quickly than in Java. The libraries aren't much an issue. A good example is SOAP. SOAP is middleware. It is standardized, documented, and the rest of it. You like it for connecting Perl to Java, but why can't it be the other way around? If it can be the other way around, why aren't Perl and Java equally adapted to building middleware applications? Rob
Re: Excellent article on Apache/mod_perl at eToys
Gunther wrote: If you do not have a strongly typed system, then when you break apart and rebuild another part of the system, Perl may very well not complain when a subtle bug comes up because of the fact that it is not strongly typed. Whereas Java will complain quite often and usually early with compile time checking. I don't think there's an objective view about this. I also think the it compiles, so it works attitude is dangerous. You don't know it works until your unit and acceptance tests pass. I've been in too many shops where the nightly build was the extent of the quality assurance program. Compile time checking can definitely be a friend of yours especially when dealing with large systems. But it's also a friend that's judgemental (strongly typed) so he's a pain to drag along to a party To me, strongly vs weakly typed is less descriptive than statically vs dynamically typed. For example, Java is missing undef. It has NULL for pointers, but not undef for ints, chars, booleans, etc. Large systems often have unexpected initialization order problems which are not handled well by Java due to this missing feature. Java's support for multi-threading makes writing servers feel fairly trivial with no jumping through IPC::Shared memory stuff hoops to get shared memory caches and the like.. you just synchronize on global data structures. It's important to define the problem space for this discussion. I think Perl is really good for information systems, be they enterprise or not. I probably wouldn't program a real-time system in Perl. I might program it in Java. Here's a strong statement: Threads have no place in information systems. The NYSE is run on Tandem boxes. Tandem's OS does not have threads. The NYSE can process over a billion stock transactions a day. The EJB spec says you can't fire off threads in a bean. I think there's a reason for the way these systems have been architected. Threads are a false economy for systems which have to scale. As some people have joked, Java is Sun's way of moving E10K servers. SMP doesn't scale. As soon as you outgrow your box, you are hosed. A shared memory cache doesn't work well over the wire. In my experience, the only way to build large scale systems is with stateless, single-threaded servers. Rob
Re: Selectively writing to the access log
I only see methods for writing to the error log. I don't think you can change the access log format, but you can modify the values. For example, you can set $c-user and $c-remote_ip. Rob
Re: Selectively writing to the access log
Usage: Apache::the_request(r) This means the sub Apache::the_request takes a single parameter, i.e. you can't modify the_request. You can modify the method and uri. You can't modify the protocol (HTTP/1.0). If you change method or uri, it doesn't change the_request. You can change your LogFormat to get these values--see http://httpd.apache.org/docs/mod/mod_log_config.html Rob
Re: apache::dbi vs mysql relay
What I don't understand is why they separate the listener and database connection daemons if you always need one of each to do anything. Probably for scalability. The database engines are doing the work and the sooner they can free themselves up (due to a slow client, for example), the better. Rob
Re: Mod_perl component based architecture
As for the remaining of the question, I've been wondering for myself if there is a MVC (model-view-controller) framework for WWW publishing in Perl ? I gather there exist quite a few for Java, but I couldn't find anything significant under Perl. Check out http://www.bivio.net/hm/why-bOP and http://petshop.bivio.net The former motivates the MVC architecture. The latter URL is a demo of Sun's J2EE blueprint demo of a Pet Store implemented using bOP, a perl application framework. It's freeware and we use it to run a large commercial website. When you visit petshop.bivio.net, at the bottom of the page, you'll see Control Logic for This Page. This is what the bOP agent (controller) uses to determine if the incoming user can access the page, what the page actually does (models and views), and any state transitions (form next or cancel). The links at the bottom of the page go to the source of this application. bOP also allows you to change the look-and-feel quite easily. Compare these two pages: http://www.bivio.com/club_cafe/mail-msg?t=1934163 http://ic.bivio.com/club_cafe/mail-msg?t=1934163 They render the same content, but in two entirely different contexts. Each look-and-feel is described in a single file, which contains color, font, URL, text, and view mappings. bOP is about 250 classes including the Pet Shop demo. It uses Oracle or Postgres, but it should be easy to port to other databases. You can also build a static site, e.g. http://www.bivio.net which doesn't require a database. SOAPBOX The J2EE architecture implements MV, not MVC imiho. Here's one of my favorite quotes from Sun's site: It is important to understand that Model, View, and Controller are usually not represented by individual classes; instead, they are conceptual subdivisions of the application. This is true for J2EE, but not true for MVC frameworks. J2EE's control flow is not a distinct element. JSPs are usually full of business logic. The whole MVC concept passed J2EE by. Even when you look at Model 2 Methodology (promoted by Apache Jakarta Turbine), the code is a mess. Here's a snippet from the reference article on Model 2: public void doPost (HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { HttpSession session = req.getSession(false); if (session == null) { res.sendRedirect(http://localhost:8080/error.html;); } Vector buylist= (Vector)session.getValue(shopping.shoppingcart); [...] if (!action.equals(CHECKOUT)) { if (action.equals(DELETE)) { [...] String url=/jsp/shopping/EShop.jsp; [...] String url=/jsp/shopping/Checkout.jsp; The excerpt is from a single method in which they mix sessions, port numbers, hosts, error pages, URLs, button values, etc. The JSP is no better and contains lines like: optionYuan | The Guo Brothers | China | $14.95/option bQuantity: /binput type=text name=qty SIZE=3 value=1 input type=submit value=Delete input type=hidden name=action value=DELETE Note that DELETE in the JSP must be the same as DELETE in the Java. Nothing is checking that. You only know that the code doesn't work when someone hits the page. In this particular example, if you mispell DELETE in either place, the code does something, and doesn't issue an error. So much for Model 2. I wonder what Model 3 will be like. ;-) Sorry, had to get that off my chest... /SOAPBOX Cheers, Rob