No problem.

BTW-It has nothing to do with the IO:: modules - I was just using that as an example illustrating the degrees of seriousness when considering namespace requests. As in, if I *were* creating an IO:: module, namespace usage would be more closely scrutinized that my Parallel::Simple module likely will be. That is, I don't expect to have to defend my module's name this time, but while we're on the topic of best practices, I thought I'd ask for future reference. This is actually very relevant to me, since I'm also working on another module right now for which I would like a new top level namespace, but I won't get in to that now.

Here's the POD for my new Parallel::Simple module:

NAME
      Parallel::Simple - the simplest way to run code blocks in parallel

SYNOPSIS
       use Parallel::Simple qw( prun );

       # Style 1 - Simple List of Code Blocks
       prun(
           sub { print "$$ foo\n" },
           sub { print "$$ bar\n" },
       ) or die( Parallel::errplus );

       # Style 1 with options
       prun(
           { use_return => 1 },
           sub { print "$$ foo\n" },
           sub { print "$$ bar\n" },
       ) or die( Parallel::errplus );

       # Style 2 - Named Code Blocks (like the Benchmark module)
       prun(
           foo => sub { print "$$ foo\n" },
           bar => sub { print "$$ bar\n" },
       ) or die( Parallel::errplus );

DESCRIPTION
I generally write my scripts in a linear fashion. Do A, then B, then
C. However, I often have parts that don't depend on each other, and
therefore don't have to run in any particular order, or even linearly.
I could save time by running them in parallel - but I'm too lazy to
deal with forking, and reaping zombie processes, and other nastiness.


The goal of this module is to make it so mind-numbingly simple to run
blocks of code in parallel that there is no longer any excuse not to do
it, and in the process, drastically cut down the runtimes of many of
our applications, especially when running on multi-processor servers
(which are pretty darn common these days).


Parallel code execution is now as simple as calling prun and passing it
a list of code blocks to run, followed by testing the return value for
truth using the common "or die" perl idiom.


EXPORTS
      By default, Parallel::Simple does not export any symbols.  You gener-
      ally want to export the prun method to save time:

          use Parallel::Simple qw(prun);
          prun( ... );

      But you don't have to:

          use Parallel::Simple;
          Parallel::Simple::run( ... );

The run method is a synonym for prun, and exists for people who don't
want to export any symbols, because Parallel::Simple::run() looks nicer
than Parallel::Simple::prun().


METHODS
      All of the following may be called as class methods:

          Parallel::Simple->run()

      Or as normal subroutines:

          Parallel::Simple::run()

      run Synonym for prun.

prun
Runs multiple code blocks in parallel by forking a process for each
one and returns when all processes have exited.


          Style 1 - Simple List of Code Blocks

In its simplest form (which is what we're all about here), prun
takes a list of code blocks and then forks a process to run
each block. It returns true if all processes exited with exit
value 0, false otherwise. Example:


                  prun(
                      sub { print "$$ foo\n" },
                      sub { print "$$ bar\n" },
                  ) or die( Parallel::errplus );

By default, the exit value will be 255 if the code block dies
or throws any exceptions, or 0 if it doesn't. You can exercise
more control over this by using the use_return option (docu-
mented below) and returning values from your code block.


If prun returns false and you want to see what went wrong, try
the err, errplus, and rv methods documented below - especially
rv which will tell you the exit values of the processes that
ran the code blocks.


          Style 2 - Named Code Blocks

Alternatively, you can specify names for all of your code
blocks by using the common "named params" perl idiom. The only
benefit you get from this currently is an improved lookup
method for code block return values (see rv for more details).
Example:


                  prun(
                      foo => sub { print "$$ foo\n" },
                      bar => sub { print "$$ bar\n" },
                  ) or die( Parallel::errplus );

Other than looking nicer, this behaves identically to the Style
1 example.


          Options

You can optionally pass a reference to a hash containing addi-
tional options as the first argument. Example:


                  prun(
                      { use_return => 1 },
                      sub { print "$$ foo\n" },
                      sub { print "$$ bar\n" },
                  ) or die( Parallel::errplus );

              There is currently only one option:

use_return
By default, the return values for the code blocks, which
are retrieved using the rv method, will be 0 if the code
block executed normally or 255 if the code block died or
threw any exceptions. By default, any value the code block
returns is ignored.


If you use the use_return option, then the return value of
the code block is used as the return value (unless the code
block dies or throws an exception, in which case the return
value will still be 255). This value is passed to the exit
function, so please please please use only number between 0
and 255!


      err Returns a string describing the last error that occured, or undef
          if there has not yet been any errors.

          Currently, only two error messages are possible:

          *   if the call to fork fails, err returns the contents of $!

          *   if any blocks fail, err returns a message describing how many
              blocks failed out of the total

rv Returns different value types depending on whether or not you used
named code blocks:


Style 1 (not using named code blocks)
returns a reference to an array containing the return values of
the code blocks in the order they were passed to run


Style 2 (using named code blocks)
returns a reference to a hash, where keys are the code block
names, and values are the return values of the respective code
block


          See the use_return option for the run method for more details on
          how to control return values.

      errplus
          Returns a string containing the return value of err plus a nicely
          formatted version of the return value of rv.

PLATFORM SUPPORT
This module was developed and tested on Red Hat Linx 9.0, kernel
2.6.11, and perl v5.8.4 built for i686-linux-thread-multi. I have not
tested it anywhere else. This module is obviously limited to platforms
that have a working fork implementation.


      I would appreciate any feedback from people using this module in dif-
      ferent environments, whether it worked or not, so I can note it here.

FUTURE
The world could probably use a thread-based version of the prun method.
I'm currently accepting applications from volunteer coders. :)


SEE ALSO
Parallel::ForkControl, Parallel::ForkManager, and Parallel::Jobs are
all similarly themed, and offer different interfaces and advanced fea-
tures. I suggest you skim the docs on all three (in addition to mine)
before choosing the right one for you. Or you can foolishly trust the
executive summaries below:


      Parallel::ForkControl

Only takes one subroutine reference to run, but provides wonderful
ways to control how many children are forked and keeping activity
below certain thresholds. Arguments to the run() method will be
passed on to the subroutine you specified during construction, so
there's some run-time flexibility. It is not yet possible to learn
anything about what happened to the forked children, such as
inspecting return or exit values.


Conclusion: Best for repetitive, looped tasks, such as fetching web
pages or running a command across a cluster of machines in paral-
lel.


          Incidentally, Parallel::ForkControl would be far more useful with
          the following two changes:

          o   Support some kind of feedback - return/exit values at a mini-
              mum, or even a single value summary, like the return value of
              my prun method.

          o   Allow the user to specify the Code value in the run method
              instead of during construction.  Then Parallel::ForkControl
              could do everything this module does and more, albeit with a
              more sophisticated interface.

      Parallel::ForkManager

Unique in the Parallel::* world in that it keeps the user somewhat
involved in the forking process. Rather than taking a code refer-
ence, you call the start() method to fork and test the return value
to determine whether you are now the parent or child... almost like
just calling fork yourself. :)


Provides control over how many child processes to allow, and blocks
new forks until some previous children have exited. Let's child
determine the process exit value. Provides a trigger mechanism to
run callbacks when certain events happen (child start, child exit,
and start blocking). You must supply a callback for the child exit
event to inspect the exit value of the child.


Conclusion: While also designed for repetitive, looped tasks, it is
far more flexible, being a thin wrapper around fork rather than
taking over child creation and management entirely. Useful mostly
if you want to limit child processes to a certain number at a time
and/or if the native system calls scare you.


      Parallel::Jobs

Different in that it executes shell commands as opposed to subrou-
tines or code blocks. Provides all the features of the open3 func-
tion, including explicit control over STDIN, STDOUT, and STDERR on
all 'jobs'. Lets you monitor jobs for output and exit events (with
associated details for each event).


Conclusion: Great for shell commands! Not great for not shell com-
mands.


AUTHORS
      Written by Ofer Nave <[EMAIL PROTECTED]>.  Sponsered by Shopzilla,
      Inc. (formerly BizRate.com).

COPYRIGHT
      Copyright 2005 by Shopzilla, Inc.

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.


      See http://www.perl.com/perl/misc/Artistic.html



-ofer

Lincoln A. Baxter wrote:

Hello Ofer,

Motivate us!

Tell the list why we should look at it.  What does it do? How does it
solve a problem that is not already solved, or solves it better?

I get the sense from the brief comment you made about IO:: that it has
to do with some mechanism for implementing parallel IO?


Pasting the POD is often a good way to do this.

Lincoln




On Sun, 2005-02-27 at 16:28 -0800, Ofer Nave wrote:


Hello everyone.

I just subscribed to this list, I just recently received my PAUSE account, and I just finished writing/documenting/testing the first perl module that I've written for CPAN. I'd like to know what is considered a good set of practices for new modules with regards to:

1) naming
2) requesting feedback on design/implementation
3) announcing the module
4) informing authors of similar modules

In this case, the I've already named the module Parallel::Simple, but I imagine things would be trickier if I want to call it IO:: something or create a new top-level namespace.

Here's the module, if anyone is interested:

   http://ofernave.com/pm/

I've already sent copies to my local perl mongers group and posted it on comp.lang.perl.modules, and I've gotten a little bit of feedback on it. I'm comfortable with the design as it stands right now, and only need write some formal tests before it will be ready for its first upload.

-ofer









Reply via email to