Before joining in the previous CPAN threads, here are some personal wish
lists regarding what the perl6 version of CPAN should do. But in order
to get some distance from CPAN, I want to call it the Module Library
system. Some of the debate threads impact on the internal software
environment, and here I think the internal and external environments
must be designed in harmony.
A library paradigm is different from an archive paradigm, although a
library is a form of archive. However, a library is a single place to go
to obtain information. It is set up to enhance searching and finding. An
archive is a place where things are stored; the users of an archive know
what and where they are storing. Hence the difference between an archive
and a library is a set of assumptions about what a user knows. Archives
are essential for large libraries (often called stacks), but there must
be standardised metadata about what is in each archive for it to be
useful for a library.
So my wish list. I would like:
1) a single place to go for modules that will help me solve programming
problems;
2) a variety of ways to look through modules and the information on them
- reviews, documentation, dependencies - so that I can choose between
them and know what a choice entails;
3) a variety of classifications of modules and module contents that can
lead me to modules which may achieve what I want but are named in a
manner I did not expect, or contain functionality I could use in another
context;
4) a means to extract from the library the module(s) I want, (better
still only the parts of the module) and all their dependencies;
5) the ability to access existing CPAN perl software.
Extending the library paradigm, there are generalist libraries, which
only contain commonly requested books, and specialist libraries and also
libraries in other languages. However, good libraries share systems for
requesting books from other libraries, even if they are not stored
locally. The difference between libraries then boils down to the
metadata available to the user, eg., more accessible user catalogs,
librarians who know about subjects and can point to related books, etc.
The library paradigm breaks down somewhat for software because there is
an intimate link between "my" environment and the software I choose.
Hence, it is important to consider the internal environment ("my"
computer) as well as the external environment (the module libraries).
Not only is there the problem of different operating systems (*ix,
windows*, mac, etc), there is also the possibility of outdated software
installed, eg., different versions of languages, such as perl 5.8, perl
5.10.
Within "my" environment, there are multiple possible locations for
different aspects of software, eg. binaries, configuration data (which
can be global to all users and local for a single user), documentation,
data, results. In a corporate environment, these locations may exist
across a network. For example, documentation may be on a server,
binaries and local configuration data on the local computer, results
posted to a web site, input data taken from a database.
My wish list for the internal environment; I would like:
1) a single system to manage all my software modules;
2) a method to see where different aspects of software are located, and
to change them;
3) an ability to manage in parallel different dependency hierarchies
(such as might result from a need to use modules from different
"generations" (eg. perl 5.8 and 5.10).
4) a comprehensive method to determine what my local environment is, so
that I can see whether it "matches" the environment required by software
that I want to download, and if not, what I need in addition to be able
to run the software.
Some commentary on the above:
I use the CPAN archive, CPANPLUS, and synaptic (a GUI frontend to the
apt tools) for perl since I use Ubuntu/Gnome. In some cases I have to
load binaries using synaptic because I cant get the sources to compile
properly from CPAN. I do not have the time or inclination to discover
why. The result is a mishmash of dependencies. Moreover, I cannot get
updated software without breaking something. This occurred when the one
Ubuntu generation was distributed with perl5.8 even though perl5.10 was
out, and I really wanted to use some of the new software.
The result was that I had both 5.8 and 5.10 modules. Although most
things worked, perldoc wasnt recognised by some Configure scripts. I am
not interested now in discovering the reason. I use this simply as an
example about how multiple distribution channels that make different
assumptions can lead to problems. But because there is no single method
for monitoring the perl environment, it is difficult to solve some of
the problems.
The current setup works extremely well, except when it doesnt. My
difficulty is finding out how to solve the problems that occur. So I
think there should be a system that normally works "out-of-the-box" with
no user interference, but tools that can be used to identify and change
a configuration when needed.
The source vs binary debate:
I would suggest that most users would prefer to have precompiled
binaries that work out of the box for common environments. The problems
of matching the software to an environment is a bit like getting a spare
part for a car, or a replacement for something broken at home. The
possible infinity of choices given multiple manufacturers and systems is
solved by having standards and selling parts that will match standards,
eg. tyres or bolts with standard diameters.
Once a module has been decided on, you look to see if there is a binary
that matches your internal environment. If not, you have to roll your
own from source.
It would be a function of the library administration to set rules as to
the standards that a binary needs to meet to be filed in some category.
Photographs vs software:
The more flexible the design, the longer lasting. Libraries can contain
books on religion and pornographic magazines. The design of a library
search search or storage system does not define what can be put in it.
There is a place for strong views about what should be in a library (eg.
no pictures, only pure perl6). In terms of library design, the question
should be "how can the user discover, in general, what a library
contains". If you dont like the contents of a library, use another.
I believe it would useful to have multiple libraries, each of which has
a "mission statement" that would define what sort of information it
contains. Hence, libraries dedicated to windows only binaries, perl6
modules, perl6/Ruby/Python interacting software.
If the system proves to be flexible, then the same library design and
management software could be applied to photographs, food recipes, etc.
Just like the worldwide web was designed for scientific papers with
embedded links, the design is so flexible it has changed the structure
of our informational world.
Names and implicit assumptions:
There was a thread about CPAN6 6PAN Pause6, etc. It seems to me that the
conflicting ideas underlying these names concern the user paradigm, that
is how does someone use the entire system.
I believe that we might get more traction if we start by looking first
at the user paradigm, then different ways to provide the functionality.
I know that many of my wishes are fulfilled by the more detailed proposals.
However, the proposals seem to assume what the whole system should do.
The idea behind this post is to make the user paradigm clearer, so that
the strengths and weaknesses of different approaches can be compared.
One way of breaking out parts of the whole system might be:
a) Internal module management
(for example)
- provides to a perl program the physical locations for settings in %*ENV
eg. 'use MyModule::Submodule;'
might normally lead to (in unix)
'/usr/local/share/lib/perl6/site/MyModule/Submodule.pm'
but it could also be
'/home/project_10/version_15/assistant_junior_hacker/Submodule.test'
or 'http://www.project_10.design.library/MyModule/Submodule.pm'
or 'select "MyModule_SubModule" from "standard_modules" where
language="perl6" '
- allows the user to set an arbitrary environment variable, eg.
%*ENV<April_dataset> so that inside the program it is possible to do
my $data = slurp(%*ENV<April_Dataset>);
without worrying where the binary is running, eg., in PADRE, or called
from the directory where the file is present. Also, the program could be
agnostic about whether it is being run on Windows and the path
definition is 'MyDocuments\UserData\April_Dataset.csv' or unix
'/home/hacker/perl6/April-09.DataSet.current'
- defines how software should be distributed in the system, eg. binaries
in /usr/local/lib/perl6/site, documentation in
/usr/local/doc/perl6/site, config files in /usr/local/etc/perl6/site,
source files in /usr/local/src/perl6/site
- handles downloads from archives
- knows how to build binaries from src and install and test them.
- knows how to define the local environment taking into account modules
that may be at a physical location not on the local computer.
b) Library sites
- presents information about modules, reviews, documentation, similar
categories,
- exposes documentation information to "grok" about individual functions
in modules
- mission statement about the library
- accesses meta data from different archives
- generates the dependency information for a selected module.
- accepts user feedback about software modules, so that incomplete
dependencies can be noted etc
- is able to extract information from existing CPAN archives and
wrap/preprocess CPAN packages for perl6
c) Archive
Standards on:
- source, dependencies, binaries, documentations, reviews, classifications
- file storage
- standard definitions for environments in which a binary should be
guaranteed to run (if stated dependencies are also available)
Hope this helps.