Before joining in the previous CPAN threads, here are some personal wish lists regarding what the perl6 version of CPAN should do. But in order to get some distance from CPAN, I want to call it the Module Library system. Some of the debate threads impact on the internal software environment, and here I think the internal and external environments must be designed in harmony.

A library paradigm is different from an archive paradigm, although a library is a form of archive. However, a library is a single place to go to obtain information. It is set up to enhance searching and finding. An archive is a place where things are stored; the users of an archive know what and where they are storing. Hence the difference between an archive and a library is a set of assumptions about what a user knows. Archives are essential for large libraries (often called stacks), but there must be standardised metadata about what is in each archive for it to be useful for a library.

So my wish list. I would like:

1) a single place to go for modules that will help me solve programming problems; 2) a variety of ways to look through modules and the information on them - reviews, documentation, dependencies - so that I can choose between them and know what a choice entails; 3) a variety of classifications of modules and module contents that can lead me to modules which may achieve what I want but are named in a manner I did not expect, or contain functionality I could use in another context; 4) a means to extract from the library the module(s) I want, (better still only the parts of the module) and all their dependencies;
5) the ability to access existing CPAN perl software.

Extending the library paradigm, there are generalist libraries, which only contain commonly requested books, and specialist libraries and also libraries in other languages. However, good libraries share systems for requesting books from other libraries, even if they are not stored locally. The difference between libraries then boils down to the metadata available to the user, eg., more accessible user catalogs, librarians who know about subjects and can point to related books, etc.

The library paradigm breaks down somewhat for software because there is an intimate link between "my" environment and the software I choose. Hence, it is important to consider the internal environment ("my" computer) as well as the external environment (the module libraries).

Not only is there the problem of different operating systems (*ix, windows*, mac, etc), there is also the possibility of outdated software installed, eg., different versions of languages, such as perl 5.8, perl 5.10.

Within "my" environment, there are multiple possible locations for different aspects of software, eg. binaries, configuration data (which can be global to all users and local for a single user), documentation, data, results. In a corporate environment, these locations may exist across a network. For example, documentation may be on a server, binaries and local configuration data on the local computer, results posted to a web site, input data taken from a database.

My wish list for the internal environment; I would like:

1) a single system to manage all my software modules;
2) a method to see where different aspects of software are located, and to change them; 3) an ability to manage in parallel different dependency hierarchies (such as might result from a need to use modules from different "generations" (eg. perl 5.8 and 5.10). 4) a comprehensive method to determine what my local environment is, so that I can see whether it "matches" the environment required by software that I want to download, and if not, what I need in addition to be able to run the software.

Some commentary on the above:
I use the CPAN archive, CPANPLUS, and synaptic (a GUI frontend to the apt tools) for perl since I use Ubuntu/Gnome. In some cases I have to load binaries using synaptic because I cant get the sources to compile properly from CPAN. I do not have the time or inclination to discover why. The result is a mishmash of dependencies. Moreover, I cannot get updated software without breaking something. This occurred when the one Ubuntu generation was distributed with perl5.8 even though perl5.10 was out, and I really wanted to use some of the new software.

The result was that I had both 5.8 and 5.10 modules. Although most things worked, perldoc wasnt recognised by some Configure scripts. I am not interested now in discovering the reason. I use this simply as an example about how multiple distribution channels that make different assumptions can lead to problems. But because there is no single method for monitoring the perl environment, it is difficult to solve some of the problems.

The current setup works extremely well, except when it doesnt. My difficulty is finding out how to solve the problems that occur. So I think there should be a system that normally works "out-of-the-box" with no user interference, but tools that can be used to identify and change a configuration when needed.

The source vs binary debate:

I would suggest that most users would prefer to have precompiled binaries that work out of the box for common environments. The problems of matching the software to an environment is a bit like getting a spare part for a car, or a replacement for something broken at home. The possible infinity of choices given multiple manufacturers and systems is solved by having standards and selling parts that will match standards, eg. tyres or bolts with standard diameters.

Once a module has been decided on, you look to see if there is a binary that matches your internal environment. If not, you have to roll your own from source.

It would be a function of the library administration to set rules as to the standards that a binary needs to meet to be filed in some category.

Photographs vs software:

The more flexible the design, the longer lasting. Libraries can contain books on religion and pornographic magazines. The design of a library search search or storage system does not define what can be put in it. There is a place for strong views about what should be in a library (eg. no pictures, only pure perl6). In terms of library design, the question should be "how can the user discover, in general, what a library contains". If you dont like the contents of a library, use another.

I believe it would useful to have multiple libraries, each of which has a "mission statement" that would define what sort of information it contains. Hence, libraries dedicated to windows only binaries, perl6 modules, perl6/Ruby/Python interacting software.

If the system proves to be flexible, then the same library design and management software could be applied to photographs, food recipes, etc. Just like the worldwide web was designed for scientific papers with embedded links, the design is so flexible it has changed the structure of our informational world.

Names and implicit assumptions:

There was a thread about CPAN6 6PAN Pause6, etc. It seems to me that the conflicting ideas underlying these names concern the user paradigm, that is how does someone use the entire system. I believe that we might get more traction if we start by looking first at the user paradigm, then different ways to provide the functionality.

I know that many of my wishes are fulfilled by the more detailed proposals.

However, the proposals seem to assume what the whole system should do. The idea behind this post is to make the user paradigm clearer, so that the strengths and weaknesses of different approaches can be compared.

One way of breaking out parts of the whole system might be:
a) Internal module management
(for example)

- provides to a perl program the physical locations for settings in %*ENV
eg. 'use MyModule::Submodule;'
might normally lead to (in unix) '/usr/local/share/lib/perl6/site/MyModule/Submodule.pm' but it could also be '/home/project_10/version_15/assistant_junior_hacker/Submodule.test'
or 'http://www.project_10.design.library/MyModule/Submodule.pm'
or 'select "MyModule_SubModule" from "standard_modules" where language="perl6" '

- allows the user to set an arbitrary environment variable, eg. %*ENV<April_dataset> so that inside the program it is possible to do
my $data = slurp(%*ENV<April_Dataset>);
without worrying where the binary is running, eg., in PADRE, or called from the directory where the file is present. Also, the program could be agnostic about whether it is being run on Windows and the path definition is 'MyDocuments\UserData\April_Dataset.csv' or unix '/home/hacker/perl6/April-09.DataSet.current'

- defines how software should be distributed in the system, eg. binaries in /usr/local/lib/perl6/site, documentation in /usr/local/doc/perl6/site, config files in /usr/local/etc/perl6/site, source files in /usr/local/src/perl6/site

- handles downloads from archives

- knows how to build binaries from src and install and test them.

- knows how to define the local environment taking into account modules that may be at a physical location not on the local computer.

b) Library sites
- presents information about modules, reviews, documentation, similar categories, - exposes documentation information to "grok" about individual functions in modules
- mission statement about the library
- accesses meta data from different archives
- generates the dependency information for a selected module.
- accepts user feedback about software modules, so that incomplete dependencies can be noted etc - is able to extract information from existing CPAN archives and wrap/preprocess CPAN packages for perl6

c) Archive
Standards on:
- source, dependencies, binaries, documentations, reviews, classifications
- file storage
- standard definitions for environments in which a binary should be guaranteed to run (if stated dependencies are also available)

Hope this helps.

Reply via email to