RFC 17 (v2) Organization and Rationalization of Perl

Perl6 RFC Librarian Mon, 07 Aug 2000 13:48:30 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Organization and Rationalization of Perl State Variables

=head1 VERSION

  Maintainer: Steve Simmons <[EMAIL PROTECTED]>
  Date: 3 Aug 2000
  Version: 2
  Mailing List: [EMAIL PROTECTED]
  Number: 17
  CVSID: $Id: RFC17.pod,v 1.3 2000/08/07 18:32:04 scs Exp $

=head1 ABSTRACT

Perl currently contains a large number of state and status variables
with names that resemble line noise (henceforth called C<$[linenoise]>
variables).
Some of these are (or should be) deprecated, and the naming methods for
the rest are obscure.  Since we are (potentially) adding, removing, and
changing the functionality of these variables with Perl6, we should seize the
opportunity to rationalize the names and organization of these variables
as well.  These variables need to be made available with mnemonic names,
categorized by module, be capable of introspective use, and proper
preparation made for deprecation and translation.

=head1 DESCRIPTION

C<Perl> allows for the runtime examination and setting of a number of
variables.  These existing C<$[linenoise]> variables are horrible names and
need cleanup, re-organization, and syntactic sugar.  Different variables
have different problems, usually one or more of the following:

=over

=item *

unused (as far as we know)

=item *

useless, but still appear in older programs

=item *

duplicated by other features

=item *

standing in the way of advance because of backwards compatibility

=back

In the pre-RFC discussion of this issue, it was also pointed out
that these variables are hard to deprecate without nagging the crap
out of users running the programs.  The proposed solution was
broadly applicable, and has been spun off into RFC 3.

The use of the C<English> module is an attempt to solve the anti-mnemonic
features of these variables.  A better solution is to do it right in
the first place, with a number of attendant wins:

=over

=item *

no need for Yet Another Module to be loaded

=item *

promotes code readability

=item *

probably promotes core Perl maintainability

=item *

grouping sets of variables together by function gives
the code writer strong hints as to variables that might
affect each other

=item *

should we move formats (or any similar current core functionality)
to a loadable module, this neatly encapsulates all the data with
a single referenced name

=item *

solves problems with C<$[linenoise]> variable proliferation

=item *

potentially promotes introspection - see below.

=back

In addition, many features which are now (re)set by other calls should
set appropriate state variables as well.  Thus a C<perl> script which
contains:

    use strict foo;

might set a var C<$PERL_STRICT{foo}>, and so forth (this is probably
a poor example).

Credit where it is due: the idea of putting related values together into
an appropriately tagged hash is shamelessly ripped off from common C<tcl> usage.

=head3 Caveat - Global State Variables Are Dangerous

Having the setting of simple variables modify the function of a
broad set of things is an inherently dangerous and inelegant way of
doing things.  Mark-Jason Dominus <[EMAIL PROTECTED]> has proposed that
they simply be removed whenever possible.  I tend to agree with his
arguement, and join in urging that the problems he addresses in message
C<[EMAIL PROTECTED]> be addressed by the core teams
working in the areas that use such variables.  The thread started my
Mark-Jason in C<perl6-language> has a good discussion of the issues.

Notwithstanding, should the core team continue to allow global
variables for some purposes, the names and categorization should
be improved.

=head2 Advantages and Non-Loses

=head3 Clean Backwards Compatibility

To promote backward compatibility, one could write a C<use antiEnglish>
module which would alias the old names to the new ones.  That Would Be
Wrong, but someone will probably do it - so it may as well be us.
Obviously we cannot provide backwards compatibility for a variable
whose meaning has changed or which has vanished, but most of the
rest can be captured cleanly.

=head3 Promoting Removable Core Modules

It has been strongly proposed that `core C<perl>' be broken down internally
into a number of modules such that one could build smaller or larger
custom perls.  This feature would ease that work, and possibly set a
standard for how well-behaved non-core modules should implement such
things.

=head3 Provide Possible Guidelines To Core-able Modules

The discussion of removable core modules has strongly implied (sometimes
explicitly stated) that sites could take modules which are not currently
in the core and move them there.  Having a standard which those modules
could follow for variable setting and exposure would be a major win.

=head2 Disadvantages

=head3 Backwards Compatibility

Existing C<perl> code which makes use of C<$[linenoise]> variables
which had not changed meaning in C<perl6> will fail if (as proposed)
the variables are completely replaced by the names proposed here.
This is, at best, a minor objection because

=head3 Increased Typing Considered Painful

A number of people have expressed great joy at the brevity of
the current names.

=head3 Loss of Distinctiveness

Currently when one sees a C<$[linenoise]> variable,
they notice its specialness and have a visual clue to take care.
C<$[linenoise]> stands out.
Since some of these variables have wide-reaching side-effects,
any solution should try to maintain that distinctiveness.

=over 4

=item *

Few if any scripts match this description

=item *

The proposed C<perl5> to C<perl6> translation tool should catch this
and do the appropriate renaming.

=back
be modified to use the C<antiEnglish> modules described above in
I<Clean Backwards Compatibility>.

=head2 Other Possible Features

It has been suggested by Alan Burlison <[EMAIL PROTECTED]>
that this allow for localizable settings.  Thus module A might
turns warnings off when its features are in use, while module
B is unaffected by the setting done by module A.  This is a change
in the functionality of such settings, and has the potential for
broadly changing what happens in C<perl>.  I suggest that this
issue is actually independent of how the variables are named,
and should be taken to another RFC if anyone is interested.


=head1 IMPLEMENTATION

=head2 Internal Implementation

The internal representation of these is largely irrelevant(!).
This RFC prescribes what the I<external> interface looks like;
the implementation team should select whatever mechanism they
prefer for internal use.  It's probably a maintenance win if
the same is used, but the proposers don't intend to dictate
to the developers.

=head2 External Implementation - Proposals

=head3 Overall

Variables should be sorted into functional areas which may or
may not have a one-to-one correspondence with internal (core)
There have been four suggestions for how the variables should
be named.

In the examples below, I have capitalized the higher-level
portions of the names.  This capitalization is not a requirement
of the RFC, and is done purely to make the new names stand out
in this document.

=head3 Well-Named Global Hashes And Keys

For each collection of variables, a well-named pseudo-hash with
well-named keys:

  $PERL_CORE{warnings}          vs      $^W
  $PERL_CORE{version}           vs      $^V
  $PERL_FORMATS{name}           vs      $^
  $PERL_FORMATS{lines_left}     vs      $-
  $PSEUDO_HASHES{strict}        vs      (none)

An additional variable should be provided,

  $PERL_CORE{variables}

which contains a list of all the settable hash names (eg,

  $PERL_CORE{variables} = qw(PERL_CORE PERL_FORMATS PSEUDO_HASHES ...)

Advantages: Only a single point in the namespace is used for each
variable.  The hashes can be handed around en masse efficiently
via references.  Pseudo-hash member access is efficient.
Names are free from module dependency.  Introspection with hashes
is more powerful (see below).

Disadvantages: Pseudo-hashes are new to most programmers.

=head3 Well-Named Module Variable Sets

This has an almost one-to-one naming match to the first suggestion,
but used module naming:

  $PERL::CORE::Warnings         vs      $^W
  $PERL::CORE::Version          vs      $^V
  $PERL::FORMATS::Name          vs      $^
  $PERL::FORMATS::LinesLeft     vs      $-
  $PSEUDOHASH::CONTROL::Strict       vs      (none)

Advantages: Looks like individual variables again.  Modules could
be moved (compiled) in and out of the perl core and, so long as
a script did the appropriate C<use/require> statement, the script
would require no other changes.

Disadvantages: Locks variables into given modules.  The second-level
names may become difficult to do sensibly.

=head3 Well-Named Per-Module Hashes And Keys

This is a hybrid of the first two:

  $PERL::CORE{warnings}          vs      $^W
  $PERL::CORE{version}           vs      $^V
  $PERL::FORMATS{name}           vs      $^
  $PERL::FORMATS{lines_left}     vs      $-
  $PSEUDOHASH::CONTROL{Strict}   (none)

Advantages: Looks like individual variables again.  Modules could
be moved (compiled) in and out of the perl core and, so long as
a script did the appropriate C<use/require> statement, the script
would require no other changes.  Introspection and variable
discovery via C<keys %PERL::CORE> is still a win.

Disadvantages: Locks variables into given modules.  The second-level
names may become difficult to do sensibly.  Introspection becomes
more difficult because one must find the second-level name(s) for
each class of item (eg, PERL::CORE and PERL::FORMATS).  Sometimes
there should not be More Than One Way To Do It - a single mechanism
makes it easier to notice, find, use, and understand such var.

=head3 Introspection With Hashes

The use of hashes and pseudo-hashes leads to a straightforward and
`natural' mechanism by which programmers can discover all the
relevant variables for a given item:

   foreach my $key ( %PERL_CORE ) {
      print "\%PERL_CORE{$key} is $PERL_CORE{$key}\n";
   }

=head3 Summary

The RFC maintainer thinks the first suggestion, I<Well-named Global
Hashes and Keys>, is the best choice.  While it has potential problems
for module removal should a hash be shared between several modules,
these are (IMHO) worth the flexibility of I<not> locking a given hash to
a given module and providing a single, consistent mechanism programmers
would use to obtain value settings.

The community has not (as of version 1.1) expressed a strong preference
in either direction.

=head2 Value Protection

A value can be set and reset, but it's existence should be protected.
Thus one could set the variables, but not undefine or delete them.
If hashes are chosen, the user should not be allowed to add or
remove key/value pairs from the hash.

Further, values such as $C<PERL_CORE{version}> should be read-only.

=head2 Backwards Compatibility Issues and Suggestions

Backwards compatibility is an issue in two ways:

=over4

=item *

Moving old scripts to C<perl6>

=item *

Moving old coders to C<perl6>

=back

There are two solution sets for this, both of which should be implemented
as part of C<perl6>

=head3 Variable Translation

The proposed script translator should catch all uses of pre-perl6
C<$[linenoise]> variables and translate them to the new names according
to the guidelines below.

=head3 Variable Remapping

An C<antiEnglish> module could be provided.
Like the current C<English> module, this would attempt to map the new names
into the old C<$[linenoise]> variables for those programmers absolutely
addicted to the old names.
It should handle the variables according to the guidelines below.

=head3 Permanent Aliasing?

Several comments have been made (including one claimed to be second-hand
from Larry Wall) that they just don't like typing long variable names.
If this is a sufficently strong problem, the contributors to this RFC
would have no objection to having both variable namesets present
simultaneously.

This would have the advantage that the variables would still be
reorganized and grouped, and some additional introspection possible,
but the folks who are really married to the ultra-short names would
still have them available with no typing overhead.

=head3 Guidelines for variable translation and remapping

The C<perl6> conversion script should silently translate old
variable names to the new names when the meaning of the variable
is essentially unchanged.

The C<antiEnglish> module should provide the old name as an
alias or map to the new name.

=head4 Variables with changed meaning

The C<perl6> conversion script should translate old
variable names to the new names when there seems to be a reasonable
but not perfect mapping between the meanings of variables.
A one-time warning should be issued at compile time for each
use of such variables, similar to the manner in which the current
C<perl -w> warns of first use of undeclared variables.
A comment should be inserted into the code above the use of the
variable saying something like:

    # Comment inserted by PERL6_trans on <date>
    # Variable <foo> does not exist in perl6.  We have translated
    # it to <bar> below, but <foo> and <bar> are not completely
    # equivalent.  Please check your usage and make the appropriate
    # changes.

In the best of all worlds, the comment might give hints as to
what to think about when changing that var.

The C<antiEnglish> module should provide no mapping for these
variables.

=head4 Variables which have been removed

The translator should leave removed variables which have been removed
in place, but a well-tagged C<warn> statement should be placed above

The C<antiEnglish> module should provide no mapping for these
variables.

=head4 New variables

Should the translator notice use of an existing variable with the
same name, it should ?print a warning and stop? ?print a warning and
go one? ?automaticly rename to prior var to l_?  TBD.

The C<antiEnglish> module should provide no mapping for these
variables.


=head2 Deprecation Warnings In Usage

Programmers using hand-translated scripts or who attempt to run
untranslated scripts with C<perl6> should get compile-time warnings
and errors, depending on severity of mis-use.

At a minimum, C<use strict> and C<perl -w> should warn of the
use of deprecated variable names.  That deprecation warning should
be I<specific>, not simply C<var I<$foo> is deprecated>.  Messages
should identify the old var, new var, and at least imply an action.
Some possible samples include: 

  changing var $foo no longer has any effect, see <doc module>

  the value of var $bar no longer has any meaning, see <doc module>

  var $baz has been replaced by $FOO_BAR{baz}

and so forth.

In addition, it should be possible to have the programmer detect
and control these messages on a case by case basis.  RFC 3 proposes
a mechanism by which programmers could take more direct control of
this; should that RFC be accepted then that mechanism should be used.
Should it fail, we will make a more specific recommendation here.

=head1 CONTRIBUTIONS

Contributions ranging from voluminous commentary to pretty
decent wisecracks have been received from:

    Ted Ashton <[EMAIL PROTECTED]>
    "J. David Blackstone" <[EMAIL PROTECTED]>
    Corwin Brust <[EMAIL PROTECTED]>
    Alan Burlison <[EMAIL PROTECTED]>
    Piers Cawley <[EMAIL PROTECTED]>
    Tom Christiansen <[EMAIL PROTECTED]>
    Mark-Jason Dominus <[EMAIL PROTECTED]>
    Ken Fox <[EMAIL PROTECTED]>
    Chaim Frenkel <[EMAIL PROTECTED]>
    Jarkko Hietaniemi <[EMAIL PROTECTED]>
    Tim Jenness <[EMAIL PROTECTED]>
    Bart Lateur <[EMAIL PROTECTED]>
    Peter Scott <[EMAIL PROTECTED]>
    William Setzer <[EMAIL PROTECTED]>
    Dan Sugalski <[EMAIL PROTECTED]>
    "Bryan C. Warnock" <[EMAIL PROTECTED]>
    Nathan Wiger <[EMAIL PROTECTED]>

and others who I probably missed.  In addition, it's been pointed
out that this suggestion was raised in the past; unfortunately the
fingerprints have long since worn away.


=head1 REFERENCES

RFC 3 on Run-Time Error Message Control
RFC 17 (v2) Organization and Rationalization of Perl

Reply via email to