Switching over to the 'nom' branch of Rakudo introduced
a large number of regressions and changes that affected
many users of Rakudo.   I know all of the core developers
agree that this was really not a good thing, and we want
to work very hard to avoid such instabilities in the future.

Ideally a commitment of this sort should probably be
documented as a policy or framework somewhere that we can
point to and use for guidance whenever we encounter
potential breakages (which are bound to occur, since Perl 6
is a living language).

In practice what this means is that we want to minimize the
number and impact of any "breakages" that people encounter
when using existing code on subsequent releases of Rakudo Star.
Here I'm focusing mostly on the Star distribution, because
that's where our stability commitments are strongest.
We'll also need to manage breakages within the compiler,
but part of the reason for separating compiler releases
from distribution releases is exactly to make it possible
for version management issues to be handled at different time
scales.

Rather than talk in generalities though, I have a few
specific "use cases" that demonstrate breakages that
we're going to have to manage very soon, and I'd like ideas 
and suggestions about how to handle them.  From the steps we
choose to handles these specific cases I think we can develop
some broader guidelines for future ones.

(If others know of any cases beyond these that need discussing, 
feel free to contribute them to this conversation!)

----

The first category of breakages are places where the Perl 6 
specification has changed from what Rakudo currently implements.

1.  ? quantifier in regexes

The C<?> quantifier used to be specified to capture matches
in the same manner as C<*> and C<+> -- that is, it produced
a List of Match objects in its capture slot (either named
or positional).

The current version of the specification says that a ?-quantified
capture fills the slot with either a single Match object or Nil.

The difference can be seen in this code example:

    / <digit>? y z /

In the specification used to implement the current regex engine, 
the returned match object would always have a List in the 
$<digit> slot; that list would contain a single Match object 
if a digit was found, or the list would be empty if no digit
was found.  That is, the match acted the same as a
C< / <digit> ** 0,1 y z / > regex, thus the match for
the digit would be found at $<digit>[0].

In the current specification (which Rakudo must now migrate
to), the regex will capture any digit directly into the
$<digit> slot of the returned Match object, and $<digit>
will be Nil if no digit is found.  A program looking for
a capture result at $<digit>[0] will always get an undefined
value.

How do we inform users of this change, and when should it be
made (in Rakudo and in Rakudo Star)?


2.  Leading whitespace in rules and :sigspace

A previous version of :sigspace (and hence 'rule') caused 
_all_ whitespace in a regex to be treated as significant; i.e.,
a rule declaration like

    rule xyz { x y z }

would be identical to

    token xyz { <.ws> x <.ws> y <.ws> z <.ws> }

In other words, the space before the 'x' in the rule declaration
would invoke <.ws> to consume any whitespace prior to the 'x'.

The current regex syntax definition changes this such that
whitespace following certain constructs is no longer significant
(in this case, the space following the opening brace).  Thus
the current spec has the xyz rule above translating to

    token xyz { x <.ws> y <.ws> z <.ws> }

with no <.ws> consumption prior to the 'x'.  This will break any 
existing grammars or rules that have been relying on the previous 
rule / :sigspace definition.  

Updating existing code to mimic the old behavior of 'rule' is
fairly simple -- just add a <?> and a space where <.ws> is expected.

    rule xyz { <?> x y z }

Again, how do we inform users of this change, and when should it be made?

----

Another category is where things are outright removed from the specification.

3.  Str.bytes

The C<.bytes> method on C<Str> has always been somewhat problematic; in Perl 6
we typically think of strings in terms of characters, codepoints, graphemes, 
or units other than bytes.  The C<.bytes> method really makes more sense for
something like C<Buf>, but not C<Str>.  Thus it was decided to remove C<.bytes>
completely from the C<Str> specification.

How long should Rakudo keep Str.bytes available for programs that may be
using it?  How do we let people know that it's going away, and what to
potentially use instead?

(For Str.bytes, I've introduced an experimental  "is DEPRECATED" trait 
into Rakudo, thus Str.bytes is actually marked as "DEPRECATED" in the
source.  We could potentially extend this trait so that an option or
pragma causes any uses of DEPRECATED routines to generate warnings
or exceptions.)

----

Another category is where parts of the Perl 6 specification are known to
be fairly slushy, in that what is written is not at all what we expect Perl 6
to ultimately look like, nor what Rakudo implements.  The IO library is
the current poster child for this; we all agree that what is documented in
S16 and other places is almost certainly not what we want, and changes are
being introduced to Rakudo to explore better options.

Most recently were changes introduced to the C<dir> function; a new
implementation of C<dir> was committed that completely invalidated existing
code.  This has since been rectified so that older programs still work, but 
we know there are other IO-related changes that really need to be made
but need exploration before we can determine what they will be yet.

Other examples of this from the past would include Lists and Iterators, and
regexes before that; the IO library is an immediate issue (including things
like sockets and non-blocking I/O), macros may also end up changing somewhat
as they're implemented; in the future I expect that S09 and parallel processing 
will have fairly slushy specs as people explore the implementations.

How should we manage exploration of new(ish) Perl 6 features and libraries
while preserving some sense of stability for people who are actively using
those features?

One suggestion in this case has been to completely freeze the existing 
IO implementation for stability purposes, while simultaneously prototyping
and testing new IO features in other namespaces.  Then as those newer
IO features stabilize, deprecate and phase out the existing IO library
in favor of the newer one.

----

Ultimately the Perl 6 specification says that version numbers are 
supposed to be able to manage these sorts of issues for us; i.e., if
a program says  C<use v6.0.2>, then it gets all of the semantics of
exactly version 6.0.2, regardless of any deprecations or changes that
may have happened since then.

However, I don't believe Rakudo is yet at a place where we can provide
this level of compatibility, so we need some other management policies in
place until we do get there.

Thanks in advance for any comments or suggestions.

Pm

Reply via email to