Re: explicitly declare closures???

2001-09-04 Thread Mark-Jason Dominus


Says Dave Mitchell:

> Closures ... can also be dangerous and counter-intuitive, espcially to
> the uninitiated. For example, how many people could say what the
> following should output, with and without $x commented out, and why:
> 
> {
> my $x = "bar";
> sub foo {
> # $x  # <- uncommenting this line changes the outcome
> return sub {$x};
> }
> }
> print foo()->();
> 

That is confusing, but it is not because closures are confusing.  It
is confusing because it is a BUG.  In Perl 5, named subroutines are
not properly closed.

If the bug were fixed, the result would be 'bar' regardless of whether
or not $x was commented.

This would solve the  problems with mod_perl also.

The right way to fix this is not to eliminate closures, or to require
declarations.  The right way to fix this is to FIX THE BUG.




Re: Please make "last" work in "grep"

2001-05-10 Thread Mark-Jason Dominus



On (03 May 2001 10:23:15 +0300) you wrote:

> Michael Schwern:
> > 
> > Would be neat if:  my($first) = grep {...} @list;  knew to stop itself, yes.
> > 
> > It also reminds me of mjd's mention of:  my($first) = sort {...} @list;
> > being O(n) if Perl were really Lazy.
> 
> But it would need a completely different algorithm.  

Not precisely.  If you have lazy evaluation, then quicksort is exactly
what is wanted here.  For example, if you implement qsort in the
straightforward way in Haskell, and write

min = first quicksort list;

then it *does* run in O(n) time; in this case qucksort reduces to
Hoare's algorithm for min.

> my ($first, $second, $third) = sort {...} @list;

The Haskell version of this also runs in O(n) time.

> is kind-of plausible.  So we'd definitely want
> 
>   ((undef)x((@list+1)/2), $median) = sort {...} @list;

The Haskell equivalent of this (still using quicksort) runs in O(n log
n) time, which I believe is optimal for finding the median.




Re: Schwartzian Transform

2001-03-28 Thread Mark-Jason Dominus


> So you can say
> 
>   use Memoize;
>   # ...
>   memoize 'f';
>   @sorted = sort { my_compare(f($a),f($b)) } @unsorted
> 
> to get a lot of the effect of the S word.

Yes, and of course the inline version of this technique is also
common:

   @sorted = sort { my $ac = $cache{$a} ||= f($a);
my $bc = $cache{$b} ||= f($b);
my_compare($ac,$bc);
  } @unsorted;

Joseph Hall calls this the 'Orcish Maneuver'.

However (I don't know who suggested this, but:)

> > > > >I'd think /perl/ should complain if your comparison function isn't
> > > > >idempotent (if warnings on, of course).  If nothing else, it's probably an
> > > > >indicator that you should be using that schwartz thang.

I have to agree with whoever followed up that this is a really dumb
idea.  It reminds me of the time I was teaching the regex class at
TPC3, and I explained how the /o in

/$foo/o

represents a promise to Perl that $foo will never change, so Perl can
skip the operation of checking to see if it has changed every time the
match is performed.  Then there was a question from someone in the
audience, asking if Perl would emit a warning if $foo changed.

On the other side of the argument, however, I should mention that I've
planned for a long time to write a Sort::Test module which *would*
check to make sure the comparator function behaved properly, and would
report problems.   When you use the module, it would make all your
sorts run really slowly, but you would get a warning if your
comparator was bad. 

Idempotency is not the important thing here.  The *important* property
that the comparator needs, and the one that bad comparators usually
lack is 
if my_compare(a,b) < 0, and
   my_compare(b,c) < 0, then it should also be the case that
   my_compare(a,c) < 0

for all keys a, b, and c.

Sort::Test would run a quadratic sort such as a bubble sort, and make
sure that this essential condition held true.  Note in particular that
if the comparator has the form { my_compare(f(a),f(b)) }, then it does
not matter if f() is idempotent; what really matters is that
my_compare should have the property above.

I had also planned to have optional checks:

use Sort::Test 'self';

(Make sure that my_compare(a,a) == 0 for all a)

use Sort::Test 'twice';

(Make sure that my_compare(a,b) == my_compare(a,b) for all a,b)

This last is essentially the idempotency restriction again.  The
reason I've never implemented this module is that in perl 5, sort()
cannot be overridden, so the usefulness seemed low; you would have to
rewrite your source code to use it.  I hope this limitation is fixed
in perl 6, because it would be a cool hack.

Finally, another argument in the opposite direction yet again.  It has
always seemed to me that this 'inconsistent sort comparator' thing is
a tempest in a teapot.  In the past it has gotten a lot of attention
because some system libraries have a qsort() function that dumps core
if the comparator is inconsistent.  

To me, this obviously indicates a defective implementation of
qsort().  If the sort function dumps core or otherwise detects an
inconsistent comparator, it is obviously functioning suboptimally.  An
optimal sort will not notice that the comparator is inconsistent,
because the only you can find out that the comparator is returning
inconsistent results is if you call it in a situation where you
already know what the result should be, and it returns a different
result.  An optimal sort function will not call the comparator if it
already knows what the result should be!

For example, consider the property from above:
if my_compare(a,b) < 0, and
   my_compare(b,c) < 0, then
   my_compare(a,c) < 0.

If the qsort() already knows that a


Re: RFC 208 (v2) crypt() default salt

2000-09-21 Thread Mark-Jason Dominus


Bart Lateur:
> >If there are no objections, I will freeze this in twenty-four hours.
> 
> Oh, I have a small one: I feel that this pseudo-random salt should NOT
> affect the standard random generator. I'll clarify: by default, if you
> feed the pseudo-random generator with a certain number, you'll get the
> same sequence of output numbers, every single time. There are
> applications for this. I think that any call to crypt() should NEVER
> change this sequence of numbers, in particular, it should not skip a
> number every time crypt() is called with one parameter.
>
> Therefore, crypt() should have it's own pseudo-random generator. A
> simple task, really: same code, but a different seed variable.

I had considered this for the original RFC, but I decided against it.

To implement it, Perl would have to have its own built-in random
number generator, because there is no way to save and restore the old
state of rand() (for example).  It would substantially complicate the
code.

And the problem you describe is not really a problem.  There has never
been any guarantee that a program would produce the same sequence of
random numbers after a change to the Perl binary.  More recent
versions of Perl use random() or drand48() if they are available,
instead of rand().  A program run under an old version of Perl and
then a newer version that used random() instead of rand() would
generate a different sequence of random numbers depending on which
version of Perl was running it, even if the seed was the same.  This
has never been an issue in the past, so I did not consider it
important.

I will add a note aboput this to the RFC.  If there are no other
comments, I will freeze it in 24 hours.




RFC255: Fix iteration of nested hashes

2000-09-18 Thread Mark-Jason Dominus


> This RFC proposes that the internal cursor iterated by the C function 
> be attached to the instance of C (i.e. its op-tree node),

In the past, this has been a mistake, because it breaks the identity
of closures.  For example, with your proposal, the following code,
which works now, will no longer work at all:

%a = ...;
%b = ...;

sub make_iterator {
  my $hashref = shift;
  return sub { each %$hashref }
}

my $a_iterator = make_iterator(\%a);
my $b_iterator = make_iterator(\%b);

for (1 .. 100) { 
  push @a, $a_iterator->();
  push @b, $b_iterator->();  
}

We want to get the data from %a into @a, and the data from %b into @b.
With your proposal, this code must fail.  The most likely failure mode
is that you get 100 copies of %a's first key and value in @a, and 100
copies of %b's first key and value in @b.
  
The code fails because you said to attach the iterator state to the op
node, and there is only a single op node here.  Unless that op node
has room for an arbitrarily large number of states, the call to
$b_iterator->() is going to destroy the iterator information that was
saved during the call to $a_iterator->().

The solution to this is that the iterator state should be stored in
the pad for the block in which the each() appears.  The op node can
hold the index of this pad element.  Since the two closures do not
share pads, the code will continue to work.

So your proposal can be saved, but it needs to be fixed.

Mark-Jason Dominus   [EMAIL PROTECTED]
I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.





Re: 'eval' odd thought

2000-09-18 Thread Mark-Jason Dominus


Bart Lateur:
> If your P5->P6 translator is slow, i.e. written
> in Perl, this would imply a pretty big performace hit. 

It is better for translated programs to do the right thing slowly than
to do the wrong thing as quickly as possible.

> What would help is a debugging mode that prints out the
> converted code string,

This is such a debugging mode.  

The translated script will run properly and do the right thing without
intervention, although it may run slowly.  The programmer can modify
the perl5_eval code so that it issues a warning or calls 'die' or
whatever.

The programmer can also examine the occurrences of 'perl5_eval' in the
source code to decide whether they can be converted immediately to
plain 'eval'.

Once all the appearances of perl5_eval have been replaced, the
translator is no longer needed.


Mark-Jason Dominus   [EMAIL PROTECTED]
I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.




Re: 'eval' odd thought

2000-09-15 Thread Mark-Jason Dominus


> eval should stay eval.

Yes, and this is the way to do that.  

When you translate a script, the translator should translate things so
that they have the same meanings as they did before.  If it doesn't
also translate eval, then your Perl 5 scripts will be using the Perl 6
eval, which isn't what you wanted.




Re: RFC 226 (v1) Selective interpolation in single quotish context.

2000-09-14 Thread Mark-Jason Dominus


> > One could for example have a pragma to *really* tag variables
> > lexically to be expanded within singlequotes.  

Or a pragma that simply changes the semantics of q{...} so that it has
the proposed feature for the rest of the scope of the current block.




'eval' odd thought

2000-09-14 Thread Mark-Jason Dominus


The perl 5 -> perl 6 translator should replace calls to 'eval' with
calls to 'perl5_eval', which will recursively call the 5->6 translator
to translate the eval'ed string into perl 6, and will then eval the
result.

Mark-Jason Dominus   [EMAIL PROTECTED]
I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.




Re: RFC 226 (v1) Selective interpolation in single quotish context.

2000-09-14 Thread Mark-Jason Dominus


> seconded by Mark-Jason Dominus <[EMAIL PROTECTED]>

Except that I don't think adding this feature to the existing q{...}
is a good idea.  If I had to vote on your proposal, I would instantaly
vote against it.  I think you should have invented a new operator or a
pragma or something.



Re: RFC 208 (v2) crypt() default salt

2000-09-14 Thread Mark-Jason Dominus


> =head1 TITLE
> 
> crypt() default salt
> 
> =head1 VERSION
> 
>   Maintainer: Mark Dominus <[EMAIL PROTECTED]>
>   Date: 11 Sep 2000
>   Last Modified: 13 Sep 2000
>   Mailing List: [EMAIL PROTECTED]
>   Number: 208
>   Version: 2
>   Status: Developing

If there are no objections, I will freeze this in twenty-four hours.



Re: Conversion of undef() to string user overridable for easy debugging

2000-09-13 Thread Mark-Jason Dominus


> This reminds me of a related but rather opposite desire I have had
> more than once: a quotish context that would be otherwise like q() but
> with some minimal extra typing I could mark a scalar or an array to be
> expanded as in qq(). 

I have wanted that also, although I don't remember why just now.  (I
think have some notes somewhere about it.)  I will RFC it if you want.

Note that there's prior art here: It's like Lisp's backquote operator.




Re: types that fail to suck

2000-09-12 Thread Mark-Jason Dominus


> You talked about Good Typing at YAPC, but I missed it.  There's a
> discussion of typing on perl6-language.  Do you have notes or a
> redux of your talk available to inform this debate?

http://www.plover.com/~mjd/perl/yak/typing/TABLE_OF_CONTENTS.html
http://www.plover.com/~mjd/perl/yak/typing/typing.html

Executive summary of the talk:

1. Type checking in C and Pascal sucks.

2. Just because static type checking is a failure in C and Pascal
   doesn't mean you have to give up on the idea.

3. Languages like ML have powerful compile-time type checking that is
   successful beyond the wildest imaginings of people who suffered
   from Pascal.

4. It is probably impossible to get static, ML-like type checking into
   Perl without altering it beyond recognition.

5. However, Perl does have some type checking mechanisms, and more are
   coming up.


Maybe I should also mention that last week I had a dream in which I
had a brilliant idea for adding strong compile-time type checking to
Perl, but when I woke up I realized it wasn't going to work.





Re: RFC 105 (v1) Downgrade or remove "In string @ must be \@" error

2000-08-16 Thread Mark-Jason Dominus


This has already been done for Perl 5.6.1.  Here is what perldelta.pod
has to say.



=head2 Arrays now Always Interpolate Into Double-Quoted Strings

In double-quoted strings, arrays now interpolate, no matter what.  The
behavior in perl 5 was that arrays would interpolate into strings if
the array had been mentioned before the string was compiled, and
otherwise Perl would raise a fatal compile-time error.  In versions
5.000 through 5.003, the error was

Literal @example now requires backslash

In versions 5.004_01 through 5.6.0, the error was

In string, @example now must be written as \@example

The idea here was to get people into the habit of writing
C<"fred\@example.com"> when they wanted a literal C<@> sign, just as
they have always written C<"Give me back my \$5"> when they wanted a
literal C<$> sign.

Starting with 5.6.1, when Perl now sees an C<@> sign in a
double-quoted string, it I attempts to interpolate an array,
regardless of whether or not the array has been used or declared
already.  The fatal error has been downgraded to an optional warning:

Array @example will be interpolated in string

This warns you that C<"[EMAIL PROTECTED]"> is going to turn into
C if you don't backslash the C<@>.

See L<http://www.plover.com/~mjd/perl/at-error.html> for more details
about the history here.




Mark-Jason Dominus   [EMAIL PROTECTED]
I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.




Re: Deep copy

2000-08-07 Thread Mark-Jason Dominus


Lisp, which you might expect would have a 'deep copy' operator,
doesn't have one.  The Lisp folks have apparently thought about this
very carefully, and decided that the semantics are unclear, and that
the obvious options are all wrong; I've read a number of articles
about this in the past.  

I don't remember all the details, unfortunately.  But I think it's
worth paying attention to prior art where there is some.  This article

http://world.std.com/~pitman/PS/EQUAL.html

discusses this in some detail.  I haven't thought about this in the
context of Perl yet, so I'm not sure if all the reasons apply.   Also
if you do a Deja search in comp.lang.lisp for the phrase "deep copy",
you'll find an extensive discussion of why it doesn't make sense, at
least in Lisp.

I'll also note the the same problems comes up when comparing for
equality; if you want a deep copy operator, you should also want a
deep compare oprator.  But Lisp has not one but *five* equality
comparison operators, and part of the proliferation is for the reason
that the 'deepness' of the desired comparison varies from application
to application.  Perl has two equality comparison operators and people
aready complain that that is too many.

Mark-Jason Dominus   [EMAIL PROTECTED]
I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.




RFC17

2000-08-05 Thread Mark-Jason Dominus


I don't want to join the discussion in general, and I'm not on the
language list.  So this is a one-shot manifesto.

I agree with the goal of RFC17:

Organization and Rationalization of Perl State Variables

but I think the implementation ideas are making a terrible mistake.
Specifically:

> =head1 IMPLEMENTATION
> =head3 Well-Named Global Hashes And Keys

I think if there's one thing we have learned (or should have leanred)
from Perl 5, it's that this sort of global state variable is a
terrible idea regardless of what its name is.

Why is $* deprecated?  Because it's dangerous.  

Why is $* dangerous?

Because some function eleven calls down in some module you never heard
of that was loaded by some other module might do this:

$* = 1;

which suddenly changes the semantics of *every* regex match in your
*entire* program.

Conclusion:  The real problem with $* isn't the name (although the
name is nasty.)  The real problem is that it's a global variable with
a global effect, and it changes the meaning of code that is far away.

RFC17 fixes the little problem and leaves the big problem gaping and
festering.

But OK, $* is deprecated, so we can assume that it won't be in Perl 6.
Maybe the real problem has gone away?  No.  RFC17 specifically
mentions $^W, which has exactly the same problem.  Some function
eleven calls down in some module that was loaded by some other module
might do

$^W = 1;

(or $PERL::CORE{warnings} = 1 if you prefer) and suddenly change the
warning behavior of *every* part of your *entire* program.  If $* were
not a global, it would be at worst an odd wart, and possibly even a
convenience.  Because it is a global, it is a dangerous hazard.  $^W
is similar.

$/ is a necessary evil that must be carefully used because it is a
global.  If you set $/ and forget to localize the change, the rest of
the program blows up in a bizarre way because *every* filehandle read
operation in (every* part of the program changes behavior.  If $/ were
per-filehandle, or if it were lexically scoped, it would be an
unmitigated advantage.

Similarly for each of $\ $, $" $; $# $* $= $^L $^A $@ $^I $^P $^R $^W
and especially the putrid $[.

$| $^ $~ are less problematic because they are per-filehandle.

$. $% $` $& $' $- $+ @+ @- $? $! $^E $$ $[ $^O $^R $^T $^V are less
problematic because they are read-only.  Some, like $., are still
problematic, because, for example:

$line = ;
subr();
print "Line $. is $line";

might work, or it might not.  

$0 $< $> $( $) $^C $^D $^F $^H $^M @ARGV are not problems because they
really are global.  Each process has one real UID, and only one, so a
global variable for $< is perfectly OK.

$_ is in a class by itself.  

Summary of manifesto: Global variables must be expunged.

Replacing the old rotten global variables with new rotten global
variables is not enough of an improvement.



Mark-Jason Dominus   [EMAIL PROTECTED]
I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.