Re: RFC 283 (v1) Ctr/// in array context should return a histogram

2000-09-26 Thread Paris Sinclair

On Mon, 25 Sep 2000, Simon Cozens wrote:

 On Mon, Sep 25, 2000 at 09:55:38AM +0100, Richard Proctor wrote:
  While this may be a fun thing to do - why?  what is the application?
 
 I think I said in the RFC, didn't I? It's extending the counting use of tr///
 to allow you to count several different letters at once. For instance, letter
 frequencies in text is an important metric for linguists, codebreakers and
 others; think about how you'd get letter frequency from a string:
 
 $as = $string =~ tr/a//;
 $bs = $string =~ tr/b//;
 $cs = $string =~ tr/c//;
 ...
 $zs = $string =~ tr/z//;
 
 Ugh.
 
 (%alphabet) = $string =~ tr/a-z//;
 
 Yum.

also a little more concise (and certainly more efficient...) than

%alphabet = map { $_ = eval "\$string =~ tr/$_//" } (a..z);

Context is beautiful; it's at least 50% of the reason I love Perl. I would
love to see it extended here.

Paris Sinclair|4a75737420416e6f74686572
[EMAIL PROTECTED]|205065726c204861636b6572
www.sinclairinternetwork.com




Re: RFC 283 (v1) Ctr/// in array context should return a histogram

2000-09-26 Thread Paris Sinclair

On Tue, 26 Sep 2000, Bennett Todd wrote:

 2000-09-26-05:18:57 Paris Sinclair:
   (%alphabet) = $string =~ tr/a-z//;
  
  also a little more concise (and certainly more efficient...) than
  
  %alphabet = map { $_ = eval "\$string =~ tr/$_//" } (a..z);
 
 However, compared to say
 
   $hist[ord($_)]++ for split //, $string;
 
 the performance edge might not be quite so dramatic. Then again,
 maybe it would be, I dunno.

But would technique work with unicode? What if I am just counting some 
Bulgarian characters? Most encodings put these in the extended ascii
range. Making an array of 250 items for a count of 5 items isn't going to
be more efficient. Also, it requires jumping through more hoops, and doing
more conversions, to figure out which index is which letter. A table could
be built, but if it maps to an array index, based on ord(), then I
couldn't support both KOI-8 and windows cyrillic encodings in the same
@hist structure. Using a hash, the only limits are the more general
language supports in Perl, and I can still convert and store KOI8 and
cp1251, and store the results without needing to know which coding it
originated in; only needing to have a symbol for the character.

There seem to be lots of beneficial side effects of extending context,
that allow for general sollutions that are much more powerful than any of
the specific sollutions.

Paris Sinclair|4a75737420416e6f74686572
[EMAIL PROTECTED]|205065726c204861636b6572
www.sinclairinternetwork.com




Re: RFC 283 (v1) Ctr/// in array context should return a histogram

2000-09-26 Thread Paris Sinclair

On Tue, 26 Sep 2000, Bennett Todd wrote:

 Yup, I'm a sick little monkey who truly doesn't care about anything
 other than US-ASCII

Please keep your fetishes and/or geocentricism to yourself. There is no
need to propose that others should share them. If Perl is going to exist
into the future, if Perl is going to be a great programming language for
Humans, then it needs to support the different ways that Humans
communicate.

It's doing a better job at it all the time. Extending the context of
Ctr/// is an excellent general sollution to many problems, in many
languages. While it has been suggested that Ctr/// isn't for
counting... well, the p5 manual says it IS for counting, amoung other
things. If it is a general language tool that makes counting easy as a
side effect, this is wonderful. And if making it a more general tool by
extending it's context makes it even better for counting, who does this
hurt? There are certainly those of us it would help.

And yes, a list of 250 items to store 5 items is HUGE. There is no way to
know how many items I will have. O(N*50) is never going to make me
happy. Which is why right now I would have to use a funky Cmap
and Ceval. Or a map and a match and an index, but that's a lot of
frivilous temp variables.

Paris Sinclair|4a75737420416e6f74686572
[EMAIL PROTECTED]|205065726c204861636b6572
www.sinclairinternetwork.com




Re: RFC 283 (v1) Ctr/// in array context should return a histogram

2000-09-26 Thread Paris Sinclair

On Tue, 26 Sep 2000, Bennett Todd wrote:
 That sounds positively noble when you put it that way. I can
 actually hear choirs of cherubim providing atmosphere.

I heard them also, but I thought it was the radio.

  And yes, a list of 250 items to store 5 items is HUGE. There is no way to
  know how many items I will have.
 
 Yup, but as long as you're working with 8-bit encodings the array
 will never get bigger than 256.

Who says I'm working with 8-bt encodings?!
Perl5 already has rudimentary support for multibyte encodings. So far I
haven't used them, but this is only because I'm dealing with my multibyte
input as binary data, and just passing it allong. Presumably I will want
to make some sanity checks once I learn enough to know what I'm checking
for.

When the Martians come out of hiding, we're going to have to add 13bit
fonts, so maybe we should keep our arbitrary character restrictions in the
core, in just one place, to make it easier to accomodate this inevitable
circumstance. If we make everything else general enough, we will be able
to meet their demands quicker, saving the world and bringing a new age of
prosperity to humankind.
 
  O(N*50) is never going to make me happy.
 
 O(1) should make you happy. It's got a small fixed upper bound.
 Unless, of course, split// and ord get interesting in the face of
 UTF-32 or something and the data is no longer bounded, in which case
 (as I said) your only hope is to change the [] to {}, at which point
 it's probably as fast as the hyper-sexy hash-building-tr///.

A "small" fixed upper bound? It is N that is bounded, that doesn't stop it
from using N*50 variables to represent N, or N*150 variables if I'm only
matching vs 2 characters. Perhaps instead of using O() I should have just
said, "it is 0 to 150 times slower." The overal algorithm, that is, I am
assuming that this list is going to be iterated over. Making this monster
list would add inefficiencies to each step in the algorithm. In any case,
that sollution doesn't seem to work, because of it's reliance on an
arbitrary set of conditions that are smaller than the conditions in the
problem domain. What's the upper bound in a 16bit language? Or does that
case just have to break? "Sorry, you're not European. Please be
assimilated before using this tool. Resistance is futile."

 What's all this about eval?

That was in reference to my previous map example, which is the best
general way I've seen proposed to handle the specified counting in p5.
Ugly as it is, there is hopefully a better way, but not one that is
obvious (to me). But given the changes proposed in RFC 283, it would not
only be easier, it would be more efficient, and fully compatible with
whatever character encodings Perl supports, now and into the future.

Paris Sinclair|4a75737420416e6f74686572
[EMAIL PROTECTED]|205065726c204861636b6572
www.sinclairinternetwork.com




Re: RFC 283 (v1) Ctr/// in array context should return a histogram

2000-09-26 Thread Paris Sinclair

kOn Tue, 26 Sep 2000, Bennett Todd wrote:

  What's the upper bound in a 16bit language? Or does that case just
  have to break? "Sorry, you're not European. Please be assimilated
  before using this tool. Resistance is futile."
 
 Lordie lordie lordie, you're one of the persecuted minority, and
 a brand-waving rioter too. I've clearly stepped on a corn, not to
 mention picked the wrong person to persecute. I'll go speak english
 to other bigots who only speak english, and leave the future of the
 civilized universe in your responsible hands.

That's really ridiculous. How do you know if I'm a minority? Mandarin is
the majority language, and it doesn't use 8bits. Not to say I speak
Mandarin. But, if you have to make assumptions about me to disagree with
my points, then it proves that your argument is flawed. And, if my being
or not being a minority is something that would effect the value of my
position, then you are even more dangerous than I had suspected.

As for a rioter, that is funny. I am not rioting, I am giving arguments in
support of an RFC. Am I "rioting" because I disagree with you?

Paris Sinclair|4a75737420416e6f74686572
[EMAIL PROTECTED]|205065726c204861636b6572
www.sinclairinternetwork.com




Re: RFC 283 (v1) Ctr/// in array context should return a histogram

2000-09-26 Thread Paris Sinclair

Could you please start from the assumption that we're all interested in
supporting the full Unicode space to the greatest degree possible?  None
of us are trying to force an ASCII-only alphabet on anyone (although some
of us are interested in keeping ASCII-only operations fast and efficient
since that's most of what we do).

I will start with no assumption. If my claim that what was said wasn't
compatible unicode and other encodings is false, pointing that out
would be more constructive than telling me to start making assumptions.

  And, if my being or not being a minority is something that would effect
  the value of my position, then you are even more dangerous than I had
  suspected.
 
 Comments like this don't help the discussion any.

Oh, I see, the problem isn't

   you're one of the persecuted minority

after all.

What a bunch of hogwash.

You don't like my comments? That is fine with me. I am only a user, and
you are something-or-other, and so you have the market cornered on the
right to be offended.

But as soon as a person labels me a minority, and implies that because I
have been labeled such that I am a rioter, and that my opinions are based
upon this label, then your choices are to filter me, or to listen to me
protest.

Yes, my aggressness is probably annoying to some people. Just like,
passive-aggressive sarcasm is annoying to me. I am sorry that this is
case.

Anyhow, I will not bother you anymore.




RE: PERL6STORM - tchrist's brainstorm list for perl6

2000-09-22 Thread Paris Sinclair

   while (FH) {
   s/^M$//;
   # Process $_
   }

Cute psuedocode.

I don't like CRLF at all, it makes me feel like I'm dealing with a
typewritter. But, giving multiple values to $/ seems more painful to me
that to just

tr/\r//d; 

on any suspected M$ strings. I guess not always M$... the chess server I
have to deal with likes to spit out that trash, and it's unix based...

but of course this is covered by RFC 69.

what frightens me is the potential here to make things a lot worse. It's
bad enough I have to translate out the "carriage returns" by hand, but I
don't want to have to start worrying about when I need to add them back in
when I didn't want to take them out in the first place.

Paris Sinclair|4a75737420416e6f74686572
[EMAIL PROTECTED]|205065726c204861636b6572
www.sinclairinternetwork.com




Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-21 Thread Paris Sinclair

On Thu, 21 Sep 2000, Tom Christiansen wrote:

 A null is a null byte, or a null character.  Period.  You are
 completely out of your mind if you expect to co-opt an extant term
 for this screwed up notion of yours.  I place my faith in Larry 
 not to fuck up the language with your insanity.
 
 --tom

I've got your null right here...

:0
* ^From: Tom Christiansen
/dev/null

Can't we all just play nice?

Paris Sinclair|4a75737420416e6f74686572
[EMAIL PROTECTED]|205065726c204861636b6572
www.sinclairinternetwork.com




Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Paris Sinclair

All this talk about adding another undef, called null, that is just a
different logical and semantical version of "not defined," or "not
known," or however you want to say it, strikes me as very odd.

I admit I am new enough to Perl that 5 was my first version, but still...
it seems better to make the new things we add consistent with the Perlish
ways, than to make the new behaviors mimic other languages. If you are
saying that it is needed to help give clarity to users of that language,
in their early stages of migrating to Perl, that is one thing. But, there
are lots of changes to basic behavior that would assist the serial drivers
I've written in Perl, but they would be awful things to add into the core.
Better is to make a module around the special cases that I want to
simplify. undef is wonderful; undef is great! All hail the great undef! If
you need additional semantics than provided by undef, why not make a
module? CPAN is the biggest strength of Perl, I don't think it would be
good use to start dumping our special cases into the core. Can't we make a
tool to make tools, instead of just making another SQL?

use MyModule qw( null some_odd_combination_of_behaviors );
my $name = null();
print "Hello world!" if some_odd_combination_of_behaviors();

or,

use MyModule;
my $obj = new MyModule;
print "Hello world!" if $obj-unknown();

I understand the differences between SQL NULL and Perl undef, I just don't
understand what defaco general problem is solved with adding it.

Paris Sinclair|4a75737420416e6f74686572
[EMAIL PROTECTED]|205065726c204861636b6572
www.sinclairinternetwork.com