Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Ken Fox
Damian Conway wrote:

Larry Wall wrote:

That suggests to me that the circumlocution could be *. 

A five character multiple symbol??? I guess that's the penalty for not
upgrading to something that can handle unicode.


Unless this is subtle humor, the Huffman encoding idea is getting
seriously out of hand. That 5 char ASCII sequence is *identically*
encoded when read by the human eye. Humans can probably type the 5
char sequence faster too. How does Unicode win here?

I know I'm just another sample point in a sea of samples, but
my embedded symbol parser seems optimized for alphabetic symbols.
The cool non-alphabetic Unicode symbols are beautiful to look at,
but they don't help me read or write faster. There are rare
exceptions (like grouping) where I strongly prefer non-alphabetics,
but otherwise alphabetics help me get past the what is this code?
phase and into the what does this code do? phase as quickly as
possible.

(I just noticed that all the non-alphabetic symbols (except '?')
in the previous paragraph are used for grouping. Weird.)

- Ken




RE: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Garrett Goebel
Ken Fox wrote:
 Damian Conway wrote:
  Larry Wall wrote:
  That suggests to me that the circumlocution could be *. 
  
  A five character multiple symbol??? I guess that's the 
  penalty for not upgrading to something that can handle
  unicode.
 
 Unless this is subtle humor, the Huffman encoding idea is
 getting seriously out of hand. That 5 char ASCII sequence
 is *identically* encoded when read by the human eye. Humans
 can probably type the 5 char sequence faster too. How does
 Unicode win here?
 
 I know I'm just another sample point in a sea of samples

Can't we have our cake and eat it too? Give ASCII digraph or trigraph
alternatives for the incoming tide of Perl6 Unicode?

Allow both * and »*«?

Or something similar '*', [*], etc...

--
Garrett Goebel
IS Development Specialist

ScriptPro   Direct: 913.403.5261
5828 Reeds Road   Main: 913.384.1008
Mission, KS 66202  Fax: 913.384.2180
www.scriptpro.com  [EMAIL PROTECTED]



RE: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Brent Dax
Garrett Goebel:
# Ken Fox wrote:
#  Unless this is subtle humor, the Huffman encoding idea is getting 
#  seriously out of hand. That 5 char ASCII sequence is *identically* 
#  encoded when read by the human eye. Humans can probably type the 5 
#  char sequence faster too. How does Unicode win here?
# 
# Can't we have our cake and eat it too? Give ASCII digraph or 
# trigraph alternatives for the incoming tide of Perl6 Unicode?

The Unicode version is more typing than the non-Unicode version, so
what's the advantage?  It's prettier?

--Brent Dax [EMAIL PROTECTED]
@roles=map {Parrot $_} qw(embedding regexen Configure)

Wire telegraph is a kind of a very, very long cat. You pull his tail in
New York and his head is meowing in Los Angeles. And radio operates
exactly the same way. The only difference is that there is no cat.
--Albert Einstein (explaining radio)




How to set your Windows keyboard to ¶erl-mode

2002-11-04 Thread Austin Hastings
This  ¶  is a pilchrow, which shows up for me as one of those
paragraph-sign looking backwards P's with two vertical bars. Sorry if
it doesn't come out for you.

--- Brent Dax [EMAIL PROTECTED] wrote:
 
 The Unicode version is more typing than the non-Unicode version, so
 what's the advantage?  It's prettier?

If you're a MAC user, you already know from the myriad responses
pointing at option-whatever how to generate the sequences.

If you're a PC/DOS user, you deserve whatever you get. But your
text-editor may support macros. I used Multi-Edit when I was DOSsing,
and it did. Ask your manufacturer.

If you're using OS/2, I'm sorry. The 850 codepage supports the
characters « and », but I don't know how to help you generate them
other than alt-123

If you're using X, and you don't already know how to generate these
characters, RTFM: xmodmap. If you can't make it work, your license to
use Linux will be revoked. Nobody who can't use xmodmap should be
allowed to own a keyboard.

If you're using Windows, specific reference is made to the following
URL:

http://support.microsoft.com/default.aspx?scid=kb;en-us;Q306560

I quote (using alt-[ and alt-], which now do « and » for me,
respectively):

««
Adding the United States-International Keyboard Layout

To add the United States-International keyboard layout, follow these
steps: 

Click Start, and then click Control Panel.

Under Pick a category, click Date, Time, Language, and Regional
Options.

The Regional and Language Options dialog box appears.

On the Languages tab, click Details.

The Text Services and Input Languages dialog box appears.

Under Installed services, click Add.

The Add Input language dialog box appears.

In the Input language list, click the language that you want. For
example, English (United States).

NOTE: When you use the United States-International keyboard layout, you
should also use an English language setting.

In the Keyboard layout/IME list, click United States-International, and
then click OK.

In the Select one of the installed input languages to use when you
start your computer list, click Language name - United
States-International (where Language name is the language that you
selected in step 6), and then click OK.

In the Regional and Language Options dialog box, click OK.

Notice that the Language bar appears on the taskbar. When you position
the mouse pointer over it, a ToolTip appears that describes the active
keyboard layout. For example, United States-International.

Click the Language bar, and then click United States-International on
the shortcut menu that appears.

The United States-International keyboard layout is selected.
»»

NOW:

At this point, Meestaire ISO-phobic Amairecain Programmaire, you have
achieved keyboard parity with the average Swiss six-year-old child.
Don't let this happen again. We can't afford a keyboard-gap!

Yes, living below the earth in salt mines, with a properly chosen ratio
of nubile females to fertile males, say ten for every one, it will be
possible to ...

erk. sorry. (*)

The URL goes on to list all the cool keys you can generate, but two
things:

1- You've probably got a little [EN] icon in your taskbar. Make sure to
switch to the international flavor (either using leftALT-Shift in a
window, or by mousing the [EN] taskbar icon) before pounding away at
your new ¶erlified keyboard.

2- The IME only uses rightALT for key composition. Those of you whose
right thumbs have been amputated in freak mine accidents will have to
pursue more accessibility features.

=áµßþíñ (International Man of Mystery)

* -- Speaking of Dr. Strangelove, happy 50th birthday to Mike.

November 1, 1952, saw the detonation of Mike, the worlds first
hydrogen bomb. Elugelab, the Pacific island on which Mike exploded,
was erased by the blast. When told that Elugelab was 'missing',
America's president-elect, Dwight Eisenhower, visibly paled.
(Economist)

Here's to 50 years of unemployment. So far.




__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Michael Lazzaro

On Monday, November 4, 2002, at 08:55  AM, Brent Dax wrote:

# Can't we have our cake and eat it too? Give ASCII digraph or
# trigraph alternatives for the incoming tide of Perl6 Unicode?

The Unicode version is more typing than the non-Unicode version, so
what's the advantage?  It's prettier?


Well, yes!  :-)... but also because they are unique characters compared 
to all the other existing prefix/postfix/binary/quotelike operators, so 
there pretty much zero chance of ambiguity.  Using just a few Unicode 
symbols would seriously open up the range of possible sensible 
operators, without causing the kind of mind-numbing ambiguities and 
subtle no-not-this-I-mean-that we've seen in the whole xor/hyper 
discussions.

UTF-8 «op» representations have the advantage of trivially not 
conflicting with _any_ existing operators, and being visually distinct 
from all of them.  There may be a few other things in 
easy-to-find-and-type Latin1, like one or two of these:

• ≈ ∫ ∆ ® © § ∑ Ω ∆ ¶ ‡ ± ˇ ¿

That could maybe fill in for ';' in the cases where ';' has been given 
a sneaky meaning, or represent some infrequent but terrifically useful 
unary or binary op, etc.

C'mon, everybody's doing it!  First one's free, kid...  ;-)

MikeL



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Matthew Zimmerman
On Sun, Nov 03, 2002 at 09:41:44AM -, Rafael Garcia-Suarez wrote:
 Matthew Zimmerman wrote in perl.perl6.language :
 
  So let me make my original question a little more
  general: are Perl 6 source files encoded in Latin-1,
  UTF-8, or will Perl 6 provide some sort of translation
  mechanism, like specifying the charset on the command
  line?

 I expect probably something similar to Perl 5's encoding
 pragma. (But hopefully lexically scoped.)

Okay, but what will the default be? UTF-8? iso-8859-1? My
current locale? Am I going to have put

use encoding 'utf8';   # or whatever the P6 syntax will be

at the beginning of every program that might get distributed
outside of my home country to make sure it'll run? Are we 
going to tell newbies to make sure they have '-w' and 'use
strict' *and* 'use encoding' at the beginning of their programs?

I'm just worried about the possibility of writing Perl 6
programs and then sending them to friends in other parts
of the world and having them fail in subtle ways because
my Perl 6 expects 0xAB and theirs expects 0xC2AB (or visa
versa). Or if I post a code sample to CLPM that runs on
my machine that doesn't compile from the posting because
my news client automatically converts charsets. 

Undoubtedly the Perl 6 parser will be smart enough to
figure out all of this, and I'm making a mountain out of a
molehill. But I just want to make sure that one of the
people in authority here either is or will be thinking
about this.
-- 
  Matt

  Matthew Zimmerman
  Interdisciplinary Biophysics, University of Virginia
  http://www.people.virginia.edu/~mdz4c/



Re: vectorization (union and intersection operators)

2002-11-04 Thread Ed Peschko
 I'm probably opening up a whole new can of worms here, but if we said
 that the following were both vector operators:

   ^ == intersection operator
   v == union operator

 then these could have potentially useful meanings on their *own* as set
 operators, as well as modifying other operators. For example:

 a = (1,2,3);
 b = (4,1,3);

 a = a ^ b; # a = (1,3);
 a = a v b; # a = (1,2,3,4);

Or is a = (1,2,3,4,1,3) ?

No... I'm assuming two things:

1) that v (union) uniqifies the elements in its array, as does ^.
2) that the v (union) modifier includes elements that exist in either
   the first set or the second set, and that ^ includes elements that exist
   in both sets.

These are pragmatic operators; I've done this type of thing *exceedingly* often.
The behaviour you describe is the concatenation of the two sets. I can get that
for free by saying push(a, b), or a = (a, b);

And since admittedly they are pragmatic, people are welcome to overload them.
But right now as it stands 'a v= b' is nonsense; it might as well be put 
to work.

Ed

( 
  ps - as an aside, are the apocalypses going to be backdated as changes to the
  design come up? Or are the apocalypses just a first draft for more enduring
  documentation?
)



Re: vectorization (union and intersection operators)

2002-11-04 Thread Damian Conway
Ed Peschko asked:


  ps - as an aside, are the apocalypses going to be backdated as changes to the
  design come up? 

Yes.



  Or are the apocalypses just a first draft for more enduring
  documentation?


Yes.

;-)

Damian




Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Larry Wall
On Mon, Nov 04, 2002 at 10:19:55AM -0800, Michael Lazzaro wrote:
 UTF-8 «op» representations have the advantage of trivially not 
 conflicting with _any_ existing operators, and being visually distinct 
 from all of them.  There may be a few other things in 
 easy-to-find-and-type Latin1, like one or two of these:
 
 • ≈ ∫ ∆ ® © § ∑ Ω ∆ ¶ ‡ ± ˇ ¿

I've actually got my eye on ≈ (U+2248 ALMOST EQUAL TO) as a
replacement for ~~ someday in the distant future.

I suppose it could be argued that we should use ≅ (U+2245
APPROXIMATELY EQUAL TO) instead.  That's what =~ was supposed to
represent, after all...

 That could maybe fill in for ';' in the cases where ';' has been given 
 a sneaky meaning, or represent some infrequent but terrifically useful 
 unary or binary op, etc.

You know, separate streams in a for loop are not going to be that
common in practic, so maybe we should look around a little harder for
a supercomma that isn't a semicolon.  Now *that* would be a big step
in reducing ambiguity...

Even if we limit ourselves to Latin1 for now, there's things like
the broken pipe ¦ and logical not ¬ and such that look useful.
I'd avoid using standard signs like multiply × and divide ÷ for
non-standard purposes though.  (Not that we can exactly use multiply
even for its standard purpose--there's an awfully heavy resemblance
between × and x, at least in the typical sans serif font.)

It would be really funny to use cent ¢, pound £, or yen ¥ as a sigil, though...

 C'mon, everybody's doing it!  First one's free, kid...  ;-)

People who believe slippery slope arguments should never go skiing.

On the other hand, even the useful slippery slopes have beginner
slopes.  I think one advantage of using Unicode for advanced features
is that it *looks* scary.  So in general we should try to keep the
basic features in ASCII, and only use Unicode where there be dragons.

It will certainly be possible to write APL in Perl, but if you do,
you'll get what you deserve.

In fact, the problem with APL is not that it's possible to write APL
in it, but that it is impossible not to...  :-)

Larry



What is the order of evaluation for separate streams in a loop?

2002-11-04 Thread Austin Hastings
Something from [EMAIL PROTECTED] about the relative frequency made me
wonder:

What's the order of evaluation or nestedness for separate streams
in a for loop?

That is, can I meaningfully say:

for my $i; $j - 0 .. @array.length - 1; $i + 1 .. @array.length
{
 ..
}

And get the equivalent of two nested loops?

=Austin

__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Larry Wall
On Mon, Nov 04, 2002 at 11:27:16AM -0800, Austin Hastings wrote:
 --- Matthew Zimmerman [EMAIL PROTECTED] wrote:
  On Sun, Nov 03, 2002 at 09:41:44AM -, Rafael Garcia-Suarez wrote:
   Matthew Zimmerman wrote in perl.perl6.language :
   
So let me make my original question a little more
general: are Perl 6 source files encoded in Latin-1,
UTF-8, or will Perl 6 provide some sort of translation
mechanism, like specifying the charset on the command
line?
  
   I expect probably something similar to Perl 5's encoding
   pragma. (But hopefully lexically scoped.)
  
  Okay, but what will the default be? UTF-8? iso-8859-1? My
  current locale? Am I going to have put
  
  use encoding 'utf8';   # or whatever the P6 syntax will be
  
  at the beginning of every program that might get distributed
  outside of my home country to make sure it'll run?
 
 8859-1 will be the default.

Actually, Unicode will be the default.  8859-1 can probably also be
handled without declaration.

 If you want trigraph support, you'll have to put 
 
 use encoding 'ugly-american';
 
 at the top of your files. ;-) ;-) ;-)
 
 Otherwise, it'll be one-character ?fancyops? all the way.

Mmm, I view one-character Unicode operators as more of an escape hatch
for the future, not as something to be made mandatory.  But then,
I'm one of those ugly Americans.

Of course, I also think I'm allowed to be a little inconsistent in
forcing things like »op« on people.  After all, there's gotta be
some advantage to being the Fearless Leader...

Larry



Re: What is the order of evaluation for separate streams in a loop?

2002-11-04 Thread Luke Palmer
 Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
 Date: Mon, 4 Nov 2002 12:09:12 -0800 (PST)
 From: Austin Hastings [EMAIL PROTECTED]
 Reply-To: [EMAIL PROTECTED]
 X-SMTPD: qpsmtpd/0.12, http://develooper.com/code/qpsmtpd/
 
 Something from [EMAIL PROTECTED] about the relative frequency made me
 wonder:
 
 What's the order of evaluation or nestedness for separate streams
 in a for loop?
 
 That is, can I meaningfully say:
 
 for my $i; $j - 0 .. @array.length - 1; $i + 1 .. @array.length
 {
  ..
 }

I don't know where to correct you first... :)

I'll start by saying your variables are on the wrong side of the
pointy sub.  Also, presuming you switched the order, that Cmy
shouldn't be there and would be an error.  Also, you don't need those
C.lengths, but I guess they don't hurt.

Finally, multi-stream Cfor iterates Iin parallel.  So:

  for 0..@array-1; $i+1..@array - $i; $j {
  ...
  }

Would.. um..  I think that's an error.  Or, it is if Cuse strict is
on.   Otherwise it would use the undefined C$i (unless, of course,
it was defined in an enclosing lexical scope).

Luke



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings
--- [EMAIL PROTECTED], UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR.
wrote:

 Mmm, I view one-character Unicode operators as more of an escape
 hatch
 for the future, not as something to be made mandatory.  But then,
 I'm one of those ugly Americans.

EBCDIC didn't support brackets, originally, so ANSI included trigraphs
called ??( and ??) for [ and ], respectively.

But the fact of the matter is that about epsilon (which is to say,
really close to zero) people wrote trigraphs.

So, yeah, include trigraph sequences if it will make happy the people
on the list who can't be bothered to read the documentation for their
own keyboard IO system.

But don't expect the rest of us to use them.

In short:

1- « and » are really useful in my context.
2- I can make my work environment generate them in one (modified)
keystroke.
3- I can make my home environment do likewise.
4- The ascii-only version isn't faster and easier, nor more morally
pure.
5- There is no differently keyboard abled market out there which has
engaged my sympathy, ascii-operator wise.

Ergo,

6- my @a = @b «+» @c;

 Of course, I also think I'm allowed to be a little inconsistent in
 forcing things like »op« on people.  After all, there's gotta be
 some advantage to being the Fearless Leader...

Which kind of begs the question: Who are you? And can you authenticate
that which you just implicitly claimed? (See quote header, above, if
you don't understand my question)

 
 Larry

=Austin


__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/



Re: What is the order of evaluation for separate streams in a loop?

2002-11-04 Thread Austin Hastings

--- Luke Palmer [EMAIL PROTECTED] wrote:
 I don't know where to correct you first... :)

Is that a dunce-hat? Is there an ISO version I could use instead? :-

 I'll start by saying your variables are on the wrong side of the
 pointy sub.  Also, presuming you switched the order, that Cmy
 shouldn't be there and would be an error.  Also, you don't need those
 C.lengths, but I guess they don't hurt.

Cost of switching languages in midstream. My bad.

 Finally, multi-stream Cfor iterates Iin parallel.  So:
 
   for 0..@array-1; $i+1..@array - $i; $j {
   ...
   }
 

Wow. I knew that. Duh. Nevermind.

=Austin


__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Mark J. Reed
On 2002-11-04 at 12:26:56, Austin Hastings wrote:
 1- ? and ? are really useful in my context.
Okay.  Now can you get your mailer to send them properly? :)




Re: How to set your Windows keyboard to ¶erl-mode

2002-11-04 Thread Ken Fox
Austin Hastings wrote:


At this point, Meestaire ISO-phobic Amairecain Programmaire, you have
achieved keyboard parity with the average Swiss six-year-old child.


The question is not about being ISO-phobic or pro-English. **

The question is whether we want a pictographic language. I like
the size of the English alphabet. It produces fairly short words,
but the words are very robust (people can read words in all
orientations, backwards, upside down, in crazy fonts, hand-written,
etc.) This is the opposite of Huffman encoding, but just
as useful IMHO.

I've had the unpleasant job of turning math into software. Hand
written formulae can be very difficult to read because mathematics
worships Huffman encoding. Multiplication is specified by *nothing*.
Exponents are just written a bit smaller and a bit raised. Is this
what we want in the core?

Does anyone have any references for reading and comprehension
rates for different types of languages? I'm ignorant on the subject
and this seems like something a Perl programmer should know.

- Ken

** I'm probably both. ISO-phobic because I actually represented my
company on an ISO standard committee. Pro-English because it's what
I use -- being pro-English doesn't make me against everything else.
A language would have to be pretty bad to have its native speakers
advocate something else!




Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Me
 After all, there's gotta be some advantage to
 being the Fearless Leader...
 
 Larry

Thousands will cry for the blood of the Perl 6
design team. As Leader, you can draw their ire.
Because you are Fearless, you won't mind...

--
ralph



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Damian Conway
Ken Fox wrote:


I know I'm just another sample point in a sea of samples, but
my embedded symbol parser seems optimized for alphabetic symbols.
The cool non-alphabetic Unicode symbols are beautiful to look at,
but they don't help me read or write faster.


Once again: we're only talking about « and ».



There are rare exceptions (like grouping) 

E.g. « and »

;-)



where I strongly prefer non-alphabetics,
but otherwise alphabetics help me get past the what is this code?
phase and into the what does this code do? phase as quickly as
possible.


Interestingly, I find it just the opposite. The use of symbolic
operators makes it easier for me to differentiate the nouns,
verbs, and punctuation of a piece of code.

Damian




Re: How to set your Windows keyboard to ¶erl-mode

2002-11-04 Thread Austin Hastings

--- Ken Fox [EMAIL PROTECTED] wrote:
 Austin Hastings wrote:
 
 The question is not about being ISO-phobic or pro-English. **

The two gripes I've heard have been:

1- It's hard to type.
2- I don't know how to type it on platform X.

With combo gripe It'll be hard to remember how to type it across
multiple platforms X, Y, Z, etc. coming in third.

So I solved that problem. I know it's easy to type on Mac, I know how
to MAKE it easy to type on WinPC, and I know how to MAKE it easy to
type on an X terminal. In all cases, [OPTION] or [ALT] plus some
matching set of punctuation [(slashes) or (brackets)].

Now it's easy to type (easier, for me at least, than typing two
backticks, since the modifier level is the same and the hand-contortion
on a PC type keyboard [with ` and ~ in the top left corner] is much
lower), and not too difficult to remember, even across N platforms.

So I'll treat your objection, below, as a new one.
 
 The question is whether we want a pictographic language. I like
 the size of the English alphabet. It produces fairly short words,
 but the words are very robust (people can read words in all
 orientations, backwards, upside down, in crazy fonts, hand-written,
 etc.) This is the opposite of Huffman encoding, but just
 as useful IMHO.

The  and  (rendered thus for Mr. Reed) are just as pictographic (or
not) as [ and ]. They look the same from top or bottom, and are
unmistakable in direction when looked at from either side. Likewise,
they are probably MORE clear, as has been mentioned, than the
difference between ' (apostrophe) and ` (tick) in many standard fonts,
especially the variable-width variety sometimes invoked for 8-bit
messages.

But in this context, we've got a pair of balanced, unmistakable
characters which have no other uses (compare, say, %hash and $a %= $b;
same character '%', different usages) being proposed to serve as the
marker for a new class of operation.

 ...
 Exponents are just written a bit smaller and a bit raised. Is this
 what we want in the core?

If every keyboard and operating system had the ability to simply
generate arbitrary expressions of the form (expr-a) ** (expr-b), ad
infinitum (a ** b ** c ** d ** e) then we'd be remiss not to use it.
But they can't, so we don't.

 ...
 ** I'm probably both. ISO-phobic because I actually represented my
 company on an ISO standard committee. 

You have my sympathy.

=Austin


__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Damian Conway
Garrett Goebel wrote:


Can't we have our cake and eat it too? Give ASCII digraph or trigraph 
alternatives for the incoming tide of Perl6 Unicode?

Allow both * and »*«?

I'd really prefer we didn't. I'd much rather keep  and  for other
things.



Or something similar '*', [*], etc...


Much as I hate the notion of di- and trigraphs, this is a possibility.

Though I'd much rather we just allowed POD escapes (e.g. Elaquo and Eraquo)
in code.

And, yes, I'm aware that makes Elaquo*Eraquo incredibly ugly.
I'm rather *counting* on it, in fact ;-)

Damian




Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Me
 people on the list who can't be bothered to read
 the documentation for their own keyboard IO system.

Most of this discussion seems to focus on keyboarding.
But that's of little consequence. This will always be
spotted before it does much harm and will affect just
one person and their software at a time.

Errors in encoding during transmission is a whole lot
more problematic. This will almost always be spotted
after the fact, and may affect many people at a time
and require fixes to multiple systems not controlled
by the sender or receiver.

--
ralph



utf and ebcdic

2002-11-04 Thread Brian Ingerson
On 04/11/02 12:12 -0800, [EMAIL PROTECTED] 
aka Larry Wall wrote:
  If you want trigraph support, you'll have to put 
  
  use encoding 'ugly-american';
  
  at the top of your files. ;-) ;-) ;-)
  
  Otherwise, it'll be one-character ?fancyops? all the way.
 
 Mmm, I view one-character Unicode operators as more of an escape hatch
 for the future, not as something to be made mandatory.  But then,
 I'm one of those ugly Americans.
 
 Of course, I also think I'm allowed to be a little inconsistent in
 forcing things like »op« on people.  After all, there's gotta be
 some advantage to being the Fearless Leader...

On one hand I really respect your fearlessness to go where no language
author has gone before. No matter what happens, I pretty sure you'll be
remembered for it. ;)

On the other hand I'm wondering what happens to the ebcdic platforms and
the like. Will it even work to have core modules written in non ascii
and expect them to translate to ebcdic? I suppose you'll have to convert
them to trigraphs as part of the installation. Just wondering if you've
thought through the support issues for platforms that by their
definition won't be using utf ever.

FWIW, ebcdic *does* have the cent sign!

Cheers, Brian



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings

--- Me [EMAIL PROTECTED] wrote:
  people on the list who can't be bothered to read
  the documentation for their own keyboard IO system.
 
 Most of this discussion seems to focus on keyboarding.
 But that's of little consequence. This will always be
 spotted before it does much harm and will affect just
 one person and their software at a time.

Good. Counting Damian, that makes three of us. Welcome aboard, ralph.
:-)

 Errors in encoding during transmission is a whole lot
 more problematic. This will almost always be spotted
 after the fact, and may affect many people at a time
 and require fixes to multiple systems not controlled
 by the sender or receiver.

I disagree (slightly). I get emailed powerpoint files, jpeg images, and
tens of other binary formats every day, and they consistently come
through correctly.

The transmission network is working fine. 

What we've got is an encoding problem at the MUA level. Mark Reed says
my mailer (Yahoo!) tagged a message containing high-bit characters as
US-ASCII. Several people the other day reported on the differences in
UTF8 vs. Latin-1 handling among pine, elm, and other mailers.

There are problems, and this kind of change will create a demand to get
them fixed. Those products that satisfy the demand will survive. The
others won't. Up until now, though, everyone's been lax about making
the encoding stuff strack. But this is a language widely regarded as a
huge player, and when a huge player says You need to take care of
(something), then it gets done. 

Perl6 will do more to address the real technical issues of electronic
communication between Americans and French-speakers than anything else.
(Primarily because Perl hackers want to talk to each other, but no
French-speaker wants to talk to an American ;-)

=Austin




__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/



Re: utf and ebcdic

2002-11-04 Thread Austin Hastings

--- Brian Ingerson [EMAIL PROTECTED] wrote:
 FWIW, ebcdic *does* have the cent sign!

And the not sign. Damian may force us to abandon ASCII entirely...

=Austin


__
Yahoo! - We Remember
9-11: A tribute to the more than 3,000 lives lost
http://dir.remember.yahoo.com/tribute



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Adam D. Lopresto
I'm having trouble this is even being considered.  At all.  And especially for
these operators...

 So, yeah, include trigraph sequences if it will make happy the people
 on the list who can't be bothered to read the documentation for their
 own keyboard IO system.
 
 But don't expect the rest of us to use them.

So you're one of the very few people who bothered to set up unicode, and now
you want to force the rest of us into your own little leet group.  Given the
choice between learning how to reconfigure their keyboard, editor, terminal,
fonts, and everything else, or just not learning perl6, I bet you'd have a LOT
of people who get scared away.  Face it, too many people think perl is
linenoise heavy and random already.

Which brings me to my real question: why these operators?  It's not as if
they're even particularly intuitive for this context.  They're quotes.  They
don't mean vector anything, and never have.  I could almost see if the
characters in question just screamed the function in question (sqrt, not
equals, not, sum, almost anything like that), but these are just sort of
random.

Given how crazy this is all getting, is it absolutely certain that we're better
off not just making vector operations work without modifiers?  I reread the
apocalypse just now, and I don't really see the problem.  The main argument
against seems to be perl5 people expect it to be scalar, but perl5 people
will have to get used to a lot.  I think the operators should just be list
based, and if you want otherwise you can specify scalar:op or convert both
sides to scalars manually (preferably with .length, so it's absolutely clear
what's meant).
-- 
Adam Lopresto ([EMAIL PROTECTED])
http://cec.wustl.edu/~adam/

Who are you and what have you done with reality?
--Jamin Gray



[ANNOUNCE] Perl6 Docs, an initial Chapter.

2002-11-04 Thread Michael Lazzaro
There is a (partial) book-style chapter describing Perl6 values, 
variables, and primitive/promoted types at:

	http://cog.cognitivity.com/perl6/val.html

The entire thing is one page, for easy printing.  It works out to about 
15-20 pages, depending on your printer.  There is *much* more coming 
soon.

This is the expanded version of what I posted earlier: it is much 
more accurate and detailed.  It represents the beginnings of a Chapter 
1 of detailed (tho unofficial) Perl6 documentation.  My working 
approach is to document Perl6 as if Perl5 never existed: in other 
words, don't focus on what's new or changed: just document all 
aspects of Perl6 completely, from the ground up, written for people who 
may not have worked with Perl5 at all.

There are some assumptions in the more detailed descriptions:  
specifically, I have documented explicit radix, bool, and a few other 
things as if they will exist, though they haven't been decided yet.  
That's why I'm writing it -- to get at all the nooks and crannies, and 
make sure we're not including, missing, or assuming anything we 
shouldn't.

There is much more coming (another 100 pages or so to finish up, edit  
post for this chapter and the next):  I'll post the rest as I get it 
polished, but I need some initial feedback on this first part, 
especially:

-- Is the writing style acceptable?  Should it be less formal?  More 
formal?  Geared to dumber people?  Smarter people?  Just right?

-- Are things being presented in the correct order?  Are there any 
obvious gaps where I need more explanation?

-- Syntax and accuracy goofs, obviously.

I realize the first page or two are pretty dull, because they're more 
abstract, and stuff so basic that it's hard to explain properly at 
all.  But it assuredly needs some edits, so I will be indebted to 
anyone who slogs through it.  :-)  The farther you go, the more 
interesting/fun/scary it gets.

I can't wait to show the beginnings of an Operators chapter.  :-)

MikeL



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Paul Johnson
On Mon, Nov 04, 2002 at 12:26:56PM -0800, Austin Hastings wrote:

 In short:
 
 1- ? and ? are really useful in my context.
 2- I can make my work environment generate them in one (modified)
 keystroke.
 3- I can make my home environment do likewise.
 4- The ascii-only version isn't faster and easier, nor more morally
 pure.
 5- There is no differently keyboard abled market out there which has
 engaged my sympathy, ascii-operator wise.
 
 Ergo,
 
 6- my @a = @b ?+? @c;

It's a great argument.  I know how to type funny characters too.  I
can even read some of the ones some people send.  Just don't expect me
to be able to understand any Perl 6 you mail me.  Whether the problem is
at your end, my end or somewhere in the middle is moot.

On the other hand, maybe all these issues will be sorted out before we
can start writing Perl 6 in earnest.  In one way I hope that is true.
In another I hope it isn't ;-)

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net



Re: Unifying invocant and topic naming syntax

2002-11-04 Thread Allison Randal
On Sun, Nov 03, 2002 at 11:17:32PM -0600, Me wrote:
 
 I started with a simple thought:
 
 is given($foo)
 
 seems to jar with
 
 given $foo { ... }
 
 One pulls in the topic from outside and
 calls it $foo, the other does the reverse --
 it pulls in $foo from the outside and makes
 it the topic.

It comes from is using 'given' as a noun (meaning the same thing as
'topic') instead of a verb. So, this:

  given $foo { ... }

means make $foo become the given or Given-ify $foo. While:

  is given($foo)

means $foo takes the value of the given.

There may be room for a better parameter name. We considered quite a few
before picking this one, though, and I'm pretty happy with it for now.

 On its own this was no big deal, but it got
 me thinking.
 
 The key thing I realized was that (naming)
 the invocant of a method involves something
 very like (naming) the topic of a method,
 and ultimately a sub and other constructs.

The similarity is that both are implicit parameters, i.e. they're
accessible to the sub/method but aren't explicitly passed when it's
called. They're not quite the same though, as the Cis given parameter
is entirely out-of-band (in P5 terms, it wouldn't appear in _).

Also, both may be the topic under certain circumstances. But then, any
variable can be the topic.

Generally, there's no conceptual link between the invocant of a method
and the topic in the caller's scope. 

 Thus it seems that whatever syntax you pick
 for the former is likely to work well for
 the latter.
 
 Afaik, the syntax for invocant naming is:
 
 method f ($self : $a, $b) { ... }
 
 But whatever it is, I think one can build
 on it for topic transfer / naming too in a
 wide range of contexts.
 
 With apologies for talking about Larry's
 colon, something that really does sound
 like it is taboo for good reason, I'll
 assume the above invocant naming syntax
 for the rest of this email.
 
 So, perhaps:
 
 sub f ($a, $b) is given($c) { ... }
 sub f ($a, $b) is given($c is topic) { ... }
 sub f ($a, $b) is given($_) { ... }
 
 could be something like:
 
 sub f ($c : $a is topic, $b) { ... }
 sub f ($c : $a, $b) { ... }
 sub f ($_ : $a, $b) { ... }
 
 where the first arg to be mentioned is the
 topic unless otherwise specified.
 
 (The first line of the alternates is not
 semantically the same as the line it is a
 suggested replacement for, in that the
 current scheme would not set the topic --
 its value would be the value of $_ in
 the lexical block surrounding the sub
 definition. It's not obvious to me why
 the current scheme has it that way and
 what would best be done about it in the
 new scheme I suggest, so I'll just move on.)

Even though both features have something to do with topic, they're
really independent. There are times when it's useful to access the
caller's topic without setting the current topic and times when it's
useful to just set the current topic.

  sub f ($a, $b) { ... } # use neither feature
  sub f ($a is topic, $b) { ... } # topic setting
  sub f ($a, $b) is given($c) { ... } # access caller's topic 
  sub f ($a, $b) is given($c is topic) { ... } # combine

When you have a system with two independent but interacting features,
it's far more efficient to define two independent flags than to define 4
flags to represent the 4 combinations. It's also easier to learn.

 The obvious (to me) thing to do for methods
 is to have /two/ colon separated prefixes of
 the arg list. So one ends up with either one,
 two, or three sections of the arg list:
 
 # $_ is invocant:
 method f ($a, $b) { ... }
 
 # $_ and $self are both invocant:
 method f ($self : $a, $b) { ... }
 
 # $_/$self are invocant, $c caller's topic
 method f ($self : $c : $a, $b) { ... }

Any two constructs with the same syntax but an entirely different
meaning exponentially increase the chance of confusion. Confusion
increases the likelyhood of bugs. Not to mention frustrating the
programmer. 

 One question is what happens if one writes:
 
 method f (: $c : $a, $b) { ... }
 
 Is the invocant the topic, or $c, ie what
 does a missing invocant field signify?

The invocant would be the topic still. It is now with:

method f (: $a, $b) { ... }

 Jumping to a different topic for one moment,
 I think it would be nice to provide some
 punctuation instead of (or as an alternate
 to) a property for setting 'is topic'. Maybe:
 
 method f ($self : $c : $a*, $b) { ... }
 
 or maybe something like:
 
 method f ($self : $c : $aT, $b) { ... }
 
 (Unicode TM for Topic Marker? Apologies if I
 screwed up and the TM character comes through
 as something else.)

Huffman encoding comes to mind. Is this really common enough to merit a
single punctuation character?

 Anyhow, a further plausible tweak that builds
 on the above colon stuff is that one could
 plausibly do:
 
 sub f ($bar : $a, $b) {
 ...
 }
 
 and then call f() like so:
 
 f (20 : 30, 40)
 
 as a 

Re: How to set your Windows keyboard to ¶erl-mode

2002-11-04 Thread Simon Cozens
[EMAIL PROTECTED] (Ken Fox) writes:
 The question is whether we want a pictographic language. 

So far we've managed to avoid turning Perl into APL.  :-)
 -- Larry Wall in [EMAIL PROTECTED]

Although that was some time ago... :)

-- 
The FSF is not overly concerned about security.  - FSF



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Rafael Garcia-Suarez
Austin Hastings wrote in perl.perl6.language :
 
 What we've got is an encoding problem at the MUA level. Mark Reed says
 my mailer (Yahoo!) tagged a message containing high-bit characters as
 US-ASCII. Several people the other day reported on the differences in
 UTF8 vs. Latin-1 handling among pine, elm, and other mailers.

Not only the MUA level. Usually source code is written in a lowest
common denominator of ascii, even for languages that allow unicode
identifiers (Java) or markup. That's because source code is handled by
parsers, documentation extractors, pretty printers, diff(1), patch(1),
version control software, and (you said it) various internet clients.

That's why some people may still prefer to continue using pure ascii
even though then think that unicode operators are cool. (Esp. if they
are under the influence of FUD : use PHP ! it's ascii compliant !)

 Perl6 will do more to address the real technical issues of electronic
 communication between Americans and French-speakers than anything else.
 (Primarily because Perl hackers want to talk to each other, but no
 French-speaker wants to talk to an American ;-)

You're Italian, aren't you ?



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Simon Cozens
[EMAIL PROTECTED] (Damian Conway) writes:
  Or something similar '*', [*], etc...
 
 Much as I hate the notion of di- and trigraphs, this is a possibility.

I do like this too, because it reminds me of C trigraphs, which had precisely
the same purpose - allow people with old-fashioned sub-standard character
sets to come and play with the big boys. And eventually, the old trigraphs
died out because everyone caught up with the decent (for the era) character
sets.

That's assuming we have to have Unicode operators.

I would, however, like to hear a passionate argument in favour of
this, because we've seen plenty of arguments against (encoding, transmission,
keyboarding, etc.) but not all that many in favour, so a nice definitive one
would be helpful.

-- 
evilPetey I often think I'd get better throughput yelling at the modem.



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings

--- Rafael Garcia-Suarez [EMAIL PROTECTED] wrote:
 Austin Hastings wrote in perl.perl6.language :
  
  What we've got is an encoding problem at the MUA level. Mark Reed
 says
  my mailer (Yahoo!) tagged a message containing high-bit characters
 as
  US-ASCII. Several people the other day reported on the differences
 in
  UTF8 vs. Latin-1 handling among pine, elm, and other mailers.
 
 Not only the MUA level. Usually source code is written in a lowest
 common denominator of ascii, even for languages that allow unicode
 identifiers (Java) or markup. That's because source code is handled
 by
 parsers, documentation extractors, pretty printers, diff(1),
 patch(1),
 version control software, and (you said it) various internet clients.

 That's why some people may still prefer to continue using pure ascii
 even though then think that unicode operators are cool. (Esp. if they
 are under the influence of FUD : use PHP ! it's ascii compliant !)

Yeah, but ActiveState does Perl, and Microsoft owns ActiveState, so
we've got the kings of FUD on our side for a change. Joy.

  Perl6 will do more to address the real technical issues of
 electronic
  communication between Americans and French-speakers than anything
 else.
  (Primarily because Perl hackers want to talk to each other, but no
  French-speaker wants to talk to an American ;-)
 
 You're Italian, aren't you ?

Actually, an American who's been ignored in many places. :-)

=Austin


__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Simon Cozens
[EMAIL PROTECTED] (Austin Hastings) writes:
 Yeah, but ActiveState does Perl, and Microsoft owns ActiveState

To what extent are *either* of those statements true? :)

-- 
All the good ones are taken.



Re: How to set your Windows keyboard to ¶erl-mode

2002-11-04 Thread Ken Fox
Austin Hastings wrote:


The  and  ... are just as pictographic (or
not) as [ and ].


I'm not particularly fond of  or  either. ;) Damian just
wrote that he prefers non-alphabetic operators to help
differentiate nouns and verbs. I find it helpful when people
explain their biases like that. What's yours?


 They look the same from top or bottom, and are
unmistakable in direction when looked at from either side.


Well, anything can look like itself, that wasn't the point. The
goal is to not look like anything else in any orientation. The
chars O and 0 fail badly, but A and T are excellent. I'm not
sure where  and  fall because I don't have any experience
with them.

Programming languages probably get away with more because
most programmers don't spray paint algorithms on the side of
a bridge. (Well, Lisp programmers maybe. ;) My three points
against arbitrary punctuation as symbols are
 (1) it's impossible to identify symbol boundaries when
 reading punctuation -- you just have to guess,
 (2) it's harder to work with punctuation in non-digital
 communication, and
 (3) my memory doesn't work well on punctuation symbols!

Perl has some nice features like sigils that clue people in on
how to read a sentence. But...


difference between ' (apostrophe) and ` (tick)


is a horrible abomination. ;)


If every keyboard and operating system had the ability to simply
generate arbitrary expressions of the form (expr-a) ** (expr-b), ad
infinitum (a ** b ** c ** d ** e) then we'd be remiss not to use it.
But they can't, so we don't.


Non sequitur. Written language prior to the printing press had
no technological reason to limit alphabet size. Some languages
developed very large pictographic representations, others
developed small alphabets with word formation rules. I have no
idea what the design pressures were that caused these different
solutions. Do you? What are the strengths and weaknesses of the
approaches? Why should we select one over the other?

- Ken




Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)

2002-11-04 Thread Michael Lazzaro
On Monday, November 4, 2002, at 11:58  AM, Larry Wall wrote:

You know, separate streams in a for loop are not going to be that
common in practic, so maybe we should look around a little harder for
a supercomma that isn't a semicolon.  Now *that* would be a big step
in reducing ambiguity...


Or more than one type of supercomma, e.g:

   for x ¡ò y ¡ò z - $x ¡ò $y ¡ò $z { ... }

to mean:
   for x ; y ; z - $x ; $y ; $z { ... }

- vs -

   for x ¡× y ¡× z - $x ¡× $y ¡× $z { ... }

to mean:

   for x - $x {
 for y - $y {
   for z - $z {
 ...
   }
 }
   }

;-)

MikeL




Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings

--- Simon Cozens [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] (Austin Hastings) writes:
  Yeah, but ActiveState does Perl, and Microsoft owns ActiveState
 
 To what extent are *either* of those statements true? :)

Hmm. Well, last time I checked you could still download a perl binary
from ActiveState.com.

And, in fact, check out the motivation behind the agreement:

http://www.dnjonline.com/newsreel/index.html

Microsoft buys into Perl

Aug 6 - Microsoft has hired ActiveState Tool Corporation to improve the
Windows functionality of the Open Source scripting language Perl. This
agreement reinforces a long-term relationship between Microsoft and
Perl, stemming from 1993 when Microsoft funded the first port of Perl 5
to the Windows platform.

  ActiveState develops and distributes the popular Windows version,
called ActivePerl. Our mission is to make Perl as popular as
possible, said Dick Hardt, chief executive of ActiveState.

  The monetary details of the deal weren't revealed - in fact there
was no mention of it anywhere on the Microsoft web site. Instead the
impetous seems to have come from Microsoft India where the main aim is
to improve Perl's support for non-Roman character-sets through Unicode.

  As part of the agreement, ActiveState will add features
previously missing from Windows ports of Perl, as well as full support
for Unicode - a key feature to users dealing with Asian character sets.

 blah blah blah ...

=Austin


__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings

--- Adam D. Lopresto [EMAIL PROTECTED] wrote:
 I'm having trouble this is even being considered.  At all.  And
 especially for these operators.

Heute vektoren, morgen das welt!

Uniperl, Uniperl uber alles,
Uber alles in der welt!
With hyper-states through choose and true();
Masterfully golf scorin' script,
Von der  bis an die all(),
Von der any() bis an den  -
Uniperl, Uniperl uber alles,
Uber alles in der welt!

 So you're one of the very few people who bothered to set up unicode,
 and now you want to force the rest of us into your own little 
 leet group. 

Nerp. Hadn't given it a second thought until the whining started about
It's so hard... I had actually figured that I'd be able to set a
keystroke in my editor and that would be the end of it. But then, for
no good reason that I can think of, I tried microsoft's help site and
found it in about thirty seconds. No need to set up a keyboard macro --
it's part of the OS.

I did BBS, though not as a warez d00d. It's L33t.

 Given the choice between learning how to reconfigure their keyboard,
 editor, terminal, fonts, and everything else, or just not learning 
 perl6, I bet you'd have a LOT of people who get scared away. 

That sounds a lot like what I said (and to a certain extent still fear)
back when - was first going away.

It didn't work then, either.

 Face it, too many people think perl is linenoise heavy and random 
 already.

Which is why adding a single character with a single meaning that can
be covered in chapter 14 instead of chapter 3 is a workable idea, and
why creating an operator called Jesus, it looks like an ASCII-art
version of a dancing penguin in high heels isn't. Bow-tie operator,
indeed! 

If @a [*=] @b; doesn't scan like rats chewing their way into your
cable, what does?

 Which brings me to my real question: why these operators?  It's not
 as if they're even particularly intuitive for this context.  They're
 quotes.  They don't mean vector anything, and never have.  I could 
 almost see if the characters in question just screamed the function
 in question (sqrt, not equals, not, sum, almost anything like that), 
 but these are just sort of random.

Simple answer: Larry suggested them. And was willing to sacrifice qw
functionality to this.

Also, I suppose, because of the map() suggestion a while back -- this
operation is going to wind up taking a huge range of parameters in
some not-too-distant future.

And @a = @b sub @c; will read a lot better, when sub is 8 lines
long.

 Given how crazy this is all getting, is it absolutely certain that
 we're better
 off not just making vector operations work without modifiers?  I
 reread the
 apocalypse just now, and I don't really see the problem.  The main
 argument
 against seems to be perl5 people expect it to be scalar, but perl5
 people
 will have to get used to a lot.  I think the operators should just be
 list
 based, and if you want otherwise you can specify scalar:op or
 convert both
 sides to scalars manually (preferably with .length, so it's
 absolutely clear
 what's meant).

It's not absolutely certain. But this discussion was destined to
happen, since we're just about out of line noise, but we're nowhere
close to being out of clever ideas.

=Austin


__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Brian Ingerson
On 04/11/02 14:09 -0800, Austin Hastings wrote:
 
 --- Rafael Garcia-Suarez [EMAIL PROTECTED] wrote:
  Austin Hastings wrote in perl.perl6.language :
   
   What we've got is an encoding problem at the MUA level. Mark Reed
  says
   my mailer (Yahoo!) tagged a message containing high-bit characters
  as
   US-ASCII. Several people the other day reported on the differences
  in
   UTF8 vs. Latin-1 handling among pine, elm, and other mailers.
  
  Not only the MUA level. Usually source code is written in a lowest
  common denominator of ascii, even for languages that allow unicode
  identifiers (Java) or markup. That's because source code is handled
  by
  parsers, documentation extractors, pretty printers, diff(1),
  patch(1),
  version control software, and (you said it) various internet clients.
 
  That's why some people may still prefer to continue using pure ascii
  even though then think that unicode operators are cool. (Esp. if they
  are under the influence of FUD : use PHP ! it's ascii compliant !)
 
 Yeah, but ActiveState does Perl, and Microsoft owns ActiveState, so
 we've got the kings of FUD on our side for a change. Joy.

Speaking of FUD, that's simply not true, nor tasteful IMO.

AS has done a handful of short-term contacts for MS, and that's the
extent of their relationship.

FWIW, AS also does as much or more Unix development as Windows
development. They also employ some people who have individually
advanced Perl more than you'll ever know.

   Perl6 will do more to address the real technical issues of
  electronic
   communication between Americans and French-speakers than anything
  else.
   (Primarily because Perl hackers want to talk to each other, but no
   French-speaker wants to talk to an American ;-)
  
  You're Italian, aren't you ?
 
 Actually, an American who's been ignored in many places. :-)
 
 =Austin
 
 
 __
 Do you Yahoo!?
 HotJobs - Search new jobs daily now
 http://hotjobs.yahoo.com/



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Simon Cozens
[EMAIL PROTECTED] (Austin Hastings) writes:
 If @a [*=] @b; doesn't scan like rats chewing their way into your
 cable, what does?

This is why God gave us functions as well as operators.

-- 
I _am_ pragmatic. That which works, works, and theory can go screw
itself.
- Linus Torvalds



Possible Vector Operator Notations

2002-11-04 Thread Smylers
The many recent suggestions for denoting vector operators all seem to
have problems, with some having significant impact elsewhere in the
language.  After reading a few hundred mails on the subject I'm no
longer sure what I prefer, but thought I'd be in a better position to
have an opinion if I at least knew what the options were.

In case anybody else feels similarly, I've listed the main contenders
below together with salient points pertaining to them gleaned from this
list.  (Credit to those who originally made them, and apologies for
omitting attributions from this compendium.)

The suggested syntax is demonstrated with the hypothetical Cop
operator (presumably making it a hyper-hypo op):

  * the as we were option: ^op

» Tilde is used for xor, underscore remains string concatenation.
» Consistent with previously published Perl 6 code samples.
» No character left for eating whitespace.
» No way of distinguishing precedence of vector assignment op.
» Slightly embarrasing to have so much discussion, go round in
  circles a couple of times, and end up right where we started.

  * the xor is a double-sided not option: ^op

» Exclamation mark is used for xor (in its various forms), leaving
  tilde for concatenation and underscore for eating whitespace.
» Vector ops remain consistent with previous code samples.
» No way of distinguishing precedence of vector assignment op.

  * the looks like an array option: [op]

» Seemed a nice idea, but doesn't work with other use of square
  brackets.

  * the guillemot option

» Inconvenient for those who don't live near the sea, and messy even
  for those who do.

  * the guillemet option: «op»

» Looks nice.
» Awkward to type for some people.
» May not be transmitted correctly in mailing lists and similar.
» May not be in the character set used by some people in the world.
» Has distinct characters for vector ops that have no other meaning
  in Perl.
» Uses 'special' looking symbols for 'special' ops.
» Looks really nice.
» Looks really likely to cause confusion in mailing lists.

  * the mélange option: ^[op]

» Xor continues to use caret.
» Overcomes problems with using just caret or square brackets.
» Easy enough to type.
» Looks ugly and/or unbalanced.
» Potential for confusion with overloading symbols used individually
  elsewhere in Perl.

  * the either of the above option: «op» and ^[op]

» Can choose the elegance of the guillemet or the type-ability and
  encoding-neutral convenience of the mélange.
» The non-Ascii characters still cause hassle when they are
  transmitted.
» The two variants don't look anything like each other.
» Everybody will have to learn both versions to be able to cope with
  others' code.

  * the generic extensibility option: «op» and `op`

» Backticks and two-character mnemonics from RFC1345 are used to
  provide an Ascii-only alternative for some symbols.
» The Perl 5 use of backticks has to be provided some other way.
» It's neat to have a consistent way of typing special symbols.
» Possibly overkill if Perl doesn't introduce any standard non-Ascii
  operators other than vector.
» Having more power in cryptic symbols rather than words may lead to
  increased claims of Perl being unreadable.
» Still suffers from needing to be able to view and transmit
  non-Ascii characters.

  * the temelliug option: »op«

» Guillemets the conventional way round are used for qw().
» The same, unusual, symbols are used for two very different
  purposes.
» Still the non-Ascii problems.

  * the how many characters? option:  op

» Bitwise shift operators are preceded with a plus (like other
  bitwise ops are).
» Here-docs require the tag to be quoted.
» Vector operators take up five or six characters.
» Probably faster for many people to type than guillemets are.
  (The second of each doubled character is very fast to type.)
» Vector bitwise shifts may look a little odd.

  * the single chevrons option:  op

» The Perl 5 'read a chunk' use of angled brackets needs to be
  provided some other way.
» Not much to type.
» Potential (human, if not parser) confusion from same characters
  being used elsewhere in Perl for comparisons and bitwise shifts.

  * the suggested but not much discussed options:

  ~op
  ~~op
  `op
  `op`
  *[op]
  .[op]
  =[op]
  ![op]
  _[op]
  :[op]
  '[op]
  ~[op]
  (op)
  )op(
  )op(
  [op]
  [)op(]

Phew!  I'm slightly concerned at this list making Piers's job too easy,
but have tried to minimize that effect by posting on a Monday (meaning
that this mail is ineligible for inclusion in the next summary and is
likely to be out of date by the time of the following one).

Smylers



Re: Possible Vector Operator Notations

2002-11-04 Thread Simon Cozens
[EMAIL PROTECTED] (Smylers) writes:

Thank you very, very much for this; this is supremely helpful.

 » No character left for eating whitespace.

That's a feature, not a bug! The space-eater alternately worries, confuses
and scares me.

-- 
I want you to know that I create nice things like this because it
pleases the Author of my story.  If this bothers you, then your notion
of Authorship needs some revision.  But you can use perl anyway. :-)
- Larry Wall



Re: Unifying invocant and topic naming syntax

2002-11-04 Thread Me
  (naming) the invocant of a method involves
  something very like (naming) the topic
 
 Generally, there's no conceptual link...

other than 

 The similarity is that both are implicit
 parameters

which was my point.

Almost the entirety of what I see as relevant
in the context of deciding syntax for naming
the invocant is that it's an especially
important implicit argument to methods that
may also be the topic.

Almost the exact same thing is true of the
topic, with the exception being that it
applies to subs as well as methods.

It's clear you could have come up with
something like one of these:

method f ($a, $b) is invoked_by($self)
method f ($a, $b) is invoked_by($self is topic)
method f ($a, $b) is invoked_by($_)

but you didn't. Any idea why not?


 There are times when it's useful to access
 the caller's topic without setting the current
 topic and times when it's useful to just set
 the current topic.
 ...
 When you have a system with two independent
 but interacting features, it's far more
 efficient to define two independent flags
 than to define 4 flags

Of course. The general scheme I suggested
doesn't impinge on this one way or the other.

But the specifics I suggested made a different
choice for the default situation.

As you said of the current scheme:

The following example will either print
nothing, or else print a stray $_ that is
in lexical scope wherever the sub is defined.

sub eddy ($space, $time) {
print;
}

In the scheme I suggest the first arg is the
topic by default (just as is the case for the
other topicalizers such as pointy sub, for,
etc in the current scheme). I think this
choice makes sense, but then, as I implied,
maybe I'm missing something.


  method f ($self : $c : $a*, $b) { ... }

(where * is short for 'is topic'.)

 Is this really common enough to merit a single
 punctuation character?

If I didn't think so I wouldn't have suggested
the shortcut. ;


  Anyhow, a further plausible tweak that builds
  on the above colon stuff is that one could
  plausibly do:
 
 So, in this system, the colon is used for both
 explicit parameters and implicit parameters.
 ...
 I'd prefer more consistency.

If by system you mean the scheme I suggested
prior to this plausible tweak, then no. The
colon would only be used in this way if one
introduced the tweak. Rejecting this tweak has
no impact on the value of the general scheme
I suggested.

The colon in my scheme (not the tweak) can
optionally be used to be explicit in sub
*defs* about otherwise implict args. The
tweak I am suggesting is that one could,
optionally, use the exact same syntax to
be explicit in sub *calls* about otherwise
implicit args. I think this has a clean
consistency.


  given $c : $foo {
  # $c = outer $_
  }
 
 It would be much more transparent to simply
 name the outer topic.

How is this so different to

method f ($c : $foo) {
# $c = invocant
}

?


 
 Allison

--
ralph



Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)

2002-11-04 Thread Larry Wall
[Note to all: yes, this is me, despite the weirdities of the quoting
and headers.  This is how it looks when I using mutt out of the box,
because I haven't yet customized it like I have pine.  But I do like
being able to see my own Unicode characters, not to mention everyone
else's.  If you don't believe this is me, well, I'll just tell you that
I live on a tropical island near Antarctica, my social security number
is 987-65-4321, and my mother's maiden name was the same as my maternal
grandfather's maiden name.  Or something like that...  --Ed]

On Mon, Nov 04, 2002 at 02:25:08PM -0800, Michael Lazzaro wrote:
 On Monday, November 4, 2002, at 11:58  AM, Larry Wall wrote:
 You know, separate streams in a for loop are not going to be that
 common in practic, so maybe we should look around a little harder for
 a supercomma that isn't a semicolon.  Now *that* would be a big step
 in reducing ambiguity...
 
 Or more than one type of supercomma, e.g:
 
for x ∫ y ∫ z - $x ∫ $y ∫ $z { ... }
 
 to mean:
for x ; y ; z - $x ; $y ; $z { ... }

That almost works visually.

 - vs -
 
for x § y § z - $x § $y § $z { ... }
 
 to mean:
 
for x - $x {
  for y - $y {
for z - $z {
  ...
}
  }
}
 
 ;-)

Glad you put the smiley.  I think the latter is much clearer.

But at the moment I'm thinking there's something wrong about any
approach that requires a special character on the signature side.
I'm starting to think that all the convolving should be specified
on the left.   So in this:

for parallel(x, y, z) - $x, $y, $z { ... }

the signature specifies that we are expecting 3 scalars to the sub,
and conveys no information as to whether they are generated in parallel
or serially.  That's entirely specified on the left.  The natural
processing of lists says that serial is specified like this:

for a, b, c - $x, $y, $z { ... }

Of course, parallel() is a rotten thing to have to say unless you're
into readability.  So we could still have some kind of parallizing
supercomma, mabye even ∥ (U+2225 PARALLEL TO).  But let's keep it
out of the signature, I think.  In other words, if something like

for x ∥ y ∥ z - $x, $y, $z { ... }

is to work, then

result = x ∥ y ∥ z;

has to interleave x, y, and z.  It's not special to the Cfor.
In the case of Cfor, of course, the compiler should feel free to
optimize out the actual construction of an interleaved array.

I suppose it could be argued that ∥ is really spelled »,« or some such.
However,

result = x »,« y »,« z;

just doesn't read quite as well for some reason.  A slightly better
case could be made for

result = x `|| y `|| z;

The reason we originally munged with the signature was so that we
could do weird things with differing numbers of streams on the left
and the right.  But if you really want a way to take 3 from x, then
3 from y, then 3 from z, there should be something equivalent to:

for round_robin_by_3s(x, y, z) - $x, $y, $z { ... }

Fooling around with signature syntax for that rare case is not worth it.
This way, the Cfor won't have to know anything about the signature other
than that it expects 3 scalar arguments.  And Simon will be happ(y|ier)
that we've removed an exception.

Ed, er, Larry



Re: Unifying invocant and topic naming syntax

2002-11-04 Thread Damian Conway
ralph wrote:


It's clear you could have come up with
something like one of these:

method f ($a, $b) is invoked_by($self)
method f ($a, $b) is invoked_by($self is topic)
method f ($a, $b) is invoked_by($_)

but you didn't. Any idea why not?


Because most methods need some kind of access to their own invocant,
but relatively few subroutines need access to the upscope topic.
So the syntax for invocant specification should be compact,
and the syntax for external topic access should be less so.

Moreover, because the latter mechanism compromises the lexical
scoping of the upscope topic, it ought to be marked very clearly
(i.e. with a keyword, rather than a single colon).



   method f ($self : $c : $a*, $b) { ... }




(where * is short for 'is topic'.)


Too short, IMHO.

For such a non-standard method definition, I'd *very* much rather
maintain a syntax like:

	method f ($self : $a is topic, $b) is given($c) { ... }

which clearly marks all the irregularities.

Not to mention that this more explicit syntax doesn't inject a spurious
pseudoparameter into the parameter list.

Damian




RE: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)

2002-11-04 Thread Brent Dax
Larry Wall:
(B# for @x $B!B(B @y $B!B(B @z - $x, $y, $z { ... }
(B
(BEven if you decide to use UTF-8 operators (which I am Officially
(BRecommending Against), *please* don't use this one.  This shows up as a
(Bbox in the Outlook UTF-8 font.
(B
(B--Brent Dax [EMAIL PROTECTED]
(B@roles=map {"Parrot $_"} qw(embedding regexen Configure)
(B
(BWire telegraph is a kind of a very, very long cat. You pull his tail in
(BNew York and his head is meowing in Los Angeles. And radio operates
(Bexactly the same way. The only difference is that there is no cat.
(B--Albert Einstein (explaining radio)



Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)

2002-11-04 Thread Brian Ingerson
On 04/11/02 17:52 -0800, [EMAIL PROTECTED] wrote:
 [Note to all: yes, this is me, despite the weirdities of the quoting
 and headers.  This is how it looks when I using mutt out of the box,
 because I haven't yet customized it like I have pine.  But I do like
 being able to see my own Unicode characters, not to mention everyone
 else's.  If you don't believe this is me, well, I'll just tell you that
 I live on a tropical island near Antarctica, my social security number
 is 987-65-4321, and my mother's maiden name was the same as my maternal
 grandfather's maiden name.  Or something like that...  --Ed]

Mutt?

I'm using mutt and I still haven't had the privledge of correctly viewing one
of these unicode characters yet. I'm gonna be really mad if you say you're
also using an OS X terminal. I suspect that it's my horrific OS X termcap
that's misbehaving here.

Aargh!

Brian

 
 On Mon, Nov 04, 2002 at 02:25:08PM -0800, Michael Lazzaro wrote:
  On Monday, November 4, 2002, at 11:58  AM, Larry Wall wrote:
  You know, separate streams in a for loop are not going to be that
  common in practic, so maybe we should look around a little harder for
  a supercomma that isn't a semicolon.  Now *that* would be a big step
  in reducing ambiguity...
  
  Or more than one type of supercomma, e.g:
  
 for @x I @y I @z - $x I $y I $z { ... }
  
  to mean:
 for @x ; @y ; @z - $x ; $y ; $z { ... }
 
 That almost works visually.
 
  - vs -
  
 for @x § @y § @z - $x § $y § $z { ... }
  
  to mean:
  
 for @x - $x {
   for @y - $y {
 for @z - $z {
   ...
 }
   }
 }
  
  ;-)
 
 Glad you put the smiley.  I think the latter is much clearer.
 
 But at the moment I'm thinking there's something wrong about any
 approach that requires a special character on the signature side.
 I'm starting to think that all the convolving should be specified
 on the left.   So in this:
 
 for parallel(@x, @y, @z) - $x, $y, $z { ... }
 
 the signature specifies that we are expecting 3 scalars to the sub,
 and conveys no information as to whether they are generated in parallel
 or serially.  That's entirely specified on the left.  The natural
 processing of lists says that serial is specified like this:
 
 for @a, @b, @c - $x, $y, $z { ... }
 
 Of course, parallel() is a rotten thing to have to say unless you're
 into readability.  So we could still have some kind of parallizing
 supercomma, mabye even P (U+2225 PARALLEL TO).  But let's keep it
 out of the signature, I think.  In other words, if something like
 
 for @x P @y P @z - $x, $y, $z { ... }
 
 is to work, then
 
 @result = @x P @y P @z;
 
 has to interleave @x, @y, and @z.  It's not special to the Cfor.
 In the case of Cfor, of course, the compiler should feel free to
 optimize out the actual construction of an interleaved array.
 
 I suppose it could be argued that P is really spelled »,« or some such.
 However,
 
 @result = @x »,« @y »,« @z;
 
 just doesn't read quite as well for some reason.  A slightly better
 case could be made for
 
 @result = @x `|| @y `|| @z;
 
 The reason we originally munged with the signature was so that we
 could do weird things with differing numbers of streams on the left
 and the right.  But if you really want a way to take 3 from @x, then
 3 from @y, then 3 from @z, there should be something equivalent to:
 
 for round_robin_by_3s(@x, @y, @z) - $x, $y, $z { ... }
 
 Fooling around with signature syntax for that rare case is not worth it.
 This way, the Cfor won't have to know anything about the signature other
 than that it expects 3 scalar arguments.  And Simon will be happ(y|ier)
 that we've removed an exception.
 
 Ed, er, Larry



Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Damian Conway
Larry wrote:


I've actually got my eye on ≈ (U+2248 ALMOST EQUAL TO) as a
replacement for ~~ someday in the distant future.

I suppose it could be argued that we should use ≅ (U+2245
APPROXIMATELY EQUAL TO) instead.  That's what =~ was supposed to
represent, after all...


Yeah, either of those work. But neither is entirely satisfactory, since
there's nothing almost or approximate about the matching the operator
does. We obviously need a unicode IS LIKE UNTO codepoint. ;-)



You know, separate streams in a for loop are not going to be that
common in practic, so maybe we should look around a little harder for
a supercomma that isn't a semicolon.  Now *that* would be a big step
in reducing ambiguity...


Amen.



Even if we limit ourselves to Latin1 for now,


Which I suspect we should seriously consider. Maybe leave 9+ bit
operators to Perl 7. ;-)



I'd avoid using standard signs like multiply × and divide ÷ for
non-standard purposes though.  (Not that we can exactly use multiply
even for its standard purpose--there's an awfully heavy resemblance
between × and x, at least in the typical sans serif font.)


That's why I semi-seriously suggested replacing Cx by C×.
For some reason alphabetic operators (at least, those that are
pretending to be symbols) really bug me.



It would be really funny to use cent ¢, pound £, or yen ¥ as a sigil, though...


H. Given that a pound is worth more than a dollar, maybe £ is the sigil
for pairs.

;-)


Damian




Re: Possible Vector Operator Notations

2002-11-04 Thread Damian Conway
Smylers summarized (beautifully, thank-you):


  * the looks like an array option: [op]

» Seemed a nice idea, but doesn't work with other use of square
  brackets.


Could be made to work. Suppose that every operator definition (explicit or
implicit) automagically also defined a variant with square brackets around it.
No ambiguity for any defined operator.

Of course you lose the ability to apply an arbitrary alphabetic function
across a vector of arguments, but maybe that's not such a terrible price.
Especially if we allow rvalue Cfor multi-stream loops, which give the
same functionality.

Damian




Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)

2002-11-04 Thread Damian Conway
Larry wrote:


But at the moment I'm thinking there's something wrong about any
approach that requires a special character on the signature side.
I'm starting to think that all the convolving should be specified
on the left.   So in this:

for parallel(x, y, z) - $x, $y, $z { ... }

the signature specifies that we are expecting 3 scalars to the sub,
and conveys no information as to whether they are generated in parallel
or serially.  That's entirely specified on the left.  The natural
processing of lists says that serial is specified like this:

for a, b, c - $x, $y, $z { ... }

Of course, parallel() is a rotten thing to have to say unless you're
into readability.  So we could still have some kind of parallizing
supercomma, mabye even ∥ (U+2225 PARALLEL TO).


I'd rather we not use that. I found it surprisingly hard to
distinguish∥from ||. May I suggest that this might be the opportunity
to deploy ¦ (i.e. Ebrvbar).



But let's keep it
out of the signature, I think.  In other words, if something like

for x ∥ y ∥ z - $x, $y, $z { ... }

is to work, then

result = x ∥ y ∥ z;

has to interleave x, y, and z.  It's not special to the Cfor.


Very nice. The n-ary zip operator.




I suppose it could be argued that ∥ is really spelled »,« or some such.
However,

result = x »,« y »,« z;

just doesn't read quite as well for some reason.


Agreed.



A slightly better case could be made for

result = x `|| y `|| z;


Except by those who suffer FIABCB (font-induced apostrophe/backtick
character blindness).



The reason we originally munged with the signature was so that we
could do weird things with differing numbers of streams on the left
and the right.  But if you really want a way to take 3 from x, then
3 from y, then 3 from z, there should be something equivalent to:

for round_robin_by_3s(x, y, z) - $x, $y, $z { ... }


Or perhaps just:

	sub take(int $n, *from) {
	yield splice from, 0, $n while from  $n;
	return ( from, undef xx ($n-from) )
	}

	three = take.assuming(n=3);

	for three(x), three(y), three($z) - $x, $y, $z { ... }

???



Fooling around with signature syntax for that rare case is not worth it.
This way, the Cfor won't have to know anything about the signature other
than that it expects 3 scalar arguments.  And Simon will be happ(y|ier)
that we've removed an exception.


and reinstituted the previous exception that a semicolon in an parameter
list marks the start of optional parameters! :-)


Damian