Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Mark Overmeer
* Tom Christiansen ([EMAIL PROTECTED]) [081126 23:55]:
> On "Wed, 26 Nov 2008 11:18:01 PST."--or, for backwards compatibility,
> at 7:18:01 p.m. hora Romae on a.d. VI Kal. Dec. MMDCCLXI AUC,
> Larry Wall <[EMAIL PROTECTED]> wrote:
> 
> SUMMARY: I've been looking into this sort of thing lately (see p5p),
>  and there may not even *be* **a** "right" answer.  The reasons
>why take us into an area we've traditionally avoided.

What a long message...

> Mark>> We should focus on OS abstraction.
> Mark>> [...] the design of this needs to be free from historical mistakes.

>  ... It cannot be
> done in an automated fashion, since you can't know a filesystem that knew
> *locale* each filename was created under, and  without that, you have to
> guess--almost always wrongly.

Exactly.  This is an historical mistake, understandable to have at least
a path of growth from the current system open() interface.  Only users
which have the same locale can see the names the same.  If you change
your locale your filenames break!  Say you change from cyrillic into
English.

In my suggestion, the programmer (who is ofter local on the system) can
at least say what the locale was when the filenames where created.  On
some OS, that OS can tell you.  What I would like is an object model
which does allow us at least to abstract these problems away... whether
it can be resolved automatically or only with help is for later.

> There is ABSOLUTELY NO WAY I've found to tell whether these utf-8
> string should test equal, and when, nor how to order them, without
> knowing the locale:
> 
> "RESUME",
> "Resume"
> "resume"
> "Resum\x{e9}"
> "r\x{E9}sum\x{E9}"
> "r\x{E9}sume\x{301}"
> "Re\x{301}sume\x{301}"

This is done by the locale of the user of the script, as usual for
ls(1).  So, I do not see your problem here.

I don't mind if problems with unicode are not solved or solvable.
Could be discuss about a buildin File::Spec/Path::Class?  And we
allow us the same limitations as these have, for the moment.
-- 
Regards,

   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net


r24080 - docs/Perl6/Spec src/perl6

2008-11-26 Thread pugs-commits
Author: lwall
Date: 2008-11-27 08:21:32 +0100 (Thu, 27 Nov 2008)
New Revision: 24080

Modified:
   docs/Perl6/Spec/S03-operators.pod
   src/perl6/STD.pm
Log:
[STD] not() etc. is a function call
[S03] prefix:<^> no longer tries to get fancy with lists


Modified: docs/Perl6/Spec/S03-operators.pod
===
--- docs/Perl6/Spec/S03-operators.pod   2008-11-27 06:14:03 UTC (rev 24079)
+++ docs/Perl6/Spec/S03-operators.pod   2008-11-27 07:21:32 UTC (rev 24080)
@@ -12,9 +12,9 @@
 
   Maintainer: Larry Wall <[EMAIL PROTECTED]>
   Date: 8 Mar 2004
-  Last Modified: 7 Nov 2008
+  Last Modified: 26 Nov 2008
   Number: 3
-  Version: 146
+  Version: 147
 
 =head1 Overview
 
@@ -245,6 +245,14 @@
 
 a(1)
 
+In term position, any identifier followed immediately by a
+parenthesized expression is always parsed as a term representing
+a function call even if that identifier also has a prefix meaning,
+so you never have to worry about precedence in that case.  Hence:
+
+not($x) + 1 # means (not $x) + 1
+abs($x) + 1 # means (abs $x) + 1
+
 =item *
 
 Pair composers
@@ -2890,10 +2898,6 @@
 
 for ^4 { say $_ } # 0, 1, 2, 3
 
-If applied to a list, it generates a multidimensional set of subscripts.
-
-for ^(3,3) { ... } # (0,0)(0,1)(0,2)(1,0)(1,1)(1,2)(2,0)(2,1)(2,2)
-
 If applied to a type name, it indicates the metaclass instance instead,
 so C<^Moose> is short for C or C.  It still kinda
 means "what is this thing's domain" in an abstract sort of way.

Modified: src/perl6/STD.pm
===
--- src/perl6/STD.pm2008-11-27 06:14:03 UTC (rev 24079)
+++ src/perl6/STD.pm2008-11-27 07:21:32 UTC (rev 24080)
@@ -3271,7 +3271,7 @@
 token term:identifier ( --> Term )
 {
 :my $t;
-
+ 
 { $t = $.text; }
 
 {{



Re: [perl #60828] [BUG] [EMAIL PROTECTED] returns ridicously long lists

2008-11-26 Thread Patrick R. Michaud
On Wed, Nov 26, 2008 at 06:21:22PM -0800, Larry Wall wrote:
> On Wed, Nov 26, 2008 at 08:54:50AM +0100, Moritz Lenz wrote:
> : Patrick R. Michaud wrote:
> : > Currently Rakudo is treating [EMAIL PROTECTED] as though it's
> : > prefix:<^> on a List, which S03 says 
> : > for ^(3,3) { ... } # (0,0)(0,1)(0,2)(1,0)(1,1)(1,2)(2,0)(2,1)(2,2)
> : 
> : I see how the specced makes sense for a List of Ints, but not for any
> : other list - any ideas from the design team?
> 
> My guess is that the list overloading will simply vanish into thin air,
> and you'll have to say something like
> 
> ^«(3,3)
> 
> to get the current Parrot meaning, and
> 
> [X] ^«(3,3)
> 
> or
> 
> ^3 X ^3
> 
> to get the specced list meaning.  But other viewpoints are welcome...

+1 to the idea that [EMAIL PROTECTED] is the same as [EMAIL PROTECTED] .  Now 
implemented
as such in Rakudo, and unfudged the corresponding tests.

Closing ticket -- thanks!

Pm


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Darren Duncan

Tom Christiansen wrote:

I believe database folks have been doing the same with character data, but
I'm not up-to-date on the DB world, so maybe we have some metainfo about
the locale to draw on there.  Tim?


AFAIK, modern databases are all strongly typed at least to the point that the 
values you store in and fetch from them are each explicitly character data or 
binary data or numbers or what-have-you; and so, when you are dealing with a 
DBMS in terms of character data, it is explicitly specified somewhere (either 
locally for the data or globally/hardcoded for the DBMS) that each value of 
character data belongs to a particular character repertoire and text encoding, 
and so the DBMS knows what encoding etc the character data is in, or at least it 
treats it consistently based on what the user said it was when it input the 
data.  The only time this information isn't really remembered is if the data is 
supplied in terms of being binary data.


Maybe some older or unusual DBMSs aren't this way, and of course technically a 
filesystem etc *is* a database ... I think that example mentioned about filename 
storage being locale dependent, probably meant that at the actual filesystem 
level it was just dealing with the names as binary data.



There is ABSOLUTELY NO WAY I've found to tell whether these utf-8
string should test equal, and when, nor how to order them, without
knowing the locale:

"RESUME",
"Resume"
"resume"
"Resum\x{e9}"
"r\x{E9}sum\x{E9}"
"r\x{E9}sume\x{301}"
"Re\x{301}sume\x{301}"

Case insensitively, in Spanish they should be identical in all
regards.  In French, they should be identical but for ties, 
in which case you work your way right to left on the diactricals.


This leads me to talk about my main point about sensitivity etc.

I believe that the most important issues here, those having to do with identity, 
can be discussed and solved without unduly worrying about matters of collation; 
identity is a lot more important than collation, as well as a precondition for 
collation, and collation is a lot more difficult and can be put off.  With 
respect to dealing with a file system, generally it is just identity that 
matters and collation is a concern that can typically be just tacked on after 
identity is solved.


That is, with a file system you need to know whether or not a file name you hold 
will or won't match a file in the system, and matching or not-matching is the 
main function of an identity.  Similarly, the file system has to make sure that 
no 2 distinct files in it have the same file name, that is the same public 
identity.  In contrast, the order that you order or sort a list of files by 
their names usually isn't so important; while all work with a file system 
requires working with identities, most work does not need to deal with 
collation.  In practice several parties can agree on a single means of 
identifying files, while still having their own favorite collations, so the same 
list can be ordered in different ways.


Collation criteria is something that can be naturally applied externally to a 
file system, such as by a user program, and only identity criteria needs to be 
built-in to the file system.


So collation doesn't need to be considered in Perl's file-system interface, 
while identity does; collation can be a layer on top of the core interface that 
just cares about identity.


One maxim I apply in my database work, and that I believe applies to this 
discussion, is "any logical difference is a big difference".  If you have 2 
distinct value literals such that you consider the difference in each literal's 
spelling to be significant, such that you can't for all use cases substitute one 
literal for the other, then the 2 literals denote 2 distinct values; in the 
other case, where you can always substitute one for the other harmlessly, then 
they denote the same value.  The concept of 'value' and 'identity' are the same, 
and any value is its own identity.


And so, with your 7 'resume' literals, I would say that if there is a reason for 
any of the spellings to exist that couldn't be handled by one of the other 
spellings, then all 7 literals are distinct/non-identical taken as-is.


If you *know* that the 7 strings are all UTF-8, then locale doesn't have to be 
considered for equality; just your unicode abstraction level matters, such as if 
you're defining the values in terms of graphemes vs codepoints vs bytes.


When talking about identity, there is no such thing as case-insensitivity or 
accent insensitivity or whitespace insensitivity or what have you.  If you have 
any reason to not replace every "E" with an "e" or vice-versa in your character 
string, then you consider those 2 non-identical and so they wouldn't match; by 
contrast, true case-insensitivity means you can replace every "e" with an "E" 
(for example) and forget than an "e" ever existed; the actual equality test is 
then the same since all comparands would 

Re: [perl #60828] [BUG] [EMAIL PROTECTED] returns ridicously long lists

2008-11-26 Thread Larry Wall
On Wed, Nov 26, 2008 at 08:54:50AM +0100, Moritz Lenz wrote:
: 
: 
: Patrick R. Michaud wrote:
: > Currently Rakudo is treating [EMAIL PROTECTED] as though it's
: > prefix:<^> on a List, which S03 says 
: > 
: > If [prefix:<^> is] applied to a list, it generates a
: > multidimensional set of subscripts.
: > 
: > for ^(3,3) { ... } # (0,0)(0,1)(0,2)(1,0)(1,1)(1,2)(2,0)(2,1)(2,2)
: > 
: > So, Rakudo is currently seeing [EMAIL PROTECTED] as following this 
definition,
: > and trying to generate the subscripts (perhaps wrongly).
: 
: Yes, wrongly:
: 08:48 < moritz_> rakudo: say (^(3,3)).perl
: 08:48 < p6eval> rakudo 33212: OUTPUT[[0, 1, 2, 0, 1, 2]␤]
: 08:51 < moritz_> rakudo: say (^(10,3)).perl
: 08:51 < p6eval> rakudo 33212: OUTPUT[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0,
: 1, 2]␤]
: 
: It counts up first the first index, then the second.
: 
: 
: I see how the specced makes sense for a List of Ints, but not for any
: other list - any ideas from the design team?

My guess is that the list overloading will simply vanish into thin air,
and you'll have to say something like

^«(3,3)

to get the current Parrot meaning, and

[X] ^«(3,3)

or

^3 X ^3

to get the specced list meaning.  But other viewpoints are welcome...

Larry


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Timothy S. Nelson
	Can I just remind everyone that (IMO) we shouldn't just be considering 
filesystems here?  I think it would be a pretty useful feature to have a 
general tree manipulation interface, and then this could be applied to 
filesystems, or XML, or LDAP, or SQL (although this doesn't map so well), or 
whatever.


I guess the way I see it, you'd have something like this:

role Tree::Node {...}
role Filesystem::Node inherits from Tree::Node {...}
role Filesystem::Directory inherits from Filesystem::Node {...}
class Filesystem::File does Filesystem::Node { # Interface, like DBI
has $implementation handles *;

$implementation = Filesystem::File::XML->new();
}
class Filesystem::File::XML inherits from Filesystem::File::Base {...}

	In the case of Filesystem::Node, you would define some standard 
attribute names (eg. "owner", "is_readable"), but then they would be 
accessible through the standard Tree::Node.get_attribute() interface.  And the 
standard Tree::Node.get_children() would be implemented by Filesystem::File as 
something to fetch the contents of the file; in the case of 
Filesystem::XMLFile, it would turn the contents into a tree of XML nodes.


	I agree about the different levels of abstractions, but just wanted to 
put in a plug for this one as one that I like.


:)


-
| Name: Tim Nelson | Because the Creator is,|
| E-mail: [EMAIL PROTECTED]| I am   |
-

BEGIN GEEK CODE BLOCK
Version 3.12
GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- 
PE(+) Y+>++ PGP->+++ R(+) !tv b++ DI D G+ e++> h! y-

-END GEEK CODE BLOCK-



Re: [perl #60828] [BUG] [EMAIL PROTECTED] returns ridicously long lists

2008-11-26 Thread Moritz Lenz


Patrick R. Michaud wrote:
> Currently Rakudo is treating [EMAIL PROTECTED] as though it's
> prefix:<^> on a List, which S03 says 
> 
> If [prefix:<^> is] applied to a list, it generates a
> multidimensional set of subscripts.
> 
> for ^(3,3) { ... } # (0,0)(0,1)(0,2)(1,0)(1,1)(1,2)(2,0)(2,1)(2,2)
> 
> So, Rakudo is currently seeing [EMAIL PROTECTED] as following this definition,
> and trying to generate the subscripts (perhaps wrongly).

Yes, wrongly:
08:48 < moritz_> rakudo: say (^(3,3)).perl
08:48 < p6eval> rakudo 33212: OUTPUT[[0, 1, 2, 0, 1, 2]␤]
08:51 < moritz_> rakudo: say (^(10,3)).perl
08:51 < p6eval> rakudo 33212: OUTPUT[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0,
1, 2]␤]

It counts up first the first index, then the second.


I see how the specced makes sense for a List of Ints, but not for any
other list - any ideas from the design team?

Moritz

> There's still some ambiguity in how (or if) we should support
> both interpretations [1], so we'll want to get that resolved before
> we can fix prefix:<^> here.
>
> [1]  http://irclog.perlgeek.de/perl6/2008-11-26#i_720703


Synopses moved to pugs svn repository

2008-11-26 Thread Patrick R. Michaud
On Wed, Nov 26, 2008 at 11:18:01AM -0800, Larry Wall wrote:
> Anyway, feel free to coordinate this here and/or on #perl6.  (Note
> that Patrick is in the process of moving all the Synopses to the pugs
> repo at some point soon, so the current S16 in pugs/docs/Perl6/Spec
> is likely to have its name/location changed soon.)  If you need
> a pugs commit bit, please ask in #perl6 on irc.freenode.net.

...and the move is now done.  The synopses currently live
in docs/Perl6/Spec/ of the pugs svn repository, but we may
be moving these to a different location in the repository
and/or renaming the files soon.

Pm


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Leon Timmermans
On Wed, Nov 26, 2008 at 5:15 PM, Mark Overmeer <[EMAIL PROTECTED]> wrote:
> Yes, you are right on this.  ASCII does not suffer from UTF-8, so my
> example was flawed.  The second 128 does cause problems.  How can glob()
> sort filenames, for instance?

That's a matter of collation, not (just) character set. TIMTOWTDI.
There is no right way to do it as it depends on the circumstances, but
a simple binary sort is not a bad default.

Leon Timmermans


Re: S16: chown, chmod

2008-11-26 Thread Aristotle Pagaltzis
* Brandon S. Allbery KF8NH <[EMAIL PROTECTED]> [2008-11-25 07:25]:
> OTOH Perl has historically not said much about doing that kind
> of thing.

And I’m not in favour of it starting now. All I am saying is that
APIs should be designed to encourage correct designs; arguably
this is the spirit of Perl 6, which says TMTOWTDI yet tries to
provide one good default way of doing any particular thing.

Regards,
-- 
Aristotle Pagaltzis // 


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Geoffrey Broadwell
On Wed, 2008-11-26 at 11:34 -0800, Darren Duncan wrote:
> I agree with the idea of making Perl 6's filesystem/etc interface more 
> abstract, 
> as previously discussed, and also that users should be able to choose between 
> different levels of abstraction where that makes sense, either picking a more 
> portable interface versus a more platform-specific one.

Agreed on both counts.

> Following up on Tim Bunce's comment about looking at prior art, I also 
> recommend 
> looking at the SQLite DBMS, specifically its virtual file system layer; this 
> one 
> is designed to give you deterministic behaviour and introspection over a wide 
> range of storage systems and attributes, both on PCs and on embedded devices, 
> or 
> hard disks versus flash or write once vs write many etc, where a lot of 
> otherwise-assumptions are spelled out.  One relevant url is 
> http://sqlite.org/c3ref/vfs.html and for the moment I forget where other good 
> urls are.

There are also higher-level VFS systems, such as Icculus.org PhysicsFS,
which goes farther than just abstracting the OS operations.  It also
abstracts away the differences between archives and "real" directories,
unions multiple directory trees on top of each other, and transparently
redirects writes to a different "trunk" than reads:

http://icculus.org/physfs/

I want to be able to support that functionality in a way that still
allows me to open and close PhysicsFS "files" and "directories" the way
I would normally.  I want to be able to layer it *under* the standard
Perl IO ops, rather than above them.

The following is all obvious, but just to keep it in people's minds and
frame the discussion:

Being able to layer IO abstractions is at least as important as the
basic OS abstraction itself -- as well as the ability to use the high
level abstraction most of the time, but reach down the stack when
needed.  This implies making best effort to minimize the ways in which
upper layers will be hopelessly confused by low-level operations, and
documenting the heck out of the problem areas.

These layers should be mix-and-match as much as possible, with
abstractions designed with common interfaces.  Certainly Perl 5's IO
layers, as well as any networking or library stack, are prior art here.

> To summarize, what we really want is something more generic than 
> case-sensitivity, which is text normalization and text folding in general, as 
> well as distinctly dealing with distinctness for representation versus 
> distinctness for mutual exclusivity.

Yes, definitely.

> [This] implies that 
> sensitivity is special whereas sensitivity should be considered normal, and 
> rather insensitivity should be considered special.

If only that were true in other areas of life.  :-)


-'f




Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Darren Duncan
I agree with the idea of making Perl 6's filesystem/etc interface more abstract, 
as previously discussed, and also that users should be able to choose between 
different levels of abstraction where that makes sense, either picking a more 
portable interface versus a more platform-specific one.


Following up on Tim Bunce's comment about looking at prior art, I also recommend 
looking at the SQLite DBMS, specifically its virtual file system layer; this one 
is designed to give you deterministic behaviour and introspection over a wide 
range of storage systems and attributes, both on PCs and on embedded devices, or 
hard disks versus flash or write once vs write many etc, where a lot of 
otherwise-assumptions are spelled out.  One relevant url is 
http://sqlite.org/c3ref/vfs.html and for the moment I forget where other good 
urls are.


Mark Overmeer wrote:

   $dir.case_sensitive(0);

   $*OS.filesystem('/home', type => 'xfs', name_encoding => 'latin1'
, text_content_encoding => 'utf-8,bom', illegal_chars => "/\x0"
, case_sensitive => 1, max_path => 1024);


I understand that the above, concerning case-sensitivity, is just meant to be an 
example, but I want to explore that in more detail for a moment, as it reflects 
a common perception that only scratches the surface and needs to be fleshed out 
more.


To summarize, what we really want is something more generic than 
case-sensitivity, which is text normalization and text folding in general, as 
well as distinctly dealing with distinctness for representation versus 
distinctness for mutual exclusivity.


For example, one file system will represent your chosen case for a filename but 
it won't allow 2 files in the same directory whose filenames are non-distinct 
when uppercased; another file system in contrast would also represent a filename 
uppercased.  For another example, one file system will not distinguish between 
accents on letters while another would, and this is orthogonal to 
case-sensitivity.  Or for another, one might treat a run of whitespace as being 
equivalent to a single whitespace character, or whitespace characters are 
ignored entirely.


Also, the paradigm that is the most distinguishing (case-sensitive, 
accent-sensitive, whitespace-sensitive, etc) should be the default, and any 
boolean option to change an aspect of this should be named that a false value is 
more distinguishing and a true value is less distinguishing.  For example, a 
flag should be named "ignores_case" rather than "case_sensitive"; this also 
assumes that if named arguments are optional, then the common default value of a 
boolean-typed argument is false.  Naming something "case_sensitive" implies that 
sensitivity is special whereas sensitivity should be considered normal, and 
rather insensitivity should be considered special.


-- Darren Duncan


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Larry Wall
On Wed, Nov 26, 2008 at 11:21:58AM +0300, Richard Hainsworth wrote:
> The S16: chown, chmod thread seems to be too unix-focussed.

Indeed, what you are currently reading in S16 is mostly just lightly
edited copy-paste from P5 docs.  But the S16 draft is out in the pugs
repo for a reason--anyone and everyone on this thread should consider
it perfectly okay to take S16 in hand and refactor it mercilessly.
Any shortcuts we wish to install into the final Perl 6 can easily
be done at the last moment by the prelude aliasing common operations
into the core language.

Anyway, feel free to coordinate this here and/or on #perl6.  (Note
that Patrick is in the process of moving all the Synopses to the pugs
repo at some point soon, so the current S16 in pugs/docs/Perl6/Spec
is likely to have its name/location changed soon.)  If you need
a pugs commit bit, please ask in #perl6 on irc.freenode.net.

Larry


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Mark Overmeer
* Leon Timmermans ([EMAIL PROTECTED]) [081126 15:43]:
> On Wed, Nov 26, 2008 at 12:40 PM, Mark Overmeer <[EMAIL PROTECTED]> wrote:
> That is a task for the operating system, not Perl. You're trying to
> solve the problem at the wrong end here IMHO.

In my (and your) case, the operating system is not helping at all
and there is no chance in having that changed.  So...
My remark was just one example, and I can give many more, where I
would like to see more abstraction in the OS interface to avoid the
need for each user to re-invent the wheel of interoperability.

> > For instance, you do not know in which character-set the filename is;
> > that is file-system dependent.  So, we treat filenames as raw bytes.
> 
> On native file-system types (like ext3fs), character-set is not
> file-system dependent but non-existent. It really is raw bytes.

Not on the presentation level to the user.  This makes it even more
horrifying.  It depends on the setting of an environment variable
of the actual user how the bytes of the filename are interpreted.

On the OS filesystem implementation you are probably correct (in
the UNIX/Linux case), but programs are used for end-user results.

> > This does cause dangers (a UTF-8 codepoint in the filename with
> > a \x2F ('/') byte in it, for instance)
> A \x2F always means a '/'. UTF-8 was designed to be backwards
> compatible like that.

Yes, you are right on this.  ASCII does not suffer from UTF-8, so my
example was flawed.  The second 128 does cause problems.  How can glob()
sort filenames, for instance?  UTF-16 names should not enter the Perl
program unless you are aware of it, because those can hurt badly.

Please comment on the big picture in the debate: there are all kinds
of OS dependent things I really would like to see hidden in a (large)
abstraction layer to simplify the development of portable scripts.
I don't say I know all the answers, but I do feel a lot of pain in
each module for CPAN the same thing again.
-- 
Regards,
   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net



Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Leon Timmermans
On Wed, Nov 26, 2008 at 12:40 PM, Mark Overmeer <[EMAIL PROTECTED]> wrote:
> Also, I get data from a CD which was written case-insensitive and then
> copied to my Linux box.  It would be nice to be able to say: "treat this
> directory case insensitive" (even when the implementation is slow)
> Shared with Windows default behavioral interface.
>

That is a task for the operating system, not Perl. You're trying to
solve the problem at the wrong end here IMHO.

> For instance, you do not know in which character-set the filename is;
> that is file-system dependent.  So, we treat filenames as raw bytes.

On native file-system types (like ext3fs), character-set is not
file-system dependent but non-existent. It really is raw bytes.

> This does cause dangers (a UTF-8 codepoint in the filename with a \x2F
> ('/') byte in it, for instance)

A \x2F always means a '/'. UTF-8 was designed to be backwards
compatible like that.

Regards,

Leon Timmermans


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Tim Bunce
On Wed, Nov 26, 2008 at 12:40:41PM +0100, Mark Overmeer wrote:

> We should focus on OS abstraction.

> [...] the design of this needs to be free from historical mistakes.

And avoid making too many new ones. There must be useful prior art around.

Java, for example, has a FileSystem abstraction java.nio.file.FileSystem
http://openjdk.java.net/projects/nio/javadoc/java/nio/file/FileSystem.html

which has been extended, based on leasons learnt, in the NIO.2 project
("JSR 203: More New I/O APIs for the JavaTM Platform ("NIO.2")
APIs for filesystem access, scalable asynchronous I/O operations,
socket-channel binding and configuration, and multicast datagrams.")
which enables things like being able to transparently treat a zip file
as a filesystem:
http://blogs.sun.com/rajendrag/entry/zip_file_system_provider_implementation

See http://javanio.info/filearea/nioserver/WhatsNewNIO2.pdf

Tim.

p.s. I didn't know any of that when I started to write this "look for
prior art" email, but a little searching turned up these examples.
I'm sure there are more in other realms, but NIO.2 certainly looks like a
rich source of good ideas derived from a wide range of experience.


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Mark Overmeer
* Richard Hainsworth ([EMAIL PROTECTED]) [081126 08:21]:
> The S16: chown, chmod thread seems to be too unix-focussed.
> 
> To be portable, the minimum assumptions need to be made about the 
> environment in which a program operates. Alternatively, the software 
> needs to be able to determine whether the environment it is operating in 
> meets a minimum set of conditions.
> ...
> Thus I would suggest that the perl6 specifications should be written in 
> an abstract way, one not related to a specific operating system and in a 
> way that can be adapted by an implementor to specific systems.

I fully agree with you: the way the design is going is making the same
"mistakes" of Perl5 again.  Where we were able to release the Perl5
syntax more and more when the design of Perl6 made more progress, so
should we do with the way we use modules.  S16 is not doing that.

Also Rafael's suggestion to focus on POSIX is not the way a nice
interface should work.  POSIX calls (and non-POSIX means) are
ways to implement the interface to the Operating System, which can
be different from the most practical interface on implementation level.
We should focus on OS abstraction.

For instance, if a file is represented in an object, then the most
friendly interface names would be like:
  $file->owner($user);
  my $user = $file->owner;
under the hood, we use chown and stat.

I really would like to see a "standard" object oriented view on the
system, which mainly autodetects the environment.  I am really
fed-up using File::Spec and Path::Class explicitly all the time.

Also, I get data from a CD which was written case-insensitive and then
copied to my Linux box.  It would be nice to be able to say: "treat this
directory case insensitive" (even when the implementation is slow)
Shared with Windows default behavioral interface.

So, I would like a radical change... trying to be as much general
(non UNIX specific) as possible:
   (sorry, my Perl6 syntax is still very sloppy)

   some global $*OS
   # may be different per parallel instance of the program
   # Maybe an OS function which returns $*OS

   my $dir = $*OS.dir($*PROGRAM.arg[0]);
   # above maybe hidden with a functional wrappers: dir $argv[0]

   $dir.case_sensitive(0);
   if $dir.entry('xyz').is_file {}

   my $f   = $dir.file('xyz');
   $f.owner($*OS.user);

   $*OS.system('ls | lpr');

   print $*OS.family;
   print $*OS.kernel_version;

   my $pid = $*OS.process.label;
   
We should also be aware that we design Perl6 for parallelism.  Do we
require all nodes to run the same OS (~version)?

Besides, I would really like to get a incremental growth path to do
things we cannot do yet.  Some things are currently difficult to realize
under UNIX/Linux because there is not kernel interface defined for it.
For instance, you do not know in which character-set the filename is;
that is file-system dependent.  So, we treat filenames as raw bytes.
This does cause dangers (a UTF-8 codepoint in the filename with a \x2F
('/') byte in it, for instance)  But as long as the OS cannot provide
the user with this information, we should still give the author a way
to specify it.

   $*OS.filesystem('/home', type => 'xfs', name_encoding => 'latin1'
, text_content_encoding => 'utf-8,bom', illegal_chars => "/\x0"
, case_sensitive => 1, max_path => 1024);

I have been working on such a module for Perl5 (which has a much wider
field than Path::Class) but (as many other of my projects) did not
complete it to a usable/publishable level (yet).

It is all NOT too difficult to implement (we do share this knowledge),
but the design of this needs to be free from historical mistakes.  That's
a challenge.
-- 
Regards,
   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Rafael Garcia-Suarez
Richard Hainsworth wrote in perl.perl6.language :
> The S16: chown, chmod thread seems to be too unix-focussed.

I was more or less thinking that the syscall-related primitives,
like chown or chmod, could go in a POSIX namespace. Even in UNIX
land nowadays the situation can be much more complex than traditional
ownership and modes (a situation not entirely satisfactorily addressed
by Perl 5's filetest pragma).

> Following the general perl6 philosophy, perhaps too there should be an 
> abstract definition for the language that is "core" and additional 
> modules that are specific to operating systems. Thus when generic 
> software is distributed, it comes with an installer that determines the 
> operating system chooses whether to use IO::Unix, IO::Unix::Gnome, 
> IO::MS::WindowsXP, IO::MS::Vista, IO::Apple, etc.
> Maybe also IO::Internet::Http, IO::Internet::Ftp?

IO (streams) and rights are not naturally related. Maybe you're thinking
about filesystems and other content addressing schemes (like URLs). The
subject is more complex than it seems at first glance, because you can
have, for example, per-volume current working directories. It's quite
hard to design something that is abstract enough, but at the same time
not totally useless.


Files, Directories, Resources, Operating Systems

2008-11-26 Thread Richard Hainsworth

The S16: chown, chmod thread seems to be too unix-focussed.

Perl6 is being born in a world dominated by the internet. Whilst perl 
was the glue for the internet when the internet was born, it was a unix 
child. I learned perl from a Windows perspective and I found the 
discussion of ownership and file tests odd. Moreover, there were things 
I wanted to do with perl that seemed unduly cumbersome in Windows 
because the paradigm perl used was a unix one, but adapted to Windows.


To be portable, the minimum assumptions need to be made about the 
environment in which a program operates. Alternatively, the software 
needs to be able to determine whether the environment it is operating in 
meets a minimum set of conditions.


Where software is written for a specific environment, the developer 
already knows more about the environment.


Thus I would suggest that the perl6 specifications should be written in 
an abstract way, one not related to a specific operating system and in a 
way that can be adapted by an implementor to specific systems.


Following the general perl6 philosophy, perhaps too there should be an 
abstract definition for the language that is "core" and additional 
modules that are specific to operating systems. Thus when generic 
software is distributed, it comes with an installer that determines the 
operating system chooses whether to use IO::Unix, IO::Unix::Gnome, 
IO::MS::WindowsXP, IO::MS::Vista, IO::Apple, etc.

Maybe also IO::Internet::Http, IO::Internet::Ftp?

Thus, questions concerning ownership, chown, etc would be left to 
modules. Modules for specific systems would make it easy to deal 
naturally with those systems.  Software written specifically for one OS 
would explicitly "use" the appropriate module, and hence would be able 
to rely on constructs natural for that system.


The interesting questions then would be what abstract concepts should be 
core to perl6 and which ones should be left to


Do we continue to talk about a file, or do we talk about a resource or 
datastream?


When considering internet resources, the concept of trust is more 
important than the concept of ownership.


It would seem to me that a program should be aware of the continuing 
ability of the datastream to operate, viz., supply / accept data, so 
that a program doesnt hang waiting for data from a source that has been 
disconnected.