Re: Should a dirhandle be a filehandle-like iterator?

2007-05-01 Thread Smylers
John Macdonald writes:

 open(:file), open(:dir), open(:url), ... could be the non-dwimmy
 versions.  If you don't specify an explicit non-dwimmy base variant,
 the dwim magic makes a (preferrably appropriate) choice.

That'll make it easy for people porting PHP scripts to Perl 6 -- in
particular for those wanting to port the security hole where a CGI
parameter is used to form part of a filename opened by a script but a
malicious user can supply a URL instead and cause the program to do
things very different from what it intended.

What are the situations in which a programmer really needs to open
something but doesn't know wether that thing is a file, a directory, or
a URL?  I'm still unpersuaded this is sensible default behaviour.

Smylers

[Apologies for the delay on this; I first tried to send it on April
15th, and only just spotted it failed to get through.]


Re: Should a dirhandle be a filehandle-like iterator?

2007-05-01 Thread Luke Palmer

On 5/1/07, Smylers [EMAIL PROTECTED] wrote:

What are the situations in which a programmer really needs to open
something but doesn't know wether that thing is a file, a directory, or
a URL?  I'm still unpersuaded this is sensible default behaviour.


Lots of times.  It's an agnosticism, meaning that you can write a
module which opens things and it doesn't have to know what it's
opening; as a matter of fact, it doesn't even have to know what kinds
of things it's *capable* of opening.  That's powerful.

The point is, even though opening a file and opening a URL are
reasonably different things, they are both the same logical operation,
so they can be abstracted.  Not abstracting them will either cause
modules to have to take a type of file parameter whenever they take
a file, or will lead to code like this:

   my $fh = do given $file {
   when /url/ { openurl($file) }
   default   { open($file) }
   }

And the programmer of this module may not have been aware that he
could operate on directories, even though it's just some sort of line
processing module.

Anyway, my point is that the concept of opening something is
abstractable, and not abstracting it means that everyone has to
abstract it separately.

And please don't argue from the standpoint of security holes. Security
holes are possible in every language which talks to the outside world.
Taint mode is a pretty good way to keep security holes out of a
complex, rich language like Perl.  But if you really want to be free
of security holes, you either have to make a language which doesn't do
anything, or make a language in which everyone can be an expert within
an hour.

Luke


Re: Should a dirhandle be a filehandle-like iterator?

2007-05-01 Thread Larry Wall
On Tue, May 01, 2007 at 10:00:00AM +0100, Smylers wrote:
: That'll make it easy for people porting PHP scripts to Perl 6 -- in
: particular for those wanting to port the security hole where a CGI
: parameter is used to form part of a filename opened by a script but a
: malicious user can supply a URL instead and cause the program to do
: things very different from what it intended.

PHP's security hole is that it treats tainting as NIH.  Putting http:
on the front of a filename is only one of several ways to attack open,
and open is far from the only spot vulnerable to injection attacks.

Larry


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-16 Thread TSa

HaloO,

Uri Guttman wrote:

[..] if
dirs mapped well onto file handles they would have been mapped that way
long ago in the OS. in 30+ years that hasn't happened afaik.


Hans Reiser is promoting just the unification of files and
directories in his Reiser4 filesystem. In particular does
it support opening a file as a directory for accessing the
file's metadata.

Also think of files that have internal structure like tar
files and libraries. With the matching IO plugin one can
access these files with the same API that is used for unix
style files/directories. In that sense it is a good idea to
provide a fatter interface that has got explicit opendir and
openfile methods in addition to plain open that dwims.

Regards, TSa.
--


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-16 Thread TSa

HaloO,

Jonathan Lang wrote:

But then, a file handle doesn't behave exactly
like standard in or standard out, either (last I checked, Perl 5
won't do anything useful if you say seek STDIN, 0, SEEK_END).


How should Perl 6 behave? I guess it's possible to return a
lazy list that captures STDIN from the time of call onwards.

Regards, TSa.
--


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-15 Thread John Macdonald
On Fri, Apr 13, 2007 at 08:14:42PM -0700, Geoffrey Broadwell wrote:
 [...] -- so non-dwimmy open
 variants are a good idea to keep around.
 
 This could be as simple as 'open(:!dwim)' I guess, or whatever the
 negated boolean adverb syntax is these days 

open(:file), open(:dir), open(:url), ... could be the non-dwimmy
versions.  If you don't specify an explicit non-dwimmy base
variant, the dwim magic makes a (preferrably appropriate) choice.

-- 


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-15 Thread Larry Wall
On Sun, Apr 15, 2007 at 01:16:32PM -0400, John Macdonald wrote:
: On Fri, Apr 13, 2007 at 08:14:42PM -0700, Geoffrey Broadwell wrote:
:  [...] -- so non-dwimmy open
:  variants are a good idea to keep around.
:  
:  This could be as simple as 'open(:!dwim)' I guess, or whatever the
:  negated boolean adverb syntax is these days 
: 
: open(:file), open(:dir), open(:url), ... could be the non-dwimmy
: versions.  If you don't specify an explicit non-dwimmy base
: variant, the dwim magic makes a (preferrably appropriate) choice.

I suspect that since we have a type system now we should probably use it
for the non-dwimmy versions.

my $io = IO::File.open($str)
my $io = IO::Pipe.open($str)
my $io = IO::Socket.open($str)
my $io = IO::Dir.open($str)
my $io = IO::URI.open($str)

etc.  And of course different kinds of objects can have different
defaults.  I'd guess the default for directories is to take a snapshot
and sort the entries, for instance.  Certainly the open is the only
place we have to distinguish the type, and $io.close will close
any of them.

To me the interesting question is, when do we assume that a string
is a filename or uri?  I can argue that for historical and security
reasons bare open() should always assume the provided string is a
normal filename.  However, given that the existence of feed operators
removes most of my objections to Ingy's io() interface, we could make
that default to uri processing:

io('http://www.wall.org/~larry') == my @homepage;

Though we have a bit of a semantic problem insofar as

@source == io('file:foo')

is going to want to supply more arguments to io() rather than send
the feed to some method of the IO object, unless io() is some kind of
a context-sensitive macro, or at least has a signature that doesn't
allow slurpies.  And currently feed ops are considered statement
terminators, which makes it odd to think about overloading them.
More of a problem is that multiple dispatch based on argument type
depends on eager evaluation of dynamic types, while feeds are basically
lazy.  We don't know how hard to call io() without recognizing it
as special, or specifying the actual method:

@source == io('file:foo').print

Maybe that's good enough, but it seems like we could do a little
better.  Hmm, type coercions tend to be unary, or at least not
listy, so maybe we can just recognize types as returning source
and sink objects, which feeds automatically call with an appropriate
variadic method (.lines, .print, .tap) depending on pointiness:

IO('http://www.wall.org/~larry') == my @homepage;  # implicit .lines
@source == IO('file:foo')  # implicit .print
@source == IO($debuglog) == @sink # implicit .tap

Larry


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-14 Thread Yuval Kogman
Why bother, actually, when it can just be a lazy list... Opendir and
closedir are very oldschool, and can be retained for whatever
technical detail they are needed, but in most modern code I think
that:

for readdir($dir_name) { .say }

should work as well.

The act of opening a directory is something I never quite got...
Even a directory with millions of entries is still peanuts in todays
memory sizes, and if it does need to be iterated very carefully the
old variants can still be around. readdir() returning a list doesn't
have to be inefficient but it's easier to screw up with it and make
it bloat.

-- 
  Yuval Kogman [EMAIL PROTECTED]
http://nothingmuch.woobling.org  0xEBD27418



pgpBdBGiCHX3R.pgp
Description: PGP signature


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-14 Thread Steve Peters
On Fri, Apr 13, 2007 at 07:43:23PM -0500, brian d foy wrote:
 As I was playing around with dirhandles, I thought What if... (which
 is actualy sorta fun to do in Pugs, where Perl 5 has everything
 documented somewhere even if nobody has read it).
 
 My goal is modest: explain fewer things in the Llama. If dirhandles
 were like filehandles, there's a couple of pages of explanation I don't
 need to go through.
 
 Witness:
 
 I can iterate through the elements of a named array with [EMAIL PROTECTED]:
 
my @a =  1 2 3 4 5 ;
for [EMAIL PROTECTED] { .say }   # but not = 1 2 3 4 5  :(
 
 and I can read lines from a file:
 
for =$fh { .say }
 
 Should I be able to go through a directory handle that way too? A yes
 answer would be very pleasing :)
 
my $dh = doc.opendir;
for =$dh { .say }# doesn't work in pugs
 
 And, since we're using objects now, .closedir can really just be
 .close, right? 
 
 And, maybe this has been already done, but wrapping a lazy filter
 around anything that can return items. I'm not proposing this as a
 language feature, but if many things shared the same way of getting the
 next item, perhaps I could wrap it in a lazy map-ish thingy:
 
my $general_iterator = lazy_mappish_thingy( doc.opendir ); 
 
for =$general_iterator { .say }
 
$general_iterator.close;  # or .end, or .whatever
 
 That last part is definetely not Llama material, but maybe I'll at
 least hit the haystack.

One of the things done for Perl 5.10 is to make dirhandles be a little
bit more like filehandles.  On OS's that allow it, things like

stat DIRHANDLE
-X DIRHANDLE
chdir DIRHANDLE

all make sense and do what you'd think they'd do.  

Steve Peters
[EMAIL PROTECTED]


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-14 Thread Smylers
Jonathan Lang writes:

 Also: why distinguish between open and opendir?  If the string is
 the name of a file, 'open' means open the file; if it is the name of
 a directory, 'open' means open the directory.

Many programs open a file from a name specified by the user.  Even if
Copenfile existed, many programmers would surely continue to use
Copen for this.

Users being able to trick such programs into opening a directory rather
than a file could be unpleasant.

Smylers


Should a dirhandle be a filehandle-like iterator?

2007-04-13 Thread brian d foy
As I was playing around with dirhandles, I thought What if... (which
is actualy sorta fun to do in Pugs, where Perl 5 has everything
documented somewhere even if nobody has read it).

My goal is modest: explain fewer things in the Llama. If dirhandles
were like filehandles, there's a couple of pages of explanation I don't
need to go through.

Witness:

I can iterate through the elements of a named array with [EMAIL PROTECTED]:

   my @a =  1 2 3 4 5 ;
   for [EMAIL PROTECTED] { .say }   # but not = 1 2 3 4 5  :(

and I can read lines from a file:

   for =$fh { .say }

Should I be able to go through a directory handle that way too? A yes
answer would be very pleasing :)

   my $dh = doc.opendir;
   for =$dh { .say }# doesn't work in pugs

And, since we're using objects now, .closedir can really just be
.close, right? 

And, maybe this has been already done, but wrapping a lazy filter
around anything that can return items. I'm not proposing this as a
language feature, but if many things shared the same way of getting the
next item, perhaps I could wrap it in a lazy map-ish thingy:

   my $general_iterator = lazy_mappish_thingy( doc.opendir ); 

   for =$general_iterator { .say }

   $general_iterator.close;  # or .end, or .whatever

That last part is definetely not Llama material, but maybe I'll at
least hit the haystack.


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-13 Thread Jonathan Lang

brian d foy wrote:

As I was playing around with dirhandles, I thought What if... (which
is actualy sorta fun to do in Pugs, where Perl 5 has everything
documented somewhere even if nobody has read it).

My goal is modest: explain fewer things in the Llama. If dirhandles
were like filehandles, there's a couple of pages of explanation I don't
need to go through.

Witness:

I can iterate through the elements of a named array with [EMAIL PROTECTED]:

   my @a =  1 2 3 4 5 ;
   for [EMAIL PROTECTED] { .say }   # but not = 1 2 3 4 5  :(

and I can read lines from a file:

   for =$fh { .say }

Should I be able to go through a directory handle that way too? A yes
answer would be very pleasing :)

   my $dh = doc.opendir;
   for =$dh { .say }# doesn't work in pugs

And, since we're using objects now, .closedir can really just be
.close, right?


Please.  I've always found the opendir ... readdir ... closedir set
to be clunky.

Also: why distinguish between open and opendir?  If the string is
the name of a file, 'open' means open the file; if it is the name of
a directory, 'open' means open the directory.  If it's the name of a
pipe, it opens the pipe.  And so on.

Note that the above could be further shorthanded, as long as you don't
need the directory handle after the loop:

   for =doc.open { .say }

--
Jonathan Dataweaver Lang


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-13 Thread Geoffrey Broadwell
On Fri, 2007-04-13 at 19:00 -0700, Jonathan Lang wrote:
 Please.  I've always found the opendir ... readdir ... closedir set
 to be clunky.
 
 Also: why distinguish between open and opendir?  If the string is
 the name of a file, 'open' means open the file; if it is the name of
 a directory, 'open' means open the directory.  If it's the name of a
 pipe, it opens the pipe.  And so on.

As long as you still have some way to reach the low-level opens --
though it's an odd thing to do (except perhaps in a disk integrity
checker), there's no fundamental reason why you shouldn't be able to
actually look at the bytes that happen to represent a directory
structure on disk.

Also, for security or correctness reasons you may want to make sure that
you don't clobber things you don't mean to -- so non-dwimmy open
variants are a good idea to keep around.

This could be as simple as 'open(:!dwim)' I guess, or whatever the
negated boolean adverb syntax is these days 


-'f




Re: Should a dirhandle be a filehandle-like iterator?

2007-04-13 Thread Jonathan Lang

Geoffrey Broadwell wrote:

Jonathan Lang wrote:
 Also: why distinguish between open and opendir?  If the string is
 the name of a file, 'open' means open the file; if it is the name of
 a directory, 'open' means open the directory.  If it's the name of a
 pipe, it opens the pipe.  And so on.

As long as you still have some way to reach the low-level opens --
though it's an odd thing to do (except perhaps in a disk integrity
checker), there's no fundamental reason why you shouldn't be able to
actually look at the bytes that happen to represent a directory
structure on disk.


It wouldn't be hard to allow .openfile, .opendir, and .openpipe as
well as .open.

--
Jonathan Dataweaver Lang


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-13 Thread Uri Guttman
 JL == Jonathan Lang [EMAIL PROTECTED] writes:


  JL Please.  I've always found the opendir ... readdir ... closedir set
  JL to be clunky.

  JL Also: why distinguish between open and opendir?  If the string is
  JL the name of a file, 'open' means open the file; if it is the name of
  JL a directory, 'open' means open the directory.  If it's the name of a
  JL pipe, it opens the pipe.  And so on.

maybe this won't help you but if you did open on a dir in perl5 you can
read the raw directory data which is pretty useless in most cases. so
with open working as opendir on directories, what is the op/method to
get the next directory entry? that isn't the same as reading a
line. there won't be any trailing newlines to chomp. marking a location
is not the same with tell and telldir (one is a byte offset, the other a
directory entry index). and since dirs can reorder their entries
(especially hash based dirs) the ordering and seek points may move. not
gonna happen on text files. there are many differences and the only one
you seem to see is a linear scan of them (which is just the most common
access style).

the operations you can do on the handles are very different as well. you
can't write to a dir. dirs have no random access (you can lookup by a
name with open but you can't go to the nth entry). and on OS with extra
stuff like version numbers, then all bets are off.

yes, you can tell the dir is such by doing a stat and then open can dwim
but i don't see the overlap as you do. dirs generally are ordered lists
of strings and have many different underlying formats based on their
file systems. mapping that to a text file of lines doesn't work for me.

this may all be obvious stuff but i think it deserves mentioning. if
dirs mapped well onto file handles they would have been mapped that way
long ago in the OS. in 30+ years that hasn't happened afaik.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-13 Thread Jonathan Lang

Uri Guttman wrote:

 JL == Jonathan Lang [EMAIL PROTECTED] writes:


  JL Please.  I've always found the opendir ... readdir ... closedir set
  JL to be clunky.

  JL Also: why distinguish between open and opendir?  If the string is
  JL the name of a file, 'open' means open the file; if it is the name of
  JL a directory, 'open' means open the directory.  If it's the name of a
  JL pipe, it opens the pipe.  And so on.

maybe this won't help you but if you did open on a dir in perl5 you can
read the raw directory data which is pretty useless in most cases. so
with open working as opendir on directories, what is the op/method to
get the next directory entry? that isn't the same as reading a
line. there won't be any trailing newlines to chomp. marking a location
is not the same with tell and telldir (one is a byte offset, the other a
directory entry index). and since dirs can reorder their entries
(especially hash based dirs) the ordering and seek points may move. not
gonna happen on text files. there are many differences and the only one
you seem to see is a linear scan of them (which is just the most common
access style).


Well, I did suggest that openfile and opendir exist alongside
open, with openfile being more akin to Perl 5's open or
sysopen, and open being a bit more dwimmy.

But in general, most of the differences that you mention are things
that ought to be addressed in the resulting iterators, not in the
creating statement.  No, a directory handle will not behave exactly
like a file handle.  But then, a file handle doesn't behave exactly
like standard in or standard out, either (last I checked, Perl 5
won't do anything useful if you say seek STDIN, 0, SEEK_END).

--
Jonathan Dataweaver Lang


Re: Should a dirhandle be a filehandle-like iterator?

2007-04-13 Thread Uri Guttman
 JL == Jonathan Lang [EMAIL PROTECTED] writes:

  JL Well, I did suggest that openfile and opendir exist alongside
  JL open, with openfile being more akin to Perl 5's open or
  JL sysopen, and open being a bit more dwimmy.

  JL But in general, most of the differences that you mention are things
  JL that ought to be addressed in the resulting iterators, not in the
  JL creating statement.  No, a directory handle will not behave exactly
  JL like a file handle.  But then, a file handle doesn't behave exactly
  JL like standard in or standard out, either (last I checked, Perl 5
  JL won't do anything useful if you say seek STDIN, 0, SEEK_END).

well, that seek failure is a result of the stream nature of stdin and
not a failure of perl. remember that open and much of the i/o layers
(regardless of perl I/O's rewrite) are just wrappers around the OS and
libc calls. i don't see how to dwim them all together (but IO::All does
that in a wacky dwim way). i have never felt the need for super smart
iterators so i can change looping over lines to looping over a
dir. maybe you might have a set of filenames in file vs a dir of
names. but i just don't run into that need. sometimes mappings like that
are just overkill IMO.

enough from me on this. as with the rest of p6 i will work with whatever
is decided by @larry.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org