Re: Working with files wish list
On Tue, Dec 16, 2008 at 7:04 AM, jason switzer wrote: > I hadn't seen a Nameable role mentioned yet, so I wasn't able to understand > any such concept. The list was not meant to be exhaustive. There are a lot more roles that have something to do with IO but were missing: Asynchronous IO, Datagram sockets, and more (specially when you add in platform specific stuff). The whole idea is that almost any operation you can do on one kind of handle can be done on some other kind too. There is a lot of overlap between seemingly unrelated types of handles. In this case, files and unix domain sockets are created and used in rather different ways, but both have a place in the filesystem, so it would be appropriate that they share the interface for retrieving that location. > That is a good idea, but the idea is so general that > anything can be nameable and thus the specificity of the role could quickly > become lost. I was suggesting specific naming functionalities be added to > the File role. If you want to abstract that, that's fine, but beware that > something like Nameable can be too broad of a role (maybe just IONameable?). I agree Nameable isn't the best name. How about Locatable? Having said that, I suppose a generic Nameable role providing a human readable designation to a handle (as opposed to one for computers) would be useful too, if only for error messages. > I haven't spent the time to understand mix-ins yet, but this does look like > a feasible (clean) idea. However, how do you specify one/more filters? For > example, say you want to :rw a file. How can you provide an input filter and > an output filter (or multiples of either)? Can you layer filters if done > with mix-ins? Stacking can be done: $fstab does WhitespaceTrim does TranslateKlingon; I'm not sure if this is the best way to approach the problem though. This is a power drill, a screwdriver will also do in many cases. > If so, how do you specify direction? That would be a problem with this approach. You'd probably need separate roles for readers and writers. Regards, Leon Timmermans
Re: Working with files wish list
On Tue, 16 Dec 2008, jason switzer wrote: You can already easily mix it in using 'does': $fstab = open('/etc/fstab', :r); $fstab does WhitespaceTrim; I don't think it's really necessary to include that into open(), though it might be useful syntactic sugar. I haven't spent the time to understand mix-ins yet, but this does look like a feasible (clean) idea. However, how do you specify one/more filters? For example, say you want to :rw a file. How can you provide an input filter and an output filter (or multiples of either)? Can you layer filters if done with mix-ins? If so, how do you specify direction? I've had somewhat related ideas in the past, but they always involved feed operators somehow. I don't think I've got enough of a grip on things to suggest specifics, but I thought I'd throw the idea into the mix. Also, if the wishlist covers filesystems as well as files (ie. opendir-type stuff), my wishlist involves having general tree-manipulation stuff that works with filesystems just as well as with XML; we could use XPath-like globs on filesystems, which would completely replace eg. File::Find, among other things. So you could ask for "/path/to/directory/child::*", which would get you the children, but not the parent ".." or the self ".". I've actually started code in this direction (ie. of a role, not the IO side of things), but stopped while waiting on the further development of Rakudo (ie. operator overloading). Anyway, just some thoughts thrown into the mix. :) - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+>++ PGP->+++ R(+) !tv b++ DI D G+ e++> h! y- -END GEEK CODE BLOCK-
Re: Working with files wish list
On Mon, Dec 15, 2008 at 6:59 PM, Leon Timmermans wrote: > On Mon, Dec 15, 2008 at 6:42 PM, jason switzer wrote: > > It's lazy and kinda cheating, but for small simple tasks, it gets the job > > done. I'm not up to speed with the IO spec, but a sort of auto-slurp > > functionality would be nice. Something to the effect: > > > > @data = :slurp("mydatafile.txt"); > > A slurp() function has been specced to slurp a file into a string, as > well as a lines() function that does the same into an array of lines. > Okay, that's good to know. > You didn't get the point of my Roles idea. It should not be added to a > role File, but to the Role Nameable, which would be composed into > whatever implements file filehandles, but for example also into Unix > sockets. IMNSHO interfaces and implementation should be kept separate > to maintain a proper abstraction level. > I hadn't seen a Nameable role mentioned yet, so I wasn't able to understand any such concept. That is a good idea, but the idea is so general that anything can be nameable and thus the specificity of the role could quickly become lost. I was suggesting specific naming functionalities be added to the File role. If you want to abstract that, that's fine, but beware that something like Nameable can be too broad of a role (maybe just IONameable?). > You can already easily mix it in using 'does': > > $fstab = open('/etc/fstab', :r); > $fstab does WhitespaceTrim; > > I don't think it's really necessary to include that into open(), > though it might be useful syntactic sugar. I haven't spent the time to understand mix-ins yet, but this does look like a feasible (clean) idea. However, how do you specify one/more filters? For example, say you want to :rw a file. How can you provide an input filter and an output filter (or multiples of either)? Can you layer filters if done with mix-ins? If so, how do you specify direction? -Jason "s1n" Switzer
Re: Working with files wish list
On Mon, Dec 15, 2008 at 6:42 PM, jason switzer wrote: > It's lazy and kinda cheating, but for small simple tasks, it gets the job > done. I'm not up to speed with the IO spec, but a sort of auto-slurp > functionality would be nice. Something to the effect: > > @data = :slurp("mydatafile.txt"); A slurp() function has been specced to slurp a file into a string, as well as a lines() function that does the same into an array of lines. > I think File::Path::canonpath and File::Path::path would be > nice attributes to add to the File role. You didn't get the point of my Roles idea. It should not be added to a role File, but to the Role Nameable, which would be composed into whatever implements file filehandles, but for example also into Unix sockets. IMNSHO interfaces and implementation should be kept separate to maintain a proper abstraction level. > I would imagine a filter role would be useful. If they're roles, it allows > people to build layers of functionality on them to do various different > kinds of filters, turn them on and off, etc. With filters as roles, I would > love to imagine something like this: > > my File $fstab = new(:name, :filter) > You can already easily mix it in using 'does': $fstab = open('/etc/fstab', :r); $fstab does WhitespaceTrim; I don't think it's really necessary to include that into open(), though it might be useful syntactic sugar. Regards, Leon Timmermans
Re: Working with files wish list
Leon Timmermans wrote: On Mon, Dec 15, 2008 at 5:43 PM, Richard Hainsworth wrote: a) I am fed up with writing something like open(FP, ">${fname}_out.txt") or die "Cant open ${fname}_out.txt for writing\n"; The complex definition of the filename is only to show that it has to be restated identically twice. my $fh = open '>', $filename, :errorstring("Could not open %file: %error"); It doesn't repeat itself, but still gives the programmer the chance to add a helpful message. I assume that the return value of C will be an unthrown exception (via C) if the file can't be opened. If your failure mode doesn't cause an immediate failure then it would die when used as an IO. The stringification of that failure object would presumably print a useful error message -- and you could override it by handling it yourself.
Re: Working with files wish list
> "LT" == Leon Timmermans writes: >> e) When dealing with files in directories in perl5 under linux, I need >> >> opendir(DIR,'./path/') or die "cant open ./path/\n"; >> >> my @filelist = grep { /^.+\.txt/ } readdir(DIR); >> >> I would prefer something like >> >> my Location $dir .= new(:OSpath<'./data'>); >> >> and without any further code $dir contains an Array ($d...@elems) or Hash >> ($dir.%elems) (I dont know which, maybe both?) of File objects. If a Hash, >> then the keys would be the stringified .name attribute of the files. >> >> No need to opendir or readdir. Lazy evaluation could handle most situations, LT> I agree there should be a single function that combines opendir, LT> readdir and closedir. Scalar readdir can be useful in some context, LT> but in my experience it's the less common usage of it. From a LT> programmers point of view lazy operation would be convenient, but from LT> a resource management point of view that may be a bit complicated. as another responder mentioned File::Slurp, i want to say it contains a read_dir function (note the _ in the name). it does a slurp of a dir and always filters out . and .. . i have plans to have it take an optional qr and grep the dir list for you. something like this: my @dirs = read_dir( $dirpath, qr/\.txt$/ ); again, i agree these functions should not be named new() as open and readdir have more meaning. how could you tell you were opening a file vs a dir with just open? many coders may not even know that open in p5 will work on a dir! you just open the file and can read the raw dir data which will likely look like garbage unless you have the correct filesystem c structures to decode it. so you must have some way to designate to the open/new that this is a dir. the whole issue of portable paths is another problem but i can't address that. thanx, uri -- Uri Guttman -- u...@stemsystems.com http://www.sysarch.com -- - Perl Code Review , Architecture, Development, Training, Support -- - Free Perl Training --- http://perlhunter.com/college.html - - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com -
Re: Working with files wish list
On Mon, Dec 15, 2008 at 5:43 PM, Richard Hainsworth wrote: > Following the request for ideas on IO, this is my wish list for working with > files. I am not a perl guru and so I do not claim to be able to write > specifications. But I do know what I would like. > > The organisation of the IO as roles seems to be a great idea. I think that > what is suggested here would fall in naturally with that idea. > > Suggestions: > > a) I am fed up with writing something like > > open(FP, ">${fname}_out.txt") or die "Cant open ${fname}_out.txt for > writing\n"; > > The complex definition of the filename is only to show that it has to be > restated identically twice. > > Since the error code I write (die "blaa") is always the same, surely it can > be made into a default that reports on what caused the die and hidden away > as a default pointer to code that can be overridden if the programmer wants > to. > You could think along the lines of my $fh = open '>', $filename, :errorstring("Could not open %file: %error"); It doesn't repeat itself, but still gives the programmer the chance to add a helpful message. > b) Why do I have to 'open' anything? Surely when software first identifies a > File object (eg., names it) that should be sufficient signal to do all the > IO things. So, I would love to write > > my File $file .= new(:name); > > my File $output .=new(:name, :mode); > > and then: > > while $file.read {…}; > > or: > > say "Hello world" :to<$output>; > > The defaults would include error routines that die if errors are > encountered, read as the default mode, and a text file with EndOfLine > markers as the file type. Obviously, other behaviours, such as not dying, > but handling the lack of a file with a request to choose another file, could > be accommodated by overridding the appropriate role attribute. > > The suggestion here is that the method "say" on a File object is provided in > a role and has some attributes, eg., $.error_code, that can be assigned to > provide a different behaviour. open() is an idiom, and not an inappropriate one at that IMHO, it carries a meaning with it. Even someone who program's another language will understand what's going on when you say open. When you say new, that isn't necessarily the case. IMHO the word new focuses to much on the object, while the resource it holds is far more important. > c) I want the simplest file names for simple scripts. As Damian Conway has > pointed out, naming a resource is a can of worms. I work with Cyrillic texts > and filenames and still have problems with the varieties of char sets. > Unicode has done a lot, but humans just keep pushing the envelop of what is > possible. I don't think there will ever be a resolution until humanity has a > single language and single script. > > It seems far better to me for standard resource names to be constrained to > the simplest possible for 'vanilla' perl scripts, but also to let the > programmer access the underlying bit/byte string so they can do what they > want if they understand the environment. > > The idea of 'stringification', that is providing to the programmer for use > inside the program a predictable representation of a complex object, also > seems to me to be something to exploit. In the case of a resource name, the > one most easily available to the programmer would be a 'stringified' version > of the underlying stream of bytes used by the operating system. > > Eg. if a File object located in some directory under some OS would have both > $file.name as a unicode representation and a $file.underlying_name with some > arbitrary sequence of bits with a semantics known only to the OS (and the > perl implementation). We talked about such issues before. Fact is, many unices don't use Unicode for filenames, but blobs. This means that you can't assume that filenames will be valid Unicode. I'm not sure how to solve that cleanly and portably. I suspect there is no way to do it that is both clean and portable, and we'll have to choose :-/. > d) It would be nice to specify filters on the incoming and outgoing data. I > find I do the following all the time in perl5: > > while () {chop; …}; > > So my example above, viz., > > while $file.read { … }; > > would automatically provide $_ with a line of text with the EOL chopped off. > > Note that the reverse (adding an EOL on output) is so common that perl6 now > has 'say', which does this. > > Could this behaviour (filtering off and on the EOL) be made a part of the > standard "read" and "say" functions? Autochomping is already in the language. It's very underspecified though. > e) When dealing with files in directories in perl5 under linux, I need > > opendir(DIR,'./path/') or die "cant open ./path/\n"; > > my @filelist = grep { /^.+\.txt/ } readdir(DIR); > > I would prefer something like > > my Location $dir .= new(:OSpath<'./data'>); > > and without any further code $dir contains an Array ($d...@elems) or Hash > ($dir.%elems) (I dont know which, maybe b
Re: Working with files wish list
On Mon, Dec 15, 2008 at 10:43 AM, Richard Hainsworth wrote: > a) I am fed up with writing something like > > open(FP, ">${fname}_out.txt") or die "Cant open ${fname}_out.txt for > writing\n"; > > The complex definition of the filename is only to show that it has to be > restated identically twice. > > Since the error code I write (die "blaa") is always the same, surely it can > be made into a default that reports on what caused the die and hidden away > as a default pointer to code that can be overridden if the programmer wants > to. > > b) Why do I have to 'open' anything? Surely when software first identifies > a File object (eg., names it) that should be sufficient signal to do all the > IO things. So, I would love to write > > my File $file .= new(:name); > > my File $output .=new(:name, :mode); > You've essentially replaced the klunky mode parameters in perl5 (">file") with a cleaner role constructor. I would argue that an autodie built-in feature would be nice with perl but it may not be a good idea to always force a die. Maybe it could be a feature/macro that could be turned on (like 'use autodie;'). > and then: > > while $file.read {…}; > > or: > > say "Hello world" :to<$output>; I usually do something more like: @ARGV = ("file.txt"); @data = <>; It's lazy and kinda cheating, but for small simple tasks, it gets the job done. I'm not up to speed with the IO spec, but a sort of auto-slurp functionality would be nice. Something to the effect: @data = :slurp("mydatafile.txt"); That's just a crude example and possibly not even valid perl6, but it would be nice to have a quick read-only file slurping functionality. This is usually the first thing I hack into larger scripts so that I can forget about doing IO (means to an end). In fact, File::Slurp does this right now in perl5. > c) I want the simplest file names for simple scripts. As Damian Conway has > pointed out, naming a resource is a can of worms. I work with Cyrillic texts > and filenames and still have problems with the varieties of char sets. > Unicode has done a lot, but humans just keep pushing the envelop of what is > possible. I don't think there will ever be a resolution until humanity has a > single language and single script. > > It seems far better to me for standard resource names to be constrained to > the simplest possible for 'vanilla' perl scripts, but also to let the > programmer access the underlying bit/byte string so they can do what they > want if they understand the environment. > > The idea of 'stringification', that is providing to the programmer for use > inside the program a predictable representation of a complex object, also > seems to me to be something to exploit. In the case of a resource name, the > one most easily available to the programmer would be a 'stringified' version > of the underlying stream of bytes used by the operating system. > > Eg. if a File object located in some directory under some OS would have > both $file.name as a unicode representation and a $file.underlying_name > with some arbitrary sequence of bits with a semantics known only to the OS > (and the perl implementation). It would actually be nicer if I the filename defaulted to my platform and I there were naming convention converters provided. I don't know how that should look and it actually sounds like something that should probably be provided by modules. Some of what File::Spec provides now would be nice built in, but how much is up for debate. I think File::Path::canonpath and File::Path::path would be nice attributes to add to the File role. Allowing access to the filter function (allowing a programmer the ability to > override an attribute) could be quite useful. For example, suppose the role > providing getline includes an attribute with default > > $.infilter = { s/\n// }; # a good implementation would have different rules > for different OS's > > and this can be overridden with > > $.infilter = { .trans ( /\s+/ => ' ' ) }; # squash all white space to a > single space > or > $.infilter = { s/\n//; split /\t/ }; I would imagine a filter role would be useful. If they're roles, it allows people to build layers of functionality on them to do various different kinds of filters, turn them on and off, etc. With filters as roles, I would love to imagine something like this: my File $fstab = new(:name, :filter) Yet another crude example, but imagine once the whitespace cleaner above trimmed things down, and output filter could then realign them. I see more utility if the filter were a role than some $.infilter scalar that can be clobbered by multi-threaded applications. > Perhaps, too a module for a specific environment, eg., Windows, would > provide the syntatic sugar that makes specifying a location look like > specifying a directory natively, eg. > use IO::Windows; > my Location $x .= new(:OSpath); > whilst for linux it would be > use IO::Linux; > my Location $x .=new(:OSpath); This looks like a good start to the w
Working with files wish list
Following the request for ideas on IO, this is my wish list for working with files. I am not a perl guru and so I do not claim to be able to write specifications. But I do know what I would like. The organisation of the IO as roles seems to be a great idea. I think that what is suggested here would fall in naturally with that idea. Suggestions: a) I am fed up with writing something like open(FP, “>${fname}_out.txt”) or die “Cant open ${fname}_out.txt for writing\n”; The complex definition of the filename is only to show that it has to be restated identically twice. Since the error code I write (die "blaa") is always the same, surely it can be made into a default that reports on what caused the die and hidden away as a default pointer to code that can be overridden if the programmer wants to. b) Why do I have to 'open' anything? Surely when software first identifies a File object (eg., names it) that should be sufficient signal to do all the IO things. So, I would love to write my File $file .= new(:name); my File $output .=new(:name, :mode); and then: while $file.read {…}; or: say “Hello world” :to<$output>; The defaults would include error routines that die if errors are encountered, read as the default mode, and a text file with EndOfLine markers as the file type. Obviously, other behaviours, such as not dying, but handling the lack of a file with a request to choose another file, could be accommodated by overridding the appropriate role attribute. The suggestion here is that the method "say" on a File object is provided in a role and has some attributes, eg., $.error_code, that can be assigned to provide a different behaviour. c) I want the simplest file names for simple scripts. As Damian Conway has pointed out, naming a resource is a can of worms. I work with Cyrillic texts and filenames and still have problems with the varieties of char sets. Unicode has done a lot, but humans just keep pushing the envelop of what is possible. I don't think there will ever be a resolution until humanity has a single language and single script. It seems far better to me for standard resource names to be constrained to the simplest possible for 'vanilla' perl scripts, but also to let the programmer access the underlying bit/byte string so they can do what they want if they understand the environment. The idea of 'stringification', that is providing to the programmer for use inside the program a predictable representation of a complex object, also seems to me to be something to exploit. In the case of a resource name, the one most easily available to the programmer would be a 'stringified' version of the underlying stream of bytes used by the operating system. Eg. if a File object located in some directory under some OS would have both $file.name as a unicode representation and a $file.underlying_name with some arbitrary sequence of bits with a semantics known only to the OS (and the perl implementation). d) It would be nice to specify filters on the incoming and outgoing data. I find I do the following all the time in perl5: while () {chop; …}; So my example above, viz., while $file.read { … }; would automatically provide $_ with a line of text with the EOL chopped off. Note that the reverse (adding an EOL on output) is so common that perl6 now has 'say', which does this. Could this behaviour (filtering off and on the EOL) be made a part of the standard “read” and “say” functions? Allowing access to the filter function (allowing a programmer the ability to override an attribute) could be quite useful. For example, suppose the role providing getline includes an attribute with default $.infilter = { s/\n// }; # a good implementation would have different rules for different OS's and this can be overridden with $.infilter = { .trans ( /\s+/ => ' ' ) }; # squash all white space to a single space or $.infilter = { s/\n//; split /\t/ }; then a call to $file.read would assign an array to $_ ( or would it be @_ ?) Filtering the outgoing data would be similar to using a format string with printf, but associating it with the IO object rather than with a specific printf statement. Thus suppose instead of a file, the IO object is a stream associated with the internet and the role that provides “say” as a method on a stream object has $.outfiler as an attribute, then overidding $.outfilter = { s[(.*)] = “$1\n” }; with $.outfilter = { s[(.*)] = “$1” } would mean (I think) that say “hello world” :to<$stream>; would generate the http stream Hello World (Yes I know, the space should be coded, but hopefully the idea is clear.) e) When dealing with files in directories in perl5 under linux, I need opendir(DIR,'./path/') or die “cant open ./path/\n”; my @filelist = grep { /^.+\.txt/ } readdir(DIR); I would prefer something like my Location $dir .= new(:OSpath<'./data'>); and without any further code $dir contains an Array ($d...@ele