On 2009-Aug-14, at 4:34 am, David Green wrote:
There's a lot of scope for a nice, friendly, high-level IO view;
perhaps we need an IO-working group to assemble knowledge about what
is and isn't possible with different filesystems and design an
interface around it all.
It won't be possible to unify every different FS function, but it can
keep the basics (like "open") as consistent as possible across
platforms, and provide idiomatic Perl expressions so everything at
least tries to feel "perlish". Some things may simply be impossible
to manage transparently (like filenames with no identifiable
encoding), but even having an organised way to identify the possible
problems will be helpful. (A lot of similar issues apply to
databases, despite which -- or because of which -- it's so nice to
have the DBI project.)
All the low-level features have to be there, of course, to open files
and seek through them, etc., etc., but Perl is not a low-level
language; we have (in P5) the magical <>, but the "easy" way to deal
with files should always apply (that is, be available anywhere you
don't absolutely need to manipulate low-level details).
The idea is to treat files as data, because that's what we're really
interested in: their contents, not all their surrounding metadata. If
you're writing a file-manager, then you need lots of FS-specific
detail; in fact, you probably don't care about the contents of files
but only their metadata. However, it's more usual to care only about
the contents and use a bit of metadata (like the name) merely as a way
to get the contents. The high-level interface shouldn't be concerned
with file "handles" or pointers, but simply with "files", and not even
as concerned with files as with *data*.
In other words, the focus should be on a data-structure, not on the
file itself. A text file is really just a way to store a string; an
XML file is a way to encode and store a structured tree object.
my Str $poem = io file://jabberwocky.txt;
my Image::JPEG $camel = io file://dromedary.jpg;
my XHTML::Doc $x is readonly = io http://foo.org/a%20doc;
Then you simply work with your string, or tree, or image-object. An
XHTML::Doc object, for instance, would really contain nodes, but it
would know how to encode or decode itself as a series of plain-text
tags for reading or writing to disk. The old-fashioned way to handle
this is to open the file, get a filehandle, read the filehandle into a
string, take the string[s] and XHTML::Doc->parse() them. The new way
would be to have the parse(Str) method be how XHTML::Doc objects do
the IO::input() role, and the IO class takes care of the opening and
reading behind the scenes.
Anything could mix in the basic IO role so that an object can load or
save itself. Different data structures would override the necessary
methods to format the data the right way. The type of IO object would
determine how metadata works (i.e. whether it's a file or an HTTP
stream or a directory, etc.).
Objects would handle other functionality in whatever ways make sense
for that object. For example, iterating a plain-text file would split
it into lines. (This suggests that maybe ordinary Strings could split
into lines in @-context? Hm.) A Spreadsheet::OpenOffice object might
instead provide a list of worksheets when acting like an array (or
hash, with sheet names as the keys). A raw binary file of
uninterpreted data would just be a simple Blob object.
Assuming that there existed a consistent definition of Table objects
that could be coerced one to another, we could do things like:
my Table::SQL $input is readonly := io dbi:Pg:employees/active;
my Table::Spreadsheet::Excel $output := io file://Active-Emp.xls;
$output<Sheet #1> = $input; # read from input, write to output
say "Data from $input.io.name() saved to $output.io.name()";
Now of course there are lots of things that such a broad interface
isn't prepared to handle, but that's what the lower levels are for.
If .io provides access to the intermediate interface, then you can
start with course-grained control, and delve into the particulars if
or when you need finer control (access to OS- or FS-specific calls).
Above that, there is room for a very high-level IO functionality that
glosses over a lot of details, because an awful lot of code just
doesn't need all the specifics.
-David