Files and IO and all

David Green Fri, 14 Aug 2009 04:33:04 -0700

On 2009-Aug-14, at 4:34 am, David Green wrote:

There's a lot of scope for a nice, friendly, high-level IO view;perhaps we need an IO-working group to assemble knowledge about whatis and isn't possible with different filesystems and design aninterface around it all.

It won't be possible to unify every different FS function, but it cankeep the basics (like "open") as consistent as possible acrossplatforms, and provide idiomatic Perl expressions so everything atleast tries to feel "perlish". Some things may simply be impossibleto manage transparently (like filenames with no identifiableencoding), but even having an organised way to identify the possibleproblems will be helpful. (A lot of similar issues apply todatabases, despite which -- or because of which -- it's so nice tohave the DBI project.)

All the low-level features have to be there, of course, to open filesand seek through them, etc., etc., but Perl is not a low-levellanguage; we have (in P5) the magical <>, but the "easy" way to dealwith files should always apply (that is, be available anywhere youdon't absolutely need to manipulate low-level details).

The idea is to treat files as data, because that's what we're reallyinterested in: their contents, not all their surrounding metadata. Ifyou're writing a file-manager, then you need lots of FS-specificdetail; in fact, you probably don't care about the contents of filesbut only their metadata. However, it's more usual to care only aboutthe contents and use a bit of metadata (like the name) merely as a wayto get the contents. The high-level interface shouldn't be concernedwith file "handles" or pointers, but simply with "files", and not evenas concerned with files as with *data*.

In other words, the focus should be on a data-structure, not on thefile itself. A text file is really just a way to store a string; anXML file is a way to encode and store a structured tree object.


    my Str $poem = io file://jabberwocky.txt;
    my Image::JPEG $camel = io file://dromedary.jpg;
    my XHTML::Doc $x is readonly = io http://foo.org/a%20doc;

Then you simply work with your string, or tree, or image-object. AnXHTML::Doc object, for instance, would really contain nodes, but itwould know how to encode or decode itself as a series of plain-texttags for reading or writing to disk. The old-fashioned way to handlethis is to open the file, get a filehandle, read the filehandle into astring, take the string[s] and XHTML::Doc->parse() them. The new waywould be to have the parse(Str) method be how XHTML::Doc objects dothe IO::input() role, and the IO class takes care of the opening andreading behind the scenes.

Anything could mix in the basic IO role so that an object can load orsave itself. Different data structures would override the necessarymethods to format the data the right way. The type of IO object woulddetermine how metadata works (i.e. whether it's a file or an HTTPstream or a directory, etc.).

Objects would handle other functionality in whatever ways make sensefor that object. For example, iterating a plain-text file would splitit into lines. (This suggests that maybe ordinary Strings could splitinto lines in @-context? Hm.) A Spreadsheet::OpenOffice object mightinstead provide a list of worksheets when acting like an array (orhash, with sheet names as the keys). A raw binary file ofuninterpreted data would just be a simple Blob object.

Assuming that there existed a consistent definition of Table objectsthat could be coerced one to another, we could do things like:


    my Table::SQL $input is readonly := io dbi:Pg:employees/active;
    my Table::Spreadsheet::Excel $output := io file://Active-Emp.xls;

    $output<Sheet #1> = $input;    # read from input, write to output
    say "Data from $input.io.name() saved to $output.io.name()";

Now of course there are lots of things that such a broad interfaceisn't prepared to handle, but that's what the lower levels are for.If .io provides access to the intermediate interface, then you canstart with course-grained control, and delve into the particulars ifor when you need finer control (access to OS- or FS-specific calls).Above that, there is room for a very high-level IO functionality thatglosses over a lot of details, because an awful lot of code justdoesn't need all the specifics.



-David

Files and IO and all

Reply via email to