Re: Input / Output encoding filters.

Martin D Kealey Tue, 21 Feb 2006 15:02:02 -0800

I'm a bit slow coming back to this, sorry.

It seems that "seek" is used in two ways:


    * returning to some previously identified point (including the start or
      end of the file)
    * moving a given number of characters you want to move relative to a
      known location

Clearly you can always do the first, just by using the underlying byte
offset without regard for the encoding. If you have a fixed number of bytes
per character then you can trivially do the second as well.

But if you have a variable-length encoding then you have to read through the
byte stream to get to the position you want; this might or might not be
desirable depending on the characteristics of the underlying stream.

Furthermore it makes (some) sense always to be able to seek *forwards* --
even on a tty device -- but not backwards.

So my suggestion is that we change the interface to "seek", and have
separate parameters for the "previously known position" and the
"character offset". The latter is obviously just an integer, but the
first is a black-box token -- maybe a PMC, but more likely a mangled
integer -- to ensure that the two args are distinguishable.

(Please excuse me as I discuss this in terms of a HLL rather than
Parrot...)

In other words, change this:

 $fpos = $io.tell();
 $io.seek(SEEK_SET, $fpos)

to this:

 ...
 $io.seek($fpos, 0)

or for brevity, just this:

 ...
 $io.seek($fpos)

Now SEEK_SET, SEEK_CUR and SEEK_END just become special cases of
"previously known positions". And I'm tempted to say that they should be
spelt "0", "undef" and "-1" respectively.

Thence it's fairly straightforward for the units of "seek" to be
whatever you find convenient: counting whole records, or lines of text,
or whatever.

Clearly this needs to be discussed in p6-lang, but having separated the
two parameter types, the filter can decide which it can implement, and
how.

-Martin

Re: Input / Output encoding filters.

Reply via email to