>     Malcolm> In fact, Binary allows you to process the data directly
>     Malcolm> from the disk rather than hauling it all into memory.
>     Malcolm> This eliminates start-up time altogether.  What's more,
>     Malcolm> provided the file is used read-only, you can use pure
>     Malcolm> lazy functions to read the data, rather than having to
>     Malcolm> sequence everything through the IO monad.
> 
> Is the fact that the processing is "directly from disk" transparent to
> the programmer

Absolutely.

> or does the processing function have to be written
> differently to avoid startup time ?

No - the program can be identical but for one line, which opens the file.

> Does 'get' return its result lazily ?

There are two functions for reading, with different semantics - 'getAt'
returns its result eagerly in the IO monad, while 'getFAt' is a pure
function which returns its result lazily.

  class Binary where
    ...
    getAt  :: BinHandle -> BinPtr a -> IO a
    getFAt :: BinHandle -> BinPtr a -> a

The latter obviously requires some guarantee that the file cannot be
modified between the building of the thunk and the demand which
evaluates it.  So the 'getFAt' function only gives a result when the
file is in RO (read-only) mode.  (After writing to a file, it can be
"frozen" to RO mode using the 'freezeBin' operation.)

On the other hand, 'getAt' has to return its result eagerly, because
there is always the possibility that a subsequent 'putAt' operation
will overwrite that part of the file before the value read earlier
is needed.

> I've been looking at the html doc and the nhc Binary src, but I can't
> quite see a transparent way of processing data on disk. There is the
> putAt operation, but that needs a file pointer; and there is the
> BinArray, which can only be written sequentially.

Sorry I wasn't clear.  The original question was from someone who
wished only to *read* a huge quantity of data from a file in order to
do some processing on it - but not to write any data back.  I was
claiming that with the Binary library s/he could process the data
transparently direct from disk without having to load everything into
memory at once.  As I understand it, the quantity of data was so large
that having enough memory for everything was a real bottleneck.
Provided the processing function is in some sense incremental - does
not require *all* the data at once - then Binary will be a win.

As you say however, more general processing which involves update to
a structure is a rather different beast.  The Binary library does not
support transparent update in the way you describe, where only the
modified part of the value is written back.  As you note, the whole
(large) structure has to be written back.

Regards,
    Malcolm

[EMAIL PROTECTED]
Dr Malcolm Wallace (functional programming research)   +44 1904 434756
Department of Computer Science, University of York, YORK YO1 5DD, U.K.
------------------------------------http://www.cs.york.ac.uk/~malcolm/


Reply via email to