i thought the discussion had actually progressed a little further than
might be obvious from
http://www.haskell.org/haskellwiki/Top_level_mutable_state
here is my summary of what i thought was the state of the discussion,
followed by a hopefully simpler proposal.
first, i'd like to distinguish between two aspects of the problem:
(a) shared identifiers
(b) shared initialisation
by (a), i mean identifiers shared between several functions, referring
to the same item, such as 'System.IO.stdin' referring to _the_ standard
input handle. by (b), i mean initialisation code that needs to be run once
before a set of functions may be used, such as
'Network.Socket.withSocketsDo'.
second, note that neither (a) nor (b) represents a problem for non-IO
code: let-binding takes care of (a) and (b).
the problem arises when (a) and (b) are combined with IO-based code,
in particular, if the shared identifiers of (a) stand for IO-based items,
such as IO-based initialisation in (b). the source of trouble lies in
the interaction of let-bindings with IO-based code.
we know how to share descriptions of IO actions:
let a = putStr "hi ho" -- 1
in a >> a
we also know how share the results of IO actions:
do r <- getLine -- 2
return (r,r)
in (1), the action description is shared, but the action itself is
executed, possibly repeatedly, after substitution, in (2), the action is
executed before substitution and before continuation (through monadic
bind), and its result is shared.
what we do not know is how to share IO actions themselves in a
demand-driven way, ie how to describe an IO action that is executed at
most once, only on demand, with shared result. i thought this had become
clear through Adrian's oneShot examples, but it seems to be mentioned
only as a sideline on the Top_level_mutable_state page.
if we had the ability to specify shared IO actions, this would directly
address the issue of (a) in the case of IO-based things, and that
IO-based (a) could then be used to make IO-based (b) more convenient.
aside: even IO-based (b) is not in itself impossible, it is just
inconvenient and error-prone. a module could provide an initialisation
action and require that to be executed before any of its other actions
may be called. this is inconvenient, especially if initialisation
generates results that need to be passed to several other actions, and
it is error-prone, because Haskell does not directly support protocol
types (guaranteeing that initialisation is always called before any of
the other actions, and is only called once). we can address the
'before' typing by having all other actions take a parameter of a type
that can only be produced by the initialisation code, but we cannot
guarantee that initialisation will be called at most once, without
relying on the user, or on a solution to (a). the latter brings us
back to the sharing of IO actions.
that is exactly what "the unsafePerformIO hack" tries to achieve:
myGlobalVar :: IORef Int
{-# NOINLINE myGlobalVar #-}
myGlobalVar = unsafePerformIO (newIORef 17)
we specify an IO action ('newIORef'), we specify a shared name for it
('myGlobalVar'), with a monomorphic type ('IORef Int'), we specify that
we do not want the action description to be substituted before execution
('NOINLINE', usually also '-fno-cse'), to avoid the repeated execution
shown in (1) above, and we do _not_ specify that the action should be
executed before continuation, as it would be in (2) above, by _not_
using monadic bind. what the 'unsafePerformIO' does is to ensure that
the action is executed before substitution. taken together, we get a
by-need sharing of IO action execution similar to by-need sharing of
expression evaluation.
the aspects that make this a somewhat brittle and unsafe 'hack' are the
use of a pragma to ensure semantics, the use of the 'unsafePerformIO'
hook to extend the evaluator, the type that is no longer IO-based (it is
'IORef Int', not 'IO (IORef Int)'), and the implicit constraint to a
monomorphic type. quite a complex interaction of features. it is great
that these features allow us to experiment with possibilities not
originally planned for in language or evaluator. but now that
experimentation has settled down to a common pattern of using these
extension hooks for a particular class of problems, it seems sensible to
integrate that pattern into the language and evaluator proper,
addressing the safety issues at the same time. which is what all this
discussion has been about.
among the approaches suggested, we have seen first-class modules (do the
initialisation on import or export, then pass the result to the whole
module, rather than to each function in it), top-level "mdo" (collect
commuting IO actions, and execute them at a sensible point in time
out-of-line), and type-based indexing (use types as shared identifiers).
i like first-class modules, but they would be a rather substantial
change to Haskell; i almost like the top