Re: unique identifiers as a separate library
Isaac Dupree wrote: Iavor Diatchki wrote: - It uses unsafeDupableInterleaveIO to avoid double locking, in particular, gen r = unsafeDupableInterleaveIO $ do v - unsafeDupableInterleaveIO (genSym r) ls - gen r rs - gen r return (Node v ls rs) where is the double locking? We want referential transparency... e.g. suppose you use newNumSupply to create a thunk for a Gen; when evaluated, it will run unsafeDupableInterleaveIO. You send that thunk off to two different other threads. Then those threads decide to evaluate it (say, enough to get the first genSym'd value) and happen to make a race condition, so it becomes two separate IO computations. Therefore one of them runs atomicModifyIORef, and the other one runs atomicModifyIORef, and so they get two different values out. Node 0 (...) (...) Node 1 (...) (...) when it's suppose to be the very same Gen data structure. so, am I right or wrong about what the perils of unsafeDupableInterleaveIO are? Oops! Yes, you're quite right. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Hi, Thanks for the feedback! Suggestions implemented in your daily value-supply release :) Happy Holidays! -Iavor On Wed, Dec 24, 2008 at 7:46 AM, Isaac Dupree isaacdup...@charter.net wrote: Iavor Diatchki wrote: Also, your implementation of it could be more efficient: it doesn't need to do locking, so I suggest modifyIORef rather than atomicModifyIORef (Actually you'll have to use readIORef = writeIORef return, instead, because modifyIORef has a different type than atomicModifyIORef). I don't think that that's quite right. I was thinking that it should be OK to use different supplies that share a reference in different threads. So, for example, split the supply and pass each version to a new thread. This should be OK with the dupable primitives because the thunks would be evaluated in different threads. However, we still need the synchronize access to the reference, or we'll get incorrect values. That's true (if you're careful enough about which thread is evaluating the splitting -- the existing split* functions are actually *not* strict enough for this, and they probably shouldn't be either, so an additional warning that the user may have to add some `seq`s or `evaluate`s might be needed). Anyway, if atomicModifyIORef isn't that big of an overhead, then no problem :-) in fact, for lists (as you get a incomplete-pattern-match warning there, but you know the list is always infinite, because you made it with iterate), you could instead use an infinite-list type, Data.Stream from package Stream[*]; as Stream is not a sum type (it only has one possible constructor: Cons), it might even be a bit more efficient! You are right but the benefit is small enough that I don't think it warrants a dependency on another package. I'm going to try to argue to convince you that it's entirely appropriate to use Data.Stream :-) From my somewhat mathematical point of view, that is. 1. Data.Stream is a small module that implements a very well-known, practically standard (though under-used), and simple data type. All you really need from it is import Data.Stream(Stream(Cons)), plus (iterate, streamToList) if you're keeping your current interface. It's nothing you should be afraid to depend on, if you're using its concept (which you are). It's much smaller than the 'containers' package, which similarly you don't use but if you needed a Map or something you obviously should. 2. The more compelling argument, that it took me a good night's sleep to think of: Supply is an infinite binary tree. Stream is an infinite unary tree (er, more commonly called infinite list). They're both codata. They're both comonads. (although they don't go so far as to depend on category-extras to provide an instance.) The primary function/purpose of Supply is an *efficient* way to turn a Stream into a Supply. I daresay it would, even, be more fundamental to provide interface newSupply :: Stream a - IO (Supply a) than newSupply :: a - (a - a) - IO (Supply a) (although it might be worth keeping both interfaces? mainly for compatibility, since one's just a Data.Stream.iterate away from the other and your haddocks could say so) In fact, it's terribly annoying to turn a (Stream a) into a (a, a - a) -- in fact I don't think it can be done in general (you can turn it into a (b, b - (a, b)) though a-la unfold, with b = Stream a) -- so you should definitely provide the above (Stream a - IO (Supply a)) interface. (Although if you're cowardly enough about the extra dependency, I guess you could make it the riskier ([a] - IO (Supply a)), risk being if the user provides a finite list... Sorry for conditionally insulting you; it seems a horribly underhanded thing for me to do :-) Likewise, it would be nice for split :: Supply a - Stream (Supply a) ...and then you would not even depend on Data.List anymore! (I don't happen to think the arguments that Data.List is better than Stream for definitely-infinite lists are very convincing; except possibly that List will be more up-to-date with respect to stream-fusion optimizations, and even then, value-supply doesn't actually rely on any of those optimizations; it actually does keep around the Stream (or List), or in the case of Num/Enum, it doesn't use one in the first place.) On the other hand, I still want the Stream-based interfaces, but my initial argument isn't even that necessary: consider implementing the current signature: newSupply :: a - (a - a) - IO (Supply a) currently by: newSupply x f = genericNewSupply (iterate f x) listGenSym but we don't need to use a list at all, it could be like: newSupply x f = genericNewSupply x (\a - (f a, a)) (with atomicallyModifyIORef added as appropriate depending on your refactorings) P.S. more code cleanup, if you didn't notice it: 0.4: -- XXX: Is the atomic necessary? import Data.IORef(IORef,newIORef,atomicModifyIORef) yes the
Re: unique identifiers as a separate library
Iavor Diatchki wrote: - It uses unsafeDupableInterleaveIO to avoid double locking, in particular, gen r = unsafeDupableInterleaveIO $ do v - unsafeDupableInterleaveIO (genSym r) ls - gen r rs - gen r return (Node v ls rs) where is the double locking? We want referential transparency... e.g. suppose you use newNumSupply to create a thunk for a Gen; when evaluated, it will run unsafeDupableInterleaveIO. You send that thunk off to two different other threads. Then those threads decide to evaluate it (say, enough to get the first genSym'd value) and happen to make a race condition, so it becomes two separate IO computations. Therefore one of them runs atomicModifyIORef, and the other one runs atomicModifyIORef, and so they get two different values out. Node 0 (...) (...) Node 1 (...) (...) when it's suppose to be the very same Gen data structure. so, am I right or wrong about what the perils of unsafeDupableInterleaveIO are? I could see changing (unsafe[Dupable]InterleaveIO (genSym r)) to (genSym r), to halve the number of unsafeInterleaveIOs needed if we assume that most of the time a node is evaluated in order to get a value... but it's hard to see a good way to make *fewer* InterleaveIOs than there are genSym'd values. (possible, but hard, and really depends on the relative expenses/risks of locking, of computing the next number, and of using up the address space of all possible Ints for example). Maybe the outer InterleaveIO could strictly make a few levels of Nodes (with lazy genSym'd values this time) before interleaving again, to reduce the amount of interleaving from the non-semantics-changing side. -Isaac ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Hi, Sigh. I think that Isaac is quite right. Even though I think that it would be quite rare for multiple threads to share the same name supply in practise, I would really rather have safe code and not have to think about it. So... I have reverted to the non-dupable versions by default. I also added an unsafeNewIntSupply that uses that dupable primitives, for those who like living on the edge :) Thanks to Sebastian for the bench marking program! The performance numbers for the current version on hackage (0.4) are as follows: With safe primitives value-supply is about 7 times slower, with the unsafe ones it is about 2 times slower (a bit less actually). Even though this seems like a big difference, the actual time it takes to generate names is very small: I have to generate about 10 million names to get reliable measurements, so I am not sure if the difference matters in practise. If anyone has further ideas, please chime in. -Iavor On Tue, Dec 23, 2008 at 8:30 AM, Isaac Dupree isaacdup...@charter.net wrote: Iavor Diatchki wrote: - It uses unsafeDupableInterleaveIO to avoid double locking, in particular, gen r = unsafeDupableInterleaveIO $ do v - unsafeDupableInterleaveIO (genSym r) ls - gen r rs - gen r return (Node v ls rs) where is the double locking? We want referential transparency... e.g. suppose you use newNumSupply to create a thunk for a Gen; when evaluated, it will run unsafeDupableInterleaveIO. You send that thunk off to two different other threads. Then those threads decide to evaluate it (say, enough to get the first genSym'd value) and happen to make a race condition, so it becomes two separate IO computations. Therefore one of them runs atomicModifyIORef, and the other one runs atomicModifyIORef, and so they get two different values out. Node 0 (...) (...) Node 1 (...) (...) when it's suppose to be the very same Gen data structure. so, am I right or wrong about what the perils of unsafeDupableInterleaveIO are? I could see changing (unsafe[Dupable]InterleaveIO (genSym r)) to (genSym r), to halve the number of unsafeInterleaveIOs needed if we assume that most of the time a node is evaluated in order to get a value... but it's hard to see a good way to make *fewer* InterleaveIOs than there are genSym'd values. (possible, but hard, and really depends on the relative expenses/risks of locking, of computing the next number, and of using up the address space of all possible Ints for example). Maybe the outer InterleaveIO could strictly make a few levels of Nodes (with lazy genSym'd values this time) before interleaving again, to reduce the amount of interleaving from the non-semantics-changing side. -Isaac ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Hi again Iavor, A couple performance ideas if you want to test them: unsafeInterleaveIO is cheap until you need to evaluate its result. So how about this, I think it makes there be 1/3 as many structural unsafeInterleaveIO's, so if it took 2 amount of time on unsafeInterleaveIO:ing before, it should take 1.33 time on it after this: and just a bit more time/memory to construct Nodes that might not be used. gen r = unsafeInterleaveIO $ do v - unsafeInterleaveIO (genSym r) n1 - gen r; n2 - gen r; n3 - gen r; n4 - gen r return (Node v1 (Node v2 n1 n2) (Node v3 n3 n4)) I also feel tempted to apply the static-argument-transformation manually, where gen r = gen' where gen' = unsafeInterleaveIO $ do v - unsafeInterleaveIO (genSym r) n1 - gen'; n2 - gen' --etc. return (Node ...) or similar which I guess is safe because this is only unsafeInterleaveIO, not unsafePerformIO? Dunno if it'd be speed-beneficial though. version 0.4: genericNewSupply :: b - (IORef b - IO a) - IO (Supply a) genericNewSupply start genSym = gen = newIORef start where gen r = unsafeInterleaveIO $ do ls - gen r rs - gen r return (Node (unsafePerformIO (genSym r)) ls rs) Why unsafePerformIO, was it faster?(i'd guess slower actually, as unsafePerformIO is NOINLINE..) It's considerably less safe than unsafeInterleaveIO! For example, do the static-argument-transformation above, then float out the unsafePerformIO because it's the same expression each time through gen', and suddenly the all the unique values are all the same! we can make this value-supply very good ultimately :-) also, I might call unsafeNewIntSupply something more specific, like unthreadsafeNew..., or the more obscure but conventional dupable description-word. Did it help specializing that to Int, i.e. why not unsafeGenericNewSupply? because I can imagine a simple data that's not an Int, where you'd still want to avoid the thread-safety overhead. Also, your implementation of it could be more efficient: it doesn't need to do locking, so I suggest modifyIORef rather than atomicModifyIORef (Actually you'll have to use readIORef = writeIORef return, instead, because modifyIORef has a different type than atomicModifyIORef). Possible refactor: All the functions ***GenSym r = atomicModifyIORef r (some expression that doesn't mention r); doing the [atomic]ModifyIORef r could be the caller's responsibility instead, and e.g. listGenSym (a:as) = (as,a). in fact, for lists (as you get a incomplete-pattern-match warning there, but you know the list is always infinite, because you made it with iterate), you could instead use an infinite-list type, Data.Stream from package Stream[*]; as Stream is not a sum type (it only has one possible constructor: Cons), it might even be a bit more efficient! [*] http://hackage.haskell.org/packages/archive/Stream/0.2.6/doc/html/Data-Stream.html thanks for your effort! and especially for measuring the performance timing! -Isaac ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Hello, Thanks for your comments! On Tue, Dec 23, 2008 at 3:22 PM, Isaac Dupree isaacdup...@charter.net wrote: unsafeInterleaveIO is cheap until you need to evaluate its result. So how about this, I think it makes there be 1/3 as many structural unsafeInterleaveIO's, so if it took 2 amount of time on unsafeInterleaveIO:ing before, it should take 1.33 time on it after this: and just a bit more time/memory to construct Nodes that might not be used. gen r = unsafeInterleaveIO $ do v - unsafeInterleaveIO (genSym r) n1 - gen r; n2 - gen r; n3 - gen r; n4 - gen r return (Node v1 (Node v2 n1 n2) (Node v3 n3 n4)) I played around with that but, for some reason, it did not seem to speed things up, at least for the benchmark that I run. Will have to experiment some more. version 0.4: genericNewSupply :: b - (IORef b - IO a) - IO (Supply a) genericNewSupply start genSym = gen = newIORef start where gen r = unsafeInterleaveIO $ do ls - gen r rs - gen r return (Node (unsafePerformIO (genSym r)) ls rs) Why unsafePerformIO, was it faster?(i'd guess slower actually, as unsafePerformIO is NOINLINE..) I just tested to see what would happen and forgot to change it back (now I have switched it back). There does not seem to be a significant difference between the two, speed-wise. Did it help specializing that to Int, i.e. why not unsafeGenericNewSupply? because I can imagine a simple data that's not an Int, where you'd still want to avoid the thread-safety overhead. Nope, it was just the simplest thing to do. I can add more general unsafe versions. Also, your implementation of it could be more efficient: it doesn't need to do locking, so I suggest modifyIORef rather than atomicModifyIORef (Actually you'll have to use readIORef = writeIORef return, instead, because modifyIORef has a different type than atomicModifyIORef). I don't think that that's quite right. I was thinking that it should be OK to use different supplies that share a reference in different threads. So, for example, split the supply and pass each version to a new thread. This should be OK with the dupable primitives because the thunks would be evaluated in different threads. However, we still need the synchronize access to the reference, or we'll get incorrect values. Possible refactor: All the functions ***GenSym r = atomicModifyIORef r (some expression that doesn't mention r); doing the [atomic]ModifyIORef r could be the caller's responsibility instead, and e.g. listGenSym (a:as) = (as,a). This is a good idea, the atomicUpdates are duplicated everywhere. I did this, and for some reason it seems to have made things a little slower, but I like the code better, so I think that I will keep it. I am surprised that it affected the speed of execution, perhaps it is just my benchmark that is being unreliable. in fact, for lists (as you get a incomplete-pattern-match warning there, but you know the list is always infinite, because you made it with iterate), you could instead use an infinite-list type, Data.Stream from package Stream[*]; as Stream is not a sum type (it only has one possible constructor: Cons), it might even be a bit more efficient! You are right but the benefit is small enough that I don't think it warrants a dependency on another package. bye, -Iavor ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Iavor Diatchki wrote: Hello, I have made the two changes that Simon suggested and uploaded a new version of the library. By the way, GHC seemed to work correctly even without the extra boolean parameter, perhaps it treats unsafePerformIO specially somehow? A somewhat related question: I ended up using three calls to unsafeInterleaveIO which seems like a bit much. Could I have done it in a different way somehow? This is what I did: gen r = do v - unsafeInterleaveIO (genSym r) ls - unsafeInterleaveIO (gen r) rs - unsafeInterleaveIO (gen r) return (Node v ls rs) I'd probably do it like this: gen r = unsafeDupableInterleaveIO $ do v - unsafeDupableInterleaveIO (genSym r) ls - gen r rs - gen r return (Node v ls rs) which is close to the way GHC does it, except that we do indeed call genSym for every node. Calling genSym is cheaper than building the thunk for unsafeDupableInterleaveIO, although if there's an atomicModifyIORef involved that will probably tip the balance the other way. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Thanks for pointing me at Iavors package. We should not have two different libraries for the same purpose. value-supply-0.2 is about 6-7 times slower than GHC's UniqSupply, but porting Simons suggestion (to use unsafeDupableInterleaveIO) into Iavors code, GHC is only about twice as fast. Can we do better? I tried to use SPECIALIZE INLINE pragmas (at new[Enum|Num]Supply for Int) and/or strictness annotations (for the unique value) without significant benefit. I like Iavors more general interface (polymorphic values) and that only demanded identifiers are created. The latter saves half of the names when evaluating map supplyValue . split I will replace my dependency to ghc with one to value-supply if unsafeDupableInterleaveIO is used instead of unsafeInterleaveIO where it is available. Cheers, Sebastian ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Hi, Thanks for the feedback! I have uploaded a new version of value supply with the following changes that should improve performance. - It uses unsafeDupableInterleaveIO to avoid double locking, - Do not create an intermediate list for Enum and Num supples - Specialize code for Int, as this is the most commonly used type for new name generation. These together should make things much faster but I have not had a chance to test that. I looked at the core and it seemed that the specializations were kicking in, so hopefully all is OK :) It would be great if you could let me know if things are still slower than the GHC version, and if so by how much. Also, I made a git-hub repo for the code, in case anyone is interested in contributing: http://github.com/yav/value-supply/tree/master -Iavor On Mon, Dec 22, 2008 at 3:27 AM, Sebastian Fischer s...@informatik.uni-kiel.de wrote: Thanks for pointing me at Iavors package. We should not have two different libraries for the same purpose. value-supply-0.2 is about 6-7 times slower than GHC's UniqSupply, but porting Simons suggestion (to use unsafeDupableInterleaveIO) into Iavors code, GHC is only about twice as fast. Can we do better? I tried to use SPECIALIZE INLINE pragmas (at new[Enum|Num]Supply for Int) and/or strictness annotations (for the unique value) without significant benefit. I like Iavors more general interface (polymorphic values) and that only demanded identifiers are created. The latter saves half of the names when evaluating map supplyValue . split I will replace my dependency to ghc with one to value-supply if unsafeDupableInterleaveIO is used instead of unsafeInterleaveIO where it is available. Cheers, Sebastian ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Hello, I have made the two changes that Simon suggested and uploaded a new version of the library. By the way, GHC seemed to work correctly even without the extra boolean parameter, perhaps it treats unsafePerformIO specially somehow? A somewhat related question: I ended up using three calls to unsafeInterleaveIO which seems like a bit much. Could I have done it in a different way somehow? This is what I did: gen r = do v - unsafeInterleaveIO (genSym r) ls - unsafeInterleaveIO (gen r) rs - unsafeInterleaveIO (gen r) return (Node v ls rs) Note that a single unsafeInterleaveIO around the whole do block is not quite right (this is what the code in the other package does) because this will increment the name as soon as the generator object is forced, and we want the name to be increment when the name is forced. -Iavor On Fri, Dec 19, 2008 at 1:24 AM, Simon Marlow marlo...@gmail.com wrote: Why not depend on this instead? http://hackage.haskell.org/cgi-bin/hackage-scripts/package/value-supply Looking at the code for this, I'm somewhat suspicious that it actually works with GHC: -- The extra argument to ``gen'' is passed because without -- it Hugs spots that the recursive calls are the same but does -- not know that unsafePerformIO is unsafe. where gen _ r = Node { supplyValue = unsafePerformIO (genSym r), supplyLeft = gen False r, supplyRight = gen True r } even if that extra Bool argument is enough to fool Hugs, I wouldn't count on it being enough to fool GHC -O2! You probably want to use unsafeInterleaveIO like we do in GHC's UniqSupply library. Also, I'd replace the MVar with an IORef and use atomicModifyIORef for speed. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
a...@spamcop.net wrote: G'day all. Quoting Sebastian Fischer s...@informatik.uni-kiel.de: I have wrapped up (a tiny subset of) GHC's uniques into the package `uniqueid` and put it on Hackage: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uniqueid First off, thanks for this. The main difference is due to my fear of depending on the foreign function `genSymZh` which I replaced by a global counting IORef. Why not depend on this instead? http://hackage.haskell.org/cgi-bin/hackage-scripts/package/value-supply Looking at the code for this, I'm somewhat suspicious that it actually works with GHC: -- The extra argument to ``gen'' is passed because without -- it Hugs spots that the recursive calls are the same but does -- not know that unsafePerformIO is unsafe. where gen _ r = Node { supplyValue = unsafePerformIO (genSym r), supplyLeft = gen False r, supplyRight = gen True r } even if that extra Bool argument is enough to fool Hugs, I wouldn't count on it being enough to fool GHC -O2! You probably want to use unsafeInterleaveIO like we do in GHC's UniqSupply library. Also, I'd replace the MVar with an IORef and use atomicModifyIORef for speed. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Isaac Dupree wrote: Sebastian Fischer wrote: On Dec 17, 2008, at 10:54 AM, Sebastian Fischer wrote: Would it be possible to put everything concerned with unique identifiers in GHC into a separate package on Hackage? I have wrapped up (a tiny subset of) GHC's uniques into the package `uniqueid` and put it on Hackage: thanks! The main difference is due to my fear of depending on the foreign function `genSymZh` which I replaced by a global counting IORef. which is its own risk. maybe you should NOINLINE it? Potential code criticisms / suggestions for it as a library: Unboxed: so it only works on GHC, even though others have unsafe IO too. In theory, strictness annotations should be able to achieve the same efficiency. Char is supposed to represent a Unicode character -- but this code behaves oddly: For 64-bit Int#, it does so. For 32-bit Int#, it assumes Char is within the first 8 bits (ASCII and a little more). If Int# (or Int) can be 30-bit (like Haskell98 permission), its correctness suffers even worse. Is it really even a necessary part of the design? The only way you provide to extract it or depend on its value is indirectly via the Show instance. Its presence there is, in any case, at the cost of max. 2^24 (16 million) IDs before problems happen, whereas billions is still not a great limit but at least is somewhat larger. (applications that are long-running or deal with huge amounts of data could be affected) unsafeDupableInterleaveIO: this Dupable was safe for GHC to use because GHC is single-threaded. Is it safe in a library setting? It would be safe if the genSym was atomic - so I recommend using atomicModifyIORef. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Sebastian Fischer wrote: for a project I am using the libraries Unique, UniqSupply, and UniqFM from the package ghc. It seems odd to have a dependency on a whole compiler only in order to use its (highly efficient and, hence, preferred) implementation of unique identifiers. Would it be possible to put everything concerned with unique identifiers in GHC into a separate package on Hackage? This may also allow me to get a fix for http://hackage.haskell.org/trac/ghc/ticket/2880 without reinstalling GHC. Sure, that would be a useful chunk to separate out from GHC. However, looking at the code I see that our unique supply monad is really not a lazy monad at all: thenUs :: UniqSM a - (a - UniqSM b) - UniqSM b thenUs expr cont us = case (expr us) of { (result, us') - cont result us' } which is strict, and even the lazy version: lazyThenUs :: UniqSM a - (a - UniqSM b) - UniqSM b lazyThenUs (USM expr) cont = USM (\us - let (result, us') = expr us in unUSM (cont result) us') doesn't really split the supply, because it will force the left side as soon as the unique on the right side is demanded. Given that our monad is strict, there's no need for it to use mkSplitUniqueSupply, it could just call genSym to create new uniques. I notice there are other parts of the compiler that do make use of the lazy splittable unique supply in their own monads, but I'm not sure if they really need it. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
On Dec 17, 2008, at 10:54 AM, Sebastian Fischer wrote: Would it be possible to put everything concerned with unique identifiers in GHC into a separate package on Hackage? I have wrapped up (a tiny subset of) GHC's uniques into the package `uniqueid` and put it on Hackage: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uniqueid It only provides type Id hashedId :: Id - Int type IdSupply initIdSupply :: Char - IO IdSupply splitIdSupply :: IdSupply - (IdSupply,IdSupply) idFromSupply :: IdSupply - Id instance Eq Id instance Ord Id instance Show Id The main difference is due to my fear of depending on the foreign function `genSymZh` which I replaced by a global counting IORef. The other difference is that the Show instance does not rely on GHC's static flags and can hence be used outside of GHC sessions. The code is on github: http://github.com/sebfisch/uniqueid Extensions welcome! Cheers, Sebastian ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
Sebastian Fischer wrote: On Dec 17, 2008, at 10:54 AM, Sebastian Fischer wrote: Would it be possible to put everything concerned with unique identifiers in GHC into a separate package on Hackage? I have wrapped up (a tiny subset of) GHC's uniques into the package `uniqueid` and put it on Hackage: thanks! The main difference is due to my fear of depending on the foreign function `genSymZh` which I replaced by a global counting IORef. which is its own risk. maybe you should NOINLINE it? Potential code criticisms / suggestions for it as a library: Unboxed: so it only works on GHC, even though others have unsafe IO too. In theory, strictness annotations should be able to achieve the same efficiency. Char is supposed to represent a Unicode character -- but this code behaves oddly: For 64-bit Int#, it does so. For 32-bit Int#, it assumes Char is within the first 8 bits (ASCII and a little more). If Int# (or Int) can be 30-bit (like Haskell98 permission), its correctness suffers even worse. Is it really even a necessary part of the design? The only way you provide to extract it or depend on its value is indirectly via the Show instance. Its presence there is, in any case, at the cost of max. 2^24 (16 million) IDs before problems happen, whereas billions is still not a great limit but at least is somewhat larger. (applications that are long-running or deal with huge amounts of data could be affected) unsafeDupableInterleaveIO: this Dupable was safe for GHC to use because GHC is single-threaded. Is it safe in a library setting? I guess likewise, the IORef global variable wouldn't be thread-safe... but this one isn't even safe between separate runs of initIdSupply. On the other hand, thread-safety probably makes it much less efficient (if you can find a way to use atomic int CPU instructions, it might not be too bad, or else per-thread counters... or just declare how unsafe it is) unsafePerformIO: it's not totally necessary here. Its only function is to make IDs generated by different runs of initIdSupply be distinct. So it could, anyway, probably be refactored to only use unsafePerformIO global-ness once per initIdSupply and just use unsafeInterleaveIO within (where currently nextInt is called). -Isaac ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: unique identifiers as a separate library
G'day all. Quoting Sebastian Fischer s...@informatik.uni-kiel.de: I have wrapped up (a tiny subset of) GHC's uniques into the package `uniqueid` and put it on Hackage: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uniqueid First off, thanks for this. The main difference is due to my fear of depending on the foreign function `genSymZh` which I replaced by a global counting IORef. Why not depend on this instead? http://hackage.haskell.org/cgi-bin/hackage-scripts/package/value-supply Cheers, Andrew Bromage ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
unique identifiers as a separate library
Hello, for a project I am using the libraries Unique, UniqSupply, and UniqFM from the package ghc. It seems odd to have a dependency on a whole compiler only in order to use its (highly efficient and, hence, preferred) implementation of unique identifiers. Would it be possible to put everything concerned with unique identifiers in GHC into a separate package on Hackage? This may also allow me to get a fix for http://hackage.haskell.org/trac/ghc/ticket/2880 without reinstalling GHC. Cheers, Sebastian ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users