Re: unique identifiers as a separate library

2009-01-06 Thread Simon Marlow

Isaac Dupree wrote:

Iavor Diatchki wrote:

  - It uses unsafeDupableInterleaveIO to avoid double locking,


in particular,


gen r = unsafeDupableInterleaveIO
  $ do v - unsafeDupableInterleaveIO (genSym r)
   ls - gen r
   rs - gen r
   return (Node v ls rs)


where is the double locking?  We want referential transparency...

e.g. suppose you use newNumSupply to create a thunk for a Gen; when 
evaluated, it will run unsafeDupableInterleaveIO.  You send that thunk 
off to two different other threads. Then those threads decide to 
evaluate it (say, enough to get the first genSym'd value) and happen to 
make a race condition, so it becomes two separate IO computations. 
Therefore one of them runs atomicModifyIORef, and the other one runs 
atomicModifyIORef, and so they get two different values out.


Node 0 (...) (...)
Node 1 (...) (...)

when it's suppose to be the very same Gen data structure.

so, am I right or wrong about what the perils of 
unsafeDupableInterleaveIO are?


Oops! Yes, you're quite right.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-24 Thread Iavor Diatchki
Hi,
Thanks for the feedback!  Suggestions implemented in your daily
value-supply release :)
Happy Holidays!
-Iavor

On Wed, Dec 24, 2008 at 7:46 AM, Isaac Dupree isaacdup...@charter.net wrote:
 Iavor Diatchki wrote:

  Also, your
 implementation of it could be more efficient: it doesn't need to do
 locking,
 so I suggest modifyIORef rather than atomicModifyIORef (Actually you'll
 have
 to use readIORef = writeIORef  return, instead, because modifyIORef
 has
 a different type than atomicModifyIORef).

 I don't think that that's quite right.  I was thinking that it should
 be OK to use different supplies that share a reference in different
 threads.  So, for example, split the supply and pass each version to a
 new thread.  This should be OK with the dupable primitives because the
 thunks would be evaluated in different threads.  However, we still
 need the synchronize access to the reference, or we'll get incorrect
 values.

 That's true (if you're careful enough about which thread is evaluating the
 splitting -- the existing split* functions are actually *not* strict enough
 for this, and they probably shouldn't be either, so an additional warning
 that the user may have to add some `seq`s or `evaluate`s might be needed).
  Anyway, if atomicModifyIORef isn't that big of an overhead, then no problem
 :-)

 in fact, for lists (as you get a incomplete-pattern-match warning there,
 but
 you know the list is always infinite, because you made it with
 iterate),
 you could instead use an infinite-list type, Data.Stream from package
 Stream[*]; as Stream is not a sum type (it only has one possible
 constructor: Cons), it might even be a bit more efficient!

 You are right but the benefit is small enough that I don't think it
 warrants a dependency on another package.

 I'm going to try to argue to convince you that it's entirely appropriate to
 use Data.Stream :-)

 From my somewhat mathematical point of view, that is.

 1. Data.Stream is a small module that implements a very well-known,
 practically standard (though under-used), and simple data type.  All you
 really need from it is import Data.Stream(Stream(Cons)), plus (iterate,
 streamToList) if you're keeping your current interface.  It's nothing you
 should be afraid to depend on, if you're using its concept (which you are).
  It's much smaller than the 'containers' package, which similarly you don't
 use but if you needed a Map or something you obviously should.

 2. The more compelling argument, that it took me a good night's sleep to
 think of:

 Supply is an infinite binary tree.  Stream is an infinite unary tree (er,
 more commonly called infinite list). They're both codata.  They're both
 comonads.  (although they don't go so far as to depend on category-extras to
 provide an instance.)  The primary function/purpose of Supply is an
 *efficient* way to turn a Stream into a Supply.  I daresay it would, even,
 be more fundamental to provide interface
 newSupply  :: Stream a - IO (Supply a)
 than
 newSupply  :: a - (a - a) - IO (Supply a)
 (although it might be worth keeping both interfaces? mainly for
 compatibility, since one's just a Data.Stream.iterate away from the other
 and your haddocks could say so)
 In fact, it's terribly annoying to turn a (Stream a) into a (a, a - a) --
 in fact I don't think it can be done in general (you can turn it into a (b,
 b - (a, b)) though a-la unfold, with b = Stream a) -- so you should
 definitely provide the above (Stream a - IO (Supply a)) interface.
 (Although if you're cowardly enough about the extra dependency, I guess you
 could make it the riskier ([a] - IO (Supply a)), risk being if the user
 provides a finite list... Sorry for conditionally insulting you; it seems a
 horribly underhanded thing for me to do :-)

 Likewise, it would be nice for
 split  :: Supply a - Stream (Supply a)
 ...and then you would not even depend on Data.List anymore!

 (I don't happen to think the arguments that Data.List is better than Stream
 for definitely-infinite lists are very convincing; except possibly that List
 will be more up-to-date with respect to stream-fusion optimizations, and
 even then, value-supply doesn't actually rely on any of those optimizations;
 it actually does keep around the Stream (or List), or in the case of
 Num/Enum, it doesn't use one in the first place.)

 On the other hand, I still want the Stream-based interfaces, but my initial
 argument isn't even that necessary: consider implementing the current
 signature:
 newSupply  :: a - (a - a) - IO (Supply a)
 currently by:
 newSupply x f   = genericNewSupply (iterate f x) listGenSym
 but we don't need to use a list at all, it could be like:
 newSupply x f   = genericNewSupply x (\a - (f a, a))
 (with atomicallyModifyIORef added as appropriate depending on your
 refactorings)


 P.S. more code cleanup, if you didn't notice it:

 0.4:

 -- XXX: Is the atomic necessary?
 import Data.IORef(IORef,newIORef,atomicModifyIORef)

 yes the 

Re: unique identifiers as a separate library

2008-12-23 Thread Isaac Dupree

Iavor Diatchki wrote:

  - It uses unsafeDupableInterleaveIO to avoid double locking,


in particular,


gen r = unsafeDupableInterleaveIO
  $ do v - unsafeDupableInterleaveIO (genSym r)
   ls - gen r
   rs - gen r
   return (Node v ls rs)


where is the double locking?  We want referential 
transparency...


e.g. suppose you use newNumSupply to create a thunk for a 
Gen; when evaluated, it will run unsafeDupableInterleaveIO. 
 You send that thunk off to two different other threads. 
Then those threads decide to evaluate it (say, enough to get 
the first genSym'd value) and happen to make a race 
condition, so it becomes two separate IO computations. 
Therefore one of them runs atomicModifyIORef, and the other 
one runs atomicModifyIORef, and so they get two different 
values out.


Node 0 (...) (...)
Node 1 (...) (...)

when it's suppose to be the very same Gen data structure.

so, am I right or wrong about what the perils of 
unsafeDupableInterleaveIO are?


I could see changing (unsafe[Dupable]InterleaveIO (genSym 
r)) to (genSym r), to halve the number of 
unsafeInterleaveIOs needed if we assume that most of the 
time a node is evaluated in order to get a value... but it's 
hard to see a good way to make *fewer* InterleaveIOs than 
there are genSym'd values.  (possible, but hard, and really 
depends on the relative expenses/risks of locking, of 
computing the next number, and of using up the address 
space of all possible Ints for example).  Maybe the outer 
InterleaveIO could strictly make a few levels of Nodes 
(with lazy genSym'd values this time) before interleaving 
again, to reduce the amount of interleaving from the 
non-semantics-changing side.


-Isaac
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-23 Thread Iavor Diatchki
Hi,
Sigh.  I think that Isaac is quite right.  Even though I think that it
would be quite rare for multiple threads to share the same name supply
in practise, I would really rather have safe code and not have to
think about it.  So... I have reverted to the non-dupable versions by
default.  I also added an unsafeNewIntSupply that uses that dupable
primitives, for those who like living on the edge :)   Thanks to
Sebastian for the bench marking program!   The performance numbers for
the current version on hackage (0.4) are as follows:

With safe primitives value-supply is about 7 times slower, with the
unsafe ones it is about 2 times slower (a bit less actually).  Even
though this seems like a big difference, the actual time it takes to
generate names is very small:  I have to generate about 10 million
names to get reliable measurements, so I am not sure if the difference
matters in practise.

If anyone has further ideas, please chime in.

-Iavor


On Tue, Dec 23, 2008 at 8:30 AM, Isaac Dupree isaacdup...@charter.net wrote:
 Iavor Diatchki wrote:

  - It uses unsafeDupableInterleaveIO to avoid double locking,

 in particular,

 gen r = unsafeDupableInterleaveIO
  $ do v - unsafeDupableInterleaveIO (genSym r)
   ls - gen r
   rs - gen r
   return (Node v ls rs)

 where is the double locking?  We want referential transparency...

 e.g. suppose you use newNumSupply to create a thunk for a Gen; when
 evaluated, it will run unsafeDupableInterleaveIO.  You send that thunk off
 to two different other threads. Then those threads decide to evaluate it
 (say, enough to get the first genSym'd value) and happen to make a race
 condition, so it becomes two separate IO computations. Therefore one of them
 runs atomicModifyIORef, and the other one runs atomicModifyIORef, and so
 they get two different values out.

 Node 0 (...) (...)
 Node 1 (...) (...)

 when it's suppose to be the very same Gen data structure.

 so, am I right or wrong about what the perils of unsafeDupableInterleaveIO
 are?

 I could see changing (unsafe[Dupable]InterleaveIO (genSym r)) to (genSym r),
 to halve the number of unsafeInterleaveIOs needed if we assume that most of
 the time a node is evaluated in order to get a value... but it's hard to see
 a good way to make *fewer* InterleaveIOs than there are genSym'd values.
  (possible, but hard, and really depends on the relative expenses/risks of
 locking, of computing the next number, and of using up the address space
 of all possible Ints for example).  Maybe the outer InterleaveIO could
 strictly make a few levels of Nodes (with lazy genSym'd values this time)
 before interleaving again, to reduce the amount of interleaving from the
 non-semantics-changing side.

 -Isaac

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-23 Thread Isaac Dupree

Hi again Iavor,

A couple performance ideas if you want to test them:

unsafeInterleaveIO is cheap until you need to evaluate its 
result.  So how about this, I think it makes there be 1/3 as 
many structural unsafeInterleaveIO's, so if it took 2 
amount of time on unsafeInterleaveIO:ing before, it should 
take 1.33 time on it after this: and just a bit more 
time/memory to construct Nodes that might not be used.


gen r = unsafeInterleaveIO $ do
  v - unsafeInterleaveIO (genSym r)
  n1 - gen r; n2 - gen r; n3 - gen r; n4 
- gen r
  return (Node v1 (Node v2 n1 n2) (Node v3 
n3 n4))


I also feel tempted to apply the 
static-argument-transformation manually,

where
   gen r = gen'
 where
   gen' = unsafeInterleaveIO $ do
  v - unsafeInterleaveIO (genSym r)
  n1 - gen'; n2 - gen' --etc.
  return (Node ...)

or similar

which I guess is safe because this is only 
unsafeInterleaveIO, not unsafePerformIO?  Dunno if it'd be 
speed-beneficial though.


version 0.4:

genericNewSupply :: b - (IORef b - IO a) - IO (Supply a)
genericNewSupply start genSym = gen = newIORef start
  where gen r = unsafeInterleaveIO
  $ do ls - gen r
   rs - gen r
   return (Node (unsafePerformIO (genSym r)) ls rs)


Why unsafePerformIO, was it faster?(i'd guess slower 
actually, as unsafePerformIO is NOINLINE..)  It's 
considerably less safe than unsafeInterleaveIO!  For 
example, do the static-argument-transformation above, then 
float out the unsafePerformIO because it's the same 
expression each time through gen', and suddenly the all the 
unique values are all the same!


we can make this value-supply very good ultimately :-)

also, I might call unsafeNewIntSupply something more 
specific, like unthreadsafeNew..., or the more obscure but 
conventional dupable description-word.  Did it help 
specializing that to Int, i.e. why not 
unsafeGenericNewSupply? because I can imagine a simple 
data that's not an Int, where you'd still want to avoid the 
thread-safety overhead.  Also, your implementation of it 
could be more efficient: it doesn't need to do locking, so I 
suggest modifyIORef rather than atomicModifyIORef (Actually 
you'll have to use readIORef = writeIORef  return, 
instead, because modifyIORef has a different type than 
atomicModifyIORef).  Possible refactor: All the functions 
***GenSym r = atomicModifyIORef r (some expression that 
doesn't mention r); doing the [atomic]ModifyIORef r could 
be the caller's responsibility instead, and e.g. listGenSym 
(a:as) = (as,a).


in fact, for lists (as you get a incomplete-pattern-match 
warning there, but you know the list is always infinite, 
because you made it with iterate), you could instead use 
an infinite-list type, Data.Stream from package Stream[*]; 
as Stream is not a sum type (it only has one possible 
constructor: Cons), it might even be a bit more efficient!
[*] 
http://hackage.haskell.org/packages/archive/Stream/0.2.6/doc/html/Data-Stream.html


thanks for your effort! and especially for measuring the 
performance timing!

-Isaac
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-23 Thread Iavor Diatchki
Hello,

Thanks for your comments!


On Tue, Dec 23, 2008 at 3:22 PM, Isaac Dupree isaacdup...@charter.net wrote:
 unsafeInterleaveIO is cheap until you need to evaluate its result.  So how
 about this, I think it makes there be 1/3 as many structural
 unsafeInterleaveIO's, so if it took 2 amount of time on
 unsafeInterleaveIO:ing before, it should take 1.33 time on it after this:
 and just a bit more time/memory to construct Nodes that might not be used.

 gen r = unsafeInterleaveIO $ do
  v - unsafeInterleaveIO (genSym r)
  n1 - gen r; n2 - gen r; n3 - gen r; n4 - gen r
  return (Node v1 (Node v2 n1 n2) (Node v3 n3 n4))

I played around with that but, for some reason, it did not seem to
speed things up, at least for the benchmark that I run.  Will have to
experiment some more.


 version 0.4:

 genericNewSupply :: b - (IORef b - IO a) - IO (Supply a)
 genericNewSupply start genSym = gen = newIORef start
  where gen r = unsafeInterleaveIO
  $ do ls - gen r
   rs - gen r
   return (Node (unsafePerformIO (genSym r)) ls rs)

 Why unsafePerformIO, was it faster?(i'd guess slower actually, as
 unsafePerformIO is NOINLINE..)

I just tested to see what would happen and forgot to change it back
(now I have switched it back).  There does not seem to be a
significant difference between the two, speed-wise.

 Did it help specializing that to Int, i.e. why not
 unsafeGenericNewSupply? because I can imagine a simple data that's not an
 Int, where you'd still want to avoid the thread-safety overhead.

Nope, it was just the simplest thing to do.  I can add more general
unsafe versions.

  Also, your
 implementation of it could be more efficient: it doesn't need to do locking,
 so I suggest modifyIORef rather than atomicModifyIORef (Actually you'll have
 to use readIORef = writeIORef  return, instead, because modifyIORef has
 a different type than atomicModifyIORef).

I don't think that that's quite right.  I was thinking that it should
be OK to use different supplies that share a reference in different
threads.  So, for example, split the supply and pass each version to a
new thread.  This should be OK with the dupable primitives because the
thunks would be evaluated in different threads.  However, we still
need the synchronize access to the reference, or we'll get incorrect
values.

Possible refactor: All the
 functions ***GenSym r = atomicModifyIORef r (some expression that doesn't
 mention r); doing the [atomic]ModifyIORef r could be the caller's
 responsibility instead, and e.g. listGenSym (a:as) = (as,a).

This is a good idea, the atomicUpdates are duplicated everywhere.  I
did this, and for some reason it seems to have made things a little
slower, but I like the code better, so I think that I will keep it.  I
am surprised that it affected the speed of execution, perhaps it is
just my benchmark that is being unreliable.

 in fact, for lists (as you get a incomplete-pattern-match warning there, but
 you know the list is always infinite, because you made it with iterate),
 you could instead use an infinite-list type, Data.Stream from package
 Stream[*]; as Stream is not a sum type (it only has one possible
 constructor: Cons), it might even be a bit more efficient!

You are right but the benefit is small enough that I don't think it
warrants a dependency on another package.

bye,
-Iavor
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-22 Thread Simon Marlow

Iavor Diatchki wrote:

Hello,
I have made the two changes that Simon suggested and uploaded a new
version of the library.   By the way, GHC seemed to work correctly
even without the extra boolean parameter, perhaps it treats
unsafePerformIO specially somehow?  A somewhat related question:  I
ended up using three calls to unsafeInterleaveIO which seems like a
bit much.  Could I have done it in a different way somehow?  This is
what I did:

gen r = do v - unsafeInterleaveIO (genSym r)
   ls - unsafeInterleaveIO (gen r)
   rs - unsafeInterleaveIO (gen r)
   return (Node v ls rs)


I'd probably do it like this:

 gen r = unsafeDupableInterleaveIO $ do
v - unsafeDupableInterleaveIO (genSym r)
ls - gen r
rs - gen r
return (Node v ls rs)

which is close to the way GHC does it, except that we do indeed call genSym 
for every node.  Calling genSym is cheaper than building the thunk for 
unsafeDupableInterleaveIO, although if there's an atomicModifyIORef 
involved that will probably tip the balance the other way.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-22 Thread Sebastian Fischer
Thanks for pointing me at Iavors package. We should not have two  
different libraries for the same purpose.


value-supply-0.2 is about 6-7 times slower than GHC's UniqSupply, but  
porting Simons suggestion (to use unsafeDupableInterleaveIO) into  
Iavors code, GHC is only about twice as fast.


Can we do better? I tried to use SPECIALIZE INLINE pragmas (at  
new[Enum|Num]Supply for Int) and/or strictness annotations (for the  
unique value) without significant benefit.


I like Iavors more general interface (polymorphic values) and that  
only demanded identifiers are created. The latter saves half of the  
names when evaluating


 map supplyValue . split

I will replace my dependency to ghc with one to value-supply if  
unsafeDupableInterleaveIO is used instead of unsafeInterleaveIO where  
it is available.


Cheers,
Sebastian
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-22 Thread Iavor Diatchki
Hi,
Thanks for the feedback!  I have uploaded a new version of value
supply with the following changes that should improve performance.
  - It uses unsafeDupableInterleaveIO to avoid double locking,
  - Do not create an intermediate list for Enum and Num supples
  - Specialize code for Int, as this is the most commonly used type
for new name generation.

These together should make things much faster but I have not had a
chance to test that.  I looked at the core and it seemed that the
specializations were kicking in, so hopefully all is OK :)
It would be great if you could let me know if things are still slower
than the GHC version, and if so by how much.
Also, I made a git-hub repo for the code, in case anyone is interested
in contributing:
http://github.com/yav/value-supply/tree/master
-Iavor


On Mon, Dec 22, 2008 at 3:27 AM, Sebastian Fischer
s...@informatik.uni-kiel.de wrote:
 Thanks for pointing me at Iavors package. We should not have two different
 libraries for the same purpose.

 value-supply-0.2 is about 6-7 times slower than GHC's UniqSupply, but
 porting Simons suggestion (to use unsafeDupableInterleaveIO) into Iavors
 code, GHC is only about twice as fast.

 Can we do better? I tried to use SPECIALIZE INLINE pragmas (at
 new[Enum|Num]Supply for Int) and/or strictness annotations (for the unique
 value) without significant benefit.

 I like Iavors more general interface (polymorphic values) and that only
 demanded identifiers are created. The latter saves half of the names when
 evaluating

 map supplyValue . split

 I will replace my dependency to ghc with one to value-supply if
 unsafeDupableInterleaveIO is used instead of unsafeInterleaveIO where it is
 available.

 Cheers,
 Sebastian
 ___
 Glasgow-haskell-users mailing list
 Glasgow-haskell-users@haskell.org
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-21 Thread Iavor Diatchki
Hello,
I have made the two changes that Simon suggested and uploaded a new
version of the library.   By the way, GHC seemed to work correctly
even without the extra boolean parameter, perhaps it treats
unsafePerformIO specially somehow?  A somewhat related question:  I
ended up using three calls to unsafeInterleaveIO which seems like a
bit much.  Could I have done it in a different way somehow?  This is
what I did:

gen r = do v - unsafeInterleaveIO (genSym r)
   ls - unsafeInterleaveIO (gen r)
   rs - unsafeInterleaveIO (gen r)
   return (Node v ls rs)

Note that a single unsafeInterleaveIO around the whole do block is not
quite right (this is what the code in the other package does) because
this will increment the name as soon as the generator object is
forced, and we want the name to be increment when the name is forced.

-Iavor




On Fri, Dec 19, 2008 at 1:24 AM, Simon Marlow marlo...@gmail.com wrote:
 Why not depend on this instead?

 http://hackage.haskell.org/cgi-bin/hackage-scripts/package/value-supply

 Looking at the code for this, I'm somewhat suspicious that it actually works
 with GHC:

  -- The extra argument to ``gen'' is passed because without
  -- it Hugs spots that the recursive calls are the same but does
  -- not know that unsafePerformIO is unsafe.
  where gen _ r = Node { supplyValue  = unsafePerformIO (genSym r),
 supplyLeft   = gen False r,
 supplyRight  = gen True r }

 even if that extra Bool argument is enough to fool Hugs, I wouldn't count on
 it being enough to fool GHC -O2!  You probably want to use
 unsafeInterleaveIO like we do in GHC's UniqSupply library.

 Also, I'd replace the MVar with an IORef and use atomicModifyIORef for
 speed.

 Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-19 Thread Simon Marlow

a...@spamcop.net wrote:

G'day all.

Quoting Sebastian Fischer s...@informatik.uni-kiel.de:


I have wrapped up (a tiny subset of) GHC's uniques into the package
`uniqueid` and put it on Hackage:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uniqueid


First off, thanks for this.


The main difference is due to my fear of depending on the foreign
function `genSymZh` which I replaced by a global counting IORef.


Why not depend on this instead?

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/value-supply


Looking at the code for this, I'm somewhat suspicious that it actually 
works with GHC:


  -- The extra argument to ``gen'' is passed because without
  -- it Hugs spots that the recursive calls are the same but does
  -- not know that unsafePerformIO is unsafe.
  where gen _ r = Node { supplyValue  = unsafePerformIO (genSym r),
 supplyLeft   = gen False r,
 supplyRight  = gen True r }

even if that extra Bool argument is enough to fool Hugs, I wouldn't count 
on it being enough to fool GHC -O2!  You probably want to use 
unsafeInterleaveIO like we do in GHC's UniqSupply library.


Also, I'd replace the MVar with an IORef and use atomicModifyIORef for speed.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-19 Thread Simon Marlow

Isaac Dupree wrote:

Sebastian Fischer wrote:

On Dec 17, 2008, at 10:54 AM, Sebastian Fischer wrote:

Would it be possible to put everything concerned with unique 
identifiers in GHC into a separate package on Hackage?



I have wrapped up (a tiny subset of) GHC's uniques into the package 
`uniqueid` and put it on Hackage:


thanks!

The main difference is due to my fear of depending on the foreign 
function `genSymZh` which I replaced by a global counting IORef.


which is its own risk.  maybe you should NOINLINE it?

Potential code criticisms / suggestions for it as a library:

Unboxed: so it only works on GHC, even though others have unsafe IO 
too.  In theory, strictness annotations should be able to achieve the 
same efficiency.


Char is supposed to represent a Unicode character -- but this code 
behaves oddly:

For 64-bit Int#, it does so.
For 32-bit Int#, it assumes Char is within the first 8 bits (ASCII and a 
little more).
If Int# (or Int) can be 30-bit (like Haskell98 permission), its 
correctness suffers even worse.
Is it really even a necessary part of the design?  The only way you 
provide to extract it or depend on its value is indirectly via the 
Show instance.  Its presence there is, in any case, at the cost of 
max. 2^24 (16 million) IDs before problems happen, whereas billions is 
still not a great limit but at least is somewhat larger. (applications 
that are long-running or deal with huge amounts of data could be affected)


unsafeDupableInterleaveIO: this Dupable was safe for GHC to use 
because GHC is single-threaded.  Is it safe in a library setting?


It would be safe if the genSym was atomic - so I recommend using 
atomicModifyIORef.


Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-18 Thread Simon Marlow

Sebastian Fischer wrote:

for a project I am using the libraries Unique, UniqSupply, and UniqFM 
from the package ghc. It seems odd to have a dependency on a whole 
compiler only in order to use its (highly efficient and, hence, 
preferred) implementation of unique identifiers.


Would it be possible to put everything concerned with unique identifiers 
in GHC into a separate package on Hackage?


This may also allow me to get a fix for 
http://hackage.haskell.org/trac/ghc/ticket/2880 without reinstalling GHC.


Sure, that would be a useful chunk to separate out from GHC.  However, 
looking at the code I see that our unique supply monad is really not a lazy 
 monad at all:


thenUs :: UniqSM a - (a - UniqSM b) - UniqSM b
thenUs expr cont us
  = case (expr us) of { (result, us') - cont result us' }

which is strict, and even the lazy version:

lazyThenUs :: UniqSM a - (a - UniqSM b) - UniqSM b
lazyThenUs (USM expr) cont
  = USM (\us - let (result, us') = expr us in unUSM (cont result) us')

doesn't really split the supply, because it will force the left side as 
soon as the unique on the right side is demanded.


Given that our monad is strict, there's no need for it to use 
mkSplitUniqueSupply, it could just call genSym to create new uniques.  I 
notice there are other parts of the compiler that do make use of the lazy 
splittable unique supply in their own monads, but I'm not sure if they 
really need it.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-18 Thread Sebastian Fischer

On Dec 17, 2008, at 10:54 AM, Sebastian Fischer wrote:

Would it be possible to put everything concerned with unique  
identifiers in GHC into a separate package on Hackage?



I have wrapped up (a tiny subset of) GHC's uniques into the package  
`uniqueid` and put it on Hackage:


http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uniqueid

It only provides

type Id
hashedId :: Id - Int

type IdSupply
initIdSupply  :: Char - IO IdSupply
splitIdSupply :: IdSupply - (IdSupply,IdSupply)
idFromSupply  :: IdSupply - Id

instance Eq Id
instance Ord Id
instance Show Id

The main difference is due to my fear of depending on the foreign  
function `genSymZh` which I replaced by a global counting IORef.


The other difference is that the Show instance does not rely on GHC's  
static flags and can hence be used outside of GHC sessions.


The code is on github:

http://github.com/sebfisch/uniqueid

Extensions welcome!

Cheers,
Sebastian
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-18 Thread Isaac Dupree

Sebastian Fischer wrote:

On Dec 17, 2008, at 10:54 AM, Sebastian Fischer wrote:

Would it be possible to put everything concerned with unique 
identifiers in GHC into a separate package on Hackage?



I have wrapped up (a tiny subset of) GHC's uniques into the package 
`uniqueid` and put it on Hackage:


thanks!

The main difference is due to my fear of depending on the foreign 
function `genSymZh` which I replaced by a global counting IORef.


which is its own risk.  maybe you should NOINLINE it?

Potential code criticisms / suggestions for it as a library:

Unboxed: so it only works on GHC, even though others have 
unsafe IO too.  In theory, strictness annotations should be 
able to achieve the same efficiency.


Char is supposed to represent a Unicode character -- but 
this code behaves oddly:

For 64-bit Int#, it does so.
For 32-bit Int#, it assumes Char is within the first 8 bits 
(ASCII and a little more).
If Int# (or Int) can be 30-bit (like Haskell98 permission), 
its correctness suffers even worse.
Is it really even a necessary part of the design?  The only 
way you provide to extract it or depend on its value is 
indirectly via the Show instance.  Its presence there is, 
in any case, at the cost of max. 2^24 (16 million) IDs 
before problems happen, whereas billions is still not a 
great limit but at least is somewhat larger. (applications 
that are long-running or deal with huge amounts of data 
could be affected)


unsafeDupableInterleaveIO: this Dupable was safe for GHC 
to use because GHC is single-threaded.  Is it safe in a 
library setting?  I guess likewise, the IORef global 
variable wouldn't be thread-safe... but this one isn't even 
safe between separate runs of initIdSupply.  On the other 
hand, thread-safety probably makes it much less efficient 
(if you can find a way to use atomic int CPU instructions, 
it might not be too bad, or else per-thread counters... or 
just declare how unsafe it is)


unsafePerformIO: it's not totally necessary here.  Its only 
function is to make IDs generated by different runs of 
initIdSupply be distinct.  So it could, anyway, probably be 
refactored to only use unsafePerformIO global-ness once per 
initIdSupply and just use unsafeInterleaveIO within (where 
currently nextInt is called).


-Isaac
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: unique identifiers as a separate library

2008-12-18 Thread ajb

G'day all.

Quoting Sebastian Fischer s...@informatik.uni-kiel.de:


I have wrapped up (a tiny subset of) GHC's uniques into the package
`uniqueid` and put it on Hackage:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uniqueid


First off, thanks for this.


The main difference is due to my fear of depending on the foreign
function `genSymZh` which I replaced by a global counting IORef.


Why not depend on this instead?

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/value-supply

Cheers,
Andrew Bromage
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


unique identifiers as a separate library

2008-12-17 Thread Sebastian Fischer

Hello,

for a project I am using the libraries Unique, UniqSupply, and UniqFM  
from the package ghc. It seems odd to have a dependency on a whole  
compiler only in order to use its (highly efficient and, hence,  
preferred) implementation of unique identifiers.


Would it be possible to put everything concerned with unique  
identifiers in GHC into a separate package on Hackage?


This may also allow me to get a fix for http://hackage.haskell.org/trac/ghc/ticket/2880 
 without reinstalling GHC.


Cheers,
Sebastian

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users