Re: [GHC] #3480: Easily make Typeable keys pure, so that Typeable can be handled efficiently across communications

GHC Wed, 04 Nov 2009 10:58:01 -0800

#3480: Easily make Typeable keys  pure, so that Typeable can be handled
efficiently across communications
-------------------------------------+--------------------------------------
    Reporter:  guest                 |        Owner:                  
        Type:  task                  |       Status:  new             
    Priority:  normal                |    Milestone:  6.14.1          
   Component:  libraries/base        |      Version:                  
    Severity:  trivial               |   Resolution:                  
    Keywords:  Typeable, efficiency  |   Difficulty:  Unknown         
    Testcase:                        |           Os:  Unknown/Multiple
Architecture:  Unknown/Multiple      |  
-------------------------------------+--------------------------------------
Old description:


> Data.Typeable:  Easily make Typeable keys  pure(used in Eq), so that
> Typeable keys don´t vary from run to run. This permits an efficient
> storage of the keys in files and to be transmitted trough communications
> as well as processed without loss of efficiency. Actually gaining
> efficiency probably, since the keys caches are not necessary.
>
> Currently, whenever the user needs to communicate types, he must transmit
> the full string name for each type. Moreover, in the reception, the
> programmer  is forced to handle these full string keys for mapping types
> to handlers, in equality checks etc. if the type keys are pure, the
> efficiency of key handling can be keept across communications.
>
> short description of task:
> Istead of using a Hash( stringType, generatedKey) use  hashString
> (string-of-type)
>
> Long description:
>
> 1) drop the cache
> drop newKey
>
> 2) use instead  the expression:
>
>  Key $ hashString str
>
> whenever a new key is needed
>
> from the package Data.HashTable:
> hashString :: String -> Int
>
> the key obtained is pure so:
>
> 3) drop the "IO" in typeRepKey signature

New description:

 Data.Typeable:  Easily make Typeable keys  pure(used in Eq), so that
 Typeable keys don´t vary from run to run. This permits an efficient
 storage of the keys in files and to be transmitted trough communications
 as well as processed without loss of efficiency. Actually gaining
 efficiency probably, since the keys caches are not necessary.

 Currently, whenever the user needs to communicate types, he must transmit
 the full string name for each type. Moreover, in the reception, the
 programmer  is forced to handle these full string keys for mapping types
 to handlers, in equality checks etc. if the type keys are pure, the
 efficiency of key handling can be keept across communications.

 short description of task:
 Istead of using a Hash( stringType, generatedKey) use  hashString (string-
 of-type)

 Long description:

 1) drop the cache; drop newKey

 2) use instead  the expression:
 {{{
  Key $ hashString str
 }}}
 whenever a new key is needed from the package `Data.HashTable`:
 {{{
 hashString :: String -> Int
 }}}
 the key obtained is pure so:

 3) drop the "IO" in `typeRepKey` signature

Comment (by simonpj):

 Something that does a better job of `TypeRep` would certainly be welcome:

   1. Those `unsafePerformIO`s are embarrassing.  And indeed they lead to
 trouble if (as can happen in GHCi or Template Haskell) you get two
 instances of that hash table.

   2. If you were (naughtily) to serialise one of those keys into a file,
 they'd be wrong when you read them back in.  But this never happens (see
 the `Show` instance for `TypeRep`: what you serialise is the `TypeRep`
 data structure ''without'' the keys.  When it's read back in a new set of
 keys is constructed.  (Actually there is no `Read` instance but that is
 what would happen if there were.)

 The keys are ''solely'' used to speed up comparison.  Comparing the data
 structures and strings is the underlying ground truth.  So it's perfectly
 fine for keys to vary from run to run.

 However, it's vital that if two `TypeRep`s with the same key really are
 equal.  Hashing the string to an `Int` does not guarantee that.  To be
 sure, you'd need a lot more bits, something like the fingerprints we store
 in interface files.  See `compiler/utils/Fingerprint.hsc`.

 I'm not sure about the best approach.  The baseline is:
   * Compare `TypeRep` structures as one would any data structure.
   * At a `TyCon` compare the `String` representation of the type
 constructor.

 An alternative, that would do the fingerprint stuff once and for all at
 compile time, is:
   * Compare `TypeRep` structures as one would any data structure.
   * At a `TyCon` compare the fingerprint of the string representation of
 the `TyCon`

 Neither of these give constant-time comparison of `TypeRep`s, but I doubt
 that such comparisons are going to be in the inner loop of any program.
 And both fix (1) above

 Other things to think about:
   * Surely `TypeRep` should be in `Ord`!

   * We should check that the `String` used to make a `Typeable.TyCon`
 includes the full, versioned, package Id.

   * Even then I'm slightly worried.  The expection is that `foo-3.2:Foo.T`
 uniquely defines the entire structure of `T`. But what if you forget to
 bump the package version?

   * More subtly  what if `T` is defined thus:
 {{{
 data T = MkT S   -- S comes from package bar
 }}}
     Then if S's definition in package `bar` changes, but package `foo` is
 unchanged, the `TyCon` for `T` should really change too.  Otherwise you
 might serialise a value of type `T` into a file along with its `TypeRep`,
 and try to read it back in with the wrong deserialiser.  I'm not sure
 whether the package versioning rules force `foo`'s version to be bumped.
 Maybe we want `foo`'s new `PackageId`?

   * But in fact the `PackageId` is perhaps ''too'' fine-grained, becuase
 it changes whenever anything in package `foo` changes.  We really only
 want the `Typeable.TyCon` for `T` to change when the structure of `T`
 (transitively) changes.  Aha!  We already have fingerprints for that, used
 for versioning in interface files.  So perhpas, once we've calculated
 `T`'s fingerprint we can use that in the `Typeable.TyCon` for `T`.  That
 seems to me to be best story.

 We could do with someone who'd like to follow these things up.  Dynamic
 typing is important.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3480#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Re: [GHC] #3480: Easily make Typeable keys pure, so that Typeable can be handled efficiently across communications

Reply via email to