Re: [Biohaskell] enumeration types and performance

Christian Hoener zu Siederdissen Thu, 14 Jul 2011 10:44:26 -0700

Hi,

just so that everybody knows the background of my usage of newtypes. In
RNA-folding I used to represent nucleotides using nullery constructors
and good a very nice speedup using newtypes. But, of course, I had
billions of accesses (n^3, n ~= 1000).

For representing large sequences (lazyness, etc), one can use any
representation for the strand information.

Ketils basic typedefs should suit us well for sequence things. I will
have a type class MkPrimary with (mkPrimary :: SeqData -> VU.Vector
Nuc), for example.

And maybe, this is where one can say the following: if it is
performance critical (nucleotides in rna-folding), one should be able
to define a 'Prim' instance using rl's primitive package.

If performance is not that critical, use whatever is most convenient?!

Btw. if you need performance, consider staying away from some types
like Word. There are fun open bugs, where Int is 2.5x faster than
Word. ;-)

Gruss,
Christian

>On Thu, Jul 14, 2011 at 11:29 AM, Christian Höner zu Siederdissen
><[email protected]> wrote:
>> Hi,
>>
>> newtype Strand = Strand Int
>>
>> uses a single-constructor datatype "Int" as strand repr.
>>
>> While "Bool" is algebraic with two constructors. This is not
>> optimized completely. Or maybe "was", I don't know the current
>> status of GHC regarding this but I think it is still open.
>
>Hmmm, using Int may or may not be better.  Or maybe Word8, since we
>only need one bit.
>
>> So "True" is a pointer toward a global "True" with an indirection,
>> while "true = Strand 0" would be an actual "0 :: Int#".
>
>If your data type has something like
>
>  data X = X {..., strand :: !Strand, ...}
>
>then, although not unpacked, the strand will always be a pointer
>without indirections (i.e. not a thunk), right?
>
>> And at least 1 year ago, I had much better performance using
>> newtypes of Ints instead of "data Nuc = A | C | G | U"
>
>I've seen this as well, but with an enumeration of more than 20
>constructors.  But it's not like Strand is the bottleneck of some
>application, I think.  My concern is about losing readability without
>gaining anything measurable in real tests.  ADTs are really nice =).
>
>Cheers, =D
>
>-- 
>Felipe.
_______________________________________________
Biohaskell mailing list
[email protected]
http://malde.org/cgi-bin/mailman/listinfo/biohaskell

Re: [Biohaskell] enumeration types and performance

Reply via email to