[Haskell-cafe] ByteString missing rewrite RULES (zipWith' f a = pack . zipWith f a)

2010-10-05 Thread Thomas DuBuisson
All,

(I notice ByteString still isn't under l...@h.o ownership, which is good
because this way I can avoid the bureaucracy and e-mail the
maintainers directly)

The following is a Data.ByteString comment for the (non-exported)
function zipWith'
--
-- | (...) Rewrite rules
-- are used to automatically covert zipWith into zipWith' when a pack is
-- performed on the result of zipWith.
--

This implies there should be a rule:
{-# RULES
ByteString specialise zipWith' forall (f :: Word8 - Word8 - Word8) p q .
zipWith' f p q = pack (zipWith f p q)
  #-}

But no such rule exists in the ByteString source (the inverse rule
using 'unpack' does exist).

1) Is this an omission?  Can we fix it?  It's a rather important rule
for crypto-api.
2) Can we export zipWith' so people can be explicit?  If not, can we
get the comment about the rule placed somewhere so it will make its
way to the generated Haddock documentation for general users?

3) Very different issue:
Could .Lazy export hGetN or have defaultChunkSize configurable by a
CPP/compile time macro?

If not, perhaps we could make chunkOverhead = max 16 (2 * sizeOf
(undefined ::Int)) so it will be the same on 64 and 32 bit systems (a
128 bit boundary, nice and fast for most modern cipher algorithms,
sadly asking for it to match hash block sizes is a bit much).

Cheers,
Thomas
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ByteString missing rewrite RULES (zipWith' f a = pack . zipWith f a)

2010-10-05 Thread Jason Dusek
On Tue, Oct 5, 2010 at 18:07, Thomas DuBuisson
thomas.dubuis...@gmail.com wrote:
 If not, perhaps we could make chunkOverhead = max 16 (2 *
 sizeOf (undefined ::Int)) so it will be the same on 64 and 32
 bit systems (a 128 bit boundary, nice and fast for most modern
 cipher algorithms, sadly asking for it to match hash block
 sizes is a bit much).

  I don't have a horse in this race; but I am curious as to why
  you wouldn't ask for `chunkOverhead = 16' as that seems to be
  your intent as well as what the expression works out to on any
  machine in common use.

--
Jason Dusek
Linux User #510144 | http://counter.li.org/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ByteString missing rewrite RULES (zipWith' f a = pack . zipWith f a)

2010-10-05 Thread Thomas DuBuisson
  I don't have a horse in this race; but I am curious as to why
  you wouldn't ask for `chunkOverhead = 16' as that seems to be
  your intent as well as what the expression works out to on any
  machine in common use.

To avoid copying data when perform FFI calls to common cipher routines
(such operations usually work on 128 bit blocks).

If you have a Haskell program performing full disk encryption (FDE)
then its reasonable to expect large amounts of data to need
encrypted/decrypted.  Reading in Lazy ByteStrings you get 32k -
chunkOverhead sized strict bytestrings, which is a 64 bit multiple on
32 bit machines.  IOW, for an operation like cbc key iv lazyBS you
will 1) encrypt 32K-16B 2) copy the remainder (8 bytes) and the next
chunk (32K - 8B) into a new strict bytestring 3) encrypt the full 32K
chunk 4) repeat.

There are other ways to do it, but the fastest ways involve making
your delicate and extremely security sensitive cipher algorithm work
on partial blocks or build the notion of linked lists of buffers (lazy
byte strings) into the implementation (which is often in C).

Unfortunately, this problem only gets worse as you expand your scope.
Hash algorithms have a much wider array of block sizes (512 to 1024
bits are very common) and we don't want to waste 1024 - 64 bits per
32KB chunk, so I didn't request that.  In situations where people know
they'll be hashing large files and explicitly use Lazy ByteStrings
they could use hGetN to set the chunk size to something agreeable.

A less programmer-intensive solution would be to have chunks at a full
32K.  I'm not sure how much of a performance problem this would
introduce (to all users of bytestrings) due to caching (other
issues?).  Did anyone measured it when the initial implementation
decided to do it this way?

Cheers,
Thomas
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ByteString missing rewrite RULES (zipWith' f a = pack . zipWith f a)

2010-10-05 Thread Thomas DuBuisson
  I don't have a horse in this race; but I am curious as to why
  you wouldn't ask for `chunkOverhead = 16' as that seems to be
  your intent as well as what the expression works out to on any
  machine in common use.

Sorry, after I sent my long explanation I see what you are really
asking.  I was going by the assumption that someone really did measure
and find out that keeping the length and pointer information in the
same page as the bytestring data is a significant win.  While saying
chunkOverhead = 16 would still work it's simply false for imaginary
128bit Haskell machines (Cell SPEs?), and I don't like betting against
commercial changes in computing.

Cheers,
Thomas
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe