Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-11-05 Thread Don Stewart
kr.angelov:
> On Wed, Aug 13, 2008 at 1:18 AM, Don Stewart <[EMAIL PROTECTED]> wrote:
> >instance Binary a => Binary [a] where
> >put l  = put (length l) >> mapM_ put l
> >get= do n <- get :: Get Int
> >replicateM n get
> 
> Of course I changed this as well. Now it is:
> 
> instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where
> put m = put (Map.size m) >> mapM_ put (Map.toAscList m)
> get   = liftM Map.fromDistinctAscList get
> 
> You don't have to convert the map to list just to compute its size.
> The Map.size is a O(1) function.

If you have a more efficient instance Binary Map, please send a patch.

Collaborate!

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-11-05 Thread Don Stewart
kr.angelov:
> I had the same problem (stack overflow). The solution was to change
> the >>= operator in the Get monad. Currently it is:
> 
>   m >>= k   = Get (\s -> let (a, s') = unGet m s
>in unGet (k a) s')
> 
> but I changed it to:
> 
> m >>= k   = Get (\s -> case unGet m s of
>  (a, s') -> unGet (k a) s')
> 
> It seems that the bind operator is lazy and this caused the stack overflow.

Hmm. That's interesting. I'm not sure that doesn't change other things
we rely on though.
  
> I have also another problem. Every Int and Word is stored as 64-bit
> value and this expands the output file a lot. I have a lot of integers
> and most of them are < 128 but not all of them. I changed the
> serialization so that the Int and Word are serialized in a variable
> number of bytes. Without this change the binary serialization was even
> worse than the textual serialization that we had before. The file was
> almost full with zeros.

The motivation for this is to use zlib compress / decompress.
E.g.

writeFile "f" . compress . encode $ foo
  
> I just haven't time to prepare a patch and to send it for review but
> if other people have the same problem I will do it.
> 

Patches welcome. You shouldn't need to patch a library like this, it
should be able to do what you need.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-11-05 Thread Krasimir Angelov
I had the same problem (stack overflow). The solution was to change
the >>= operator in the Get monad. Currently it is:

  m >>= k   = Get (\s -> let (a, s') = unGet m s
   in unGet (k a) s')

but I changed it to:

m >>= k   = Get (\s -> case unGet m s of
 (a, s') -> unGet (k a) s')

It seems that the bind operator is lazy and this caused the stack overflow.

I have also another problem. Every Int and Word is stored as 64-bit
value and this expands the output file a lot. I have a lot of integers
and most of them are < 128 but not all of them. I changed the
serialization so that the Int and Word are serialized in a variable
number of bytes. Without this change the binary serialization was even
worse than the textual serialization that we had before. The file was
almost full with zeros.

I just haven't time to prepare a patch and to send it for review but
if other people have the same problem I will do it.


Best Regars,
   Krasimir




On Wed, Aug 13, 2008 at 1:13 AM, Tim Newsham <[EMAIL PROTECTED]> wrote:
> I have a program that read in and populated a large data structure and
> then saved it out with Data.Binary and Data.ByteString.Lazy.Char8:
>
>   saveState db = B.writeFile stateFile =<<
>   encode <$> atomically (readTVar db)
>
> when I go to read this in later I get a stack overflow:
>
> loadState db = do
>d <- decode <$> B.readFile stateFile
>atomically $ writeTVar db d
>
>  Stack space overflow: current size 8388608 bytes.
>  Use `+RTS -Ksize' to increase it.
>
> or from ghci:
>
>d <- liftM decode
>  (Data.ByteString.Lazy.Char8.readFile
> "savedState.bin") :: IO InstrsDb
>
>fromList *** Exception: stack overflow
>
> The data type I'm storing is a Map (of maps):
>
>   type DailyDb = M.Map Date Daily
>   type InstrsDb = M.Map String DailyDb
>
> What's going on here?  Why is the system capable of building and saving
> the data but not in reading and umarhsalling it?  What is the proper way
> to track down where the exception is happening?  Any debugging tips?
>
> I also noticed another issue while testing.  If my program loads
> the data at startup by calling loadState then all later calls to
> saveState give an error:
>
>  Log: savedState.bin: openFile: resource busy (file is locked)
>
> this does not occur if the program wasnt loaded.  My best guess here
> is that B.readFile isnt completing and closing the file for some
> reason.  Is there a good way to force this?
>
> Tim Newsham
> http://www.thenewsh.com/~newsham/
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-11-05 Thread Krasimir Angelov
On Wed, Aug 13, 2008 at 1:18 AM, Don Stewart <[EMAIL PROTECTED]> wrote:
>instance Binary a => Binary [a] where
>put l  = put (length l) >> mapM_ put l
>get= do n <- get :: Get Int
>replicateM n get

Of course I changed this as well. Now it is:

instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where
put m = put (Map.size m) >> mapM_ put (Map.toAscList m)
get   = liftM Map.fromDistinctAscList get

You don't have to convert the map to list just to compute its size.
The Map.size is a O(1) function.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-11-05 Thread Ketil Malde
Ketil Malde <[EMAIL PROTECTED]> writes:

> Doing 'x <- decodeFile "/dev/zero"

Well, it turns out 'decodeFile' needs to -- or does, anyway -- check
whether the file is empty.  Replacing it with a combination of
'decode' and 'readFile' solved the problem.

Thanks to Saizan and the other people hanging around on #haskell.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-11-05 Thread Ketil Malde

Old threads never die:

Tim Newsham <[EMAIL PROTECTED]> writes:

>> Chunk = {
>>length :: Word8
>>elems :: [Elem]  --  0..255 repetitions
>>  }
>> Chunks = [Chunk] -- terminated with the first 0 length Chunk

> I tried my hand at the encoding above:
>
> http://www.thenewsh.com/%7Enewsham/store/test10.hs
>
> it seems to work, although it doesn't seem to be very efficient.
> I'm getting very large memory growth when I was hoping it
> would be lazy and memory efficient...  What's leaking?

Did you ever get to the bottom of this?

I have a similar problem with Data.Binary that I don't know how to
work around yet.  It boils down to reading a large list.  This
exhibits the problem:

  newtype Foo = Foo [Word8]
  instance Binary Foo where
  get = do 
xs <- replicateM 1000 get
return (Foo xs)

Doing 'x <- decodeFile "/dev/zero" and "case x of Foo y -> take 10 y"
blows the heap.  I thought Data.Binary was lazy?

My actual program looks something like this:

  instance Binary MyData where
 get = do
 header <- get
 data   <- replicateM (data_length header) $ do 
  stuff to read a data item
 return (MyData header data)

This blows the stack as soon as I try to access anything, even if it's
just the contents of the header.  Why?

My understanding of how Data.Binary works must be sorely lacking.
Could some kind soul please disperse some enlightenment in my
direction? 

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-26 Thread Duncan Coutts
On Tue, 2008-08-26 at 15:31 -0700, Bryan O'Sullivan wrote:
> On Tue, Aug 26, 2008 at 3:04 PM, Don Stewart <[EMAIL PROTECTED]> wrote:
> 
> > No, since I can get whnf with `seq`. However, that does sound like a
> > good idea (a patch to the parallel library? )
> 
> I suspect that patching parallel doesn't scale. It doesn't have a
> maintainer, so it will be slow, and the package will end up dragging
> in everything under the sun if we centralise instances in there. I
> think that the instance belongs in bytestring instead. I know that
> this would make everything depend on parallel, but that doesn't seem
> as bad a problem.

This is a general problem we have with packages and instances. Perhaps
in this specific case it wouldn't cause many problems to make bytestring
depend on parallel (though it means bytestring cannot be a boot lib and
cannot be used to implement basic IO) but in general it can be a
problem. I can't see any obvious solutions. We don't want lots of tiny
packages that just depend on two other packages and define a instance.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-26 Thread Bryan O'Sullivan
On Tue, Aug 26, 2008 at 3:04 PM, Don Stewart <[EMAIL PROTECTED]> wrote:

> No, since I can get whnf with `seq`. However, that does sound like a
> good idea (a patch to the parallel library? )

I suspect that patching parallel doesn't scale. It doesn't have a
maintainer, so it will be slow, and the package will end up dragging
in everything under the sun if we centralise instances in there. I
think that the instance belongs in bytestring instead. I know that
this would make everything depend on parallel, but that doesn't seem
as bad a problem.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-26 Thread Don Stewart
bos:
> On Mon, Aug 25, 2008 at 2:28 PM, Don Stewart <[EMAIL PROTECTED]> wrote:
> 
> > I've pushed a decodeFile that does a whnf on the tail after decoding.
> 
> Does this mean that there are now NFData instances for bytestrings?
> That would be handy.

No, since I can get whnf with `seq`. However, that does sound like a
good idea (a patch to the parallel library? )
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-26 Thread Bryan O'Sullivan
On Mon, Aug 25, 2008 at 2:28 PM, Don Stewart <[EMAIL PROTECTED]> wrote:

> I've pushed a decodeFile that does a whnf on the tail after decoding.

Does this mean that there are now NFData instances for bytestrings?
That would be handy.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-25 Thread Don Stewart
duncan.coutts:
> On Thu, 2008-08-14 at 10:21 -0700, Don Stewart wrote:
> 
> > > I think you're right. The Binary instances cannot and must not read more
> > > than they need to, so that gives us the behaviour that we read exactly
> > > the length of the file, but no more, and thus we never hit EOF, so we
> > > don't close the file. So yes, decode should force the tail so that it
> > > can indeed hit EOF.
> > 
> > Duncan, 
> > 
> > You're suggesting that decode and decodeFile should whnf the next cell?
> 
> At least decodeFile should, since it doesn't give you any other access
> to the file handle otherwise.
> 
> Does decode return the tail? I don't remember. If not it should also
> whnf it. If it does then the user can choose (they might want to do
> something else with the trailing data).

I've pushed a decodeFile that does a whnf on the tail after decoding.
If you're at the end of the file, that's sufficient to close the Handle.

You'll also need bytestring >= 0.9.1.0 (note, not the one that comes by
default with ghc 6.8.x)

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-14 Thread Tim Newsham

Maybe it makes sense to have the streamble list instance in Binary as
well, with some examples?



Chunk = {
   length :: Word8
   elems :: [Elem]  --  0..255 repetitions
 }
Chunks = [Chunk] -- terminated with the first 0 length Chunk


I tried my hand at the encoding above:

http://www.thenewsh.com/%7Enewsham/store/test10.hs

it seems to work, although it doesn't seem to be very efficient.
I'm getting very large memory growth when I was hoping it
would be lazy and memory efficient...  What's leaking?


-- Don


Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-14 Thread Duncan Coutts
On Thu, 2008-08-14 at 10:21 -0700, Don Stewart wrote:

> > I think you're right. The Binary instances cannot and must not read more
> > than they need to, so that gives us the behaviour that we read exactly
> > the length of the file, but no more, and thus we never hit EOF, so we
> > don't close the file. So yes, decode should force the tail so that it
> > can indeed hit EOF.
> 
> Duncan, 
> 
> You're suggesting that decode and decodeFile should whnf the next cell?

At least decodeFile should, since it doesn't give you any other access
to the file handle otherwise.

Does decode return the tail? I don't remember. If not it should also
whnf it. If it does then the user can choose (they might want to do
something else with the trailing data).

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-14 Thread Don Stewart
duncan.coutts:
> On Wed, 2008-08-13 at 12:02 -1000, Tim Newsham wrote:
> 
> > However, I think probably the real blame here should probably go
> > to Data.Binary which doesn't attempt to check that it has consumed
> > all of its input after doing a "decode".  If "decode" completes
> > and there is unconsumed data, it should probably raise an error
> > (it already raises errors for premature EOF).  There's no reason
> > for it not to, since it does not provide the unconsumed data to
> > the caller when its done, anyway...
> > 
> > Thoughts?
> 
> I think you're right. The Binary instances cannot and must not read more
> than they need to, so that gives us the behaviour that we read exactly
> the length of the file, but no more, and thus we never hit EOF, so we
> don't close the file. So yes, decode should force the tail so that it
> can indeed hit EOF.

Duncan, 

You're suggesting that decode and decodeFile should whnf the next cell?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-14 Thread Bryan O'Sullivan
On Wed, Aug 13, 2008 at 3:02 PM, Tim Newsham <[EMAIL PROTECTED]> wrote:

> However, I think probably the real blame here should probably go
> to Data.Binary which doesn't attempt to check that it has consumed
> all of its input after doing a "decode".  If "decode" completes
> and there is unconsumed data, it should probably raise an error
> (it already raises errors for premature EOF).  There's no reason
> for it not to, since it does not provide the unconsumed data to
> the caller when its done, anyway...

You missed runGetState in Data.Binary.Get, which I added. It's
definitely not an error in the abstract to have excess input after
you're done decoding. In your specific application, it might be, but
then you should write a combinator that checks for this state.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-14 Thread Bertram Felgenhauer
Tim Newsham wrote:
[snip]
> I would have expected this to fix my problems:
>
>   binEof :: Get ()
>   binEof = do
>   more <- not <$> isEmpty
>   when more $ error "expected EOF"
>
>   decodeFully :: Binary b => B.ByteString -> b
>   decodeFully = runGet (get << binEof)
> where a << b = a >>= (\x -> b >> return x)

Remember that the Get monad is lazy; the result of binEof is never
used, so the action is not performed.

decodeFully :: Binary b => B.ByteString -> b
decodeFully = runGet (get << binEof)
  where a << b = a >>= (\x -> b >>= \y -> y `seq` return x)
works, for example, and also
  where a << b = a >>= (\x -> b >>= \y -> return (y `seq` x))
and
  where (<<) = liftM2 (flip seq)

HTH,

Bertram
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-14 Thread Bryan O'Sullivan
On Wed, Aug 13, 2008 at 5:39 PM, Tim Newsham <[EMAIL PROTECTED]> wrote:
> So am I understanding you correctly that you believe this is not
> a bug?  That the use Data.Binary.decodeFile function leaks a file
> descriptor and this is proper behavior?

I think he might be saying that decodeFile is not the place for
checking this condition. I will put words in his mouth and say that
checking for EOF after a decode is the responsibility of the
application code, because the lower level cannot possibly know whether
it makes sense for there to be residual data in the ByteString. There
are plenty of file formats that consist of back-to-back concatenated
chunks of data, in which reading just one chunk does not by any means
require that a file can only contain one.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-14 Thread Tim Newsham

I think he might be saying that decodeFile is not the place for
application code, because the lower level cannot possibly know whether
it makes sense for there to be residual data in the ByteString. There
are plenty of file formats that consist of back-to-back concatenated
chunks of data, in which reading just one chunk does not by any means
require that a file can only contain one.


Right, but because of the way decodeFile works, whenever you do
not have a data type that explicitely checks for EOF in it's
Get definition, decodeFile will leak a file handle.  There is no
way to check that there is residual data, to access it, or to
close the file handle.  Since this is the normal state of affairs
(are there any Get definitions in the current library which check
for EOF when done?) I would suggest that this is an API bug.

I would suggest that "decodeFile" should check for EOF when done.
A second wrapper function "decodePartialFile" could return the
unconsumed data, perhaps, for situations when the EOF behavior is
not desired, or return some other way for the file to be closed.

Additionally, I would suggest that the Data.Binary library provide
a combinator for consuming data fully (ie. checking for EOF).  ie:

   fully :: Get a -> Get a
   fully a = do
  x <- a
  e <- isEmpty
  return $ case e of
  False -> error "expected EOF"
  True  -> x

   decodeFully = runGet $ fully get
   decodeFile fn = decodeFully <$> B.readFile fn

to make it easy for developers who do not use the decodeFile
primitive to add EOF checking to their marshalling functions.

As it currently stands, the most obvious application of the Data.Binary
API leads to subtly confusing errors that may go unnoticed for a
while.  (This would be a fine point for the documentation to address
to prevent others from falling in the same hole).

I'm currently using definitions like these and (`using` rnf) and
have a server that is able to repeatedly read and write the state
file.  Many thanks to Dons, Brian, Duncan and everyone else who
helped me out...

Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-14 Thread Don Stewart
newsham:
> So am I understanding you correctly that you believe this is not
> a bug?  That the use Data.Binary.decodeFile function leaks a file
> descriptor and this is proper behavior?

It's not a bug. It's lazy IO. If you want the Handle closed, demand all the
input. isEmpty will do this for you, if you're at the end of the file already.

> I still don't understand your explanation of how isEmpty can
> return True without having read to EOF.  The ByteString continues
> to contain more data until an EOF is reached.  Doesn't one of
> 
>  return (B.null s && L.null ss)

isEmpty is perfectly fine. You're just not demanding its result.

Consider,

{-# LANGUAGE BangPatterns #-}

import Data.Word
import Data.Binary
import Data.Binary.Get
import qualified Data.ByteString.Lazy as L
import System.IO

main = do
encodeFile stateFile (42 :: Word32)
d <- strictDecodeFile stateFile :: IO Word32
encodeFile stateFile d
print d
  where
stateFile = "1word32.bin"

strictDecodeFile :: Binary a => FilePath -> IO a
strictDecodeFile f = do
ss <- L.readFile f
return $! runGet (do v  <- get
 !m <- isEmpty -- if we're at the end, this will 
close it
 return v) ss

Look at strictDecodeFile. It's pretty much identical to the normal decodeFile,
but it assumes 'get' will consume the entire file. It then checks for null,
which will trigger an EOF and close if you are actually at the end.

So we just decode the file, and check if the buffer's empty at the end.

$ ghc --make A.hs
[1 of 1] Compiling Main ( A.hs, A.o )
Linking A ...
$ ./A
42

But if we leave out that bang pattern on isEmpty, the test won't run, and we'll 
get,

$ ./A
A: 1word32.bin: openBinaryFile: resource busy (file is locked)

So were you just confused about how to use isEmpty?

You could also explicit close in strictDecodeFile,

strictDecodeFile :: Binary a => FilePath -> IO a
strictDecodeFile f = do
h  <- openFile f ReadMode
ss <- L.hGetContents h f
let !v = runGet (do v  <- get
return v) ss
hClose h
return v

Whatever works best for you.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-14 Thread Don Stewart
bos:
> On Wed, Aug 13, 2008 at 5:39 PM, Tim Newsham <[EMAIL PROTECTED]> wrote:
> > So am I understanding you correctly that you believe this is not
> > a bug?  That the use Data.Binary.decodeFile function leaks a file
> > descriptor and this is proper behavior?
> 
> I think he might be saying that decodeFile is not the place for
> checking this condition. I will put words in his mouth and say that
> checking for EOF after a decode is the responsibility of the
> application code, because the lower level cannot possibly know whether
> it makes sense for there to be residual data in the ByteString. There
> are plenty of file formats that consist of back-to-back concatenated
> chunks of data, in which reading just one chunk does not by any means
> require that a file can only contain one.

Exactly.

This particular condition -- that encode should consume exactly the
amount of data in the input file, and be sitting on EOF at the end -- is
application dependent.

That said, there's an argument to be made that the wrapper, decodeFile,
could reasonable assume this is the most common case.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-14 Thread Duncan Coutts
On Wed, 2008-08-13 at 12:02 -1000, Tim Newsham wrote:

> However, I think probably the real blame here should probably go
> to Data.Binary which doesn't attempt to check that it has consumed
> all of its input after doing a "decode".  If "decode" completes
> and there is unconsumed data, it should probably raise an error
> (it already raises errors for premature EOF).  There's no reason
> for it not to, since it does not provide the unconsumed data to
> the caller when its done, anyway...
> 
> Thoughts?

I think you're right. The Binary instances cannot and must not read more
than they need to, so that gives us the behaviour that we read exactly
the length of the file, but no more, and thus we never hit EOF, so we
don't close the file. So yes, decode should force the tail so that it
can indeed hit EOF.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-13 Thread Tim Newsham

So am I understanding you correctly that you believe this is not
a bug?  That the use Data.Binary.decodeFile function leaks a file
descriptor and this is proper behavior?

I still don't understand your explanation of how isEmpty can
return True without having read to EOF.  The ByteString continues
to contain more data until an EOF is reached.  Doesn't one of

 return (B.null s && L.null ss)

force getContents to read until EOF?

On Wed, 13 Aug 2008, Don Stewart wrote:

newsham:

Ok, surely at least everyone must agree that this is a bug:

  force :: Word8 -> IO Word8
  force x = print x >> return x
  -- force = return . (`using` rnf)

  main = do
  d <- force =<< decodeFile stateFile
  encodeFile stateFile d
  where stateFile = "1word32.bin"

test8.hs: 1word32.bin: openBinaryFile: resource busy (file is locked)



Remember that

   decodeFile f = liftM decode (L.readFile f)

and

   readFile :: FilePath -> IO ByteString
   readFile f = openBinaryFile f ReadMode >>= hGetContents

where hGetContents sits in a loop, reading chunks,

   loop = do
   c <- S.hGetNonBlocking h k
   if S.null c
 then do eof <- hIsEOF h
 if eof then hClose h >> return Empty
else hWaitForInput h (-1)
  >> loop
 else do cs <- lazyRead
 return (Chunk c cs)

while isEmpty is just,

   isEmpty :: Get Bool
   isEmpty = do
   S s ss _ <- get
   return (B.null s && L.null ss)

That is, it checks the parsed chunk, it doesn't demand any more reading be done.

So the only way you're going to get that Handle closed by readFile is to ensure
you read till EOF is hit. After you decode, just ask keep asking for bytes till 
EOF,
or close it yourself,

   decodeFile f = do
   h  <- openFile f ReadMode
   ss <- L.hGetContents h
   let e = decode ss
   rnf e `seq` hClose h

or some such, where you can confirm the decoding as taken place.



Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-13 Thread Don Stewart
newsham:
> Ok, surely at least everyone must agree that this is a bug:
> 
>   force :: Word8 -> IO Word8
>   force x = print x >> return x
>   -- force = return . (`using` rnf)
> 
>   main = do
>   d <- force =<< decodeFile stateFile
>   encodeFile stateFile d
>   where stateFile = "1word32.bin"
> 
> test8.hs: 1word32.bin: openBinaryFile: resource busy (file is locked)
> 

Remember that

decodeFile f = liftM decode (L.readFile f)

and 

readFile :: FilePath -> IO ByteString
readFile f = openBinaryFile f ReadMode >>= hGetContents

where hGetContents sits in a loop, reading chunks,

loop = do
c <- S.hGetNonBlocking h k
if S.null c
  then do eof <- hIsEOF h
  if eof then hClose h >> return Empty
 else hWaitForInput h (-1)
   >> loop
  else do cs <- lazyRead
  return (Chunk c cs)

while isEmpty is just,

isEmpty :: Get Bool
isEmpty = do
S s ss _ <- get
return (B.null s && L.null ss)

That is, it checks the parsed chunk, it doesn't demand any more reading be done.

So the only way you're going to get that Handle closed by readFile is to ensure
you read till EOF is hit. After you decode, just ask keep asking for bytes till 
EOF, 
or close it yourself,

decodeFile f = do
h  <- openFile f ReadMode
ss <- L.hGetContents h
let e = decode ss
rnf e `seq` hClose h

or some such, where you can confirm the decoding as taken place.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-13 Thread Tim Newsham

Ok, surely at least everyone must agree that this is a bug:

  force :: Word8 -> IO Word8
  force x = print x >> return x
  -- force = return . (`using` rnf)

  main = do
  d <- force =<< decodeFile stateFile
  encodeFile stateFile d
  where stateFile = "1word32.bin"

test8.hs: 1word32.bin: openBinaryFile: resource busy (file is locked)

the built-in Data.Binary.decodeFile function doesn't close
its handle when it is done (same reason as my earlier examples).


However, I think probably the real blame here should probably go
to Data.Binary which doesn't attempt to check that it has consumed
all of its input after doing a "decode".  If "decode" completes
and there is unconsumed data, it should probably raise an error
(it already raises errors for premature EOF).  There's no reason
for it not to, since it does not provide the unconsumed data to
the caller when its done, anyway...


I would have expected this to fix my problems:

  binEof :: Get ()
  binEof = do
  more <- not <$> isEmpty
  when more $ error "expected EOF"

  decodeFully :: Binary b => B.ByteString -> b
  decodeFully = runGet (get << binEof)
where a << b = a >>= (\x -> b >> return x)

but even when using decodeFully, it still doesn't close the handle.
Shouldn't Data.Binary.Get.isEmpty force a file handle close in
the case that it returns True?

Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-13 Thread Antoine Latter
On Wed, Aug 13, 2008 at 5:02 PM, Tim Newsham <[EMAIL PROTECTED]> wrote:
>
> However, I think probably the real blame here should probably go
> to Data.Binary which doesn't attempt to check that it has consumed
> all of its input after doing a "decode".  If "decode" completes
> and there is unconsumed data, it should probably raise an error
> (it already raises errors for premature EOF).  There's no reason
> for it not to, since it does not provide the unconsumed data to
> the caller when its done, anyway...
>

Would the error be raised in 'decode' or in 'runGet'?  On a project in
progress I rely on 'runGet' to toss out padding bytes for me.

-Antoine
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-13 Thread Don Stewart
newsham:
> >Should the file be closed when the last byte is read (in this
> >case its definitely reading all four bytes) or when the first
> >byte after that is read (in this case it probably doesn't
> >attempt to read more than 4 bytes)?
> 
> I'll answer my own question.  Both Prelude.readFile and
> Data.ByteString.Lazy.Char8.readFile will keep the file open
> after reading the last byte and close it when trying to
> read further.  Proof:
> 
>   module Main where
>   import Control.Applicative
>   -- import qualified Data.ByteString.Lazy.Char8 as B
>   import Prelude as B
> 
>   stateFile = "1word32.bin"
>   main = do
>   x <- B.take 4 <$> B.readFile stateFile
>   -- x <- B.take 5 <$> B.readFile stateFile
>   print x
>   B.writeFile stateFile x
> 
> This works for Prelude and ByteString when taking 5 (there are
> exactly 4 bytes in "1word32.bin") and fail when taking 4.
> 
> I'm not sure that this behavior is so bad..  there might be some
> advantages...  but it might be nice to have it close after the last
> byte is read...
> 
> However, I think probably the real blame here should probably go
> to Data.Binary which doesn't attempt to check that it has consumed
> all of its input after doing a "decode".  If "decode" completes
> and there is unconsumed data, it should probably raise an error
> (it already raises errors for premature EOF).  There's no reason
> for it not to, since it does not provide the unconsumed data to
> the caller when its done, anyway...
> 

This is perhaps a use case for Data.Binary.Strict then.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-13 Thread Tim Newsham

Should the file be closed when the last byte is read (in this
case its definitely reading all four bytes) or when the first
byte after that is read (in this case it probably doesn't
attempt to read more than 4 bytes)?


I'll answer my own question.  Both Prelude.readFile and
Data.ByteString.Lazy.Char8.readFile will keep the file open
after reading the last byte and close it when trying to
read further.  Proof:

  module Main where
  import Control.Applicative
  -- import qualified Data.ByteString.Lazy.Char8 as B
  import Prelude as B

  stateFile = "1word32.bin"
  main = do
  x <- B.take 4 <$> B.readFile stateFile
  -- x <- B.take 5 <$> B.readFile stateFile
  print x
  B.writeFile stateFile x

This works for Prelude and ByteString when taking 5 (there are
exactly 4 bytes in "1word32.bin") and fail when taking 4.

I'm not sure that this behavior is so bad..  there might be some
advantages...  but it might be nice to have it close after the last
byte is read...

However, I think probably the real blame here should probably go
to Data.Binary which doesn't attempt to check that it has consumed
all of its input after doing a "decode".  If "decode" completes
and there is unconsumed data, it should probably raise an error
(it already raises errors for premature EOF).  There's no reason
for it not to, since it does not provide the unconsumed data to
the caller when its done, anyway...

Thoughts?

Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-13 Thread Tim Newsham

Ah, that would be a bug in older ByteString implementations, that were a
bit incautious about closing handles. This example works for me with

   bytestring-0.9.1.0


Yup, thank you Don and Duncan for pointing this out.  I updated
my bytestring library and the test case no longer fails.  However,
I'm still having problems and not sure why.  I was able to
distill the problem down to this:

  $ od -x 1word32.bin
  000  05002ca4

  $ runhaskell test6.hs
  loading...
  saving...
  test6.hs: 1word32.bin: openFile: resource busy (file is locked)

  $ cat test6.hs
  module Main where
  import Control.Applicative
  import Control.Parallel.Strategies (rnf, NFData, using)
  import Data.Binary
  import qualified Data.ByteString.Lazy.Char8 as B
  import Data.Word

  stateFile = "1word32.bin"

  loadState :: IO Word32
  loadState = decode <$> B.readFile stateFile

  saveState :: Word32 -> IO ()
  saveState db = B.writeFile stateFile $ encode db

  {-
  -- Works!
  loadState = B.readFile stateFile
  saveState = B.writeFile stateFile
  -}

  -- force x = print x >> return x
  force = return . (`using` rnf)

  main = do
  putStrLn "loading..."
  d <- force =<< loadState
  putStrLn "saving..."
  saveState d


I tried this both with "print" and "rnf" to the same effect.
It looks like there still might be some situations where the
file isn't being closed?

Should the file be closed when the last byte is read (in this
case its definitely reading all four bytes) or when the first
byte after that is read (in this case it probably doesn't
attempt to read more than 4 bytes)?

Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-13 Thread Duncan Coutts
On Tue, 2008-08-12 at 14:13 -1000, Tim Newsham wrote:

> I also noticed another issue while testing.  If my program loads
> the data at startup by calling loadState then all later calls to
> saveState give an error:
> 
>Log: savedState.bin: openFile: resource busy (file is locked)

You're not using an old version of bytestring are you, anything older
than 0.9.0.4? We had a bug where the handle was not closed as soon as we
got to the end of the stream, so even forcing the whole input didn't
help.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-13 Thread Don Stewart
ketil:
> Don Stewart <[EMAIL PROTECTED]> writes:
> 
> >> You really, really want to be using rnf for this job, instead of
> >> turning your brain into a pretzel shape.
> 
> > The Pretzel being one of the lesser-known lazy, cyclic, functional data 
> > structures.
> 
> So "pretzel-brain" is actually a honorific, rather than derogative term?
> /me makes mental note.

Yes, bestowed upon those who've read Okasaki, and can tie-the-knot.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Ketil Malde
Don Stewart <[EMAIL PROTECTED]> writes:

>> You really, really want to be using rnf for this job, instead of
>> turning your brain into a pretzel shape.

> The Pretzel being one of the lesser-known lazy, cyclic, functional data 
> structures.

So "pretzel-brain" is actually a honorific, rather than derogative term?
/me makes mental note.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Don Stewart
newsham:
> >I'm starting to wonder if this isn't an issue with
> >Data.ByteString.Lazy.Char8.{read,write}File.
> 
> This simple test case fails:
> 
>   module Main where
>   import qualified Data.ByteString.Lazy.Char8 as B
>   main = do
>   print =<< B.readFile "xxx"
>   B.writeFile "xxx" =<< B.readFile "test.hs"
> 
> If you replace B.readFile with readFile and B.writeFile with writeFile
> it works properly.  ByteString bug?

Ah, that would be a bug in older ByteString implementations, that were a
bit incautious about closing handles. This example works for me with 

bytestring-0.9.1.0

You're looking for a post-Dec 19, 2007 release, after the patch,

  Wed Dec 19 22:06:13 PST 2007  Don Stewart
* For lazy IO operations, be sure to hClose the resource on EOF

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Don Stewart
bos:
> On Tue, Aug 12, 2008 at 6:01 PM, Tim Newsham <[EMAIL PROTECTED]> wrote:
> 
> > (my keys are dates, which are Enum).  This should look at
> > every key in every inner map.  Shouldn't that be sufficient to
> > force the entire data set (or do I have to touch the fields in the
> > data elements too?)
> 
> You might have to force the last value of the alist that the map gets
> flattened into, since otherwise there's no guarantee that it will be
> read.
> 
> You really, really want to be using rnf for this job, instead of
> turning your brain into a pretzel shape.

The Pretzel being one of the lesser-known lazy, cyclic, functional data 
structures.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Tim Newsham

I'm starting to wonder if this isn't an issue with
Data.ByteString.Lazy.Char8.{read,write}File.


This simple test case fails:

  module Main where
  import qualified Data.ByteString.Lazy.Char8 as B
  main = do
  print =<< B.readFile "xxx"
  B.writeFile "xxx" =<< B.readFile "test.hs"

If you replace B.readFile with readFile and B.writeFile with writeFile
it works properly.  ByteString bug?

Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Tim Newsham

You might have to force the last value of the alist that the map gets
flattened into, since otherwise there's no guarantee that it will be
read.

You really, really want to be using rnf for this job, instead of
turning your brain into a pretzel shape.


*nod* that's my eventual goal but I'd like to make sure I understand
what is going on here first and rule out any bugs before I go using
some class I'm not that familiar with.

I'm starting to wonder if this isn't an issue with
Data.ByteString.Lazy.Char8.{read,write}File.  I am now printing
out the data entirely:

d <- decode <$> B.readFile stateFile
print d

and I still get the same error when I go to writeFile later.
There should be no data items in any of my structures that
the print statement does not force.

Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Bryan O'Sullivan
On Tue, Aug 12, 2008 at 6:01 PM, Tim Newsham <[EMAIL PROTECTED]> wrote:

> (my keys are dates, which are Enum).  This should look at
> every key in every inner map.  Shouldn't that be sufficient to
> force the entire data set (or do I have to touch the fields in the
> data elements too?)

You might have to force the last value of the alist that the map gets
flattened into, since otherwise there's no guarantee that it will be
read.

You really, really want to be using rnf for this job, instead of
turning your brain into a pretzel shape.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Tim Newsham

Doesn't Data.Map.size run in O(1) time? Maybe something like using
different encodings for big maps in the default implementation would
help?


ugh, of course.  Ok, so I fixed it to:

loadState db = do
d <- decode <$> B.readFile stateFile
let force = sum $ map (sum . map fromEnum . M.keys) $ M.elems d
print force
force `seq` atomically $ writeTVar db d

(my keys are dates, which are Enum).  This should look at
every key in every inner map.  Shouldn't that be sufficient to
force the entire data set (or do I have to touch the fields in the
data elements too?).  I still get the same error condition.


Felipe.


Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Don Stewart
bos:
> On Tue, Aug 12, 2008 at 5:13 PM, Tim Newsham <[EMAIL PROTECTED]> wrote:
> 
> > The data type I'm storing is a Map (of maps):
> >
> >   type DailyDb = M.Map Date Daily
> >   type InstrsDb = M.Map String DailyDb
> >
> > What's going on here?
> 
> The default marshalling scheme that Binary uses for lists and maps
> (which are flattened to lists before writing out) is not streamable.
> Instead of writing out data in chunks, it computes the length of the
> list and writes that, followed by the elements. Presumably on the read
> side, a huge thunk is being built up before any actual Map creation
> starts.

Maybe it makes sense to have the streamble list instance in Binary as
well, with some examples?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Tim Newsham

Maybe it makes sense to have the streamble list instance in Binary as
well, with some examples?


A flexible format that doesn't sacrifice too much space efficiency
would be to encode in chunks of up to 255 elements:

 Chunk = {
length :: Word8
elems :: [Elem]  --  0..255 repetitions
  }
 Chunks = [Chunk] -- terminated with the first 0 length Chunk

streamable, amortized cost is about 1/256th of the length, and
the encoding is much more efficient than the current scheme
for short lists (like most strings).  Currently a string "foobar"
is 8 bytes for length and 7 for the actual string.


-- Don


Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Tim Newsham

On Tue, 12 Aug 2008, Bryan O'Sullivan wrote:


On Tue, Aug 12, 2008 at 5:34 PM, Tim Newsham <[EMAIL PROTECTED]> wrote:


I tried to force the data with:

   loadState db = do
   d <- decode <$> B.readFile stateFile
   let force = sum $ M.elems $ M.size `fmap` d
   force `seq` atomically $ writeTVar db d

and I get the same error when trying to writeFile after doing
a loadState.


What happens if you simply print the number of elements in the map?
Forcing its spine is all you should need.


This is what the sum above does.  Its a Map of Maps, so this maps
M.size over the outter Map and sums the resulting elements.
That should touch every map element, at least.

Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Felipe Lessa
On Tue, Aug 12, 2008 at 9:32 PM, Don Stewart <[EMAIL PROTECTED]> wrote:
> Not hackery, just a different encoding. The default Binary encodings
> don't work cover all use cases and all scales. To hit other sweet spots,
> use your own instances.

Doesn't Data.Map.size run in O(1) time? Maybe something like using
different encodings for big maps in the default implementation would
help?

-- 
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Bryan O'Sullivan
On Tue, Aug 12, 2008 at 5:34 PM, Tim Newsham <[EMAIL PROTECTED]> wrote:

> I tried to force the data with:
>
>loadState db = do
>d <- decode <$> B.readFile stateFile
>let force = sum $ M.elems $ M.size `fmap` d
>force `seq` atomically $ writeTVar db d
>
> and I get the same error when trying to writeFile after doing
> a loadState.

What happens if you simply print the number of elements in the map?
Forcing its spine is all you should need.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Bryan O'Sullivan
On Tue, Aug 12, 2008 at 5:13 PM, Tim Newsham <[EMAIL PROTECTED]> wrote:

> The data type I'm storing is a Map (of maps):
>
>   type DailyDb = M.Map Date Daily
>   type InstrsDb = M.Map String DailyDb
>
> What's going on here?

The default marshalling scheme that Binary uses for lists and maps
(which are flattened to lists before writing out) is not streamable.
Instead of writing out data in chunks, it computes the length of the
list and writes that, followed by the elements. Presumably on the read
side, a huge thunk is being built up before any actual Map creation
starts.

> I also noticed another issue while testing.  If my program loads
> the data at startup by calling loadState then all later calls to
> saveState give an error:
>
>  Log: savedState.bin: openFile: resource busy (file is locked)
>
> this does not occur if the program wasnt loaded.

Your loading of state isn't being forced to complete, so the file
handle is still open when you try to save to the same file. The H98
standard requires that file handles be locked for exclusive access
during writes.

To force the read to finish, use rnf. You can find a description of
how to use it, and the typeclasses involved, here:
http://book.realworldhaskell.org/beta/concurrent.html
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Don Stewart
newsham:
> >so that fromAscList's the result of parsing the map as a list, via,
> >
> >   instance Binary a => Binary [a] where
> >   put l  = put (length l) >> mapM_ put l
> >   get= do n <- get :: Get Int
> >   replicateM n get
> >
> >so that's a length-prefixed list, strictly. Which is possibly where the
> >stack's being consumed. Does just bumping the stack size a bit help?
> 
> ugh.. length prefix..
> I could bump the stack size to fix my immediate situation, but my goal
> is to have a server with a huge in-memory data set, and my test data
> so far is quite small.
> 
> >Alternatively, you could consider serialising the Map in some other
> >format (i.e. newtype the list, and serialise that say, in a lazy/chunked
> >encoding).
> 
> hackery :(
> but that sounds like the obvious fix.

Not hackery, just a different encoding. The default Binary encodings
don't work cover all use cases and all scales. To hit other sweet spots,
use your own instances.
  
> >>  Log: savedState.bin: openFile: resource busy (file is locked)
> >>
> >>this does not occur if the program wasnt loaded.  My best guess here
> >>is that B.readFile isnt completing and closing the file for some
> >>reason.  Is there a good way to force this?
> >
> >Lazy IO. So force the result to be evaluated, and then close the handle,
> >or use strict bytestring reading.
> 
> There is no visible handle.  It's all hidden in readFile.
> I will try forcing the data.

So you can decode using openFile, hGet and hClose on strict bytetrings,
or force the data.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Tim Newsham

  Log: savedState.bin: openFile: resource busy (file is locked)

this does not occur if the program wasnt loaded.  My best guess here
is that B.readFile isnt completing and closing the file for some
reason.  Is there a good way to force this?


Lazy IO. So force the result to be evaluated, and then close the handle,
or use strict bytestring reading.


There is no visible handle.  It's all hidden in readFile.
I will try forcing the data.


I tried to force the data with:

loadState db = do
d <- decode <$> B.readFile stateFile
let force = sum $ M.elems $ M.size `fmap` d
force `seq` atomically $ writeTVar db d

and I get the same error when trying to writeFile after doing
a loadState.


-- Don


Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Tim Newsham

so that fromAscList's the result of parsing the map as a list, via,

   instance Binary a => Binary [a] where
   put l  = put (length l) >> mapM_ put l
   get= do n <- get :: Get Int
   replicateM n get

so that's a length-prefixed list, strictly. Which is possibly where the
stack's being consumed. Does just bumping the stack size a bit help?


ugh.. length prefix..
I could bump the stack size to fix my immediate situation, but my goal
is to have a server with a huge in-memory data set, and my test data
so far is quite small.


Alternatively, you could consider serialising the Map in some other
format (i.e. newtype the list, and serialise that say, in a lazy/chunked
encoding).


hackery :(
but that sounds like the obvious fix.


  Log: savedState.bin: openFile: resource busy (file is locked)

this does not occur if the program wasnt loaded.  My best guess here
is that B.readFile isnt completing and closing the file for some
reason.  Is there a good way to force this?


Lazy IO. So force the result to be evaluated, and then close the handle,
or use strict bytestring reading.


There is no visible handle.  It's all hidden in readFile.
I will try forcing the data.


-- Don


Tim Newsham
http://www.thenewsh.com/~newsham/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] two problems with Data.Binary and Data.ByteString

2008-08-12 Thread Don Stewart
newsham:
> I have a program that read in and populated a large data structure and
> then saved it out with Data.Binary and Data.ByteString.Lazy.Char8:
> 
>saveState db = B.writeFile stateFile =<<
>encode <$> atomically (readTVar db)
> 
> when I go to read this in later I get a stack overflow:
> 
> loadState db = do
> d <- decode <$> B.readFile stateFile
> atomically $ writeTVar db d
> 
>   Stack space overflow: current size 8388608 bytes.
>   Use `+RTS -Ksize' to increase it.
> 
> or from ghci:
> 
> d <- liftM decode
>   (Data.ByteString.Lazy.Char8.readFile
>  "savedState.bin") :: IO InstrsDb
> 
> fromList *** Exception: stack overflow
> 
> The data type I'm storing is a Map (of maps):
> 
>type DailyDb = M.Map Date Daily
>type InstrsDb = M.Map String DailyDb
> 
> What's going on here?  Why is the system capable of building and saving
> the data but not in reading and umarhsalling it?  What is the proper way
> to track down where the exception is happening?  Any debugging tips?

So a big Map is serialised as a huge list.

instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where
put = put . Map.toAscList
get = liftM Map.fromDistinctAscList get

so that fromAscList's the result of parsing the map as a list, via,

instance Binary a => Binary [a] where
put l  = put (length l) >> mapM_ put l
get= do n <- get :: Get Int
replicateM n get

so that's a length-prefixed list, strictly. Which is possibly where the
stack's being consumed. Does just bumping the stack size a bit help?

Alternatively, you could consider serialising the Map in some other
format (i.e. newtype the list, and serialise that say, in a lazy/chunked
encoding).
  
> I also noticed another issue while testing.  If my program loads
> the data at startup by calling loadState then all later calls to
> saveState give an error:
> 
>   Log: savedState.bin: openFile: resource busy (file is locked)
> 
> this does not occur if the program wasnt loaded.  My best guess here
> is that B.readFile isnt completing and closing the file for some
> reason.  Is there a good way to force this?

Lazy IO. So force the result to be evaluated, and then close the handle,
or use strict bytestring reading.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe