Send Beginners mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."
Today's Topics:
1. Re: filesystem verification utility (Stephen Tetley)
2. Re: filesystem verification utility (Anand Mitra)
3. Re: filesystem verification utility (Stephen Tetley)
----------------------------------------------------------------------
Message: 1
Date: Mon, 10 Jan 2011 23:24:29 +0000
From: Stephen Tetley <[email protected]>
Subject: Re: [Haskell-beginners] filesystem verification utility
Cc: [email protected]
Message-ID:
<[email protected]>
Content-Type: text/plain; charset=ISO-8859-1
On 10 January 2011 22:26, Stephen Tetley <[email protected]> wrote:
> And append (++) inside the genPattern function...
>
Changing to use "functional strings" which have the ShowS type is the
obvious way to avoid the cost of (++) append. There is a library on
Hackage that provides many utility functions for a wrapped ShowS type
and an IsString instance so you can still use string literals.
http://hackage.haskell.org/package/dstring
However you might be better using one of the alternative String
packages such as Text or ByteString.
------------------------------
Message: 2
Date: Tue, 11 Jan 2011 09:49:47 +0530
From: Anand Mitra <[email protected]>
Subject: Re: [Haskell-beginners] filesystem verification utility
To: [email protected], [email protected]
Message-ID:
<[email protected]>
Content-Type: text/plain; charset=ISO-8859-1
Based on feedback I inferred that the huge memory usage was mostly
because of the String handling in the patern generation. To make it
more efficient I have used Data.ByteString.Lazy.Char8
but now I get the following error when I execute
stress: tmp/asdf-1: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-3: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-5: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-2: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-7: hPutBuf: illegal operation (handle is closed)
stress: tmp/asdf-10: hPutBuf: illegal operation (handle is closed)
I have explicitly remove all calls to close despite this I get this
error. I tried both version of seek the fdSeek as well as hSeek. The
documentation on hSeek had a confusing comment that "The offset i is
given in terms of 8-bit bytes"
At this point I am unable to understand how the handle is getting
explicitly closed.
{-# OPTIONS -fglasgow-exts #-}
import Data.Int
import qualified Data.ByteString.Lazy.Char8 as L
import System.Random
import Data.List
import Monad
import System.Posix.IO
import System.Posix.Types
import Data.Time.Clock
import System.Posix.Files
import Data.Maybe
import GHC.IO.Device (SeekMode(..))
import Control.Exception
import Data.Typeable
import System
import Control.Concurrent
import System.IO
--myrandomRlist :: (Num t, Random t) => t -> IO [t]
--myrandomRlist x = liftM (randomRs (0,x)) newStdGen
myrandomRlist :: (Num t, Random t) => t ->t -> StdGen -> [t]
myrandomRlist min max seed = randomRs (min,max) seed
data IoLoc = IoLoc {offset ::FileOffset, size::Int, num::Int }
deriving (Show, Typeable)
instance Exception IoLoc
data Corrupt = Corrupt String IoLoc
deriving (Show, Typeable)
instance Exception Corrupt
data FileHdr = FileHdr {fileName::FilePath, seed::StdGen,
minIoSize::Int, maxIoSize::Int, ioCount::Int}
deriving (Show)
data FileIO = FileIO {fd::Handle, params::FileHdr, fileData::[IoLoc]}
check hdl = do
closed <- hIsClosed hdl
if closed
then do
putStrLn $ "file handle was closed" ++ (show hdl)
else do
return ()
genPattern :: FilePath -> IoLoc -> L.ByteString
genPattern f l =
L.take (fromIntegral (size l)) (L.cycle $ L.pack ("(" ++ f ++
")" ++ "[" ++ (show (offset l)) ++ ":" ++ (show (size l)) ++ (show
(num l)) ++ "]"))
doHdrRead :: Handle -> IO [Char]
doHdrRead x = do
fd <- handleToFd x
(str, count) <- fdRead fd 100
return (takeWhile (\x-> not (x == '\n')) str)
doGetHdr :: Handle -> IO FileHdr
doGetHdr fd = do
file <- doHdrRead fd
seed <- doHdrRead fd
minIoSize <- doHdrRead fd
maxIoSize <- doHdrRead fd
ioCount <- doHdrRead fd
return (FileHdr file (read seed) (read minIoSize) (read maxIoSize)
(read ioCount))
doHdrFromFile :: Handle -> IO FileIO
doHdrFromFile fd = do
-- fd <- openBinaryFile name ReadWriteMode -- (Just ownerModes)
(OpenFileFlags {append=False, exclusive=False, noctty=True,
nonBlock=False, trunc=False})
hSeek fd AbsoluteSeek 0
hdr <- doGetHdr fd
return (FileIO fd hdr [])
doHdrWrite :: Handle -> [Char] -> IO ()
doHdrWrite fd str = do
hPutStr fd (take 100 (str ++ "\n" ++ (cycle ['\0'])))
doWriteHdr :: Handle -> FileHdr -> IO ()
doWriteHdr fd hdr = do
doHdrWrite fd (show (fileName hdr))
doHdrWrite fd (show (seed hdr))
doHdrWrite fd (show (minIoSize hdr))
doHdrWrite fd (show (maxIoSize hdr))
doHdrWrite fd (show (ioCount hdr))
doWriteFile :: FilePath -> IO FileIO
doWriteFile name = do
fd <- openBinaryFile name ReadWriteMode -- (Just ownerModes)
(OpenFileFlags {append=False, exclusive=False, noctty=True,
nonBlock=False, trunc=True})
seed <- newStdGen
hdr <- return (FileHdr name seed 4096 (2*8096) 200)
doWriteHdr fd hdr
return (FileIO fd hdr [])
overLap (IoLoc off1' sz1' num1) (IoLoc off2' sz2' num2) =
((off1 > off2) && (off1 < off2 +sz2)) ||((off1+sz1 >off2)
&&(off1+sz1 < off2 +sz2))
where
off1 = read (show off1')
sz1 = read (show sz1')
off2 = read (show off2')
sz2 = read (show sz2')
genIoList1 :: [IoLoc] -> [Int] -> [Int] -> Int -> Int -> [IoLoc]
genIoList1 list offset size _ 0 =
list
genIoList1 list offset size 0 _ =
list
genIoList1 list (offset:os) (size:ss) count bound =
if isNothing $ find (overLap x) list
then genIoList1 ([x] ++ list) os ss (count - 1) (bound -1)
else genIoList1 list os ss count (bound -1)
where
x = IoLoc (read (show offset)) (read(show size)) count
-- offset = (read (show offset1))
-- size = (read(show size1))
genIoList :: FileHdr -> [IoLoc]
genIoList (FileHdr name seed min max count) =
genIoList1 [] (myrandomRlist 4000 1099511627776 seed)
(myrandomRlist min max seed) count (count*2)
doActualIo :: Handle -> IoLoc -> L.ByteString -> IO ()
doActualIo fd (IoLoc off sz num) str = do
-- hSeek fd AbsoluteSeek off
rfd <- (handleToFd fd)
fdSeek rfd AbsoluteSeek off
L.hPut fd str
return ()
doVerifyIo :: Handle -> IoLoc -> L.ByteString -> IO ()
doVerifyIo fd (IoLoc off sz num) str = do
-- hSeek fd AbsoluteSeek off
rfd <- (handleToFd fd)
fdSeek rfd AbsoluteSeek off
filedata <- L.hGet fd sz
if str == filedata
then return ()
else throwIO (Corrupt ("Data corruption in #" ++ (take 200
(L.unpack str)) ++ "#" ++ (take 200 (L.unpack filedata)))(IoLoc off
sz num))
mainWrite file = do
(FileIO hd _ _) <- doWriteFile file
hdrIO <- doHdrFromFile hd
check hd
(FileIO _ hdr _) <- return hdrIO
check hd
iolst <- return $ genIoList hdr
app <- return $ zip iolst (map (genPattern (fileName hdr)) iolst)
putStrLn (fileName hdr)
mapM (\(x,y)-> doActualIo (fd hdrIO) x y) app
mapM (\(x,y)-> doVerifyIo (fd hdrIO) x y) app
-- hClose hd
return hdr
mainVerify file = do
hd <- openBinaryFile file ReadWriteMode
hdrIO <- doHdrFromFile hd
hdr <- return (params hdrIO)
iolst <- return $ genIoList hdr
file <- return $ filter (\x->not ( x=='"')) (fileName hdr)
app <- return $ zip iolst (map (genPattern file) iolst)
putStrLn $ filter (\x->not ( x=='"')) (fileName hdr)
mapM (\(x,y)-> doVerifyIo (fd hdrIO) x y) app
-- hClose hd
return hdr
main = do
x <- getArgs
if (length x) == 3
then do
main1
else do
putStrLn "USAGE:\nfile-io write/verify <full-path file name>
<number of threads> \
\\n\n\
\Simple IO load generator with write verification. This utility is designed\n\
\to generate multi-threaded IO load which will write a pattern to the file. \n\
\When this is invoked with the same parameters with the verify option \n\
\the data written will be verified.\n\
\ "
main1 = do
[op, name, numProcs] <- getArgs
m <- newEmptyMVar
n <- return $read numProcs
case op of
"write" -> do mapM forkIO [(fillfile (name ++ (show i)) ) m|i<-[1..n]]
"verify" -> do mapM forkIO [(verifyfile (name ++ (show i)) ) m|i<-[1..n]]
x <- mapM takeMVar $ take (read numProcs) $ repeat m
putStrLn $ show $ and x
fillfile filename m = do
mainWrite filename
putMVar m True
verifyfile filename m = do
mainVerify filename
putMVar m True
On Tue, Jan 11, 2011 at 1:02 AM, Anand Mitra <[email protected]> wrote:
> Hi,
>
> I had a requirement to generate various kinds of I/O patterns on a
> filesystem and subsequently verify this. The initial version of the
> program below implements a random I/O pattern with multiple threads.
> Even when the number of I/O is as small as 200 and 10 concurrent
> theads, the amount of memory used is huge. When I run the program it
> consumes close to 1to2GB memory. Moreover the rate at which it
> generates the I/O is very low which is not good for testing a
> filesystem. I have used System.POSIX.IO but I tried System.IO and did
> not see much difference either. I would appreciate help in identifying
> ways to improve this.
>
>
------------------------------
Message: 3
Date: Tue, 11 Jan 2011 10:08:07 +0000
From: Stephen Tetley <[email protected]>
Subject: Re: [Haskell-beginners] filesystem verification utility
Cc: [email protected]
Message-ID:
<[email protected]>
Content-Type: text/plain; charset=ISO-8859-1
Hi Anand
Firstly apologies - my advice from yesterday was trivial advice,
changing to a better representation of Strings and avoiding costly
operations (++) is valuable and should improve the performance of the
program, but it might be a small overall improvement and it doesn't
get to the heart of things.
Really you need to do two things - one is consider what you are doing
and evaluate whether it is appropriate for a performance sensitive
app, the other is to profile and find the bits that are too slow.
I rarely use Control.Concurrent so I can't offer any real experience
but I'm concerned that it is adding overhead for no benefit. Looking
at the code and what the comments say it does - I don't think your
situation benefits from concurrency. A thread in your program could do
all is work in one go, its not that you need to be servicing many
clients (cf. a web server that needs to service many clients without
individual long waits so it makes sense to schedule them) or that you
are waiting on other processes making resources available. So for your
program, any changes to execution caused by scheduling / de-scheduling
threads (probably) just add to the total time.
If you have a multi-core machine you could potentially benefit from
parallelism - splitting the work amongst available cores. But in GHC
forkIO spawns "green threads" which run in the same OS thread so you
won't be getting any automatic parallelism from the code even if you
have multi-core.
However don't take my word for this - I could easily be wrong. If you
want performance you really do need to see what the profiler tells
you.
Best wishes
Stephen
------------------------------
_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners
End of Beginners Digest, Vol 31, Issue 9
****************************************