Patch: sync ByteString with FPS head

Donald Bruce Stewart Wed, 03 May 2006 04:14:36 -0700

Attached.

-- Don

New patches:


[Sync with FPS head
[EMAIL PROTECTED]
 
 This patch brings Data.ByteString into sync with the FPS head.
 The most significant of which is the new Haskell counting sort.
 
 Changes:
 
 Sun Apr 30 18:16:29 EST 2006  [EMAIL PROTECTED]
   * Fix foldr1 in Data.ByteString and Data.ByteString.Char8
 
 Mon May  1 11:51:16 EST 2006  Don Stewart <[EMAIL PROTECTED]>
   * Add group and groupBy. Suggested by conversation between sjanssen and 
petekaz on #haskell
 
 Mon May  1 16:42:04 EST 2006  [EMAIL PROTECTED]
   * Fix groupBy to match Data.List.groupBy.
 
 Wed May  3 15:01:07 EST 2006  [EMAIL PROTECTED]
   * Migrate to counting sort.
   
   Data.ByteString.sort used C's qsort(), which is O(n log n).  The new 
algorithm 
   is O(n), and is faster for strings larger than approximately thirty bytes.  
We
   also reduce our dependency on cbits!
 
] {
hunk ./Data/ByteString/Char8.hs 1
-{-# OPTIONS_GHC -cpp -fffi #-}
+{-# OPTIONS_GHC -cpp -fffi -fglasgow-exts #-}
hunk ./Data/ByteString/Char8.hs 101
+        spanChar,           -- :: Char -> ByteString -> (ByteString, 
ByteString)
hunk ./Data/ByteString/Char8.hs 112
+        group,                  -- :: ByteString -> [ByteString]
+        groupBy,                -- :: (Word8 -> Word8 -> Bool) -> ByteString 
-> [ByteString]
hunk ./Data/ByteString/Char8.hs 238
-                       ,findSubstrings,unsafeTail,copy
+                       ,findSubstrings,unsafeTail,copy,group
hunk ./Data/ByteString/Char8.hs 371
-foldr1 f ps = w2c (B.foldl1 (\x y -> c2w (f (w2c x) (w2c y))) ps)
+foldr1 f ps = w2c (B.foldr1 (\x y -> c2w (f (w2c x) (w2c y))) ps)
hunk ./Data/ByteString/Char8.hs 492
+-- | 'spanChar' breaks its ByteString argument at the first
+-- occurence of a Char other than its argument. It is more efficient
+-- than 'span (==)'
+--
+-- > span  (=='c') "abcd" == spanByte 'c' "abcd"
+--
+spanChar :: Char -> ByteString -> (ByteString, ByteString)
+spanChar = B.spanByte . c2w
+{-# INLINE spanChar #-}
+
hunk ./Data/ByteString/Char8.hs 573
+-- | The 'groupBy' function is the non-overloaded version of 'group'.
+groupBy :: (Char -> Char -> Bool) -> ByteString -> [ByteString]
+groupBy k = B.groupBy (\a b -> k (w2c a) (w2c b))
+
hunk ./Data/ByteString/Char8.hs 628
+-- 
+-- Also
+--  
+-- > count '\n' == length . lines
hunk ./Data/ByteString/Char8.hs 809
+--
hunk ./Data/ByteString.hs 1
-{-# OPTIONS_GHC -cpp -fffi #-}
+{-# OPTIONS_GHC -cpp -fffi -fglasgow-exts #-}
hunk ./Data/ByteString.hs 102
+        spanByte,               -- :: Word8 -> ByteString -> (ByteString, 
ByteString)
hunk ./Data/ByteString.hs 110
+        group,                  -- :: ByteString -> [ByteString]
+        groupBy,                -- :: (Word8 -> Word8 -> Bool) -> ByteString 
-> [ByteString]
hunk ./Data/ByteString.hs 246
+import Control.Monad            (when)
hunk ./Data/ByteString.hs 654
-    | otherwise      = f (unsafeHead ps) (foldr1 f (unsafeTail ps))
+    | otherwise      = foldr f (last ps) (init ps)
hunk ./Data/ByteString.hs 880
+-- | 'spanByte' breaks its ByteString argument at the first
+-- occurence of a byte other than its argument. It is more efficient
+-- than 'span (==)'
+--
+-- > span  (=='c') "abcd" == spanByte 'c' "abcd"
+--
+spanByte :: Word8 -> ByteString -> (ByteString, ByteString)
+spanByte c ps@(PS x s l) = inlinePerformIO $ withForeignPtr x $ \p ->
+    go (p `plusPtr` s) 0
+  where
+    STRICT2(go)
+    go p i | i >= l    = return (ps, empty)
+           | otherwise = do c' <- peekByteOff p i
+                            if c /= c'
+                                then return (take i ps, drop i ps)
+                                else go p (i+1)
+{-# INLINE spanByte #-}
+
hunk ./Data/ByteString.hs 935
-span  p ps = break (not . p) ps
+span p ps = break (not . p) ps
hunk ./Data/ByteString.hs 1062
+-- | The 'group' function takes a ByteString and returns a list of
+-- ByteStrings such that the concatenation of the result is equal to the
+-- argument.  Moreover, each sublist in the result contains only equal
+-- elements.  For example,
+--
+-- > group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
+--
+-- It is a special case of 'groupBy', which allows the programmer to
+-- supply their own equality test. It is about 40% faster than 
+-- /groupBy (==)/
+group :: ByteString -> [ByteString]
+group xs
+    | null xs   = []
+    | otherwise = ys : group zs
+    where
+        (ys, zs) = spanByte (unsafeHead xs) xs
+
+-- | The 'groupBy' function is the non-overloaded version of 'group'.
+groupBy :: (Word8 -> Word8 -> Bool) -> ByteString -> [ByteString]
+groupBy k xs
+    | null xs   = []
+    | otherwise = take n xs : groupBy k (drop n xs)
+    where
+        n = 1 + findIndexOrEnd (not . k (unsafeHead xs)) (unsafeTail xs)
+
hunk ./Data/ByteString.hs 1440
--- | /O(n log(n))/ Sort a ByteString efficiently, using qsort(3).
+-- | /O(n)/ Sort a ByteString efficiently, using counting sort.
+sort :: ByteString -> ByteString
+sort (PS input s l) = create l $ \p -> allocaArray 256 $ \arr -> do
+
+    memset (castPtr arr) 0 (256 * fromIntegral (sizeOf (undefined :: CSize)))
+    withForeignPtr input (\x -> countEach arr (x `plusPtr` s) l)
+
+    let STRICT2(go)
+        go 256 _   = return ()
+        go i   ptr = do n <- peekElemOff arr i
+                        when (n /= 0) $ memset ptr (fromIntegral i) n >> 
return ()
+                        go (i + 1) (ptr `plusPtr` (fromIntegral n))
+    go 0 p
+
+-- "countEach counts str l" counts the number of occurences of each Word8 in
+-- str, and stores the result in counts.
+countEach :: Ptr CSize -> Ptr Word8 -> Int -> IO ()
+STRICT3(countEach)
+countEach counts str l = go 0
+ where
+    STRICT1(go)
+    go i | i == l    = return ()
+         | otherwise = do k <- fromIntegral `fmap` peekElemOff str i
+                          x <- peekElemOff counts k
+                          pokeElemOff counts k (x + 1)
+                          go (i + 1)
+
+{-
hunk ./Data/ByteString.hs 1472
+-}
hunk ./Data/ByteString.hs 1478
+-- | The 'sortBy' function is the non-overloaded version of 'sort'.
+--
+-- Try some linear sorts: radix, counting
+-- Or mergesort.
+--
+-- sortBy :: (Word8 -> Word8 -> Ordering) -> ByteString -> ByteString
+-- sortBy f ps = undefined
+
hunk ./Data/ByteString.hs 1949
--- the ByteString is no reallocated if the final size is less than the
--- estimated size.
+-- the ByteString is not reallocated if the final size is less than the
+-- estimated size. Also, unlike 'generate' ByteString's created this way
+-- are managed on the Haskell heap.
hunk ./Data/ByteString.hs 2058
-foreign import ccall unsafe "static fpstring.h my_qsort" c_qsort
-    :: Ptr Word8 -> Int -> IO ()
-
hunk ./cbits/fpstring.c 43
-
-/* compare bytes ascii-wise */
-static int cmp(const void *p, const void *q) {
-    return (*(unsigned char *)p - *(unsigned char *)q);
-}
-
-/* quicksort wrapper */
-void my_qsort(unsigned char *base, size_t size) {
-    qsort(base, size, sizeof(char), cmp);
-}
hunk ./include/fpstring.h 1
-#include <stdlib.h>
hunk ./include/fpstring.h 3
-void my_qsort(unsigned char *base, size_t size);
}

Context:

[Fix string truncating in hGetLine -- it was a pasto from Simon's code
Simon Marlow <[EMAIL PROTECTED]>**20060503103504
 (from Don Stewart)
] 
[Merge in Data.ByteString head. Fixes ByteString+cbits in hugs
Don Stewart <[EMAIL PROTECTED]>**20060429040733] 
[Import Data.ByteString from fps 0.5.
Don Stewart <[EMAIL PROTECTED]>**20060428130718
 Fast, packed byte vectors, providing a better PackedString.
 
] 
[fix previous patch
Ross Paterson <[EMAIL PROTECTED]>**20060501154847] 
[fixes for non-GHC
Ross Paterson <[EMAIL PROTECTED]>**20060501144322] 
[fix imports for mingw32 && !GHC
Ross Paterson <[EMAIL PROTECTED]>**20060427163248] 
[RequireOrder: do not collect unrecognised options after a non-opt
Simon Marlow <[EMAIL PROTECTED]>**20060426121110
 The documentation for RequireOrder says "no option processing after
 first non-option", so it doesn't seem right that we should process the
 rest of the arguments to collect the unrecognised ones.  Presumably
 the client wants to know about the unrecognised options up to the
 first non-option, and will be using a different option parser for the
 rest of the command line.
 
 eg. before:
 
 Prelude System.Console.GetOpt> getOpt' RequireOrder [] ["bar","--foo"]
 ([],["bar","--foo"],["--foo"],[])
 
 after:
 
 Prelude System.Console.GetOpt> getOpt' RequireOrder [] ["bar","--foo"]
 ([],["bar","--foo"],[],[])
] 
[fix for Haddock 0.7
Ashley Yakeley <[EMAIL PROTECTED]>**20060426072521] 
[add Data.Fixed module
Ashley Yakeley <[EMAIL PROTECTED]>**20060425071853] 
[add instances
Ross Paterson <[EMAIL PROTECTED]>**20060424102146] 
[add superclasses to Applicative and Traversable
Ross Paterson <[EMAIL PROTECTED]>**20060411144734
 
 Functor is now a superclass of Applicative, and Functor and Foldable
 are now superclasses of Traversable.  The new hierarchy makes clear the
 inclusions between the classes, but means more work in defining instances.
 Default definitions are provided to help.
] 
[add Functor and Monad instances for Prelude types
Ross Paterson <[EMAIL PROTECTED]>**20060410111443] 
[GHC.Base.breakpoint
Lemmih <[EMAIL PROTECTED]>**20060407125827] 
[Track the GHC source tree reorganisation
Simon Marlow <[EMAIL PROTECTED]>**20060407041631] 
[in the show instance for Exception, print the type of dynamic exceptions
Simon Marlow <[EMAIL PROTECTED]>**20060406112444
 Unfortunately this requires some recursve module hackery to get at 
 the show instance for Typeable.
] 
[implement ForeignEnvPtr, newForeignPtrEnv, addForeignPtrEnv for GHC
Simon Marlow <[EMAIL PROTECTED]>**20060405155448] 
[add  forkOnIO :: Int -> IO () -> IO ThreadId
Simon Marlow <[EMAIL PROTECTED]>**20060327135018] 
[Rework previous: not a gcc bug after all
Simon Marlow <[EMAIL PROTECTED]>**20060323161229
 It turns out that we were relying on behaviour that is undefined in C,
 and undefined behaviour in C means "the compiler can do whatever the
 hell it likes with your entire program".  So avoid that.
] 
[work around a gcc 4.1.0 codegen bug in -O2 by forcing -O1 for GHC.Show
Simon Marlow <[EMAIL PROTECTED]>**20060323134514
 See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26824
] 
[commit mysteriously missing parts of "runIOFastExit" patch
Simon Marlow <[EMAIL PROTECTED]>**20060321101535] 
[add runIOFastExit :: IO a -> IO a
Simon Marlow <[EMAIL PROTECTED]>**20060320124333
 Similar to runIO, but calls stg_exit() directly to exit, rather than
 shutdownHaskellAndExit().  Needed for running GHCi in the test suite.
] 
[Fix a broken invariant
Simon Marlow <[EMAIL PROTECTED]>**20060316134151
 Patch from #694,  for the problem "empty is an identity for <> and $$" is
 currently broken by eg. isEmpty (empty<>empty)"
] 
[Add unsafeSTToIO :: ST s a -> IO a
Simon Marlow <[EMAIL PROTECTED]>**20060315160232
 Implementation for Hugs is missing, but should be easy.  We need this
 for the forthcoming nested data parallelism implementation.
] 
[Added 'alter'
[EMAIL PROTECTED]
 Added 'alter :: (Maybe a -> Maybe a) -> k -> Map k a -> Map k a' to IntMap and 
Map
 This addresses ticket #665
] 
[deprecate FunctorM in favour of Foldable and Traversable
Ross Paterson <[EMAIL PROTECTED]>**20060315092942
 as discussed on the libraries list.
] 
[Simplify Eq, Ord, and Show instances for UArray
Simon Marlow <[EMAIL PROTECTED]>**20060313142701
 The Eq, Ord, and Show instances of UArray were written out longhand
 with one instance per element type.  It is possible to condense these
 into a single instance for each class, at the expense of using more
 extensions (non-std context on instance declaration).
 
 Suggestion by: Frederik Eaton <[EMAIL PROTECTED]>
 
] 
[Oops typo in intSet notMember 
[EMAIL PROTECTED] 
[IntMap lookup now returns monad instead of Maybe.
[EMAIL PROTECTED] 
[Added notMember to Data.IntSet and Data.IntMap
[EMAIL PROTECTED] 
[add Data.Set.notMember and Data.Map.notMember
John Meacham <[EMAIL PROTECTED]>**20060309191806] 
[addToClockTime: handle picoseconds properly
Simon Marlow <[EMAIL PROTECTED]>**20060310114532
 fixes #588
] 
[make head/build rule apply to all types, not just Bool.
John Meacham <[EMAIL PROTECTED]>**20060303045753] 
[Avoid overflow when normalising clock times
Ian Lynagh <[EMAIL PROTECTED]>**20060210144638] 
[Years have 365 days, not 30*365
Ian Lynagh <[EMAIL PROTECTED]>**20060210142853] 
[declare blkcmp() static
Simon Marlow <[EMAIL PROTECTED]>**20060223134317] 
[typo in comment in Foldable class
Ross Paterson <[EMAIL PROTECTED]>**20060209004901] 
[simplify fmap
Ross Paterson <[EMAIL PROTECTED]>**20060206095048] 
[update ref in comment
Ross Paterson <[EMAIL PROTECTED]>**20060206095139] 
[Give -foverlapping-instances to Data.Typeable
[EMAIL PROTECTED]
 
 For some time, GHC has made -fallow-overlapping-instances "sticky": 
 any instance in a module compiled with -fallow-overlapping-instances
 can overlap when imported, regardless of whether the importing module
 allows overlap.  (If there is an overlap, both instances must come from
 modules thus compiled.)
 
 Instances in Data.Typeable might well want to be overlapped, so this
 commit adds the flag to Data.Typeable (with an explanatory comment)
 
 
] 
[Add -fno-bang-patterns to modules using both bang and glasgow-exts
[EMAIL PROTECTED] 
[When splitting a bucket, keep the contents in the same order
Simon Marlow <[EMAIL PROTECTED]>**20060201130427
 To retain the property that multiple inserts shadow each other
 (see ticket #661, test hash001)
] 
[add foldr/build optimisation for take and replicate
Simon Marlow <[EMAIL PROTECTED]>**20060126164603
 This allows take to be deforested, and improves performance of
 replicate and replicateM/replicateM_.  We have a separate problem that
 means expressions involving [n..m] aren't being completely optimised
 because eftIntFB isn't being inlined but otherwise the results look
 good.  
 
 Sadly this has invalidated a number of the nofib benchmarks which were
 erroneously using take to duplicate work in a misguided attempt to
 lengthen their runtimes (ToDo).
] 
[Generate PrimopWrappers.hs with Haddock docs
Simon Marlow <[EMAIL PROTECTED]>**20060124131121
 Patch originally from Dinko Tenev <[EMAIL PROTECTED]>, modified
 to add log message by me.
] 
[[project @ 2006-01-19 14:47:15 by ross]
ross**20060119144715
 backport warning avoidance from Haddock
] 
[[project @ 2006-01-18 11:45:47 by malcolm]
malcolm**20060118114547
 Fix import of Ix for nhc98.
] 
[[project @ 2006-01-17 09:38:38 by ross]
ross**20060117093838
 add Ix instance for GeneralCategory.
] 
[TAG Initial conversion from CVS complete
John Goerzen <[EMAIL PROTECTED]>**20060112154126] 
Patch bundle hash:
1f39760666c5d088ade4fae63e54ab368114f3e4

_______________________________________________
Cvs-ghc mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Patch: sync ByteString with FPS head

Reply via email to