Re: [Haskell-cafe] bytestring vs. uvector
On Mon, 2009-03-09 at 18:29 -0700, Alexander Dunlap wrote: > Thanks for all of the responses! > > So let me see if my summary is accurate here: > > - ByteString is for just that: strings of bytes, generally read off of > a disk. The Char8 version just interprets the Word8s as Chars but > doesn't do anything special with that. Right. So it's only suitable for binary or ASCII (or mixed) formats. > - Data.Text/text library is a higher-level library that deals with > "text," abstracting over Unicode details and treating each element as > a potentially-multibye "character." If you're writing about this on the wiki for people, it's best not to confuse the issue by talking about multibyte anything. We do not describe [Char] as a multibyte encoding of Unicode. We say it is a Unicode string. The abstraction is at the level of Unicode code points. The String type *is* a sequence of Unicode code points. This is exactly the same for Data.Text. It is a sequence of Unicode code points. It is not an encoding. It is not UTF-anything. It does not abstract over Unicode. The Text type is just like the String type but with different strictness and performance characteristics. Both are just sequences of Unicode code points. There is a reasonably close correspondence between Unicode code points and what people normally think of as characters. > - utf8-string is a wrapper over ByteString that interprets the bytes > in the bytestring as potentially-multibye unicode "characters." This on the other hand is an encoding. ByteString is a sequence of bytes and when we interpret that as UTF-8 then we are looking at an encoding of a sequence of Unicode code points. Clear as mud? :-) Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
On Mon, 9 Mar 2009, Alexander Dunlap wrote: - uvector, storablevector and vector are all designed for dealing with arrays. They *can* be used for characters/word8s but are not specialized for that purpose, do not deal with Unicode at all, and are probably worse at it. They are better for dealing with things that you would generally use arrays for. Since the Storable instance of Char stores 32 bit values, I expect they store full Unicode codes and thus StorableVector can be used for Unicode strings without loss. They will require more memory than UTF8 strings, but whether they are faster or slower will depend on how expensive UTF8 decoding/encoding is. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
manlio_perillo: > Bryan O'Sullivan ha scritto: >> [...] >> text is not mature, and is based on the same modern fusion framework as >> uvector and vector. It uses unpinned arrays, but provides functions for >> dealing with foreign code. > > What is the reason why you have decided to use unpinned arrays > (ByteArray#) instead of pinned arrays (Foreign.Ptr)? They prevent heap fragmentation (and in general are faster). -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Bryan O'Sullivan ha scritto: [...] text is not mature, and is based on the same modern fusion framework as uvector and vector. It uses unpinned arrays, but provides functions for dealing with foreign code. What is the reason why you have decided to use unpinned arrays (ByteArray#) instead of pinned arrays (Foreign.Ptr)? > [...] Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Don Stewart ha scritto: [...] You also have to add instance for UIO: instance (RealFloat a, UIO a) => UIO (Complex a) where hPutU h (UAComplex arr) = hPutU h arr hGetU h = do arr <- hGetU h return (UAComplex arr) With Storable, this should not be required; you just have to write an instance for the Storable class. Though you get no IO operations with Storable... UIO is entirely separate. Yes, but using Storable and Foreign.Ptr, IO is rather simple. From storablevector package: -- | Outputs a 'Vector' to the specified 'Handle'. hPut :: (Storable a) => Handle -> Vector a -> IO () hPut h v = if null v then return () else let (fptr, s, l) = toForeignPtr v in withForeignPtr fptr $ \ ptr -> let ptrS = advancePtr ptr s ptrE = advancePtr ptrS l -- use advancePtr and minusPtr in order to respect -- alignment in hPutBuf h ptrS (minusPtr ptrE ptrS) -- | Read a 'Vector' directly from the specified 'Handle'. This -- is far more efficient than reading the characters into a list -- and then using 'pack'. -- hGet :: (Storable a) => Handle -> Int -> IO (Vector a) hGet _ 0 = return empty hGet h i = createAndTrim i $ \p -> let elemType :: Ptr a -> a elemType _ = undefined sizeOfElem = sizeOf (elemType p) in fmap (flip div sizeOfElem) $ hGetBuf h p (i * sizeOfElem) Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
manlio_perillo: > Don Stewart ha scritto: >> bulat.ziganshin: >>> Hello Don, >>> >>> Wednesday, March 11, 2009, 12:12:07 AM, you wrote: >>> Right, so my point stands: there's no difference now. If you can write a Storable instance, you can write a UA et al instance. >>> yes, if there is some class provided for this and not just hard-coded >>> 4 or so base types >> >> That's right. For example (supporting even pairs): >> >> instance (RealFloat a, UA a) => UA (Complex a) where >> >> newtype UArr (Complex a) = UAComplex (UArr (a :*: a)) >> newtype MUArr (Complex a) s = MUAComplex (MUArr (a :*: a) s) >> > > You also have to add instance for UIO: > > instance (RealFloat a, UIO a) => UIO (Complex a) where > hPutU h (UAComplex arr) = hPutU h arr > hGetU h = do arr <- hGetU h > return (UAComplex arr) > > > With Storable, this should not be required; you just have to write an > instance for the Storable class. > Though you get no IO operations with Storable... UIO is entirely separate. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Manlio Perillo ha scritto: [...] uvector package also suppors Complex and Rational, however the support for these type is "hard written", using a UAProd class, and requires some boiler plate code (IMHO). Correction: UAProd is not a class, sorry. It is the UA constructor overloaded for a:*:b, and Complex and Rational just reuse this specialization. > [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Don Stewart ha scritto: bulat.ziganshin: Hello Don, Wednesday, March 11, 2009, 12:12:07 AM, you wrote: Right, so my point stands: there's no difference now. If you can write a Storable instance, you can write a UA et al instance. yes, if there is some class provided for this and not just hard-coded 4 or so base types That's right. For example (supporting even pairs): instance (RealFloat a, UA a) => UA (Complex a) where newtype UArr (Complex a) = UAComplex (UArr (a :*: a)) newtype MUArr (Complex a) s = MUAComplex (MUArr (a :*: a) s) You also have to add instance for UIO: instance (RealFloat a, UIO a) => UIO (Complex a) where hPutU h (UAComplex arr) = hPutU h arr hGetU h = do arr <- hGetU h return (UAComplex arr) With Storable, this should not be required; you just have to write an instance for the Storable class. Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Don Stewart ha scritto: [...] I think uvector only works with certain types that can be unboxed, while storablevector works with all types that instantiate Foreign.Storable.Storable. I don't know about vector. From the description of vector, I have the One of the nice feature of uvector is the support for UArr (a :*: b). An UArr (a :*: b) can be easily (with fstU and sndU) transformed in UArr a and UArr b. uvector package also suppors Complex and Rational, however the support for these type is "hard written", using a UAProd class, and requires some boiler plate code (IMHO). I find StorableVector implementation much more simple; I would like to see it in the Haskell Platform. As for Data.Parallel, uvector and vector, it seems there is some code duplication. Both Data.Parallel and uvector, make us of a strict pair type. Such a type is also implemented in the strict package [1]. The authors are the same, so I don't understand the reason of code replication. There is also replication in the definition of the Stream data type. [1] there seems to be an error in the documentation: http://hackage.haskell.org/packages/archive/strict/0.3.2/doc/html/Data-Strict-Tuple.html In the description, there is: "Same as regular Haskell pairs, but (x :*: _|_) = (_|_ :*: y) = _|_" but in the synopsis, the data constructor is :!:, not :*:. Regards Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Hi, thanks for the hint. I'll see what I can do with it. Xiao-Yong Bulat Ziganshin writes: > Hello Xiao-Yong, > > Wednesday, March 11, 2009, 12:28:45 AM, you wrote: > >> It goes beyond my current knowledge, now. How do you define >> a custom data type as an instance of UA or Storable? > > just look at existing instances. basically, for complex data type, you > just use instances for its basic types, plus you need to calculate > offset of second and following fields (using sizeOf in Storable class) -- c/*__o/* <\ * (__ */\ < ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
bulat.ziganshin: > Hello Don, > > Wednesday, March 11, 2009, 12:48:35 AM, you wrote: > > >> unfortunately, Array library unboxed arrays still aren't based on any > >> Unboxable *class* > > > Hmm. Aren't all the array library types based on MArray and IArray? > > > So I can define my own say, new STUArray element type by writing an > > instance of > > MArray for it. Like so: > > yes, you can, just this definition duplicates too much code Ah yes. Then in that case that is indeed true. Array / UArray aren't as extensible as the newer systems modelled on Storable (like UA in vector/uvector). ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
bulat.ziganshin: > Hello Don, > > Wednesday, March 11, 2009, 12:12:07 AM, you wrote: > > > Right, so my point stands: there's no difference now. If you can write a > > Storable instance, you can write a UA et al instance. > > yes, if there is some class provided for this and not just hard-coded > 4 or so base types That's right. For example (supporting even pairs): instance (RealFloat a, UA a) => UA (Complex a) where newtype UArr (Complex a) = UAComplex (UArr (a :*: a)) newtype MUArr (Complex a) s = MUAComplex (MUArr (a :*: a) s) > > And GHC 6.6 was released what, 11 October 2006? So this has been the > > case for a long time. > > unfortunately, Array library unboxed arrays still aren't based on any > Unboxable *class* Hmm. Aren't all the array library types based on MArray and IArray? So I can define my own say, new STUArray element type by writing an instance of MArray for it. Like so: {-# LANGUAGE MultiParamTypeClasses #-} {-# LANGUAGE MagicHash #-} {-# LANGUAGE UnboxedTuples #-} {-# LANGUAGE CPP #-} -- get at low level representation stuff import Data.Array.Base import GHC.IOBase import GHC.ST ( ST(..), runST ) import GHC.Prim import GHC.Base import GHC.Word import GHC.Ptr import GHC.Float import GHC.Stable import GHC.Int import GHC.Word import Data.Array.Unboxed -- helpers import Data.Bits import Text.Printf import System.Environment import Control.Monad -- portable to 32 bit or 64 bit #include -- define a new data type we wish to store in unboxed arrays data Boolean = IsTrue | IsFalse deriving (Eq, Ord, Enum, Show, Bounded) -- write a program using an unboxed array of these things main = do n <- getArgs >>= readIO . head :: IO Int mapM_ (\i -> sieve (1 `shiftL` (n-i))) [0, 1, 2] -- Nsieve with bit packing of a custom MyBool type. sieve n = do let r = runST (do a <- newArray (2,n) IsTrue :: ST s (STUArray s Int Boolean) go a n 2 0) printf "Primes up to %8d %8d\n" (n::Int) (r::Int) :: IO () go !a !}m !n !c | n == m= return c | otherwise = do e <- unsafeRead a n if e == IsTrue then let loop j | j < m = do x <- unsafeRead a j when (x == IsTrue) $ unsafeWrite a j IsFalse loop (j+n) | otherwise = go a m (n+1) (c+1) in loop (n `shiftL` 1) else go a m (n+1) c -- -- Create a new unboxed representation for MyBool -- We choose to use bit packing, storing them in a W# -- instance MArray (STUArray s) Boolean (ST s) where getBounds (STUArray l u _ _) = return (l,u) getNumElements (STUArray _ _ n _) = return n newArray (l,u) initialValue = ST $ \s1# -> case safeRangeSize (l,u)of { n@(I# n#) -> case newByteArray# (bOOL_SCALE n#) s1# of { (# s2#, marr# #) -> case bOOL_WORD_SCALE n# of { n'# -> let loop i# s3# | i# ==# n'# = s3# | otherwise = case writeWordArray# marr# i# e# s3# of { s4# -> loop (i# +# 1#) s4# } in case loop 0# s2#of { s3# -> (# s3#, STUArray l u n marr# #) where W# e# = if initialValue == IsTrue then maxBound else 0 unsafeNewArray_ (l,u) = unsafeNewArraySTUArray_ (l,u) bOOL_SCALE newArray_ arrBounds = newArray arrBounds IsFalse unsafeRead (STUArray _ _ _ marr#) (I# i#) = ST $ \s1# -> case readWordArray# marr# (bOOL_INDEX i#) s1# of { (# s2#, e# #) -> (# s2#, case (e# `and#` bOOL_BIT i#) `neWord#` int2Word# 0# of True -> IsTrue ; _ -> IsFalse #) } unsafeWrite (STUArray _ _ _ marr#) (I# i#) e = ST $ \s1# -> case bOOL_INDEX i# of { j# -> case readWordArray# marr# j# s1# of { (# s2#, old# #) -> case if e == IsTrue then old# `or#` bOOL_BIT i# else old# `and#` bOOL_NOT_BIT i# of { e# -> case writeWordArray# marr# j# e# s2# of { s3# -> (# s3#, () #) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Don Stewart writes: >> instance UA UserDefinedDataType >> >> I'm not sure how to do that. Can you give me some >> clarification? > > Yes, you can do that. This is the case for most of the new array > libraries. It goes beyond my current knowledge, now. How do you define a custom data type as an instance of UA or Storable? Xiao-Yong -- c/*__o/* <\ * (__ */\ < ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
xj2106: > Don Stewart writes: > > > And what is Storable limited to? > > > > Ultimately they're all limited to the primops for reading and writing, > > and to what types we can encode in those. So: > > > > primop ReadOffAddrOp_Char "readCharOffAddr#" GenPrimOp > > ... > > {- > > instance Storable Double > > instance Storable Bool > > instance Storable Char > > instance Storable Int > > instance Storable Float > > ... > > -} > > > > {- > > > > instance UA () > > instance (UA a, UA b) => UA (a :*: b) > > instance UA Bool > > instance UA Char > > instance UA Int > > instance UA Float > > instance UA Double > > ... > > -} > > > > So what's a type that's Storable, but not writable in UA (or UArray or ..) > > So it's me who understand it wrong. If I want some high > performance array with elements of custom data type, I'm > stuck with Array, anyway? > > Is it possible to make > > instance UA UserDefinedDataType > > I'm not sure how to do that. Can you give me some > clarification? Yes, you can do that. This is the case for most of the new array libraries. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
bulat.ziganshin: > Hello Don, > > Tuesday, March 10, 2009, 11:01:31 PM, you wrote: > > >> if uavector use ghc's built-in unboxed array operations (as > >> Data.Array.Unboxed does) then it's necessarily bounded to types > >> supported by those operations > > > And what is Storable limited to? > > > Ultimately they're all limited to the primops for reading and writing, > > the full story: > > ghc up to 6.6 has slow access to ForeignArrays, as you may recall > > therefore, those primitives was added. ByteArra# plus those primitives > was the only way to have unboxed arrays with fast access > > starting with 6.6, ForeignArray access is no-op, so we can just use > obvious Ptr operations (via Storable class) to get unboxed arrays fast > access. so, no more need for those special ByteArray# access operations > > but Array library still old, so any effort based on its spources, got > the same restrictions > > also, ByteArray# may be unpinned, but afaik, this isn't really > important - it can be coerced to Ptr for the period of one operation Right, so my point stands: there's no difference now. If you can write a Storable instance, you can write a UA et al instance. And GHC 6.6 was released what, 11 October 2006? So this has been the case for a long time. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Don Stewart writes: > And what is Storable limited to? > > Ultimately they're all limited to the primops for reading and writing, > and to what types we can encode in those. So: > > primop ReadOffAddrOp_Char "readCharOffAddr#" GenPrimOp > ... > {- > instance Storable Double > instance Storable Bool > instance Storable Char > instance Storable Int > instance Storable Float > ... > -} > > {- > > instance UA () > instance (UA a, UA b) => UA (a :*: b) > instance UA Bool > instance UA Char > instance UA Int > instance UA Float > instance UA Double > ... > -} > > So what's a type that's Storable, but not writable in UA (or UArray or ..) So it's me who understand it wrong. If I want some high performance array with elements of custom data type, I'm stuck with Array, anyway? Is it possible to make instance UA UserDefinedDataType I'm not sure how to do that. Can you give me some clarification? Thanks, Xiao-Yong -- c/*__o/* <\ * (__ */\ < ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
bulat.ziganshin: > Hello Don, > > Tuesday, March 10, 2009, 10:40:30 PM, you wrote: > > >> I think uvector only works with certain types that can be > >> unboxed, while storablevector works with all types that > >> instantiate Foreign.Storable.Storable. I don't know about > >> vector. From the description of vector, I have the > > > That's interesting. I'd expect Storable and UA to have the same set of > > inhabitants. Is there any difference? > > if uavector use ghc's built-in unboxed array operations (as > Data.Array.Unboxed does) then it's necessarily bounded to types > supported by those operations And what is Storable limited to? Ultimately they're all limited to the primops for reading and writing, and to what types we can encode in those. So: primop ReadOffAddrOp_Char "readCharOffAddr#" GenPrimOp primop ReadOffAddrOp_WideChar "readWideCharOffAddr#" GenPrimOp primop ReadOffAddrOp_Int "readIntOffAddr#" GenPrimOp primop ReadOffAddrOp_Word "readWordOffAddr#" GenPrimOp primop ReadOffAddrOp_Addr "readAddrOffAddr#" GenPrimOp primop ReadOffAddrOp_Float "readFloatOffAddr#" GenPrimOp primop ReadOffAddrOp_Double "readDoubleOffAddr#" GenPrimOp primop ReadOffAddrOp_StablePtr "readStablePtrOffAddr#" GenPrimOp primop ReadOffAddrOp_Int8 "readInt8OffAddr#" GenPrimOp primop ReadOffAddrOp_Int16 "readInt16OffAddr#" GenPrimOp primop ReadOffAddrOp_Int32 "readInt32OffAddr#" GenPrimOp primop ReadOffAddrOp_Int64 "readInt64OffAddr#" GenPrimOp primop ReadOffAddrOp_Word8 "readWord8OffAddr#" GenPrimOp primop ReadOffAddrOp_Word16 "readWord16OffAddr#" GenPrimOp primop ReadOffAddrOp_Word32 "readWord32OffAddr#" GenPrimOp primop ReadOffAddrOp_Word64 "readWord64OffAddr#" GenPrimOp {- instance Storable Double instance Storable Bool instance Storable Char instance Storable Int instance Storable Float ... -} {- instance UA () instance (UA a, UA b) => UA (a :*: b) instance UA Bool instance UA Char instance UA Int instance UA Float instance UA Double ... -} So what's a type that's Storable, but not writable in UA (or UArray or ..) -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
xj2106: > Alexander Dunlap writes: > > > - uvector, storablevector and vector are all designed for dealing with > > arrays. They *can* be used for characters/word8s but are not > > specialized for that purpose, do not deal with Unicode at all, and are > > probably worse at it. They are better for dealing with things that you > > would generally use arrays for. > > I think uvector only works with certain types that can be > unboxed, while storablevector works with all types that > instantiate Foreign.Storable.Storable. I don't know about > vector. From the description of vector, I have the That's interesting. I'd expect Storable and UA to have the same set of inhabitants. Is there any difference? > impression that it is quite unstable. How is it compared to > uvector and storablevector? I need one of those to work > with my code to possibly improve the efficiency. But I > can't use uvector, because I can't use unboxed types. Hmm? If you can write a Storable instance, you can write a UA instance. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Alexander Dunlap writes: > - uvector, storablevector and vector are all designed for dealing with > arrays. They *can* be used for characters/word8s but are not > specialized for that purpose, do not deal with Unicode at all, and are > probably worse at it. They are better for dealing with things that you > would generally use arrays for. I think uvector only works with certain types that can be unboxed, while storablevector works with all types that instantiate Foreign.Storable.Storable. I don't know about vector. From the description of vector, I have the impression that it is quite unstable. How is it compared to uvector and storablevector? I need one of those to work with my code to possibly improve the efficiency. But I can't use uvector, because I can't use unboxed types. Should I use storablevector? Or vector? Thanks, Xiao-Yong -- c/*__o/* <\ * (__ */\ < ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
On Mon, Mar 9, 2009 at 3:12 AM, Henning Thielemann wrote: > > On Mon, 9 Mar 2009, Claus Reinke wrote: > >> Given the close relationship between uvector and vector, it would >> be very helpful if both package descriptions on hackage could point to a >> common haskell wiki page, starting out with the text >> and link above, plus a link to the stream fusion paper (I hadn't been >> aware that vector incorporates the recycling work, and had often wondered >> about the precise relationship between those >> two packages). Apart from saving others from similar confusion, >> that would also provide a place to record experience with those two >> alternatives. > > I have at least started a page which mentions the existing alternatives: > http://www.haskell.org/haskellwiki/Storable_Vector > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > Thanks for all of the responses! So let me see if my summary is accurate here: - ByteString is for just that: strings of bytes, generally read off of a disk. The Char8 version just interprets the Word8s as Chars but doesn't do anything special with that. - Data.Text/text library is a higher-level library that deals with "text," abstracting over Unicode details and treating each element as a potentially-multibye "character." - utf8-string is a wrapper over ByteString that interprets the bytes in the bytestring as potentially-multibye unicode "characters." - uvector, storablevector and vector are all designed for dealing with arrays. They *can* be used for characters/word8s but are not specialized for that purpose, do not deal with Unicode at all, and are probably worse at it. They are better for dealing with things that you would generally use arrays for. If that seems accurate, I'll put it on the wiki. Alex ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
On Mon, 9 Mar 2009, Claus Reinke wrote: Given the close relationship between uvector and vector, it would be very helpful if both package descriptions on hackage could point to a common haskell wiki page, starting out with the text and link above, plus a link to the stream fusion paper (I hadn't been aware that vector incorporates the recycling work, and had often wondered about the precise relationship between those two packages). Apart from saving others from similar confusion, that would also provide a place to record experience with those two alternatives. I have at least started a page which mentions the existing alternatives: http://www.haskell.org/haskellwiki/Storable_Vector ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
On Sat, 7 Mar 2009, Bryan O'Sullivan wrote: On Sat, Mar 7, 2009 at 10:23 PM, Alexander Dunlap wrote: Hi all, For a while now, we have had Data.ByteString[.Lazy][.Char8] for our fast strings. Now we also have Data.Text, which does the same for Unicode. These seem to be the standard for dealing with lists of bytes and characters. Now we also have the storablevector, uvector, and vector packages. These seem to be also useful for unpacked data, *including* Char and Word8 values. What is the difference between bytestring and these new "fast array" libraries? Are the latter just generalizations of the former? storablevector is not mature (I'm not even sure if it's actually used) and is a derivative of an old version of the bytestring library, and so has similar characteristics for interacting with foreign code. It contains some old fusion code that is sketchy in nature and somewhat likely to be broken. I'm not sure I would recommend using this library. As maintainer of storablevector I can tell that I use it for realtime audio signal processing. Indeed, I expected more of the fusion mechanism than it can do. It is hard to get correct fusion on lazy storablevectors at all, because of non-matching chunk sizes. Thus I didn't follow that path anymore. For audio signal processing I use a list type, like that of the Streams approach. When you convert those streams into storable vectors you get efficient inner loops without any fusion. It would certainly be worth to fuse with those Stream lists, but I haven't tried that so far. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
On 09/03/2009, at 11:47, Claus Reinke wrote: Btw, have any of the Haskell array optimization researchers considered fixpoints yet? This, for instance, is a very nice paper: http://www.pllab.riec.tohoku.ac.jp/~ohori/research/OhoriSasanoPOPL07.pdf However, in the context of high-performance array programming explicit recursion is bad because it is very hard if not impossible to parallelise automatically except in fairly trivial cases. And if your array program is not parallelisable then you don't really care about performance all that much :-) Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
uvector is, if my memory serves me correctly, a fork of the vector library. It uses modern stream fusion, but is under active development and is a little scary. I'm a little unclear on the exact difference between uvector and vector. Both use arrays that are not pinned, so they can't be readily used with foreign code. If you want to use either library, understand that you're embarking on a bracing adventure. vector and uvector are roughly based on the same technology; uvector is - as far as I remember - a fork of some of the old DPH code which uses stream fusion which Don cleaned up and worked on (and it's proven pretty useful, and people are still hacking on it.) vector however, has the notion of 'recycling arrays' when it does array operations. The technique is in fact quite similar to stream fusion. Roman L. built this from scratch I think, so it's quite a bit more unused and less stable than even uvector is maybe, but I suppose you could say it's kind of a superset of uvector. Hopefully though it should mature a little, and the plans are to have the technology from both of these folded into the Data Parallel Haskell project so we get fast array operations+automatic parallelisation. For info, see Roman's paper, 'Recycle your arrays!' http://www.cse.unsw.edu.au/~rl/publications/recycling.html Given the close relationship between uvector and vector, it would be very helpful if both package descriptions on hackage could point to a common haskell wiki page, starting out with the text and link above, plus a link to the stream fusion paper (I hadn't been aware that vector incorporates the recycling work, and had often wondered about the precise relationship between those two packages). Apart from saving others from similar confusion, that would also provide a place to record experience with those two alternatives. Btw, have any of the Haskell array optimization researchers considered fixpoints yet? Both fusion and recycling are based on rewrite rules of the kind "in . out --> id". Now, given a loop like this: loop a = if c a then loop (out (action (in a))) else a loop a these rules don't apply. Unrolling the loop a fixed number of times would enable some rule applications, but still some would remain in the loop body. But with a little rewriting loop a = if c a then loop (out (action (in a))) else out (id (in a)) loop a loop a = if c a then loop (out (action (in a))) else out (id (in a)) (if c a then loop (out (action (in a))) else out (id (in a))) we can now push the out into the next iteration of the loop or, if there is no next iteration, into the loop epilogue loop a = if c (out a) then loop (action (in (out a))) else id (in (out a)) out (if c a then loop (action (in a)) else a) making the rewrite rule applicable loop a = if c (out a) then loop (action a) else id a out (if c a then loop (action (in a)) else a) leading (modulo bugs, omissions, and oversights;-) to a fused/ recycled loop body, with potentially substantial benefit. Claus ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
bos: > On Sat, Mar 7, 2009 at 10:23 PM, Alexander Dunlap > wrote: > > Hi all, > > For a while now, we have had Data.ByteString[.Lazy][.Char8] for our > fast strings. Now we also have Data.Text, which does the same for > Unicode. These seem to be the standard for dealing with lists of bytes > and characters. > > Now we also have the storablevector, uvector, and vector packages. > These seem to be also useful for unpacked data, *including* Char and > Word8 values. > > What is the difference between bytestring and these new "fast array" > libraries? Are the latter just generalizations of the former? > > > There are quite a few overlaps and differences among them. > > bytestring is mature and useful for low-level byte buffer manipulations, and > also for efficient I/O. This is in part because it uses pinned pointers that > can interoperate easily with foreign code. It used to have an early fusion > rewriting framework, but that was abandoned. So it will not fuse multiple > ByteString traversals into single loops. This library is widely used, and also > somewhat abused for text I/O. > > storablevector is not mature (I'm not even sure if it's actually used) and is > a > derivative of an old version of the bytestring library, and so has similar > characteristics for interacting with foreign code. It contains some old fusion > code that is sketchy in nature and somewhat likely to be broken. I'm not sure > I > would recommend using this library. > > uvector is, if my memory serves me correctly, a fork of the vector library. It > uses modern stream fusion, but is under active development and is a little > scary. I'm a little unclear on the exact difference between uvector and > vector. > Both use arrays that are not pinned, so they can't be readily used with > foreign > code. If you want to use either library, understand that you're embarking on a > bracing adventure. > > text is not mature, and is based on the same modern fusion framework as > uvector > and vector. It uses unpinned arrays, but provides functions for dealing with > foreign code. It uses a denser encoding than uvector for text, and provides > text-oriented functions like splitting on word and line boundaries. Although > it's intended for use with Unicode text, it does not yet provide proper > Unicode-aware functions for things like case conversion. It interacts with > bytestring to perform conversion to and from standard representations like > UTF-8, and (via the text-icu package) ICU for others (SJIS, KOI-8, etc). If > you > want to use this library, understand that you're embarking on a bracing > adventure. I endorse this message. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Excerpts from Bryan O'Sullivan's message of Sun Mar 08 00:45:03 -0600 2009: > uvector is, if my memory serves me correctly, a fork of the vector library. > It uses modern stream fusion, but is under active development and is a > little scary. I'm a little unclear on the exact difference between uvector > and vector. Both use arrays that are not pinned, so they can't be readily > used with foreign code. If you want to use either library, understand that > you're embarking on a bracing adventure. vector and uvector are roughly based on the same technology; uvector is - as far as I remember - a fork of some of the old DPH code which uses stream fusion which Don cleaned up and worked on (and it's proven pretty useful, and people are still hacking on it.) vector however, has the notion of 'recycling arrays' when it does array operations. The technique is in fact quite similar to stream fusion. Roman L. built this from scratch I think, so it's quite a bit more unused and less stable than even uvector is maybe, but I suppose you could say it's kind of a superset of uvector. Hopefully though it should mature a little, and the plans are to have the technology from both of these folded into the Data Parallel Haskell project so we get fast array operations+automatic parallelisation. For info, see Roman's paper, 'Recycle your arrays!' http://www.cse.unsw.edu.au/~rl/publications/recycling.html Austin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Excerpts from Alexander Dunlap's message of Sun Mar 08 00:23:01 -0600 2009: > For a while now, we have had Data.ByteString[.Lazy][.Char8] for our > fast strings. Now we also have Data.Text, which does the same for > Unicode. These seem to be the standard for dealing with lists of bytes > and characters. > > Now we also have the storablevector, uvector, and vector packages. > These seem to be also useful for unpacked data, *including* Char and > Word8 values. > > What is the difference between bytestring and these new "fast array" > libraries? Are the latter just generalizations of the former? > > Thanks for any insight anyone can give on this. > > Alex Data.Text provides functions for unicode over bytestrings, with several encoding/decoding methods. So, I think that bytestring+text now solves the general problem with the slow String type - we get various international encodings, and fast, efficient packed strings. (It's also worth mentioning utf8-string, which gives you utf8 over bytestrings. text gives you more encodings and is probably still quite efficient, however.) But this is pretty much a separate effort to that of packages like uvector/vector etc. etc.. To clarify, uvector and vector are likely to be merged in the future I think - vector is based on the idea of 'recycling arrays' so that array operations are still very efficient, while uvector only has the tested stream fusion technique behind it. Actually, I think the inevitable plan is to merge the technology behind both vector and uvector into the Data Parallel Haskell project. Array recylcing and stream fusion goes into creating extremely efficient sequential code, while the vectorisation pass turns that into efficient multicore code at the same time. In any case, I suppose that hypothetically if someone wanted to use a package like uvector to create an efficient string type, they could, but if they want that, why not just use bytestring? It's already optimized, battle tested and in extremely wide use. I think some library proliferation is good; in this case, the libraries mentioned here are really for some different purposes, and that's great, because they all lead to some nice, fast code with low conceptual overhead when put together (hopefully...) But I'm not even going to begin examining/comparing the different array interfaces or anything, because that's been done many times here, so you best check the archives if you want the 'in-depth' on the matter. Austin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
On Sat, Mar 7, 2009 at 10:23 PM, Alexander Dunlap < alexander.dun...@gmail.com> wrote: > Hi all, > > For a while now, we have had Data.ByteString[.Lazy][.Char8] for our > fast strings. Now we also have Data.Text, which does the same for > Unicode. These seem to be the standard for dealing with lists of bytes > and characters. > > Now we also have the storablevector, uvector, and vector packages. > These seem to be also useful for unpacked data, *including* Char and > Word8 values. > > What is the difference between bytestring and these new "fast array" > libraries? Are the latter just generalizations of the former? There are quite a few overlaps and differences among them. bytestring is mature and useful for low-level byte buffer manipulations, and also for efficient I/O. This is in part because it uses pinned pointers that can interoperate easily with foreign code. It used to have an early fusion rewriting framework, but that was abandoned. So it will not fuse multiple ByteString traversals into single loops. This library is widely used, and also somewhat abused for text I/O. storablevector is not mature (I'm not even sure if it's actually used) and is a derivative of an old version of the bytestring library, and so has similar characteristics for interacting with foreign code. It contains some old fusion code that is sketchy in nature and somewhat likely to be broken. I'm not sure I would recommend using this library. uvector is, if my memory serves me correctly, a fork of the vector library. It uses modern stream fusion, but is under active development and is a little scary. I'm a little unclear on the exact difference between uvector and vector. Both use arrays that are not pinned, so they can't be readily used with foreign code. If you want to use either library, understand that you're embarking on a bracing adventure. text is not mature, and is based on the same modern fusion framework as uvector and vector. It uses unpinned arrays, but provides functions for dealing with foreign code. It uses a denser encoding than uvector for text, and provides text-oriented functions like splitting on word and line boundaries. Although it's intended for use with Unicode text, it does not yet provide proper Unicode-aware functions for things like case conversion. It interacts with bytestring to perform conversion to and from standard representations like UTF-8, and (via the text-icu package) ICU for others (SJIS, KOI-8, etc). If you want to use this library, understand that you're embarking on a bracing adventure. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe