Re: [Haskell-cafe] Encoding of Haskell source files

2011-04-04 Thread Tako Schotanus
On Mon, Apr 4, 2011 at 17:51, Yitzchak Gale  wrote:

> malcolm.wallace wrote:
> >> BOM is not part of UTF8, because UTF8 is byte-oriented.  But
> applications
> >> should be prepared to read and discard it, because some applications
> >> erroneously generate it.
>
> For maximum portability, the standard should be require compilers
> to accept and discard an optional BOM as the first character of a
> source code file.
>
> Tako Schotanus wrote:
> > That's not what the official unicode site says in its FAQ:
> > http://unicode.org/faq/utf_bom.html#bom4 and
> http://unicode.org/faq/utf_bom.html#bom5
>
> That FAQ clearly states that BOM is part of some "protocols".
> It carefully avoids stating whether it is part of the encoding.
>
> It is certainly not erroneous to include the BOM
> if it is part of the protocol for the applications being used.
> Applications can include whatever characters they'd like, and
> they can use whatever handshake mechanism they'd like to
> agree upon an encoding. The BOM mechanism is common
> on the Windows platform. It has since appeared in other
> places as well, but it is certainly not universally adopted.
>
> Python supports a pseudo-encoding called "utf8-bom" that
> automatically generates and discards the BOM in support
> of that handshake mechanism But it isn't really an encoding,
> it's a convenience.
>
> Part of the source of all this confusion is some documentation
> that appeared in the past on Microsoft's site which was unclear
> about the fact that the BOM handshake is a protocol adopted
> by Microsoft, not a part of the encoding itself. Some people
> claim that this was intentional, part of the "extend and embrace"
> tactic Microsoft allegedly employed in those days in an effort
> to expand its monopoly.
>
> The wording of the Unicode FAQ is obviously trying to tip-toe
> diplomatically around this issue without arousing the ire of
> either pro-Microsoft or anti-Microsoft developers.
>
>
Some reliable sources for all this would be entertaining (although
irrelevant for the rest of this discussion).

Cheers,
 -Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Encoding of Haskell source files

2011-04-04 Thread Tako Schotanus
That's not what the official unicode site says in its FAQ:
http://unicode.org/faq/utf_bom.html#bom4 and
http://unicode.org/faq/utf_bom.html#bom
5

Cheers,
-Tako


On Mon, Apr 4, 2011 at 15:18, malcolm.wallace wrote:

> BOM is not part of UTF8, because UTF8 is byte-oriented.  But applications
> should be prepared to read and discard it, because some applications
> erroneously generate it.
>
>
> Regards,
> Malcolm
>
>
> On 04 Apr, 2011,at 02:09 PM, Antoine Latter  wrote:
>
> On Mon, Apr 4, 2011 at 7:30 AM, Max Bolingbroke
>  wrote:
> > On 4 April 2011 11:34, Daniel Fischer 
> wrote:
> >> If there's only a single encoding recognised, UTF-8 surely should be the
> >> one (though perhaps Windows users might disagree, iirc, Windows uses
> UCS2
> >> as standard encoding).
> >
> > Windows APIs use UTF-16, but the encoding of files (which is the
> > relevant point here) is almost uniformly UTF-8 - though of course you
> > can find legacy apps making other choices.
> >
>
> Would we need to specifically allow for a Windows-style leading BOM in
> UTF-8 documents? I can never remember if it is truly a part of UTF-8
> or not.
>
> > Cheers,
> > Max
> >
> > ___
> > Haskell-Cafe mailing list
> > Haskell-Cafe@haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Encoding-aware System.Directory functions

2011-03-30 Thread Tako Schotanus
On Wed, Mar 30, 2011 at 11:01, Alistair Bayley  wrote:

> On 30 March 2011 20:53, Max Bolingbroke wrote:
>
>> On 30 March 2011 07:52, Michael Snoyman  wrote:
>> > I could
>> > manually do something like (utf8Decode . S8.pack), but that presumes
>> > that the character encoding on the system in question is UTF8. So two
>> > questions:
>>
>> Funnily enough I have been thinking about this quite hard recently,
>> and the situation is kind of a mess and short of implementing PEP383
>> (http://www.python.org/dev/peps/pep-0383/) in GHC I can't see how to
>> make it easier on the programmer. As Jason points out the best you can
>> really do is probably:
>>
>>  1. Treat Strings that represent filenames as raw byte sequences, even
>> though they claim to be strings
>>
>>  2. When presenting such Strings to the user, re-decode them by using
>> the current locale encoding (which will typically be UTF-8). You
>> probably want to have some means of avoiding decoding errors here too
>> -- ignoring or replacing undecodable bytes -- but presently this is
>> not so straightforward. If you happen to be on a system with GNU Iconv
>> you can use it's "C//TRANSLIT//IGNORE" encoding to achieve this,
>> however.
>>
>
>
> http://www.haskell.org/pipermail/libraries/2009-August/012493.html
>
> I took from this discussion that FilePath really should be a pair of the
> actual filename ByteString, and the printable String (decoded from the
> ByteString, with encoding specified by the user's locale). The conversion
> from ByteString to String (and vice versa) is not guaranteed to be lossless,
> so you need to remember both.
>
>
I'm not sure that  I agree with that. Why does it have to be loss-less?
The problem, more likely, is the fact that FilePath is just a simple string.
Maybe we should go the way of Java where cross-platform file access is based
upon a File (or the new Path) type? That way the internal representation
could use whatever necessary to ensure a unique reference to a file or
directory while at the same time providing a way to get a human-readable
representation.
Going from strings to file/path types would need the correct encodings to
work.

Cheers,
 -Tako

PS: Just lurking here most of the time because I'm still a total Haskell
noob, you can ignore me without risk.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Encoding-aware System.Directory functions

2011-03-30 Thread Tako Schotanus
On Wed, Mar 30, 2011 at 09:26, Jason Dagit  wrote:

>
>
> On Tue, Mar 29, 2011 at 11:52 PM, Michael Snoyman wrote:
>
>> Hi all,
>>
>> I think this is a well-known issue: it seems that there is no
>> character decoding performed on the values returned from the functions
>> in System.Directory (getDirectoryContents specifically). I could
>> manually do something like (utf8Decode . S8.pack), but that presumes
>> that the character encoding on the system in question is UTF8. So two
>> questions:
>>
>> * Is there a package out there that handles all the gory details for
>> me automatically, and simply returns a properly decoded String (or
>> Text)?
>> * If not, is there a standard way to determine the character encoding
>> used by the filesystem, short of hard-coding in character encodings
>> used by the major ones?
>>
>
> I started to write a thoughtful reply, but I found that the answers here
> sum up everything I was going to say:
>
> http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux
>
> This same issue comes up from time to time for darcs and, if I recall
> correctly, the solution has been to treat unix file paths as arbitrary bytes
> whenever possible and to escape non-ascii compatible bytes when they occur.
>  Otherwise it can be hard to encode them in textual patch descriptions or
> xml (where an encoding is required and I believe utf8 is a standard
> default).
>
> I wish you luck.  It's not as easy problem, at least on unix.  I've heard
> that windows has a much easier time here as MS has provided a standard for
> it.
>

All the more reason it seems to make this available in the standard package,
so people don't have to figure out how to the conversions each time (for all
the different OSes with whcih they might not have any experience etc) .

All modern Linuxes use UTF8 by default anyway so in the beginning one could
assume UTF8 and later change the system to be able to make more intelligent
decisions (like checking environment variables for per-user settings). A way
to override the assumptions made would be necessary too I guess.

-Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] object oriented technique

2011-03-29 Thread Tako Schotanus
Sorry , the following line got lost in the copy & paste:

   {-# LANGUAGE ExistentialQuantification #-}

-Tako


On Tue, Mar 29, 2011 at 11:09, Tako Schotanus  wrote:

> Hi,
>
> just so you know that I have almost no idea what I'm doing, I'm a complete
> Haskell noob, but trying a bit I came up with this before getting stuck:
>
>class Drawable a where
>   draw :: a -> String
>
>data Rectangle = Rectangle { rx, ry, rw, rh :: Double }
>   deriving (Eq, Show)
> instance Drawable Rectangle where
>   draw (Rectangle rx ry rw rh) = "Rect"
> data Circle = Circle { cx, cy, cr :: Double }
>   deriving (Eq, Show)
> instance Drawable Circle where
>   draw (Circle cx cy cr) = "Circle"
>
>data Shape = ???
>
> Untill I read about existential types here:
> http://www.haskell.org/haskellwiki/Existential_type
>
> And was able to complete the definition:
>
>data Shape = forall a. Drawable a => Shape a
>
> Testing it with a silly example:
>
>main :: IO ()
>main =  do putStr (test shapes)
>
>test :: [Shape] -> String
>test [] = ""
>test ((Shape x):xs) = draw x ++ test xs
>
>shapes :: [Shape]
>shapes = [ Shape (Rectangle 1 1 4 4) , Shape (Circle 2 2 5) ]
>
>
> Don't know if this helps...
>
> Cheers,
> -Tako
>
>
>
> On Tue, Mar 29, 2011 at 07:49, Tad Doxsee  wrote:
>
>> I've been trying to learn Haskell for a while now, and recently
>> wanted to do something that's very common in the object oriented
>> world, subtype polymorphism with a heterogeneous collection.
>> It took me a while, but I found a solution that meets
>> my needs. It's a combination of solutions that I saw on the
>> web, but I've never seen it presented in a way that combines both
>> in a short note. (I'm sure it's out there somewhere, but it's off the
>> beaten
>> path that I've been struggling along.)  The related solutions
>> are
>>
>> 1. section 3.6 of http://homepages.cwi.nl/~ralf/OOHaskell/paper.pdf
>>
>> 2. The GADT comment at the end of section 4 of
>>http://www.haskell.org/haskellwiki/Heterogenous_collections
>>
>> I'm looking for comments on the practicality of the solution,
>> and references to better explanations of, extensions to, or simpler
>> alternatives for what I'm trying to achieve.
>>
>> Using the standard example, here's the code:
>>
>>
>> data Rectangle = Rectangle { rx, ry, rw, rh :: Double }
>>deriving (Eq, Show)
>>
>> drawRect :: Rectangle -> String
>> drawRect r = "Rect (" ++ show (rx r) ++ ", "  ++ show (ry r) ++ ") -- "
>> ++ show (rw r) ++ " x " ++ show (rh r)
>>
>>
>> data Circle = Circle {cx, cy, cr :: Double}
>>deriving (Eq, Show)
>>
>> drawCirc :: Circle -> String
>> drawCirc c = "Circ (" ++ show (cx c) ++ ", " ++ show (cy c)++ ") -- "
>> ++ show (cr c)
>>
>> r1 = Rectangle 0 0 3 2
>> r2 = Rectangle 1 1 4 5
>> c1 = Circle 0 0 5
>> c2 = Circle 2 0 7
>>
>>
>> rs = [r1, r2]
>> cs = [c1, c2]
>>
>> rDrawing = map drawRect rs
>> cDrawing = map drawCirc cs
>>
>> -- shapes = rs ++ cs
>>
>> Of course, the last line won't compile because the standard Haskell list
>> may contain only homogeneous types.  What I wanted to do is create a list
>> of
>> circles and rectangles, put them in a list, and draw them.  It was easy
>> for me to find on the web and in books how to do that if I controlled
>> all of the code. What wasn't immediately obvious to me was how to do that
>> in a library that could be extended by others.  The references noted
>> previously suggest this solution:
>>
>>
>> class ShapeC s where
>>  draw :: s -> String
>>  copyTo :: s -> Double -> Double -> s
>>
>> -- needs {-# LANGUAGE GADTs #-}
>> data ShapeD  where
>>  ShapeD :: ShapeC s => s -> ShapeD
>>
>> instance ShapeC ShapeD where
>>  draw (ShapeD s) = draw s
>>  copyTo (ShapeD s) x y = ShapeD (copyTo s x y)
>>
>> mkShape :: ShapeC s => s -> ShapeD
>> mkShape s = ShapeD s
>>
>>
>>
>> instance ShapeC Rectangle where
>>  draw = drawRect
>>  copyTo (Rectangle _ _ rw rh) x y = Rectangle x y rw rh
>>
>> instance ShapeC Circle where
>>  draw = drawCirc
>>  cop

Re: [Haskell-cafe] object oriented technique

2011-03-29 Thread Tako Schotanus
Hi,

just so you know that I have almost no idea what I'm doing, I'm a complete
Haskell noob, but trying a bit I came up with this before getting stuck:

   class Drawable a where
  draw :: a -> String

   data Rectangle = Rectangle { rx, ry, rw, rh :: Double }
  deriving (Eq, Show)
instance Drawable Rectangle where
  draw (Rectangle rx ry rw rh) = "Rect"
data Circle = Circle { cx, cy, cr :: Double }
  deriving (Eq, Show)
instance Drawable Circle where
  draw (Circle cx cy cr) = "Circle"

   data Shape = ???

Untill I read about existential types here:
http://www.haskell.org/haskellwiki/Existential_type

And was able to complete the definition:

   data Shape = forall a. Drawable a => Shape a

Testing it with a silly example:

   main :: IO ()
   main =  do putStr (test shapes)

   test :: [Shape] -> String
   test [] = ""
   test ((Shape x):xs) = draw x ++ test xs

   shapes :: [Shape]
   shapes = [ Shape (Rectangle 1 1 4 4) , Shape (Circle 2 2 5) ]


Don't know if this helps...

Cheers,
-Tako


On Tue, Mar 29, 2011 at 07:49, Tad Doxsee  wrote:

> I've been trying to learn Haskell for a while now, and recently
> wanted to do something that's very common in the object oriented
> world, subtype polymorphism with a heterogeneous collection.
> It took me a while, but I found a solution that meets
> my needs. It's a combination of solutions that I saw on the
> web, but I've never seen it presented in a way that combines both
> in a short note. (I'm sure it's out there somewhere, but it's off the
> beaten
> path that I've been struggling along.)  The related solutions
> are
>
> 1. section 3.6 of http://homepages.cwi.nl/~ralf/OOHaskell/paper.pdf
>
> 2. The GADT comment at the end of section 4 of
>http://www.haskell.org/haskellwiki/Heterogenous_collections
>
> I'm looking for comments on the practicality of the solution,
> and references to better explanations of, extensions to, or simpler
> alternatives for what I'm trying to achieve.
>
> Using the standard example, here's the code:
>
>
> data Rectangle = Rectangle { rx, ry, rw, rh :: Double }
>deriving (Eq, Show)
>
> drawRect :: Rectangle -> String
> drawRect r = "Rect (" ++ show (rx r) ++ ", "  ++ show (ry r) ++ ") -- "
> ++ show (rw r) ++ " x " ++ show (rh r)
>
>
> data Circle = Circle {cx, cy, cr :: Double}
>deriving (Eq, Show)
>
> drawCirc :: Circle -> String
> drawCirc c = "Circ (" ++ show (cx c) ++ ", " ++ show (cy c)++ ") -- "
> ++ show (cr c)
>
> r1 = Rectangle 0 0 3 2
> r2 = Rectangle 1 1 4 5
> c1 = Circle 0 0 5
> c2 = Circle 2 0 7
>
>
> rs = [r1, r2]
> cs = [c1, c2]
>
> rDrawing = map drawRect rs
> cDrawing = map drawCirc cs
>
> -- shapes = rs ++ cs
>
> Of course, the last line won't compile because the standard Haskell list
> may contain only homogeneous types.  What I wanted to do is create a list
> of
> circles and rectangles, put them in a list, and draw them.  It was easy
> for me to find on the web and in books how to do that if I controlled
> all of the code. What wasn't immediately obvious to me was how to do that
> in a library that could be extended by others.  The references noted
> previously suggest this solution:
>
>
> class ShapeC s where
>  draw :: s -> String
>  copyTo :: s -> Double -> Double -> s
>
> -- needs {-# LANGUAGE GADTs #-}
> data ShapeD  where
>  ShapeD :: ShapeC s => s -> ShapeD
>
> instance ShapeC ShapeD where
>  draw (ShapeD s) = draw s
>  copyTo (ShapeD s) x y = ShapeD (copyTo s x y)
>
> mkShape :: ShapeC s => s -> ShapeD
> mkShape s = ShapeD s
>
>
>
> instance ShapeC Rectangle where
>  draw = drawRect
>  copyTo (Rectangle _ _ rw rh) x y = Rectangle x y rw rh
>
> instance ShapeC Circle where
>  draw = drawCirc
>  copyTo (Circle _ _ r) x y = Circle x y r
>
>
> r1s = ShapeD r1
> r2s = ShapeD r2
> c1s = ShapeD c1
> c2s = ShapeD c2
>
> shapes1 = [r1s, r2s, c1s, c2s]
> drawing1 = map draw shapes1
>
> shapes2 = map mkShape rs ++ map mkShape cs
> drawing2 = map draw shapes2
>
> -- copy the shapes to the origin then draw them
> shapes3 = map (\s -> copyTo s 0 0) shapes2
> drawing3 = map draw shapes3
>
>
> Another user could create a list of shapes that included triangles by
> creating
> a ShapeC instance for his triangle and using mkShape to add it to a list of
> ShapeDs.
>
> Is the above the standard method in Haskell for creating an extensible
> heterogeneous list of "objects" that share a common interface?  Are there
> better
> approaches?  (I ran into a possible limitation to this approach that I plan
> to ask about later if I can't figure it out myself.)
>
> - Tad
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Noob question about list comprehensions

2011-02-16 Thread Tako Schotanus
Ok, thanks all, that was what I was looking for :)

-Tako


On Wed, Feb 16, 2011 at 10:46, Ozgur Akgun  wrote:

> On 16 February 2011 09:19, Tako Schotanus  wrote:
>
>> I wondered if there was a way for a guard in a list comprehension to refer
>> to the item being produced?
>>
>
>
>> I'm just wondering about this very specific case
>>
>
> Then, the answer is no.
>
> As others have noted, let binding is the way to go.
>
> Ozgur
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Noob question about list comprehensions

2011-02-16 Thread Tako Schotanus
Hello,

I was going through some of the tuturials and trying out different
(syntactic) alternatives to the given solutions and I I got to this line:

*length [chain x | x <- [1..100] , length (chain x) > 15]*

Now, there's nothing wrong with it, it works of course. But the application
of chain x is repeated twice and I wondered if there was a way for a guard
in a list comprehension to refer to the item being produced?

Like this for example (invented syntax):

*length [@c(chain x) | x <- [1..100] , length c > 15]*

NB: Just to make clear, I'm not asking if there is an alternative way of
preventing the repetition, of course there is, I'm just wondering about this
very specific case within list comprehensions.

-Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] windows network programming

2011-01-22 Thread Tako Schotanus
You're probably right, I just tried at again at home and here it works.

Thanks,
-Tako


On Fri, Jan 21, 2011 at 23:23, Albert Y. C. Lai  wrote:

> On 11-01-21 03:13 AM, Tako Schotanus wrote:
>
>> Just starting out here so I don't know what I'm doing yet, but this one
>> doesn't compile for me.
>>
>>   ping.hs:19:13: parse error on input `<-'
>>
>
> The original code contains no parse error.
>
> When you tranfer it to your editor, you may accidentally change
> indentation, which causes such parse errors.
>
> Some editors add tabs behind your back to confound indentation further.
>
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] windows network programming

2011-01-21 Thread Tako Schotanus
Just starting out here so I don't know what I'm doing yet, but this one
doesn't compile for me.

  ping.hs:19:13: parse error on input `<-'

So am I doing something stupid here or is there something wrong with the
code?

-Tako


On Fri, Jan 21, 2011 at 04:48, Michael Litchard  wrote:

> freenode figured this out. Pasting here for future reference.
>
>
> import Control.Concurrent
> import Network
> import System.IO
>
> main :: IO ()
> main = withSocketsDo $ do
> m <- newEmptyMVar
>
> forkIO (waitAndPong m)
> ping m
>
> -- The basic server
> waitAndPong :: MVar () -> IO ()
> waitAndPong m = do
>
> socket <- listenOn (PortNumber 8000)
> putMVar m ()
>
> (handle,_,_) <- accept socket
> hSetBuffering handle LineBuffering
>
> incoming <- hGetLine handle
> putStrLn ("> " ++ incoming)
> hPutStrLn handle "pong"
>
> -- The basic client
> ping :: MVar () -> IO ()
> ping m = do
> _ <- takeMVar m
> handle <- connectTo "localhost" (PortNumber 8000)
>
>
> hSetBuffering handle LineBuffering
> hPutStrLn handle "ping"
> incoming <- hGetLine handle
> putStrLn ("< " ++ incoming)
>
>
> On Thu, Jan 20, 2011 at 6:17 PM, Michael Litchard wrote:
>
>> I tried this as an example and got the following error when running.
>>
>> net.exe: connect: failed (Connection refused (WSAECONNREFUSED))
>>
>> Firewall is off, running as administrator
>>
>> Windows is Windows 7 Enterprise.
>>
>> Advice on what to do next is appreciated
>>
>>
>> On Tue, Nov 2, 2010 at 1:24 PM, Nils Schweinsberg  wrote:
>>
>>> Am 02.11.2010 19:57, schrieb Michael Litchard:
>>>
>>>  got any urls with examples?

>>>
>>> Sure, see this short server-client-ping-pong application.
>>>
>>> By the way, I noticed that you don't need withSocketsDo on windows 7, but
>>> I guess it's there for a reason for older windows versions. :)
>>>
>>>
>>>
>>>import Control.Concurrent
>>>import Network
>>>import System.IO
>>>
>>>main :: IO ()
>>>main = withSocketsDo $ do
>>>forkIO waitAndPong
>>>ping
>>>
>>>-- The basic server
>>>waitAndPong :: IO ()
>>>waitAndPong = do
>>>socket <- listenOn (PortNumber 1234)
>>>(handle,_,_) <- accept socket
>>>hSetBuffering handle LineBuffering
>>>incoming <- hGetLine handle
>>>putStrLn ("> " ++ incoming)
>>>hPutStrLn handle "pong"
>>>
>>>-- The basic client
>>>ping :: IO ()
>>>ping = do
>>>handle <- connectTo "localhost" (PortNumber 1234)
>>>hSetBuffering handle LineBuffering
>>>hPutStrLn handle "ping"
>>>incoming <- hGetLine handle
>>>putStrLn ("< " ++ incoming)
>>>
>>> ___
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe@haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>
>>
>>
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: ANNOUNCE: text 0.8.0.0, fast Unicode text support

2010-09-01 Thread Tako Schotanus
Hi Kevin,

thanks for the pointer, although I was aware of the thread and had followed
it quite closely, it was quite interesting.
But it never explained if and why String should be avoided, all I read is
"test and decide depending on the circumstances", which in itself is good
advise, but I'd like to have an idea of the reasons so I can form in idea
before actually having to code any benchmarks :)

Knowing that String literally is a linked list of Char makes it a lot
clearer. I figured that maybe Haskell could be using some more efficient
mechanism for Strings internally, only treating it outwardly as a [Char].
But I guess that in a lot of circumstances where you're just working with
small pieces of text in non-performance critical code it's perfectly okay to
use String.

Cheers,
-Tako


On Wed, Sep 1, 2010 at 08:31, Kevin Jardine  wrote:

> Hi Tako,
>
> The issues involved with String, ByteString, Text and a few related
> libraries were discussed at great length recently in this thread:
>
>
> http://groups.google.com/group/haskell-cafe/browse_thread/thread/52a21cf61ffb21b0/
>
> Basically, Chars are 32 bit integers and Strings are represented as a
> list of Chars.
>
> This is very convenient for small computations but often very
> inefficient for anything large scale.
>
> The String API  is also missing various encoding related features.
>
> Because of the limitations of String, various alternative libraries
> have been proposed. Text is one important option.
>
> You'll find much more detail on the above referenced thread.
>
> Kevin
>
> On Sep 1, 8:13 am, Tako Schotanus  wrote:
> > On Wed, Sep 1, 2010 at 07:14, John Millikin  wrote:
> >
> > > > Don't forget, you can always improve the text library yourself. I
> love to
> > > receive
> > > > patches, requests for improvement, and bug reports.
> >
> > > Are there any areas in particular you'd like help with, for either
> > > library? I'm happy to assist any effort which will help reduce use of
> > > String.
> >
> > As a Haskell noob I'm curious about this statement, is there something
> > intrinsically wrong with String?
> > Or is it more a performance/resource problem when dealing with large
> amounts
> > of text for example?
> > (Like having to use StringBuilder in Java if you want to avoid the
> penalty
> > of repeated String allocations when simply concatenating for example)
> >
> > Cheers,
> >  -Tako
> >
> > ___
> > Haskell-Cafe mailing list
> > haskell-c...@haskell.orghttp://
> www.haskell.org/mailman/listinfo/haskell-cafe
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: text 0.8.0.0, fast Unicode text support

2010-08-31 Thread Tako Schotanus
On Wed, Sep 1, 2010 at 07:14, John Millikin  wrote:

>
> > Don't forget, you can always improve the text library yourself. I love to
> receive
> > patches, requests for improvement, and bug reports.
>
> Are there any areas in particular you'd like help with, for either
> library? I'm happy to assist any effort which will help reduce use of
> String.
>

As a Haskell noob I'm curious about this statement, is there something
intrinsically wrong with String?
Or is it more a performance/resource problem when dealing with large amounts
of text for example?
(Like having to use StringBuilder in Java if you want to avoid the penalty
of repeated String allocations when simply concatenating for example)

Cheers,
 -Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Tako Schotanus
On Tue, Aug 17, 2010 at 13:40, Ketil Malde  wrote:

> Michael Snoyman  writes:
>
> > As far as space usage, you are correct that CJK data will take up more
> > memory in UTF-8 than UTF-16.
>
> With the danger of sounding ... alphabetist? as well as belaboring a
> point I agree is irrelevant (the storage format):
>
> I'd point out that it seems at least as unfair to optimize for CJK at
> the cost of Western languages.
>

Thing is that here you're only talking about size optimizations, for
somebody having to handle a lot of international texts (and I'm not
necessarily talking about Chinese or Japanese here) it would be important
that this is handled in the most efficient way possible, because in the end
storing and retrieving you only do once each while maybe doing a lot of
processing in between. And the on-disk storage or the over-the-wire format
might very well be different than the in-memory format. Each can be selected
for what it's best at.

I'll repeat here that in my opinion a Text package should be good at
handling text, human text, from whatever country. If I need to handle large
streams of ASCII I'll use something else.

:)

Cheers,
 -Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Tako Schotanus
On Tue, Aug 17, 2010 at 13:29, Ketil Malde  wrote:

> Tako Schotanus  writes:
>
> >> Just like Char is capable of encoding any valid Unicode codepoint.
>
> > Unless a Char in Haskell is 32 bits (or at least more than 16 bits) it
> con
> > NOT encode all Unicode points.
>
> And since it can encode (or rather, represent) any valid Unicode
> codepoint, it follows that it is 32 bits (and at least more than 16
> bits).
>
> :-)
>
> (Char is basically a 32bit value, limited valid Unicode code points, so
> it corresponds to UCS-4/UTF-32.)
>
>
Yeah, I tried looking it up but I could find the technical definition for
Char, but in the end I found that "maxBound" was "0x10" making it
basically 24 bits :)

I know for example that Java uses only 16 bits for its Chars and therefore
can NOT give you all Unicode code points with a single Char, with Strings
you can because of the extension points.

-Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Tako Schotanus
On Tue, Aug 17, 2010 at 13:00, Michael Snoyman  wrote:

>
>
> On Tue, Aug 17, 2010 at 1:50 PM, Yitzchak Gale  wrote:
>
>> Ketil Malde wrote:
>> > I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
>> > 3Gbyte file (the Human genome, say¹), into a computer with 4Gbytes of
>> > RAM, UTF-16 will be slower than UTF-8...
>>
>> I don't think the genome is typical text. And
>> I doubt that is true if that text is in a CJK language.
>>
>

>  As far as space usage, you are correct that CJK data will take up more
> memory in UTF-8 than UTF-16. The question still remains whether the overall
> document size will be larger: I'd be interested in taking a random sampling
> of CJK-encoded pages and comparing their UTF-8 and UTF-16 file sizes. I
> think simply talking about this in the vacuum of data is pointless. If
> anyone can recommend a CJK website which would be considered representative
> (or a few), I'll do the test myself.
>
>
Regardless of the outcome of that investigation (which in itself is
interesting) I have to agree with Yitzchak that the human genome (or any
other ASCII based data that is not ncessarily a representation of written
human language) is not a good fir for the Text package.

A package like this should IMHO be good at handling human language, as much
of them as possible, and support the common operations as efficiently as
possible: sorting, upper/lowercase (where those exist), find word
boundaries, whatever.

Parsing some kind of file containing the human genome and the like I
think would be much better served by a package focusing on handling large
streams of bytes. No encodings to worry about, no parsing of the stream
determine code points, no calculations determine string lengths. If you need
to convert things to upper/lower case or do sorting you can just fall back
on simple ASCII processing, no need to depend on a package dedicated to
human text processing.

I do think that in-memory processing of Unicode is better served with UTF16
than UTF8 because except en very rare circumstances you can just treat the
text as an array of Char. You can't do that for UTF8 so the efficiency of
the algorithmes would suffer.

I also think that the memory problem is much easier worked around (for
example by dividing the problem in smaller parts) than sub-optimal string
processing because of increased complexity.

-Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Tako Schotanus
On Tue, Aug 17, 2010 at 12:54, Ivan Lazar Miljenovic <
ivan.miljeno...@gmail.com> wrote:

> Tom Harper  writes:
>
> > 2010/8/17 Bulat Ziganshin :
> >> Hello Tom,
> >
> > 
> >
> >> i don't understand what you mean. are you support all 2^20 codepoints
> >> in Data.Text package?
> >
> > Bulat,
> >
> > Yes, its internal representation is UTF-16, which is capable of
> > encoding *any* valid Unicode codepoint.
>
> Just like Char is capable of encoding any valid Unicode codepoint.
>
>
Unless a Char in Haskell is 32 bits (or at least more than 16 bits) it con
NOT encode all Unicode points.

-Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Tako Schotanus
On Tue, Aug 17, 2010 at 10:34, Bulat Ziganshin wrote:

> Hello Johan,
>
> Tuesday, August 17, 2010, 12:20:37 PM, you wrote:
>
> >  I agree, Data.Text is great.  Unfortunately, its internal use of UTF-16
> >  makes it inefficient for many purposes.
>
> > It's not clear to me that using UTF-16 internally does make
> > Data.Text noticeably slower.
>
> not slower but require 2x more memory. speed is the same since
> Unicode contains 2^20 codepoints
>
>
This is not entirely correct because it all depends on your data.
For western languages is normally holds true that UTF16 occupies twice the
memory of UTF8, but for other languages code points might take up to 3 bytes
(I thought even 4, but the wikipedia page only mentions 3:
http://en.wikipedia.org/wiki/UTF-8).

That wikipedia page is a nice read anyway, it mentions some of the
advantages and disadvantages of the different encodings.
(The complexity of the code that determines the length of an UTF string
depends on the encoding for example)

Cheers,
 -Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Question about memory usage

2010-08-14 Thread Tako Schotanus
First of all, thanks to the people who responded :)

On Sat, Aug 14, 2010 at 17:49, Christopher Lane Hinson <
l...@downstairspeople.org> wrote:

>
> On Sat, 14 Aug 2010, Tako Schotanus wrote:
>
>  I was reading this article:
>>
>>
>> http://scienceblogs.com/goodmath/2009/11/writing_basic_functions_in_has.php
>>
>> And came to the part where it shows:
>>
>>
>> > fiblist = 0 : 1 : (zipWith (+) fiblist (tail fiblist))
>>
>> But then I read that "Once it's been referenced, then the list up to where
>> you looked is concrete - the
>> computations won't be repeated."
>>
>
> It is so implemented.  If you *really* wanted a weak reference that could
> be
> garbage collected and rebuilt (you don't), it could be made to happen.


I understand, if you don't want to keep the memory tied up you either define
the reference locally or you use another algorithm.


>
>
>  and I started wondering how that works.
>> Because this seems to mean that functions could have unknown (to the
>> caller) memory requirements.
>>
>
> This is true in any programming language or runtime.  Nothing special
> has happened: you could implement the same thing in C/C++/Java/Python,
> but it would take 10-100 lines of code.
>
>
Sure, although the memory use would normally be more obvious and it would
actually be more work to make the result globally permanent.
(the difference between the 10 lines and the 100 lines you refer to
probably)


>
>  How does one, programming in Haskell, keep that in check?
>> And when does that memory get reclaimed?
>>
>
> Haskell is garbage collected, as soon as the fiblist is not longer
> reachable,
> and the runtime wants to reclaim the memory, it will.
>
> If fiblist is a top-level declaration it will always be reachable.
>

Ok, makes sense.

Just to make this clear, I'm not complaining nor suggesting there is
anything wrong with the way Haskell does things (minds immeasurably superior
to mine working on this stuff, I'm not going to pretend to know better),
it's just one of those "surprises" for somebody who has no experience yet
with Haskell.

Thanks,
 -Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Question about memory usage

2010-08-14 Thread Tako Schotanus
I was reading this article:

http://scienceblogs.com/goodmath/2009/11/writing_basic_functions_in_has.php

And came to the part where it shows:


> fiblist = 0 : 1 : (zipWith (+) fiblist (tail fiblist))


Very interesting stuff for somebody who comes from an imperative world of
course.
But then I read that "Once it's been referenced, then the list up to where
you looked is concrete - the computations *won't* be repeated."
and I started wondering how that works.
Because this seems to mean that functions could have unknown (to the caller)
memory requirements.
How does one, programming in Haskell, keep that in check?
And when does that memory get reclaimed?

Cheers,
-Tako
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe