Re: [Haskell-cafe] Difference between Lazy ByteStrings and Strings in alex

2007-02-13 Thread Jefferson Heard
Yes, that was a typo :-)

On Tuesday 13 February 2007 22:54, Stefan O'Rear wrote:
> On Tue, Feb 13, 2007 at 10:43:11PM -0500, Jefferson Heard wrote:
> > I am running GHC 2.6 now, and am using -O3 as my optimization parameter. 
> > I'm
>
> I think you will get much better performance with GHC 6.6.  The optimizer
> has been improved a *lot* in the last 10 years.
>
> (I hope that was a typo!!)
>
> > Non-lazy version
> >
> > {
> > module Main
> > where
> >
> > import qualified FileReader
> >
> > }
> >
> > %wrapper "basic"
> >
> > $letter = [a-zA-Z]
> > $digit = 0-9
> > $alphanum = [a-zA-Z0-9]
> > $punct = [\! \@ \# \$ \% \^ \& \* \( \) \_ \- \+ \= \{ \[ \} \] \\ \| \;
> > \: \' \" \, \. \? \/ \` \~]
> > $dec = \.
> > $posneg = [\- \+]
> >
> > @date1 = jan($punct|uary)?\ $digit{1,2}(\,\ $digit{2,4})?
> >
> >| feb($punct|ruary)?\ $digit{1,2}(\,\ $digit{2,4})?
> >| mar($punct|ch)?\ $digit{1,2}(\,\ $digit{2,4})?
> >| apr($punct|il)?\ $digit{1,2}(\,\ $digit{2,4})?
> >| may?\ $digit{1,2}(\,\ $digit{2,4})?
> >| jun($punct|e)?\ $digit{1,2}(\,\ $digit{2,4})?
> >| jul($punct|y)?\ $digit{1,2}(\,\ $digit{2,4})?
> >| aug($punct|ust)?\ $digit{1,2}(\,\ $digit{2,4})?
> >| sep($punct|tember)?\ $digit{1,2}(\,\ $digit{2,4})?
> >| sept($punct)?\ $digit{1,2}(\,\ $digit{2,4})?
> >| oct($punct|ober)?\ $digit{1,2}(\,\ $digit{2,4})?
> >| nov($punct|ember)?\ $digit{1,2}(\,\ $digit{2,4})?
> >| dec($punct|ember)?\ $digit{1,2}(\,\ $digit{2,4})?
> >
> > @date2 = $digit{1,2} $punct $digit{1,2} $punct $digit{2,4}
> >
> > @time = $digit{1,2} \: $digit{2} (am|pm)?
> >
> > @word = $alphanum+
> >
> > @number = $posneg? $digit+
> >
> > | $posneg? $digit+ $dec $digit+
> > | $posneg? $digit+ (\,$digit{3})+
> > | $posneg? $digit? (\,$digit{3})+ $dec $digit+
> >
> > $white = [\t\r\n\v\f\ ]
> >
> > @doc = \< DOC \>
> > @tag = \< $alphanum+ \>
> >
> >  | \<\/ $alphanum+ \>
> >
> > tokens :-
> >   @doc{ \s -> "" }
> >   @tag;
> >   $white+ ;
> >   @time   { \s -> s }
> >   @number { \s -> s }
> >   @word   { \s -> s }
> >   $punct  ;
> >   .   ;
> >
> > {
> >
> > printCount c [] = print c
> > printCount c (l:ls) = if l == "" then printCount (c+1) ls else printCount
> > c ls
> >
> > main = do
> > file <- readFile "trecfile1"
> > printCount 0 (alexScanTokens file)
> >
> > }
>
> FTR, regular strings are lazy - too lazy, which is where the performance
> problems come from.
>
> > --
> > -
> >--- Version depending on ByteString.Lazy
> > -- note that the grammar is the same, so it has been omitted
> > --
> > -
> >---
> >
> > ... grammar ...
> >
> > {
> > type AlexInput = (Char, -- previous char
> >   B.ByteString)   -- current input string
> >
> > takebytes :: Int -> B.ByteString -> String
> > takebytes (0) _ =  ""
> > takebytes n s = c : takebytes (n-1) cs
> > where c = B.index s 0
> >   cs = B.drop 1 s
> >
> > alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
> > alexGetChar (_, bytestring)
> >
> > | bytestring == B.empty = Nothing
> > | otherwise = Just (c , (c,cs))
> >
> > where c = B.index bytestring 0
> >   cs = B.drop 1 bytestring
>
> Hm, you might do better with more specialized functions.
>
> > alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
> > alexGetChar (_, bytestring)
> >
> > | B.null bytestring = Nothing
> > | otherwise = Just (c , (c,cs))
> >
> > where c = B.head bytestring
> >   cs = B.tail bytestring
>
> or even:
> > alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
> > alexGetChar (_, bytestring)
> >
> > | B.null bytestring = Nothing
> > | otherwise = Just (c , (c,cs))
> >
> > where c = B.unsafeHead bytestring
> >   cs = B.unsafeTail bytestring
> >
> > alexInputPrevChar :: AlexInput -> Char
> > alexInputPrevChar (c,_) = c
>
> If you are certian this isn't the first character, you might do better
> using B.unsafeIndex (-1).
>
> > alexScanTokens :: B.ByteString -> [String]
> > alexScanTokens str = go ('\n',str)
> >   where go inp@(_,str) =
> >   case alexScan inp 0 of
> > AlexToken inp' len act -> act (takebytes len str) : go
> > inp' AlexSkip  inp' len -> go inp'
> > AlexEOF -> []
> > AlexError _ -> error "lexical error"
> >
> >
> >
> >
> > printCount :: Int -> [String] -> IO ()
> > printCount c [] = print c
> > printCount c (l:ls) = if l == "" then printCount (c+1) ls else printCount
> > c ls
> >
> > main = do
> > file <- B.readFile "trecfile1"
> > printCount 0 (alexScanTokens file)
> >
> > }
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.ha

Re: [Haskell-cafe] Difference between Lazy ByteStrings and Strings in alex

2007-02-13 Thread Stefan O'Rear
On Tue, Feb 13, 2007 at 10:43:11PM -0500, Jefferson Heard wrote:
> I am running GHC 2.6 now, and am using -O3 as my optimization parameter.  I'm 

I think you will get much better performance with GHC 6.6.  The optimizer has 
been
improved a *lot* in the last 10 years.

(I hope that was a typo!!)

> Non-lazy version
> 
> {
> module Main
> where
> 
> import qualified FileReader
> 
> }
> 
> %wrapper "basic"
> 
> $letter = [a-zA-Z]
> $digit = 0-9
> $alphanum = [a-zA-Z0-9]
> $punct = [\! \@ \# \$ \% \^ \& \* \( \) \_ \- \+ \= \{ \[ \} \] \\ \| \; \: 
> \' 
> \" \, \. \? \/ \` \~]
> $dec = \.
> $posneg = [\- \+]
> 
> @date1 = jan($punct|uary)?\ $digit{1,2}(\,\ $digit{2,4})?
>| feb($punct|ruary)?\ $digit{1,2}(\,\ $digit{2,4})?
>| mar($punct|ch)?\ $digit{1,2}(\,\ $digit{2,4})?
>| apr($punct|il)?\ $digit{1,2}(\,\ $digit{2,4})?
>| may?\ $digit{1,2}(\,\ $digit{2,4})?
>| jun($punct|e)?\ $digit{1,2}(\,\ $digit{2,4})?
>| jul($punct|y)?\ $digit{1,2}(\,\ $digit{2,4})?
>| aug($punct|ust)?\ $digit{1,2}(\,\ $digit{2,4})?
>| sep($punct|tember)?\ $digit{1,2}(\,\ $digit{2,4})?
>| sept($punct)?\ $digit{1,2}(\,\ $digit{2,4})?
>| oct($punct|ober)?\ $digit{1,2}(\,\ $digit{2,4})?
>| nov($punct|ember)?\ $digit{1,2}(\,\ $digit{2,4})?
>| dec($punct|ember)?\ $digit{1,2}(\,\ $digit{2,4})?
> 
> @date2 = $digit{1,2} $punct $digit{1,2} $punct $digit{2,4}
> 
> @time = $digit{1,2} \: $digit{2} (am|pm)?
> 
> @word = $alphanum+
> 
> @number = $posneg? $digit+ 
> | $posneg? $digit+ $dec $digit+
> | $posneg? $digit+ (\,$digit{3})+
> | $posneg? $digit? (\,$digit{3})+ $dec $digit+
> 
> $white = [\t\r\n\v\f\ ]
> 
> @doc = \< DOC \>
> @tag = \< $alphanum+ \>
>  | \<\/ $alphanum+ \>
> 
> tokens :- 
>   @doc{ \s -> "" }
>   @tag;
>   $white+ ; 
>   @time   { \s -> s }
>   @number { \s -> s } 
>   @word   { \s -> s }
>   $punct  ; 
>   .   ;
> 
> {
> 
> printCount c [] = print c
> printCount c (l:ls) = if l == "" then printCount (c+1) ls else printCount c ls
> 
> main = do
> file <- readFile "trecfile1" 
> printCount 0 (alexScanTokens file) 
>  
> }

FTR, regular strings are lazy - too lazy, which is where the performance 
problems come from.
 
> -- 
> 
> Version depending on ByteString.Lazy -- note that the grammar is the same, so 
> it has been omitted
> -- 
> 
> 
> ... grammar ...
> 
> {
> type AlexInput = (Char, -- previous char
>   B.ByteString)   -- current input string
> 
> takebytes :: Int -> B.ByteString -> String
> takebytes (0) _ =  ""
> takebytes n s = c : takebytes (n-1) cs
> where c = B.index s 0
>   cs = B.drop 1 s
>
> alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
> alexGetChar (_, bytestring) 
> | bytestring == B.empty = Nothing
> | otherwise = Just (c , (c,cs))
> where c = B.index bytestring 0
>   cs = B.drop 1 bytestring

Hm, you might do better with more specialized functions.

> alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
> alexGetChar (_, bytestring) 
> | B.null bytestring = Nothing
> | otherwise = Just (c , (c,cs))
> where c = B.head bytestring
>   cs = B.tail bytestring

or even:

> alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
> alexGetChar (_, bytestring) 
> | B.null bytestring = Nothing
> | otherwise = Just (c , (c,cs))
> where c = B.unsafeHead bytestring
>   cs = B.unsafeTail bytestring

> alexInputPrevChar :: AlexInput -> Char
> alexInputPrevChar (c,_) = c

If you are certian this isn't the first character, you might do better using 
B.unsafeIndex (-1).

> alexScanTokens :: B.ByteString -> [String]
> alexScanTokens str = go ('\n',str)
>   where go inp@(_,str) =
>   case alexScan inp 0 of
> AlexToken inp' len act -> act (takebytes len str) : go inp'
> AlexSkip  inp' len -> go inp'
> AlexEOF -> []
> AlexError _ -> error "lexical error"
> 
> 
> 
> 
> printCount :: Int -> [String] -> IO ()
> printCount c [] = print c
> printCount c (l:ls) = if l == "" then printCount (c+1) ls else printCount c ls
> 
> main = do
> file <- B.readFile "trecfile1" 
> printCount 0 (alexScanTokens file) 
>  
> }
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Difference between Lazy ByteStrings and Strings in alex

2007-02-13 Thread Donald Bruce Stewart
jeff:
> It was suggested that I might derive some performance benefit from using lazy 
> bytestrings in my tokenizer instead of regular strings.  Here's the code that 
> I've tried.  Note that I've hacked the "basic" wrapper code in the Lazy 
> version, so the code should be all but the same.  The only thing I had to do 
> out of the ordinary was write my own 'take' function instead of using the 
> substring function provided by Data.Lazy.ByteString.Char8.  The take function 
> I used was derived from the one GHC uses in GHC.List and produces about the 
> same code.  
> 
> The non-lazy version runs in 38 seconds on a 211MB file versus the lazy 
> versions 41 seconds.  That of course doesn't seem like that much, and in the 
> non-lazy case, I have to break the input up into multiple files, whereas I 
> don't have to in the lazy version -- this does not take any extra time.  The 
> seconds do add up to a couple of hours for me, though once I'm done, and so 
> I'd like to understand why, when the consensus was that Data.ByteString.Lazy 
> might give me better performance in the end, it doesn't do so here.  
> 
> I am running GHC 2.6 now, and am using -O3 as my optimization parameter.  I'm 
> profiling the code now, but was wondering if there was any insight...

GHC 6.6 you mean?

Can you post a complete example, including FileReader, so that I can
compile the code, with some example input and output, to work out what's
going on?

By the way, if you're able to break the file into chunks already, we
should able to do even better with a strict ByteString.

Cheers,
  Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Difference between Lazy ByteStrings and Strings in alex

2007-02-13 Thread Jefferson Heard
It was suggested that I might derive some performance benefit from using lazy 
bytestrings in my tokenizer instead of regular strings.  Here's the code that 
I've tried.  Note that I've hacked the "basic" wrapper code in the Lazy 
version, so the code should be all but the same.  The only thing I had to do 
out of the ordinary was write my own 'take' function instead of using the 
substring function provided by Data.Lazy.ByteString.Char8.  The take function 
I used was derived from the one GHC uses in GHC.List and produces about the 
same code.  

The non-lazy version runs in 38 seconds on a 211MB file versus the lazy 
versions 41 seconds.  That of course doesn't seem like that much, and in the 
non-lazy case, I have to break the input up into multiple files, whereas I 
don't have to in the lazy version -- this does not take any extra time.  The 
seconds do add up to a couple of hours for me, though once I'm done, and so 
I'd like to understand why, when the consensus was that Data.ByteString.Lazy 
might give me better performance in the end, it doesn't do so here.  

I am running GHC 2.6 now, and am using -O3 as my optimization parameter.  I'm 
profiling the code now, but was wondering if there was any insight...

-- Jeff 

Non-lazy version

{
module Main
where

import qualified FileReader

}

%wrapper "basic"

$letter = [a-zA-Z]
$digit = 0-9
$alphanum = [a-zA-Z0-9]
$punct = [\! \@ \# \$ \% \^ \& \* \( \) \_ \- \+ \= \{ \[ \} \] \\ \| \; \: \' 
\" \, \. \? \/ \` \~]
$dec = \.
$posneg = [\- \+]

@date1 = jan($punct|uary)?\ $digit{1,2}(\,\ $digit{2,4})?
   | feb($punct|ruary)?\ $digit{1,2}(\,\ $digit{2,4})?
   | mar($punct|ch)?\ $digit{1,2}(\,\ $digit{2,4})?
   | apr($punct|il)?\ $digit{1,2}(\,\ $digit{2,4})?
   | may?\ $digit{1,2}(\,\ $digit{2,4})?
   | jun($punct|e)?\ $digit{1,2}(\,\ $digit{2,4})?
   | jul($punct|y)?\ $digit{1,2}(\,\ $digit{2,4})?
   | aug($punct|ust)?\ $digit{1,2}(\,\ $digit{2,4})?
   | sep($punct|tember)?\ $digit{1,2}(\,\ $digit{2,4})?
   | sept($punct)?\ $digit{1,2}(\,\ $digit{2,4})?
   | oct($punct|ober)?\ $digit{1,2}(\,\ $digit{2,4})?
   | nov($punct|ember)?\ $digit{1,2}(\,\ $digit{2,4})?
   | dec($punct|ember)?\ $digit{1,2}(\,\ $digit{2,4})?

@date2 = $digit{1,2} $punct $digit{1,2} $punct $digit{2,4}

@time = $digit{1,2} \: $digit{2} (am|pm)?

@word = $alphanum+

@number = $posneg? $digit+ 
| $posneg? $digit+ $dec $digit+
| $posneg? $digit+ (\,$digit{3})+
| $posneg? $digit? (\,$digit{3})+ $dec $digit+

$white = [\t\r\n\v\f\ ]

@doc = \< DOC \>
@tag = \< $alphanum+ \>
 | \<\/ $alphanum+ \>

tokens :- 
  @doc{ \s -> "" }
  @tag;
  $white+ ; 
  @time   { \s -> s }
  @number { \s -> s } 
  @word   { \s -> s }
  $punct  ; 
  .   ;

{

printCount c [] = print c
printCount c (l:ls) = if l == "" then printCount (c+1) ls else printCount c ls

main = do
file <- readFile "trecfile1" 
printCount 0 (alexScanTokens file) 
 
}

-- 

Version depending on ByteString.Lazy -- note that the grammar is the same, so 
it has been omitted
-- 


... grammar ...

{
type AlexInput = (Char, -- previous char
  B.ByteString)   -- current input string

takebytes :: Int -> B.ByteString -> String
takebytes (0) _ =  ""
takebytes n s = c : takebytes (n-1) cs
where c = B.index s 0
  cs = B.drop 1 s

alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
alexGetChar (_, bytestring) 
| bytestring == B.empty = Nothing
| otherwise = Just (c , (c,cs))
where c = B.index bytestring 0
  cs = B.drop 1 bytestring

alexInputPrevChar :: AlexInput -> Char
alexInputPrevChar (c,_) = c

alexScanTokens :: B.ByteString -> [String]
alexScanTokens str = go ('\n',str)
  where go inp@(_,str) =
  case alexScan inp 0 of
AlexToken inp' len act -> act (takebytes len str) : go inp'
AlexSkip  inp' len -> go inp'
AlexEOF -> []
AlexError _ -> error "lexical error"




printCount :: Int -> [String] -> IO ()
printCount c [] = print c
printCount c (l:ls) = if l == "" then printCount (c+1) ls else printCount c ls

main = do
file <- B.readFile "trecfile1" 
printCount 0 (alexScanTokens file) 
 
}
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to solve this problem?It's quite easy in PHP.

2007-02-13 Thread Donald Bruce Stewart
keepbal:
> 
>The code and template are separated in the PHP example,so
>designers can design with out too much PHP knowledge.This is
>actually what I want to solve.

I'd use one of the Html/XML pretty printing libraries then,

xhtml: 
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/xhtml-3000.0.1
haxml: 
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/HaXml-1.13.2
hxt:   http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hxt-7.1  

All available on hackage.haskell.org.  There's few others too, here,

http://www.haskell.org/haskellwiki/Libraries_and_tools/Web_programming

It should be perfectly possible to write clean code, with the
presentation separated from the content.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to solve this problem?It's quite easy in PHP.

2007-02-13 Thread Donald Bruce Stewart
keepbal:
> 
>For example,
>---
>//test.php
>require ("tiny.php");//Tiny is a small template engine.
>$tn=new Tiny();
>$arr=new Array();
>$arr['a']='1';
>$arr['b']='2';
>$arr['c']='3';
>$tn->set('arr',$arr);
>$tn->show('_test.php');
>?>
>---
>
>foreach($arr as $key => $val){
>echo "$key = $val ";
>}
>?>
>
>---
>a = 1
>b = 2
>c = 3
>---

Doesn't look that easy. I guess its not too bad though.

Anyway, here's Data.Map for you:


import Data.Map
import Text.Printf

m = fromList (zip "abc" [1..])

main = mapM_ draw (toList m)

draw (k,v) = printf "%c = %d\n" k (v :: Int)

And if you want to run this:

$ runhaskell A.hs
a = 1
b = 2
c = 3
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] How to solve this problem?It's quite easy in PHP.

2007-02-13 Thread keepbal

For example,

---
set('arr',$arr);
$tn->show('_test.php');
?>
---


 $val){
   echo "$key = $val ";
}
?>


---
a = 1
b = 2
c = 3
---
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Summer of Code

2007-02-13 Thread Donald Bruce Stewart
bulat.ziganshin:
> Hello Bryan,
> 
> Tuesday, February 13, 2007, 2:24:21 AM, you wrote:
> 
> > I am wondering if there are any Summer of Code projects that I would
> > be able to do for the Haskell community.
> 
> of 9 projects started last year, only 1 or 2 was successful. so i
> think that retaking one of projects selected last year may be a good
> idea

7 were successful, 2 were unsuccessful. This was above average for
Google SoC (they originally planned for more than 50% failure) and we
can be considered very succesful! Some other .orgs had dramatically
worse rates.

> to Donald: afair, you was written some report about this SoC. can you
> please explain status quo of projects started last year? is it
> possible to restart unsuccessful ones again?

Certainly. 

> 
> to all: i think, now it's the good time to start thinking about
> projects we need. last year one-week schedule was too fast
> 
> i'm completely agree with Donald in that last year some participants was
> Haskell novices and this paid project was the first program they ever
> tried to write in Haskell :) 

As far as I remember, all of the participants had previous haskell
knowledge. The point of the SoC though is to bring *new* hackers into the
community, who hopefully become core contributors.

We had several SoC guys at the recent hackathon, who wouldn't have been
there  if not for the SoC. You see SoC participants helping out in all
sorts of ways in the community now. So it really worked rather well.

I'm looking forward to another crop of hackers for this years projects!

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suggestion for hackage

2007-02-13 Thread Donald Bruce Stewart
benjamin.franksen:
> Hi
> 
> It would be a nice feature if one could look online at the documentation of
> a package, i.e. w/o downloading and building the package first. Fr
> instance, haddock generated API docs can give you a much better idea what
> you can expect from a library package than the mere package description.

Check the todo list,

http://hackage.haskell.org/trac/hackage/wiki/HackageToDo

Haddock is probably the main next task to get done.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Duncan Coutts
On Tue, 2007-02-13 at 15:12 -0600, Creighton Hogg wrote:
> 
> 
> On 2/13/07, Duncan Coutts <[EMAIL PROTECTED]> wrote:
> On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
> > Hi, I am running the following code against a 210 MB file in
> an attempt to
> > determine whether I should use alex or whether, since my
> needs are very
> > performance oriented, I should write a lexer of my own.  I
> thought that
> > everything I'd written here was tail-recursive
> 
> Isn't that exactly the problem - that it's tail recursive? You
> do not
> want it to be tail recursive since then it must consume the
> whole input
> before producing any output. You want it to be as lazy as
> possible so
> that it can start producing tokens as soon as possible without
> having to
> consume everything.
> 
> This may be silly of me, but I feel like this is an important point:
> so you're saying that tail recursion, without strictness, doesn't run
> in constant space?

There are two kinds of space use that you have to consider here. One is
the stack space and the other is the space required by whatever it is
that your recursive function is doing (in particular if your recursive
function constructs a list then you need space for that list).

> So for example in the case of, 
> facTail 1 n' = n'
> facTail n n' = facTail (n-1) (n*n')
> You'll just be building a bunch of unevaluated thunks until you hit
> the termination condition?

Actually yes, though with a slight modification we can fix that and make
it run in constant space:

facTail !1 !n' = n'
facTail !n !n' = facTail (n-1) (n*n')

however the original example, even if we did something like the above it
still has major problems. Yes it is tail recursive and so it's not
taking any stack space, it is a true loop, but it's a loop that's
allocating a massive list! Let's look at the code again:

pass1 :: String -> String -> String
pass1 left [] = left
pass1 left ('<':right) = pass1 left (stripTagOrComment right)
pass1 left (' ':right) = pass1 left right
pass1 left (c:right) 
| Set.member c punct = pass1 (' ':c:' ':left) right
| otherwise  = pass1 (c:left) right

This may well be a perfect tail recursive loop but each iteration it's
allocating a cons cell. It doesn't return until it has consumed the
entire input and built the entire output. So if you run it on a 2TB file
then it's going to pull the whole lot into memory before returning
anything.

So as I said originally, this is a case where it pays to be lazy.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Donald Bruce Stewart
duncan.coutts:
> On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
> > Hi, I am running the following code against a 210 MB file in an attempt to 
> > determine whether I should use alex or whether, since my needs are very 
> > performance oriented, I should write a lexer of my own.  I thought that 
> > everything I'd written here was tail-recursive
> 
> Isn't that exactly the problem - that it's tail recursive? You do not
> want it to be tail recursive since then it must consume the whole input
> before producing any output. You want it to be as lazy as possible so
> that it can start producing tokens as soon as possible without having to
> consume everything.
> 
> If performance is really important to you then you may also want to
> investigate lexing from a lazy ByteString. Alex can now do that (darcs
> version) or you can do it by hand as you're trying now.
> 

I'd reenforce this point: the only chance for C like performance for
this kind of problem is to use lazy bytestrings, or a combined strict
head, lazy tail approach. Then you can reasonably expect to compete with
C.

Examples on the shootout.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Implementation of scaled integers

2007-02-13 Thread Benjamin Franksen
Stefan Heinzmann wrote:
> is there a library for Haskell that implements scaled integers, i.e.
> integers with a fixed scale factor so that the scale factor does not
> need to be stored, but is part of the type?

I dimly remember that there has been some work done on this in connection
with (and by the creator of) the new time package. Can't remember any
specifics, though.

Ben

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Implementation of scaled integers

2007-02-13 Thread Twan van Laarhoven

Stefan Heinzmann wrote:


Hi all,

is there a library for Haskell that implements scaled integers, i.e.
integers with a fixed scale factor so that the scale factor does not
need to be stored, but is part of the type?


Data.Fixed [1] does exactly that, only it is based on Integer. Using 
fixed point with finite sized integers is more tricky, because you have 
to be careful not to get overflows in intermediate results.


Twan


[1] http://haskell.org/ghc/docs/latest/html/libraries/base/Data-Fixed.html

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Suggestion for hackage

2007-02-13 Thread Benjamin Franksen
Hi

It would be a nice feature if one could look online at the documentation of
a package, i.e. w/o downloading and building the package first. Fr
instance, haddock generated API docs can give you a much better idea what
you can expect from a library package than the mere package description.

Cheers
Ben

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Jefferson Heard
Ha!  You're right!  I didn't think about the laziness aspect of it.  Anyway, 
the non tail-recursive version fixed the problem.  Thanks!

On Tuesday 13 February 2007 16:32, Bernie Pope wrote:
> Creighton Hogg wrote:
> > On 2/13/07, *Duncan Coutts* <[EMAIL PROTECTED]
> > > wrote:
> >
> > On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
> > > Hi, I am running the following code against a 210 MB file in an
> >
> > attempt to
> >
> > > determine whether I should use alex or whether, since my needs
> >
> > are very
> >
> > > performance oriented, I should write a lexer of my own.  I
> >
> > thought that
> >
> > > everything I'd written here was tail-recursive
> >
> > Isn't that exactly the problem - that it's tail recursive? You do not
> > want it to be tail recursive since then it must consume the whole
> > input
> > before producing any output. You want it to be as lazy as possible so
> > that it can start producing tokens as soon as possible without
> > having to
> > consume everything.
> >
> >
> > This may be silly of me, but I feel like this is an important point:
> > so you're saying that tail recursion, without strictness, doesn't run
> > in constant space?
>
> It is an important point, and a classic space bug (see foldl in the
> Prelude).
>
> It it not the fault of tail recursion per se, in fact tail recursion is
> often important in Haskell too.
>
> > So for example in the case of,
> > facTail 1 n' = n'
> > facTail n n' = facTail (n-1) (n*n')
>
> The problem with this example is that it will build up an expression of
> the form:
>
>(n1 * n2 * n3 .)
>
> in the second argument. It's size will be proportional to the number of
> recursive calls made (n).
>
> > You'll just be building a bunch of unevaluated thunks until you hit
> > the termination condition?
>
> To fix it you will want the function to evaluate its second argument
> eagerly:
>
> facTail n n' = facTail (n-1) $! (n*n')
> Cheers,
> Bernie.
>
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Jefferson Heard
Didn't think it was overly slow, just that I could do better :-).


On Tuesday 13 February 2007 16:30, [EMAIL PROTECTED] wrote:
> Jefferson Heard wrote:
> > Argh, bitten by the scheme bug! Right -- NO tail recursion...  So that
> > leaves me with some rather non-intuitive strategies for achieving
> > execution time efficiency.  Anyone care to point me in the direction of a
> > document on efficiency in Haskell?
>
> Besides, proper tail recursion in Haskell needs strictness annotations,
> but the best way is to forget the two words "tail recursive" altogether :)
>
> It always helps to do a rough calculation of how much time you have to
> expect it to run. Processing 1TB with a 1GHz processor and 16=2^4
> machine instruction in the inner loop (must be quite short, the loop) takes
>
>  2^40 / (2^30 / 16) = 2^14 seconds ~ 4.5 hours
>
> Of course, these 4.5 hours are quite sensitive to the 2^4 factor and
> might well be 3 or 9 hours. Assuming that you ran alex on a String, the
> reported 36 hours are entirely reasonable, in the sense of alex not
> being overly slow.
>
> Regards,
> apfelmus
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Creighton Hogg

On 2/13/07, Bernie Pope <[EMAIL PROTECTED]> wrote:


Creighton Hogg wrote:
> This may be silly of me, but I feel like this is an important point:
> so you're saying that tail recursion, without strictness, doesn't run
> in constant space?

It is an important point, and a classic space bug (see foldl in the
Prelude).

It it not the fault of tail recursion per se, in fact tail recursion is
often important in Haskell too.

> So for example in the case of,
> facTail 1 n' = n'
> facTail n n' = facTail (n-1) (n*n')

The problem with this example is that it will build up an expression of
the form:

   (n1 * n2 * n3 .)

in the second argument. It's size will be proportional to the number of
recursive calls made (n).
> You'll just be building a bunch of unevaluated thunks until you hit
> the termination condition?
>

To fix it you will want the function to evaluate its second argument
eagerly:

facTail n n' = facTail (n-1) $! (n*n')



Awesome.
For a long time now I've been interested in Haskell, and studied it from the
math side, but haven't actually really written anything.  This mailing list,
the wiki, and #haskell are proving to be a great resource.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread apfelmus
Jefferson Heard wrote:
> Argh, bitten by the scheme bug! Right -- NO tail recursion...  So that leaves 
> me with some rather non-intuitive strategies for achieving execution time 
> efficiency.  Anyone care to point me in the direction of a document on 
> efficiency in Haskell?

Besides, proper tail recursion in Haskell needs strictness annotations,
but the best way is to forget the two words "tail recursive" altogether :)

It always helps to do a rough calculation of how much time you have to
expect it to run. Processing 1TB with a 1GHz processor and 16=2^4
machine instruction in the inner loop (must be quite short, the loop) takes

 2^40 / (2^30 / 16) = 2^14 seconds ~ 4.5 hours

Of course, these 4.5 hours are quite sensitive to the 2^4 factor and
might well be 3 or 9 hours. Assuming that you ran alex on a String, the
reported 36 hours are entirely reasonable, in the sense of alex not
being overly slow.

Regards,
apfelmus

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Bernie Pope

Creighton Hogg wrote:



On 2/13/07, *Duncan Coutts* <[EMAIL PROTECTED] 
> wrote:


On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
> Hi, I am running the following code against a 210 MB file in an
attempt to
> determine whether I should use alex or whether, since my needs
are very
> performance oriented, I should write a lexer of my own.  I
thought that
> everything I'd written here was tail-recursive

Isn't that exactly the problem - that it's tail recursive? You do not
want it to be tail recursive since then it must consume the whole
input
before producing any output. You want it to be as lazy as possible so
that it can start producing tokens as soon as possible without
having to
consume everything.


This may be silly of me, but I feel like this is an important point:  
so you're saying that tail recursion, without strictness, doesn't run 
in constant space?


It is an important point, and a classic space bug (see foldl in the 
Prelude).


It it not the fault of tail recursion per se, in fact tail recursion is 
often important in Haskell too.



So for example in the case of,
facTail 1 n' = n'
facTail n n' = facTail (n-1) (n*n')


The problem with this example is that it will build up an expression of 
the form:


  (n1 * n2 * n3 .)

in the second argument. It's size will be proportional to the number of 
recursive calls made (n).
You'll just be building a bunch of unevaluated thunks until you hit 
the termination condition?




To fix it you will want the function to evaluate its second argument 
eagerly:


facTail n n' = facTail (n-1) $! (n*n')
Cheers,
Bernie.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Seth Gordon
Jefferson Heard wrote:
> 
> Argh, bitten by the scheme bug! Right -- NO tail recursion...  So that leaves 
> me with some rather non-intuitive strategies for achieving execution time 
> efficiency.  Anyone care to point me in the direction of a document on 
> efficiency in Haskell?

I found this page to be helpful:

http://haskell.org/haskellwiki/Stack_overflow
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Bernie Pope

Duncan Coutts wrote:

On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
  
Hi, I am running the following code against a 210 MB file in an attempt to 
determine whether I should use alex or whether, since my needs are very 
performance oriented, I should write a lexer of my own.  I thought that 
everything I'd written here was tail-recursive



Isn't that exactly the problem - that it's tail recursive? You do not
want it to be tail recursive since then it must consume the whole input
before producing any output. You want it to be as lazy as possible so
that it can start producing tokens as soon as possible without having to
consume everything.
  


Duncan is right, and I will just elaborate a little bit.

Consider the pass1 function:

  pass1 :: String -> String -> String 
  pass1 left [] = left

  pass1 left ('<':right) = pass1 left (stripTagOrComment right)
  pass1 left (' ':right) = pass1 left right
  pass1 left (c:right)
  | Set.member c punct = pass1 (' ':c:' ':left) right
  | otherwise  = pass1 (c:left) right

It accumulates its result in the "left" parameter. So it chomps down the 
"right" string building up a bigger and bigger solution until it reaches 
the base case, and hands the solution over to the calling function.


The calling function gets nothing back from pass1 until pass1 has 
processed the whole input. And that accumulated solution in "left" could 
grow quite big.


A much better approach would be:

  pass1 :: String -> String 
  pass1 [] =  []

  pass1 ('<':right) = pass1 (stripTagOrComment right)
  pass1 (' ':right) = pass1 right
  pass1 (c:right)
  | Set.member c punct = ' ':c:' ': pass1 right
  | otherwise  = c : pass1 right

This way, pass1 will be producing output as early as possible, which can 
be consumed earlier by the calling function. Lazy evaluation gives you a 
kind of co-routining between producers and consumers, but you have to 
write "good" producers and "good" consumers.


You should also write the pass2 in this style as well. Your memory 
consumption should drop to something very small.


Cheers,
Bernie.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Creighton Hogg

On 2/13/07, Duncan Coutts <[EMAIL PROTECTED]> wrote:


On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
> Hi, I am running the following code against a 210 MB file in an attempt
to
> determine whether I should use alex or whether, since my needs are very
> performance oriented, I should write a lexer of my own.  I thought that
> everything I'd written here was tail-recursive

Isn't that exactly the problem - that it's tail recursive? You do not
want it to be tail recursive since then it must consume the whole input
before producing any output. You want it to be as lazy as possible so
that it can start producing tokens as soon as possible without having to
consume everything.



This may be silly of me, but I feel like this is an important point:  so
you're saying that tail recursion, without strictness, doesn't run in
constant space?

So for example in the case of,
facTail 1 n' = n'
facTail n n' = facTail (n-1) (n*n')
You'll just be building a bunch of unevaluated thunks until you hit the
termination condition?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Brandon S. Allbery KF8NH


On Feb 13, 2007, at 16:07 , Kirsten Chevalier wrote:


On 2/13/07, Jefferson Heard <[EMAIL PROTECTED]> wrote:


Argh, bitten by the scheme bug! Right -- NO tail recursion...  So  
that leaves
me with some rather non-intuitive strategies for achieving  
execution time
efficiency.  Anyone care to point me in the direction of a  
document on

efficiency in Haskell?


There really should be one! (Although there may be something on the  
wiki

already.) Profiling can help, though.


http://haskell.org/haskellwiki/Category:Performance ?

--
brandon s. allbery[linux,solaris,freebsd,perl] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Kirsten Chevalier

On 2/13/07, Jefferson Heard <[EMAIL PROTECTED]> wrote:


Argh, bitten by the scheme bug! Right -- NO tail recursion...  So that leaves
me with some rather non-intuitive strategies for achieving execution time
efficiency.  Anyone care to point me in the direction of a document on
efficiency in Haskell?


There really should be one! (Although there may be something on the wiki
already.) Profiling can help, though.

Cheers,
Kirsten

--
Kirsten Chevalier* [EMAIL PROTECTED] *Often in error, never in doubt
"Are you aware that rushing toward a goal is a sublimated death wish? It's no
coincidence we call them 'deadlines'." -- Tom Robbins
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Jefferson Heard
On Tuesday 13 February 2007 15:59, Duncan Coutts wrote:
> On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
> > Hi, I am running the following code against a 210 MB file in an attempt
> > to determine whether I should use alex or whether, since my needs are
> > very performance oriented, I should write a lexer of my own.  I thought
> > that everything I'd written here was tail-recursive
>
> Isn't that exactly the problem - that it's tail recursive? You do not
> want it to be tail recursive since then it must consume the whole input
> before producing any output. You want it to be as lazy as possible so
> that it can start producing tokens as soon as possible without having to
> consume everything.
>
> If performance is really important to you then you may also want to
> investigate lexing from a lazy ByteString. Alex can now do that (darcs
> version) or you can do it by hand as you're trying now.
>
> Duncan

Argh, bitten by the scheme bug! Right -- NO tail recursion...  So that leaves 
me with some rather non-intuitive strategies for achieving execution time 
efficiency.  Anyone care to point me in the direction of a document on 
efficiency in Haskell?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Duncan Coutts
On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
> Hi, I am running the following code against a 210 MB file in an attempt to 
> determine whether I should use alex or whether, since my needs are very 
> performance oriented, I should write a lexer of my own.  I thought that 
> everything I'd written here was tail-recursive

Isn't that exactly the problem - that it's tail recursive? You do not
want it to be tail recursive since then it must consume the whole input
before producing any output. You want it to be as lazy as possible so
that it can start producing tokens as soon as possible without having to
consume everything.

If performance is really important to you then you may also want to
investigate lexing from a lazy ByteString. Alex can now do that (darcs
version) or you can do it by hand as you're trying now.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Implementation of scaled integers

2007-02-13 Thread Sebastian Sylvan

On 2/13/07, Stefan Heinzmann <[EMAIL PROTECTED]> wrote:

Hi all,

is there a library for Haskell that implements scaled integers, i.e.
integers with a fixed scale factor so that the scale factor does not
need to be stored, but is part of the type?

In particular it would be useful (i.e. for signal processing) to have
numbers based on Int scaled such that they fall into the range [-1.0 ..
1.0). Or other scale factors which are powers of 2. Addition and
subtraction would then map to the ordinary operations for Int, while
Multiplication and Division would have to apply the scale factor to
correct the result of normal Int operations (which would be a shift
operation).

If it doesn't exist yet, have you got ideas how to implement that for
maximum efficiency?



What you're looking for is usually referred to as "fixed point
arithmetic", sadly the term "fixed point" when talking about
functional languages means something entirely different so it's a bit
tricky to google for it! :-)

As for implementation tips, I would recommend that you take a look at
Data.Bits for shifting (assuming that the scaling factor is a power of
two). The only caveat is that when you perform the "(a<< s)/b" and
"(a*b)>>s" operations that you need for multiplication and division,
the inner multiplication/shift needs to have enough precision in its
result value to avoid overflow (i.e. for Int32, you'll need to use a
multiplication which has a 64 bit result type -- probably easiest to
just convert both operands to Int64 before multiplying).

--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Implementation of scaled integers

2007-02-13 Thread Lennart Augustsson

The tricky part to get efficient is multiply and divide.
Say you pick Int32 as the underlying type, when multiplying
you really want the 64 bit result and then scale that.

AFAIK, there are no such primitives exposed to the user.
What you can do is cast to 64 bit, multiply, shift, and cast
back again.  It shouldn't be too bad.

As for having the scale factor be part of the type, that shouldn't
be a problem.

-- Lennart

On Feb 13, 2007, at 19:15 , Stefan Heinzmann wrote:


Hi all,

is there a library for Haskell that implements scaled integers, i.e.
integers with a fixed scale factor so that the scale factor does not
need to be stored, but is part of the type?

In particular it would be useful (i.e. for signal processing) to have
numbers based on Int scaled such that they fall into the range  
[-1.0 ..

1.0). Or other scale factors which are powers of 2. Addition and
subtraction would then map to the ordinary operations for Int, while
Multiplication and Division would have to apply the scale factor to
correct the result of normal Int operations (which would be a shift
operation).

If it doesn't exist yet, have you got ideas how to implement that for
maximum efficiency?

Cheers
Stefan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

2007-02-13 Thread Jefferson Heard
Hi, I am running the following code against a 210 MB file in an attempt to 
determine whether I should use alex or whether, since my needs are very 
performance oriented, I should write a lexer of my own.  I thought that 
everything I'd written here was tail-recursive, but after compiling this with 
GHC 2.4.6, and running it, I eat up 2GB of RAM in less than a second.  So 
far, I have tried token and character oriented Parsec parsers and alex and 
alex is winning by a factor of 2.  I would like to be able to tokenize the 
entirety of a 1TB collection in less than 36 hours on my current machine, 
which is where alex has gotten me so far.  Thanks in advance!

 -- Jeff

---

module Main 
where


import qualified FileReader
import qualified Data.Set as Set

punct = foldl (flip Set.insert) Set.empty "<,>.?/:;\"'{[}]|\\_-+=)
(*&[EMAIL PROTECTED]"

stripTagOrComment [] = []
stripTagOrComment ('>':rest) = rest
stripTagOrCOmment (c:rest) = stripTagOrComment rest

pass1 :: String -> String -> String
pass1 left [] = left
pass1 left ('<':right) = pass1 left (stripTagOrComment right)
pass1 left (' ':right) = pass1 left right
pass1 left (c:right) 
| Set.member c punct = pass1 (' ':c:' ':left) right
| otherwise  = pass1 (c:left) right


pass2 :: [String] -> String -> Char -> String -> [String]
pass2 left word ' ' [] = word:left
pass2 left word c [] = (c:word):left
pass2 left word ' ' (' ':right) = pass2 left word ' ' right
pass2 left word ' ' (c:right) = pass2 (word:left) "" c right
pass2 left word l (c:right) = pass2 left (l:word) c right

tokenize = (pass2 [] "" ' ') . (pass1 [])

main = do
  file <- do FileReader.trecReadFile "trecfile"
  print (tokenize (head (tail file))) 


--  print (length (map (runParser tokenizeDoc [] "") file))
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Implementation of scaled integers

2007-02-13 Thread Stefan Heinzmann
Hi all,

is there a library for Haskell that implements scaled integers, i.e.
integers with a fixed scale factor so that the scale factor does not
need to be stored, but is part of the type?

In particular it would be useful (i.e. for signal processing) to have
numbers based on Int scaled such that they fall into the range [-1.0 ..
1.0). Or other scale factors which are powers of 2. Addition and
subtraction would then map to the ordinary operations for Int, while
Multiplication and Division would have to apply the scale factor to
correct the result of normal Int operations (which would be a shift
operation).

If it doesn't exist yet, have you got ideas how to implement that for
maximum efficiency?

Cheers
Stefan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Calendar Dates before the epoch

2007-02-13 Thread Björn Bringert

jim burton wrote:

It seems that CalendarTime is for dates since the epoch...what do I use to
handle dates before that? Sorry if this is an FAQ, I looked on the wiki and
tried to find MissingH since I thought it might be in there, but don't know
where to find it. I also found this from 2003 -
http://www.arcknowledge.com/gmane.comp.lang.haskell.libraries/2003-11/msg00019.html
- is the code in a library somewhere?

Thanks.


Use the time package (Data.Time.*). time-1.0 is in GHC 6.6 extralibs, 
and available from Hackage 
(http://hackage.haskell.org/cgi-bin/hackage-scripts/package/time-1.0) 
and the development version lives at http://darcs.haskell.org/packages/time/


/Björn
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Calendar Dates before the epoch

2007-02-13 Thread jim burton

It seems that CalendarTime is for dates since the epoch...what do I use to
handle dates before that? Sorry if this is an FAQ, I looked on the wiki and
tried to find MissingH since I thought it might be in there, but don't know
where to find it. I also found this from 2003 -
http://www.arcknowledge.com/gmane.comp.lang.haskell.libraries/2003-11/msg00019.html
- is the code in a library somewhere?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Calendar-Dates-before-the-epoch-tf3221685.html#a8947718
Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Haskell vs Ruby as a scripting language

2007-02-13 Thread Bulat Ziganshin
Hello Neil,

Tuesday, February 13, 2007, 4:43:46 PM, you wrote:

>> Benchmarks please!  Let's see some comparisons on the nofib suite.  If 
>> there's a
>> factor of 2 or less between GHC -O2 and YHC for any of the nofib programs,  
>> I'll
>> eat my keyboard for lunch :-)

> I will try and get some Yhc vs GHC benchmarks at some point. We are
> also working on an optimiser for Yhc, and a C backend, so does the
> keyboard eating offer still encompass that version of Yhc? :)

Simon, how about proposing limited time offer? :)

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Haskell vs Ruby as a scripting language

2007-02-13 Thread Neil Mitchell

Hi Simon,


> Yhc can happily compile code and run it. You'll probably pay a factor
> of 2-8 times slower than GHC, depending on what the code does.

Benchmarks please!  Let's see some comparisons on the nofib suite.  If there's a
factor of 2 or less between GHC -O2 and YHC for any of the nofib programs,  I'll
eat my keyboard for lunch :-)


I remember having some benchmark where Yhc was only twice as slow as
GHC - possibly without -O2 (or even -O). Can't remember what, but I
was slightly surprised to see it.

I will try and get some Yhc vs GHC benchmarks at some point. We are
also working on an optimiser for Yhc, and a C backend, so does the
keyboard eating offer still encompass that version of Yhc? :)

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Network.CGI.Compat.pwrapper

2007-02-13 Thread Bulat Ziganshin
Hello Albert,

Tuesday, February 13, 2007, 1:27:29 AM, you wrote:

> * Or, nevermind performance or privilege. I am a cheapo, and I use a
> cheapo hosting provider, which only provides me with 3MB of storage. My
> program weighs 17MB (recall that it links in the whole GHC :) ).

may be hugs or yhc/nhc can provide you with cheap solution? hugs
interpreter should fit into this storage together with libraries you
need


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: GHC throws IOError on Win32 when there is no console

2007-02-13 Thread Simon Marlow

Krasimir Angelov wrote:

On 2/13/07, Simon Marlow <[EMAIL PROTECTED]> wrote:
Sounds like a good idea.  You need to look at rts/RtsMessages.c, in 
particular
rtsErrorMsgFn(), which currently has cases for GUI and non-GUI.  I 
guess it

really should have 3 cases: GUI, console, and non-GUI.


The trick here is how to find whether the current application is GUI,
console or non-GUI. The distinction between GUI and Console
applications is easy because you can check subsystem OS type. This
doesn't work with Windows services because they can be either GUI or
Console. The OS is preventing them from showing any windows.


So then a program which is running as a service should explicitly set a 
different message handler.  Maybe we could provide a handler that sends messages 
to the system log.


Cheers,
Simon

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Haskell vs Ruby as a scripting language

2007-02-13 Thread Simon Marlow

Neil Mitchell wrote:

Hi


> Also, I recommend looking into embedding YHC. I have not had a
> chance to use
> it yet, but it looks like it is a better fit to an "interpreter-only"
> embedding situation than GHC--with GHC, you are getting a lot more
> than you
> seem to be asking for.

I would want to compile code as well. Compile bits of code <100 lines
at a time and load them back into my app for execution. Does YHC
compile and how efficiently?


Yhc can happily compile code and run it. You'll probably pay a factor
of 2-8 times slower than GHC, depending on what the code does.


Benchmarks please!  Let's see some comparisons on the nofib suite.  If there's a 
factor of 2 or less between GHC -O2 and YHC for any of the nofib programs,  I'll 
eat my keyboard for lunch :-)


Cheers,
Simon

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: GHC throws IOError on Win32 when there is no console

2007-02-13 Thread Krasimir Angelov

On 2/13/07, Simon Marlow <[EMAIL PROTECTED]> wrote:

Sounds like a good idea.  You need to look at rts/RtsMessages.c, in particular
rtsErrorMsgFn(), which currently has cases for GUI and non-GUI.  I guess it
really should have 3 cases: GUI, console, and non-GUI.


The trick here is how to find whether the current application is GUI,
console or non-GUI. The distinction between GUI and Console
applications is easy because you can check subsystem OS type. This
doesn't work with Windows services because they can be either GUI or
Console. The OS is preventing them from showing any windows.

Cheers,
  Krasimir
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: GHC throws IOError on Win32 when there is no console

2007-02-13 Thread Simon Marlow

Duncan Coutts wrote:

On Sat, 2007-02-10 at 23:46 +1100, John Ky wrote:

Hi Duncan,

Thanks for your comments.  In the context of a haskell process running
as a Windows service, a message box is useless, because Haskell
services do not have a GUI and cannot interact with the desktop.


Good point. Perhaps you can persuade the people who look after GHC on
win32 to have it use the Windows debug log service for exception
messages like that when there's no GUI available. Of course if you can
code up and submit such a patch yourself then even better.


Sounds like a good idea.  You need to look at rts/RtsMessages.c, in particular 
rtsErrorMsgFn(), which currently has cases for GUI and non-GUI.  I guess it 
really should have 3 cases: GUI, console, and non-GUI.


Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[4]: [Haskell-cafe] Re: Optimization fun

2007-02-13 Thread Bulat Ziganshin
Hello Matthew,

Tuesday, February 13, 2007, 5:57:22 AM, you wrote:

>> qoutRem# is primitive operation in GHC

> I can see quotRemInteger# and divModInteger#, as well as quotInt#,
> remInt#, divInt# and modInt#, but not quotRemInt# nor divModInt#. For
> example:

you are right. btw, full list of primitive operations stored in
primops.txt.pp inside ghc sources archive

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Summer of Code

2007-02-13 Thread Bulat Ziganshin
Hello Bryan,

Tuesday, February 13, 2007, 2:24:21 AM, you wrote:

> I am wondering if there are any Summer of Code projects that I would
> be able to do for the Haskell community.

of 9 projects started last year, only 1 or 2 was successful. so i
think that retaking one of projects selected last year may be a good
idea

to Donald: afair, you was written some report about this SoC. can you
please explain status quo of projects started last year? is it
possible to restart unsuccessful ones again?

to all: i think, now it's the good time to start thinking about
projects we need. last year one-week schedule was too fast

i'm completely agree with Donald in that last year some participants was
Haskell novices and this paid project was the first program they ever
tried to write in Haskell :) 


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] SIP & SDP parsers

2007-02-13 Thread Neil Davies

Hi

Has anyone out there done any work on parsers for SIP (Session
Initiation Protocol) and/or SDP (Session Description Protocol)?

Thought that I would ask before I embarked on it myself.

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Network.CGI.Compat.pwrapper

2007-02-13 Thread Gracjan Polak
Bjorn Bringert  cs.chalmers.se> writes:
> >
> > Is there a description what is a *CGI* protocol?
> 
> Here you go: http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
> 

I should be more clear: what kind of data does pwrapper expect? Somewhere in the
middle it needs two handles: one to write and one to read which seem to be
equivalent to stdin/stdout. But what about environment? How is it transfered, as
someone ale pointed out pwrapper runs on different machine?

-- 
Gracjan




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Foldr tutorial, Inspired by Getting a Fix from a Fold

2007-02-13 Thread apfelmus
Lennart Augustsson wrote:
 para f e xs = snd $ foldr (\ x ~(xs, y) -> (x:xs, f x xs y)) ([], e) xs
>
> I thought solution one was missing the ~ ?

Yes, that's irrefutably right ;) I mean solution one modulo the laziness
bug.

Regards,
apfelmus

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Network.CGI.Compat.pwrapper

2007-02-13 Thread Bjorn Bringert

On Feb 12, 2007, at 23:27 , Albert Y. C. Lai wrote:


Bjorn Bringert wrote:
pwrapper is not an HTTP server, though the Haddock comment can  
make you think so. pwrapper allows you to talk *CGI* over a TCP  
port, but I have no idea why anyone would like to do that.


Here is a scenerio. I want a basic web application: someone makes a  
request, and my program computes a response.


* For one reason or another, I settle with CGI.

* The program is huge and slow to load. (Let's say it statically  
links in the whole GHC API and therefore is basically GHC  
itself. :) ) It would suck to re-load this program at every request.


By the way, here's an example application which does just that using  
FastCGI: http://csmisc14.cs.chalmers.se/~bjorn/dynhs/examples/wiki/ 
wiki.hs


It uses a dynamically started FastCGI application, which means that  
the web server starts up new processes when needed and keeps a bunch  
of them around to serve future requests.



...


/Björn___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Network.CGI.Compat.pwrapper

2007-02-13 Thread Bjorn Bringert

On Feb 12, 2007, at 23:27 , Albert Y. C. Lai wrote:


Bjorn Bringert wrote:
pwrapper is not an HTTP server, though the Haddock comment can  
make you think so. pwrapper allows you to talk *CGI* over a TCP  
port, but I have no idea why anyone would like to do that.


Here is a scenerio. I want a basic web application: someone makes a  
request, and my program computes a response.


* For one reason or another, I settle with CGI.

* The program is huge and slow to load. (Let's say it statically  
links in the whole GHC API and therefore is basically GHC  
itself. :) ) It would suck to re-load this program at every request.


* Or, the program performs work that requires more file-system  
privilege than the admin of the web server grants. You know, a good  
admin sets up a web server and all CGI scripts to run with nobody's  
privilege.


* Or, nevermind performance or privilege. I am a cheapo, and I use  
a cheapo hosting provider, which only provides me with 3MB of  
storage. My program weighs 17MB (recall that it links in the whole  
GHC :) ).


Here is a solution. The program runs as a daemon and never quits;  
it can run somewhere with sufficient privilege and storage. It  
talks CGI over TCP. At the web server, which is super-slow, super- 
paranoid, and super-cheapo, the CGI script is a lightweight C  
program that redirects everything over TCP to my daemon.


(Here is a counter-solution. The program still runs as a daemon  
somewhere, but it talks my own protocol over TCP. The CGI script is  
a lightweight C program that parses CGI into my own protocol.  
Besides having to design my own protocol carefully, here is a  
problem: C is a great language for writing parsers that are  
incomplete, inconsistent, and insecure. :) )


OK, that sounds reasonable, but I think that there are better  
solutions for those problems. Besides, it's niche stuff that I don't  
think belongs in the main CGI package. If anything wants it, it's  
easy to implement.


* If you have slow start-up, you can use FastCGI instead of CGI.  
There is already  a Haskell library for that.


* If the program needs additional privileges, you can use an external  
FastCGI program which is started independently of the server.


* If you have cheapo hosting, at least Apache mod_fastcgi allows you  
to run your FastCGI app on a different machine. Another solution  
would be mod_rewrite + mod_proxy. But if you have another server  
where you can run your application, why not put your web server there  
instead of using cheapo hosting?


/Björn___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Network.CGI.Compat.pwrapper

2007-02-13 Thread Bjorn Bringert


On Feb 13, 2007, at 9:14 , Gracjan Polak wrote:


Bjorn Bringert  cs.chalmers.se> writes:



Another question is: how do I do equivalent functionality without
pwrapper?


You can roll you own web server if you want something very simple. If
you don't want to do that, there is a version of Simon Marlow's
Haskell Web Server with CGI support [1]. You could also get the
original HWS [2] and merge it with your program. You might also be
interested In HAppS [3].


Haskell Web Server seems to be the closest match. I don't want fully
functional web server. I need more low level thing, as I need to set
this up as a testing environment for some other (browser like)  
application.
So I need a way to trigger (atrificial) errors, like protocol  
errors, garbage

and broken connections.

Thanks for the response.

Is there a description what is a *CGI* protocol?


Here you go: http://hoohoo.ncsa.uiuc.edu/cgi/interface.html

/Björn___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Network.CGI.Compat.pwrapper

2007-02-13 Thread Gracjan Polak
Bjorn Bringert  cs.chalmers.se> writes:
> 
> > Another question is: how do I do equivalent functionality without  
> > pwrapper?
> 
> You can roll you own web server if you want something very simple. If  
> you don't want to do that, there is a version of Simon Marlow's  
> Haskell Web Server with CGI support [1]. You could also get the  
> original HWS [2] and merge it with your program. You might also be  
> interested In HAppS [3].

Haskell Web Server seems to be the closest match. I don't want fully 
functional web server. I need more low level thing, as I need to set 
this up as a testing environment for some other (browser like) application. 
So I need a way to trigger (atrificial) errors, like protocol errors, garbage 
and broken connections.

Thanks for the response.

Is there a description what is a *CGI* protocol? 

-- 
Gracjan


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe