Beginners Digest, Vol 94, Issue 27

beginners-request Tue, 26 Apr 2016 20:50:44 -0700

Send Beginners mailing list submissions to
        beginners@haskell.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        beginners-requ...@haskell.org

You can reach the person managing the list at
        beginners-ow...@haskell.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."

Today's Topics:

   1. Re:  {Disarmed} Re: 'No such file' - ghc Parsec Package Error
      (Oliver Dean)
   2.  approximate string search (Mason Lai)
   3.  Array update and `seq` (Chul-Woong Yang)
   4.  Feedback on Maybe+State+IO code (shane pearman)
   5. Re:  approximate string search (Sylvain Henry)

----------------------------------------------------------------------

Message: 1
Date: Tue, 26 Apr 2016 12:32:14 +0000
From: Oliver Dean <o...@st-andrews.ac.uk>
To: "The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell" <beginners@haskell.org>,
        "Imants Cekusins" <ima...@gmail.com>
Subject: Re: [Haskell-beginners] {Disarmed} Re: 'No such file' - ghc
        Parsec Package Error
Message-ID: <dd81d5a5-0661-4f2d-9d09-dcca57cc5...@st-andrews.ac.uk>
Content-Type: text/plain; charset="utf-8"

Thank you for the information. Will give this a go. Should do the trick 
hopefully.

Ollie

From: Beginners on behalf of Joseph Melfi
Reply-To: The Haskell-Beginners Mailing List - Discussion of primarily 
beginner-level topics related to Haskell
Date: Monday, 25 April 2016 20:29
To: Imants Cekusins, The Haskell-Beginners Mailing List - Discussion of 
primarily beginner-level topics related to Haskell
Subject: {Disarmed} Re: [Haskell-beginners] 'No such file' - ghc Parsec Package 
Error

cabal build can be used to build the binary in bin.

stack can also do the build command to do this as well.

Joseph Melfi

On Mon, Apr 25, 2016 at 11:45 AM Imants Cekusins <Imants Cekusins 
<mailto:Imants%20Cekusins%20<ima...@gmail.com>> > wrote:

Can cabal be used to compile?

_______________________________________________
Beginners mailing list
Beginners@haskell.org<mailto:Beginners@haskell.org>
http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.haskell.org/pipermail/beginners/attachments/20160426/9f9bd5b4/attachment-0001.html>

------------------------------

Message: 2
Date: Tue, 26 Apr 2016 16:17:02 -0700
From: Mason Lai <masonm...@gmail.com>
To: The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell <beginners@haskell.org>
Subject: [Haskell-beginners] approximate string search
Message-ID:
        <CAH1iVpf4s23LGyVv-xYkyDJU4abMgos=6yfB=sfxqmw3ko_...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

I'm writing a Haskell program which is looking for whether or not a string
(needle) exists within a larger string (haystack) given a Levenshtein
distance. Note that it shouldn't calculate the Levenshtein distance between
the needle and haystack. For example,

needle: AAA
haystack: TTTTTAATTTTT

The needle would be "found" if the Levenshtein distance is set at 1,
because dist("AAT", "AAA") == 1.

The following almost works:

distance(needle, haystack) - (len(haystack) - len(needle))

But it does not handle deletions correctly.

I've previously written a program which does this approximate search in
Java with the Wu-Manber extension of the Bitap, or shift-or, algorithm. The
same strategy seems difficult to code up in Haskell because it's very
"stateful" and involves a lot of bit-fiddling. (Maybe it's actually quite
simple, but I'm not sure how I would go about doing this.)

As a further complication, I'd prefer to keep the data packed as
ByteStrings, as I'll be dealing with around 200 GiBs of data (split and
parallelized over a cluster, but still a lot). I don't really know how to
deal with ByteStrings without using the functions that come along in the
ByteString module or unpacking the ByteString.

I've been using the language-spelling package, which has a module which can
calculate the Levenshtein distance between two ByteStrings. I see that it
uses ListLike. I'm not really sure how it works, but I assume it makes
ByteString an instance of ListLike, and then you have access to all of the
ListLike methods?

Does anyone have any advice on how I should proceed with this? I don't mind
learning new things, but I don't really know what the best strategy is or
where to look.

-- Mason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.haskell.org/pipermail/beginners/attachments/20160426/5bda2698/attachment-0001.html>

------------------------------

Message: 3
Date: Wed, 27 Apr 2016 08:53:34 +0900
From: Chul-Woong Yang <cwy...@aranetworks.com>
To: Haskell-Beginners <beginners@haskell.org>
Subject: [Haskell-beginners] Array update and `seq`
Message-ID:
        <CALmycjpcmO_KQLHe1QL5qjc6t=qrns_jiyllzbi30g-rxpw...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi, all

When I fold a list to update Data.Array,
memory usage is very high.
Consider following source, which counts occurence of
each element in a list (1):

import Data.Array
import Data.List
import Control.DeepSeq
go :: [Int] -> [Int]
go = elems . foldl' update (array (0,99) [(i,0) | i <- [0..99]])
  where update acc x = acc // [(x, acc ! x + 1)]
main = putStrLn . unwords . map show . go . concat .
       replicate 5000 $ [0..99]

Above program uses about 350MB at heap.
Memory usage is same if  I try to force strictness in array update
with `seq` (2) :

  where update acc x = let v = acc ! x + 1
                           a' = acc // [(x,v `seq` v)]
                       in a' `seq` a'

However, when I use `deepseq`, memory usage is gone
(around 35Kbyte) (3):

  where update acc x = let v = acc ! x + 1
                           a' = acc // [(x,v `seq` v)]
                       in a' `deepseq` a'

What's the missing part in (2)? At (2), evaluation of
updated array a' is forced and the value of array cell
is also evaluated forcefully with `seq`.

Any help would be appreciated deeply.
--
Regards,
Chul-Woong Yang

------------------------------

Message: 4
Date: Tue, 26 Apr 2016 17:57:43 -0700
From: shane pearman <pearman...@gmail.com>
To: beginners@haskell.org
Subject: [Haskell-beginners] Feedback on Maybe+State+IO code
Message-ID:
        <cacyb-gjmvyd3lqizs5q5tzqtx-_n9v0p+zrfbf8kq3fgk39...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I just read the M.P. Jones paper on the topic of monad transformers and
have been playing a bit with some simple examples combining `Maybe` and
`State` with `IO`: http://lpaste.net/6702944292505124864

I have the same "stuff" function defined 3 ways: two recursive methods, one
of which takes the "maybe" function as an argument which is used to break
the recursion, and a non-recursive method which is 'looped' using
`iterateM_`.

The functionality is that an integer is given as initial `State` and it is
decremented until 0 is reached, printing each iteration and breaking by the
result of the "maybe" function. I also wanted to ensure that no negative
integer could enter into the `State` so the same "maybe" function is used
in main to restrict entry into the "stuff" function, either using `for_`
over the result or injecting the `Maybe` before the "stuff" function.

The iterative calls are a little bit harder to read but the second (2)
recursive call is fairly concise and readable.

Basically I'm just looking for any suggestions if anything looks out of
place or can be refined before I go on to do more involved error handling
or logging.

* There are also a couple helper functions at the bottom below "main" that
I ended up not using but was wondering if they are defined somewhere.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.haskell.org/pipermail/beginners/attachments/20160426/4d1da797/attachment-0001.html>

------------------------------

Message: 5
Date: Wed, 27 Apr 2016 05:50:04 +0200
From: Sylvain Henry <sylv...@haskus.fr>
To: beginners@haskell.org
Subject: Re: [Haskell-beginners] approximate string search
Message-ID: <5adfa2d6-a00b-ddf9-aa0a-9517bcd8b...@haskus.fr>
Content-Type: text/plain; charset="windows-1252"; Format="flowed"

Hi,

The ListLike instance for ByteString is defined here: 
https://hackage.haskell.org/package/ListLike-3.1.7.1/docs/src/Data-ListLike-Instances.html

The implementation of Levenshtein distance in language-spelling is here: 
https://hackage.haskell.org/package/language-spelling-0.3.2/docs/src/Language-Distance.html#levenshtein

It seems to use only O(1) operations on the Bytestring: length and index 
(!!).
It uses an unboxed mutable vector in the ST monad.
It is specialized for ByteString.

So I think it should be quite efficient already. If you want you can use 
the same approach to implement the other algorithm, with Data.Bits for 
the bit-fiddling: 
https://hackage.haskell.org/package/base-4.8.2.0/docs/Data-Bits.html

Cheers,
Sylvain

On 27/04/2016 01:17, Mason Lai wrote:
> Hi,
>
> I'm writing a Haskell program which is looking for whether or not a 
> string (needle) exists within a larger string (haystack) given a 
> Levenshtein distance. Note that it shouldn't calculate the Levenshtein 
> distance between the needle and haystack. For example,
>
> needle: AAA
> haystack: TTTTTAATTTTT
>
> The needle would be "found" if the Levenshtein distance is set at 1, 
> because dist("AAT", "AAA") == 1.
>
> The following almost works:
>
> distance(needle, haystack) - (len(haystack) - len(needle))
>
> But it does not handle deletions correctly.
>
> I've previously written a program which does this approximate search 
> in Java with the Wu-Manber extension of the Bitap, or shift-or, 
> algorithm. The same strategy seems difficult to code up in Haskell 
> because it's very "stateful" and involves a lot of bit-fiddling. 
> (Maybe it's actually quite simple, but I'm not sure how I would go 
> about doing this.)
>
> As a further complication, I'd prefer to keep the data packed as 
> ByteStrings, as I'll be dealing with around 200 GiBs of data (split 
> and parallelized over a cluster, but still a lot). I don't really know 
> how to deal with ByteStrings without using the functions that come 
> along in the ByteString module or unpacking the ByteString.
>
> I've been using the language-spelling package, which has a module 
> which can calculate the Levenshtein distance between two ByteStrings. 
> I see that it uses ListLike. I'm not really sure how it works, but I 
> assume it makes ByteString an instance of ListLike, and then you have 
> access to all of the ListLike methods?
>
> Does anyone have any advice on how I should proceed with this? I don't 
> mind learning new things, but I don't really know what the best 
> strategy is or where to look.
>
> -- Mason
>
>
> _______________________________________________
> Beginners mailing list
> Beginners@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.haskell.org/pipermail/beginners/attachments/20160427/aec95706/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

------------------------------

End of Beginners Digest, Vol 94, Issue 27
*****************************************

Beginners Digest, Vol 94, Issue 27

Reply via email to