subject:"\[Haskell\-cafe\] Function to detect duplicates"

Re: [Haskell-cafe] Function to detect duplicates

2010-02-24 Thread Casey Hawthorne

On Tue, 23 Feb 2010 08:30:18 -0300, you wrote:

>Hi folks,
>
>While solving a puzzle, I was posed the problem of finding if there was no
>duplicates on a list.

Must it be a list data structure(DS) or list ADT?

Mergesort can be parallelized.

>
>Best regards,
>
>Rafael

If space is at a premium you might want to look at a Bloom Filter.

http://en.wikipedia.org/wiki/Bloom_filter

The Bloom filter, conceived by Burton Howard Bloom in 1970,[1] is a
space-efficient probabilistic data structure that is used to test
whether an element is a member of a set. False positives are possible,
but false negatives are not. Elements can be added to the set, but not
removed (though this can be addressed with a counting filter). The
more elements that are added to the set, the larger the probability of
false positives.

The book "Real World Haskell" has an implementation.
--
Regards,
Casey
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Function to detect duplicates

2010-02-24 Thread Casey Hawthorne

On Tue, 23 Feb 2010 08:30:18 -0300, you wrote:

>Hi folks,
>
>While solving a puzzle, I was posed the problem of finding if there was no
>duplicates on a list.

Must it be a list data structure(DS) or list ADT?

Mergesort can be parallelized.

>
>Best regards,
>
>Rafael

--
Regards,
Casey
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Function to detect duplicates

2010-02-24 Thread Daniel Fischer

Am Mittwoch 24 Februar 2010 21:25:04 schrieb Gwern Branwen:
> 2010/2/23 Jonas Almström Duregård :
> > Hi Rafael,
> >
> > I assume you will perform this operation on some very large lists, or
> > performance would not be an issue. Have you tested if your optimized
> > version is better than your initial one?
> >
> > You should compare your implementation against something like this:
> >
> > import qualified Data.Set as Set
> > noneRepeated :: (Ord a) => [a] -> Bool
> > noneRepeated = accum Set.empty where
> >  accum _ [] = True
> >  accum s (x:xs)
> >    | Set.member x s = False
> >    | otherwise      = accum (Set.insert x s) xs
> >
> > Also there is some discussion about the nub function that relates to
> > this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/.
> >
> > /Jonas
>
> Or better yet,
> http://www.haskell.org/pipermail/libraries/2008-October/010778.html Much
> more thorough and practical w/r/t to actually getting faster nubs in the
> libraries.

Umm,

using the nubOrd' code to nub a 1 million long list of pseudo random 
numbers takes (here) about 2.5 times the time and twice space as the Set-
based ordNub. It does slightly better for 100,000 elements, but still takes 
more than twice the time (and 1.6 x the space).

In my book, that's a compelling reason to go with the set-based 
implementation - unless we're talking about code to include directly in 
Data.List, but then I'd still _use_ the set-based one.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Function to detect duplicates

2010-02-24 Thread Gwern Branwen

2010/2/23 Jonas Almström Duregård :
> Hi Rafael,
>
> I assume you will perform this operation on some very large lists, or
> performance would not be an issue. Have you tested if your optimized
> version is better than your initial one?
>
> You should compare your implementation against something like this:
>
> import qualified Data.Set as Set
> noneRepeated :: (Ord a) => [a] -> Bool
> noneRepeated = accum Set.empty where
>  accum _ [] = True
>  accum s (x:xs)
>    | Set.member x s = False
>    | otherwise      = accum (Set.insert x s) xs
>
> Also there is some discussion about the nub function that relates to
> this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/.
>
> /Jonas

Or better yet, 
http://www.haskell.org/pipermail/libraries/2008-October/010778.html
Much more thorough and practical w/r/t to actually getting faster nubs
in the libraries.

-- 
gwern
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Function to detect duplicates

2010-02-23 Thread Ketil Malde

Rafael Gustavo da Cunha Pereira Pinto 
writes:

> First I used:
>
> noneRepeated=null.(filter (>1)).(map length).group.sort

> But this seemed very unneficient, so I thought that I could detect the
> duplicates while sorting, and devised this:
   [...]
> 1) Is there any improvement I can make?

Well - it's a bit long, don't you think?  Long enough that from a
cursory glance, I'd say it's in the "no obvious errors" category.  How
about (inspired by quicksort, as you no doubt can tell):

   noneRepeated [] = True
   noneRepeated (x:xs) = noneRepeated lt && singleton eq && noneRepeated gt
where lt = filter (x) xs
  eq = x:filter (==x) xs
  singleton [_] = True
  singleton _   = False

> 2) Can it be parallelized (par, pseq)?

You could force each of the sublists in parallel, but you might lose
some laziness properties, so I'd carefully benchmark it.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Function to detect duplicates

2010-02-23 Thread Jonas Almström Duregård

Hi Rafael,

I assume you will perform this operation on some very large lists, or
performance would not be an issue. Have you tested if your optimized
version is better than your initial one?

You should compare your implementation against something like this:

import qualified Data.Set as Set
noneRepeated :: (Ord a) => [a] -> Bool
noneRepeated = accum Set.empty where
  accum _ [] = True
  accum s (x:xs)
| Set.member x s = False
| otherwise  = accum (Set.insert x s) xs

Also there is some discussion about the nub function that relates to
this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/.

/Jonas

On 23 February 2010 12:30, Rafael Gustavo da Cunha Pereira Pinto
 wrote:
>
> Hi folks,
>
> While solving a puzzle, I was posed the problem of finding if there was no
> duplicates on a list.
>
> First I used:
>
> noneRepeated=null.(filter (>1)).(map length).group.sort
>
>
> But this seemed very unneficient, so I thought that I could detect the
> duplicates while sorting, and devised this:
>
> import Control.Monad
> import Data.Maybe
>
> noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs
>
> pairs []=[]
> pairs [x]=[[x]]
> pairs (x:y:xs)=[x,y]:pairs xs
>
> sort []=Just []
> sort [x]=Just [x]
> sort [x,y] | x>y=Just [y,x]
>    | y>x=Just [x,y]
>    | x==y=Nothing
>
> merge::(Eq a, Ord a) => Maybe [a]->Maybe [a]->Maybe[a]
> merge _ Nothing = Nothing
> merge Nothing _ = Nothing
> merge (Just []) (Just xs)=Just xs
> merge (Just xs) (Just [])=Just xs
> merge (Just (x:xs)) (Just (y:ys)) | x==y = Nothing
>   | x>y  = (Just y) +? (merge (Just (x:xs))
> (Just ys))
>   | x (Just (y:ys)))
>
> (+?) = liftM2 (:)
>
>
>
> My version of the merge sort returns Nothing whenever it finds two equal
> entries, aborting all subsequent comparisons.
>
> I have a few questions for the friendly people at this cafe:
>
> 1) Is there any improvement I can make?
> 2) Can it be parallelized (par, pseq)?
>
>
> Best regards,
>
> Rafael
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Function to detect duplicates

2010-02-23 Thread Rafael Gustavo da Cunha Pereira Pinto

Hi folks,

While solving a puzzle, I was posed the problem of finding if there was no
duplicates on a list.

First I used:

noneRepeated=null.(filter (>1)).(map length).group.sort


But this seemed very unneficient, so I thought that I could detect the
duplicates while sorting, and devised this:

import Control.Monad
import Data.Maybe

noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs

pairs []=[]
pairs [x]=[[x]]
pairs (x:y:xs)=[x,y]:pairs xs

sort []=Just []
sort [x]=Just [x]
sort [x,y] | x>y=Just [y,x]
   | y>x=Just [x,y]
   | x==y=Nothing

merge::(Eq a, Ord a) => Maybe [a]->Maybe [a]->Maybe[a]
merge _ Nothing = Nothing
merge Nothing _ = Nothing
merge (Just []) (Just xs)=Just xs
merge (Just xs) (Just [])=Just xs
merge (Just (x:xs)) (Just (y:ys)) | x==y = Nothing
  | x>y  = (Just y) +? (merge (Just (x:xs))
(Just ys))
  | x___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Function to detect duplicates

Re: [Haskell-cafe] Function to detect duplicates

Re: [Haskell-cafe] Function to detect duplicates

Re: [Haskell-cafe] Function to detect duplicates

Re: [Haskell-cafe] Function to detect duplicates

Re: [Haskell-cafe] Function to detect duplicates

[Haskell-cafe] Function to detect duplicates

7 matches

Site Navigation

Mail list logo

Footer information