Re: [Haskell-cafe] Function to detect duplicates
On Tue, 23 Feb 2010 08:30:18 -0300, you wrote: >Hi folks, > >While solving a puzzle, I was posed the problem of finding if there was no >duplicates on a list. Must it be a list data structure(DS) or list ADT? Mergesort can be parallelized. > >Best regards, > >Rafael If space is at a premium you might want to look at a Bloom Filter. http://en.wikipedia.org/wiki/Bloom_filter The Bloom filter, conceived by Burton Howard Bloom in 1970,[1] is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Elements can be added to the set, but not removed (though this can be addressed with a counting filter). The more elements that are added to the set, the larger the probability of false positives. The book "Real World Haskell" has an implementation. -- Regards, Casey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Function to detect duplicates
On Tue, 23 Feb 2010 08:30:18 -0300, you wrote: >Hi folks, > >While solving a puzzle, I was posed the problem of finding if there was no >duplicates on a list. Must it be a list data structure(DS) or list ADT? Mergesort can be parallelized. > >Best regards, > >Rafael -- Regards, Casey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Function to detect duplicates
Am Mittwoch 24 Februar 2010 21:25:04 schrieb Gwern Branwen: > 2010/2/23 Jonas Almström Duregård : > > Hi Rafael, > > > > I assume you will perform this operation on some very large lists, or > > performance would not be an issue. Have you tested if your optimized > > version is better than your initial one? > > > > You should compare your implementation against something like this: > > > > import qualified Data.Set as Set > > noneRepeated :: (Ord a) => [a] -> Bool > > noneRepeated = accum Set.empty where > > accum _ [] = True > > accum s (x:xs) > > | Set.member x s = False > > | otherwise = accum (Set.insert x s) xs > > > > Also there is some discussion about the nub function that relates to > > this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/. > > > > /Jonas > > Or better yet, > http://www.haskell.org/pipermail/libraries/2008-October/010778.html Much > more thorough and practical w/r/t to actually getting faster nubs in the > libraries. Umm, using the nubOrd' code to nub a 1 million long list of pseudo random numbers takes (here) about 2.5 times the time and twice space as the Set- based ordNub. It does slightly better for 100,000 elements, but still takes more than twice the time (and 1.6 x the space). In my book, that's a compelling reason to go with the set-based implementation - unless we're talking about code to include directly in Data.List, but then I'd still _use_ the set-based one. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Function to detect duplicates
2010/2/23 Jonas Almström Duregård : > Hi Rafael, > > I assume you will perform this operation on some very large lists, or > performance would not be an issue. Have you tested if your optimized > version is better than your initial one? > > You should compare your implementation against something like this: > > import qualified Data.Set as Set > noneRepeated :: (Ord a) => [a] -> Bool > noneRepeated = accum Set.empty where > accum _ [] = True > accum s (x:xs) > | Set.member x s = False > | otherwise = accum (Set.insert x s) xs > > Also there is some discussion about the nub function that relates to > this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/. > > /Jonas Or better yet, http://www.haskell.org/pipermail/libraries/2008-October/010778.html Much more thorough and practical w/r/t to actually getting faster nubs in the libraries. -- gwern ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Function to detect duplicates
Rafael Gustavo da Cunha Pereira Pinto writes: > First I used: > > noneRepeated=null.(filter (>1)).(map length).group.sort > But this seemed very unneficient, so I thought that I could detect the > duplicates while sorting, and devised this: [...] > 1) Is there any improvement I can make? Well - it's a bit long, don't you think? Long enough that from a cursory glance, I'd say it's in the "no obvious errors" category. How about (inspired by quicksort, as you no doubt can tell): noneRepeated [] = True noneRepeated (x:xs) = noneRepeated lt && singleton eq && noneRepeated gt where lt = filter (x) xs eq = x:filter (==x) xs singleton [_] = True singleton _ = False > 2) Can it be parallelized (par, pseq)? You could force each of the sublists in parallel, but you might lose some laziness properties, so I'd carefully benchmark it. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Function to detect duplicates
Hi Rafael, I assume you will perform this operation on some very large lists, or performance would not be an issue. Have you tested if your optimized version is better than your initial one? You should compare your implementation against something like this: import qualified Data.Set as Set noneRepeated :: (Ord a) => [a] -> Bool noneRepeated = accum Set.empty where accum _ [] = True accum s (x:xs) | Set.member x s = False | otherwise = accum (Set.insert x s) xs Also there is some discussion about the nub function that relates to this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/. /Jonas On 23 February 2010 12:30, Rafael Gustavo da Cunha Pereira Pinto wrote: > > Hi folks, > > While solving a puzzle, I was posed the problem of finding if there was no > duplicates on a list. > > First I used: > > noneRepeated=null.(filter (>1)).(map length).group.sort > > > But this seemed very unneficient, so I thought that I could detect the > duplicates while sorting, and devised this: > > import Control.Monad > import Data.Maybe > > noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs > > pairs []=[] > pairs [x]=[[x]] > pairs (x:y:xs)=[x,y]:pairs xs > > sort []=Just [] > sort [x]=Just [x] > sort [x,y] | x>y=Just [y,x] > | y>x=Just [x,y] > | x==y=Nothing > > merge::(Eq a, Ord a) => Maybe [a]->Maybe [a]->Maybe[a] > merge _ Nothing = Nothing > merge Nothing _ = Nothing > merge (Just []) (Just xs)=Just xs > merge (Just xs) (Just [])=Just xs > merge (Just (x:xs)) (Just (y:ys)) | x==y = Nothing > | x>y = (Just y) +? (merge (Just (x:xs)) > (Just ys)) > | x (Just (y:ys))) > > (+?) = liftM2 (:) > > > > My version of the merge sort returns Nothing whenever it finds two equal > entries, aborting all subsequent comparisons. > > I have a few questions for the friendly people at this cafe: > > 1) Is there any improvement I can make? > 2) Can it be parallelized (par, pseq)? > > > Best regards, > > Rafael > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Function to detect duplicates
Hi folks, While solving a puzzle, I was posed the problem of finding if there was no duplicates on a list. First I used: noneRepeated=null.(filter (>1)).(map length).group.sort But this seemed very unneficient, so I thought that I could detect the duplicates while sorting, and devised this: import Control.Monad import Data.Maybe noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs pairs []=[] pairs [x]=[[x]] pairs (x:y:xs)=[x,y]:pairs xs sort []=Just [] sort [x]=Just [x] sort [x,y] | x>y=Just [y,x] | y>x=Just [x,y] | x==y=Nothing merge::(Eq a, Ord a) => Maybe [a]->Maybe [a]->Maybe[a] merge _ Nothing = Nothing merge Nothing _ = Nothing merge (Just []) (Just xs)=Just xs merge (Just xs) (Just [])=Just xs merge (Just (x:xs)) (Just (y:ys)) | x==y = Nothing | x>y = (Just y) +? (merge (Just (x:xs)) (Just ys)) | x___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe