Re: [darcs-users] darcs patch: configuring author spelling variations w... (and 3 more)

Eric Kow Sun, 08 Feb 2009 14:43:48 -0800

Thanks, Simon,

I've long been hoping that somebody would work on this.


Just some quick superficial comments...

Also, I'm assuming the goal is for list_authors to either disappear
or offload the bulk of its work to darcs show authors?  Why not just
go that route directly?

Thanks!

configuring author spelling variations was complicated, now easier
------------------------------------------------------------------
I see some entries with only one possible spelling.  Would you have to
update this (or the corresponding spellings file) every time somebody
new comes into the project?  If so, I think what would be nicer is if we
just passed along any variant that is not matched by anything.

canonical authors may be defined in an .authorspellings file
------------------------------------------------------------
> Simon Michael <[email protected]>**20090207221321
>  Example:
>  
>  Joe Blogg <[email protected]>
>  -- authors containing [email protected] or [email protected] or matching just "sue" are Sue 
> Bragg
>  Sue Bragg <[email protected]>, [email protected], ^sue$
>  
> ] hunk ./src/list_authors.hs 26
> -import Data.List ( sort, group, isInfixOf )
> -import Data.Char ( toLower )
> +import Data.List ( sort, group, isInfixOf, isPrefixOf )
> +import Data.Char ( toLower, isSpace )
> hunk ./src/list_authors.hs 36
> -          mapM_ putStrLn $ sort_authors use_statistics $ mapRL 
> (pi_author.info)
> -                         $ concatRL darcs_history
> +          spellings <- compiled_spellings
> +          mapM_ putStrLn $ sort_authors use_statistics spellings
> +                         $ mapRL (pi_author.info) $ concatRL darcs_history
> hunk ./src/list_authors.hs 54
> -sort_authors :: Bool -> [String] -> [String]
> -sort_authors use_stats as = reverse $ map shownames $ sort $
> -                            map (\s -> (length s,canonize_author $ head s)) $
> -                            group $ sort as
> +sort_authors :: Bool -> [(String,[Regex])] -> [String] -> [String]
> +sort_authors use_stats spellings as = 
> +    reverse $ map shownames $ sort $
> +    {- group and count again after canonizing -}
> +    map (\s -> (length s,head s)) $ group $ sort $ concat $ map (\(n,a) ->  
> replicate n a) $
> +    map (\s -> (length s,canonize_author spellings $ head s)) $ group $ sort 
> as
> hunk ./src/list_authors.hs 64
> -canonize_author :: String -> String
> -canonize_author a | null author_spellings = a
> -canonize_author a = safehead a $ canonicalsfor a
> +canonize_author :: [(String,[Regex])] -> String -> String
> +canonize_author [] a = a
> +canonize_author spellings a = safehead a $ canonicalsfor a
> hunk ./src/list_authors.hs 69
> -      canonicalsfor s = map fst $ filter (ismatch s) $ compiled_spellings
> +      canonicalsfor s = map fst $ filter (ismatch s) spellings
> hunk ./src/list_authors.hs 72
> -          where email = takeWhile (/= '>') $ tail $ dropWhile (/= '<') 
> canonical
> +          where email = takeWhile (/= '>') $ drop 1 $ dropWhile (/= '<') 
> canonical
> hunk ./src/list_authors.hs 82
> -compiled_spellings :: [(String,[Regex])]
> -compiled_spellings = map compile author_spellings
> +compiled_spellings :: IO [(String,[Regex])]
> +compiled_spellings = do
> +  fs <- author_spellings_from_file
> +  return $ map compile $ fs ++ author_spellings
> hunk ./src/list_authors.hs 92
> --- containing the canonical name and email address optionally followed
> --- by additional regular expression patterns. An author string which
> --- contains the canonical email address or any of the patterns will be
> --- replaced by the canonical form.  All matching is case-insensitive,
> --- to match the whole author string use ^ and $.
> +-- containing the canonical name and email address in angle brackets,
> +-- optionally followed by additional regular expression patterns. An
> +-- author string which contains the canonical email address or any of
> +-- the patterns will be replaced by the canonical form.  All matching
> +-- is case-insensitive. To match the whole author string use ^ and $.
> hunk ./src/list_authors.hs 192
> +-- Canonical author spellings may also be defined in this file, one
> +-- per line. Fields are as above, comma-separated. Blank lines and
> +-- lines beginning with -- are ignored. The file takes precedence over
> +-- the built-in spellings.
> +authorspellingsfile = ".authorspellings"
> +
> +author_spellings_from_file :: IO [[String]]
> +author_spellings_from_file = do
> +  s <- readFile authorspellingsfile `catch` (\e -> return "")
> +  let noncomments = filter (not . ("--" `isPrefixOf`)) $ 
> +                    filter (not . null) $ map strip $ lines s
> +  return $ map (map strip . split_on ',') noncomments
> +
> +split_on :: Eq a => a -> [a] -> [[a]]
> +split_on e l = 
> +    case dropWhile (e==) l of
> +      [] -> []
> +      l' -> first : split_on e rest
> +        where
> +          (first,rest) = break (e==) l'
> +
> +strip :: String -> String
> +strip = dropWhile isSpace . reverse . dropWhile isSpace . reverse
> +
> +
> 

add list_authors-style canonicalizing to the show authors command
-----------------------------------------------------------------
> -           map (\s -> (length s,head s)) $ group $ sort authors
> +           map (\s -> (length s,head s)) $ group $ sort $ concat $ map 
> (\(n,a) ->  replicate n a) $

concatMap and uncurry might be nice here

> +           map (\s -> (length s,canonize_author spellings $ head s)) $ group 
> $ sort authors

> hunk ./src/Darcs/Commands/ShowAuthors.lhs 72

> +      ismatch s (canonical,regexps) =
> +          (not (null email) && (s `contains` email)) || (any (s 
> `contains_regex`) regexps)

Superfluous parentheses

> +          where email = takeWhile (/= '>') $ drop 1 $ dropWhile (/= '<') 
> canonical

> +contains :: String -> String -> Bool
> +a `contains` b = lower b `isInfixOf` (lower a) where lower = map toLower

Superfluous parentheses

> +contains_regex :: String -> Regex -> Bool
> +a `contains_regex` r = case matchRegex r a of
> +                         Just _ -> True
> +                         _ -> False

I think this could be written as:
  a `contains_regex` r = maybe False (const True) matchRegex r a

(Up to you to decide if it's wise to do so)

> +compiled_author_spellings = do
> +  ss <- author_spellings_from_file
> +  return $ map compile $ ss

Perhaps a better formulation:
  map compile `fmap` author_spellings_from_file

> +      compile [] = error "each author spelling should contain at least the 
> canonical form"
> +      compile (canonical:pats) = (canonical, map mkregex pats)
> +      mkregex pat = mkRegexWithOpts pat True False

> +-- Canonical author spellings can be defined in this file, to clean up

Sounds like this should be a haddock comment

> +split_on :: Eq a => a -> [a] -> [[a]]
> +split_on e l =
> +    case dropWhile (e==) l of
> +      [] -> []
> +      l' -> first : split_on e rest
> +        where
> +          (first,rest) = break (e==) l'

We could consider using the split package on hackage (don't know if it's
wise to introduce a dependency just for that)

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9

pgpd7MBXq2JEW.pgp
Description: PGP signature

_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] darcs patch: configuring author spelling variations w... (and 3 more)

Reply via email to