Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies
On 02.07.2011 01:26, Gwern Branwen wrote: Another thing you can do along the same lines is generate a script to download all the repos from packages which declare repos. Some ugly code: If 'script' also includes Haskell code, then the 'tar' package could be of help to walk throught the TAR archive. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies
On Fri, Jul 1, 2011 at 5:37 PM, Gwern Branwen wrote: > Looking at it, the index tarball contains the .cabal files for all > versions known to Hackage, which isn't necessarily the interesting set > of cabal files - I'm usually more interested in just the cabal files > of the latest version of every package. No doubt there's a scripting > solution (loop over the untarred directory of packages, and take the > lexically last cabal file?), but it was easier to just exploit cabal > fetch's behavior of fetching only the latest version and work with > those tarballs. The version using just the index tarball is kind of ugly; the filtering and extracting doesn't seem terribly easy, so the best script I could come up with was: cd ~/.cabal/packages/hackage.haskell.org && for DIR in */; do (for CABAL in `tar --wildcards "$DIR" -tf 00-index.tar|head -1`; do (tar -Oxf 00-index.tar $CABAL | runhaskell ~/deps.hs); done); done (Parentheses aren't necessary but make it more readable.) -- gwern http://www.gwern.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies
Another thing you can do along the same lines is generate a script to download all the repos from packages which declare repos. Some ugly code: import Data.Maybe (fromJust) import Distribution.PackageDescription import Distribution.PackageDescription.Parse import Control.Monad (unless) main :: IO () main = do cbl <- getContents let desc = parsePackageDescription cbl case desc of ParseFailed _ -> return () ParseOk _ d -> do let repos = repoPair $ extractHead $ extractRepos d let cmd = concatMap shellify repos unless (null cmd) $ putStrLn cmd shellify :: (RepoType, String) -> String shellify (rt,url) = case rt of Darcs -> "darcs get " ++ url Git -> "git clone " ++ url SVN -> "svn clone " ++ url CVS -> "cvs co " ++ url Mercurial -> "hg clone " ++ url _ -> "" repoPair :: [SourceRepo] -> [(RepoType, String)] repoPair = map (\x -> (fromJust $ repoType x, fromJust $ repoLocation x)) extractHead :: [SourceRepo] -> [SourceRepo] extractHead rs = filter (\x -> isnothing x && ishead x) rs where ishead sr = case repoKind sr of RepoHead -> True _ -> False isnothing ss = case repoType ss of Nothing -> False Just _ -> case repoLocation ss of Nothing -> False Just _ -> True extractRepos :: GenericPackageDescription -> [SourceRepo] extractRepos = sourceRepos . packageDescription This generates results (with the same find command and setup as previously) like: ... git clone git://gitorious.org/maximus/mandulia.git darcs get http://darcs.cielonegro.org/HsOpenSSL/ darcs get http://darcs.cielonegro.org/HsOpenSSL/ hg clone https://bitbucket.org/bos/text-icugit clone https://github.com/bos/text-icu darcs get http://code.haskell.org/Graphalyze darcs get http://code.haskell.org/~roelvandijk/code/base-unicode-symbols git clone git://github.com/roelvandijk/base-unicode-symbols.git darcs get http://code.haskell.org/~basvandijk/code/regions git clone https://github.com/skogsbaer/xmlgen git clone git://github.com/tanakh/HongoDB.git darcs get http://repos.mornfall.net/shellish darcs get http://patch-tag.com/r/Saizan/syb-with-class/ git clone git://github.com/ekmett/eq.git git clone git://github.com/ekmett/data-lens-fd.git git clone git://github.com/ekmett/streams.git git clone git://github.com/alanz/hjsmin.git darcs get http://patch-tag.com/r/byorgey/diagrams-lib ... -- gwern http://www.gwern.net/haskell/Archiving%20GitHub ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies
On Fri, Jul 1, 2011 at 5:23 PM, Rogan Creswick wrote: > > I think the index tarball has all the info you need, and would be > faster to retrieve / process, if you or anyone else needs to get the > .cabal files again: > > http://hackage.haskell.org/packages/archive/00-index.tar.gz (2.2mb) Looking at it, the index tarball contains the .cabal files for all versions known to Hackage, which isn't necessarily the interesting set of cabal files - I'm usually more interested in just the cabal files of the latest version of every package. No doubt there's a scripting solution (loop over the untarred directory of packages, and take the lexically last cabal file?), but it was easier to just exploit cabal fetch's behavior of fetching only the latest version and work with those tarballs. -- gwern http://www.gwern.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies
On Fri, Jul 1, 2011 at 1:43 PM, Gwern Branwen wrote: > Athas on #haskell wondered how many dependencies the average Haskell > package had. I commented that it seemed like some fairly simple > scripting to find out, and as these things tend to go, I wound up > doing a complete solution myself. > > First, we get most/all of Hackage locally to examine, as tarballs: > > for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal > fetch $package; done I think the index tarball has all the info you need, and would be faster to retrieve / process, if you or anyone else needs to get the .cabal files again: http://hackage.haskell.org/packages/archive/00-index.tar.gz (2.2mb) The set of the latest package sdists is also available: http://hackage.haskell.org/cgi-bin/hackage-scripts/archive.tar (~150mb) --Rogan > Then we cd .cabal/packages/hackage.haskell.org > > Now we can run a command which extracts the .cabal file from each > tarball to standard output: > > find . -name "*.tar.gz" -exec tar --wildcards "*.cabal" -Oxf {} \; > > We could grep for 'build-depends' or something, but that gives > unreliable dirty results. (>80k items, resulting in a hard to believe > 87k total deps and an average of 27 deps.) So instead, we use the > Cabal library and write a program to parse Cabal files & spit out the > dependencies, and we feed each .cabal into that: > > find . -name "*.tar.gz" -exec sh -c 'tar --wildcards "*.cabal" > -Oxf {} | runhaskell ~/deps.hs' \; > > And what is deps.hs? Turns out to be surprisingly easy to parse a > String, extract the Library and Executable AST, and grab the > [Dependency] field, and then print it out (code is not particularly > clean): > > import Distribution.Package > import Distribution.PackageDescription > import Distribution.PackageDescription.Parse > main :: IO () > main = do cbl <- getContents > let desc = parsePackageDescription cbl > case desc of > ParseFailed _ -> return () > ParseOk _ d -> putStr $ unlines $ map show $ map > (\(Dependency x _) -> x) $ extractDeps d > extractDeps :: GenericPackageDescription -> [Dependency] > extractDeps d = ldeps ++ edeps > where ldeps = case (condLibrary d) of > Nothing -> [] > Just c -> condTreeConstraints c > edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d > > So what are the results? (The output of one run is attached.) I get > 18,134 dependencies, having run on 3,137 files, or 5.8 dependencies > per package. > > -- > gwern > http://www.gwern.net > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies
On Fri, Jul 1, 2011 at 4:49 PM, L Corbijn wrote: > Is this including or exluding 'or'-ed dependency lists like > http://hackage.haskell.org/package/hugs2yc ? Excluding, it seems. When I run the script on that tarball: $ tar --wildcards "*.cabal" -Oxf `find . -name "*.tar.gz" | g hugs2yc` | runhaskell /home/gwern/deps.hs PackageName "mtl" PackageName "uniplate" PackageName "yhccore" PackageName "ycextra" PackageName "parsec" PackageName "directory" PackageName "filepath" No version of base or containers appears. (mtl appears in both branches and also the general build-depends list.) -- gwern http://www.gwern.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Parsing cabal files to calculate average number of dependencies
Athas on #haskell wondered how many dependencies the average Haskell package had. I commented that it seemed like some fairly simple scripting to find out, and as these things tend to go, I wound up doing a complete solution myself. First, we get most/all of Hackage locally to examine, as tarballs: for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal fetch $package; done Then we cd .cabal/packages/hackage.haskell.org Now we can run a command which extracts the .cabal file from each tarball to standard output: find . -name "*.tar.gz" -exec tar --wildcards "*.cabal" -Oxf {} \; We could grep for 'build-depends' or something, but that gives unreliable dirty results. (>80k items, resulting in a hard to believe 87k total deps and an average of 27 deps.) So instead, we use the Cabal library and write a program to parse Cabal files & spit out the dependencies, and we feed each .cabal into that: find . -name "*.tar.gz" -exec sh -c 'tar --wildcards "*.cabal" -Oxf {} | runhaskell ~/deps.hs' \; And what is deps.hs? Turns out to be surprisingly easy to parse a String, extract the Library and Executable AST, and grab the [Dependency] field, and then print it out (code is not particularly clean): import Distribution.Package import Distribution.PackageDescription import Distribution.PackageDescription.Parse main :: IO () main = do cbl <- getContents let desc = parsePackageDescription cbl case desc of ParseFailed _ -> return () ParseOk _ d -> putStr $ unlines $ map show $ map (\(Dependency x _) -> x) $ extractDeps d extractDeps :: GenericPackageDescription -> [Dependency] extractDeps d = ldeps ++ edeps where ldeps = case (condLibrary d) of Nothing -> [] Just c -> condTreeConstraints c edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d So what are the results? (The output of one run is attached.) I get 18,134 dependencies, having run on 3,137 files, or 5.8 dependencies per package. -- gwern http://www.gwern.net deps.txt.gz Description: GNU Zip compressed data ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe