Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-19 Thread Henning Thielemann

On 02.07.2011 01:26, Gwern Branwen wrote:


Another thing you can do along the same lines is generate a script to
download all the repos from packages which declare repos. Some ugly
code:


If 'script' also includes Haskell code, then the 'tar' package could be 
of help to walk throught the TAR archive.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-09 Thread Gwern Branwen
On Fri, Jul 1, 2011 at 5:37 PM, Gwern Branwen  wrote:
> Looking at it, the index tarball contains the .cabal files for all
> versions known to Hackage, which isn't necessarily the interesting set
> of cabal files - I'm usually more interested in just the cabal files
> of the latest version of every package. No doubt there's a scripting
> solution (loop over the untarred directory of packages, and take the
> lexically last cabal file?), but it was easier to just exploit cabal
> fetch's behavior of fetching only the latest version and work with
> those tarballs.

The version using just the index tarball is kind of ugly; the
filtering and extracting doesn't seem terribly easy, so the best
script I could come up with was:

cd ~/.cabal/packages/hackage.haskell.org && for DIR in */; do (for
CABAL in `tar --wildcards "$DIR" -tf 00-index.tar|head -1`; do (tar
-Oxf 00-index.tar $CABAL | runhaskell ~/deps.hs); done); done

(Parentheses aren't necessary but make it more readable.)

-- 
gwern
http://www.gwern.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-01 Thread Gwern Branwen
Another thing you can do along the same lines is generate a script to
download all the repos from packages which declare repos. Some ugly
code:

import Data.Maybe (fromJust)
import Distribution.PackageDescription
import Distribution.PackageDescription.Parse
import Control.Monad (unless)

main :: IO ()
main = do cbl <- getContents
  let desc = parsePackageDescription cbl
  case desc of
ParseFailed _ -> return ()
ParseOk _ d -> do let repos = repoPair $ extractHead $
extractRepos d
  let cmd = concatMap shellify repos
  unless (null cmd) $ putStrLn cmd

shellify :: (RepoType, String) -> String
shellify (rt,url) = case rt of
   Darcs -> "darcs get " ++ url
   Git -> "git clone " ++ url
   SVN -> "svn clone " ++ url
   CVS -> "cvs co " ++ url
   Mercurial -> "hg clone " ++ url
   _ -> ""

repoPair :: [SourceRepo] -> [(RepoType, String)]
repoPair = map (\x -> (fromJust $ repoType x, fromJust $ repoLocation x))

extractHead :: [SourceRepo] -> [SourceRepo]
extractHead rs = filter (\x -> isnothing x && ishead x) rs
where ishead sr = case repoKind sr of
RepoHead -> True
_ -> False
  isnothing ss = case repoType ss of
   Nothing -> False
   Just _ -> case repoLocation ss of
 Nothing -> False
 Just _ -> True

extractRepos :: GenericPackageDescription -> [SourceRepo]
extractRepos = sourceRepos . packageDescription

This generates results (with the same find command and setup as
previously) like:

...
git clone git://gitorious.org/maximus/mandulia.git
darcs get http://darcs.cielonegro.org/HsOpenSSL/
darcs get http://darcs.cielonegro.org/HsOpenSSL/
hg clone https://bitbucket.org/bos/text-icugit clone
https://github.com/bos/text-icu
darcs get http://code.haskell.org/Graphalyze
darcs get http://code.haskell.org/~roelvandijk/code/base-unicode-symbols
git clone git://github.com/roelvandijk/base-unicode-symbols.git
darcs get http://code.haskell.org/~basvandijk/code/regions
git clone https://github.com/skogsbaer/xmlgen
git clone git://github.com/tanakh/HongoDB.git
darcs get http://repos.mornfall.net/shellish
darcs get http://patch-tag.com/r/Saizan/syb-with-class/
git clone git://github.com/ekmett/eq.git
git clone git://github.com/ekmett/data-lens-fd.git
git clone git://github.com/ekmett/streams.git
git clone git://github.com/alanz/hjsmin.git
darcs get http://patch-tag.com/r/byorgey/diagrams-lib
...

--
gwern
http://www.gwern.net/haskell/Archiving%20GitHub

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-01 Thread Gwern Branwen
On Fri, Jul 1, 2011 at 5:23 PM, Rogan Creswick  wrote:
>
> I think the index tarball has all the info you need, and would be
> faster to retrieve / process, if you or anyone else needs to get the
> .cabal files again:
>
> http://hackage.haskell.org/packages/archive/00-index.tar.gz (2.2mb)

Looking at it, the index tarball contains the .cabal files for all
versions known to Hackage, which isn't necessarily the interesting set
of cabal files - I'm usually more interested in just the cabal files
of the latest version of every package. No doubt there's a scripting
solution (loop over the untarred directory of packages, and take the
lexically last cabal file?), but it was easier to just exploit cabal
fetch's behavior of fetching only the latest version and work with
those tarballs.

-- 
gwern
http://www.gwern.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-01 Thread Rogan Creswick
On Fri, Jul 1, 2011 at 1:43 PM, Gwern Branwen  wrote:
> Athas on #haskell wondered how many dependencies the average Haskell
> package had. I commented that it seemed like some fairly simple
> scripting to find out, and as these things tend to go, I wound up
> doing a complete solution myself.
>
> First, we get most/all of Hackage locally to examine, as tarballs:
>
>    for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal
> fetch $package; done

I think the index tarball has all the info you need, and would be
faster to retrieve / process, if you or anyone else needs to get the
.cabal files again:

http://hackage.haskell.org/packages/archive/00-index.tar.gz (2.2mb)

The set of the latest package sdists is also available:

http://hackage.haskell.org/cgi-bin/hackage-scripts/archive.tar (~150mb)

--Rogan

> Then we cd .cabal/packages/hackage.haskell.org
>
> Now we can run a command which extracts the .cabal file from each
> tarball to standard output:
>
>    find . -name "*.tar.gz" -exec tar --wildcards "*.cabal" -Oxf {} \;
>
> We could grep for 'build-depends' or something, but that gives
> unreliable dirty results. (>80k items, resulting in a hard to believe
> 87k total deps and an average of 27 deps.) So instead, we use the
> Cabal library and write a program to parse Cabal files & spit out the
> dependencies, and we feed each .cabal into that:
>
>    find . -name "*.tar.gz" -exec sh -c 'tar --wildcards "*.cabal"
> -Oxf {} | runhaskell ~/deps.hs' \;
>
> And what is deps.hs? Turns out to be surprisingly easy to parse a
> String, extract the Library and Executable AST, and grab the
> [Dependency] field, and then print it out (code is not particularly
> clean):
>
> import Distribution.Package
> import Distribution.PackageDescription
> import Distribution.PackageDescription.Parse
> main :: IO ()
> main = do cbl <- getContents
>          let desc = parsePackageDescription cbl
>          case desc of
>            ParseFailed _ -> return ()
>            ParseOk _ d -> putStr $ unlines $ map show $ map
> (\(Dependency x _) -> x) $ extractDeps d
> extractDeps :: GenericPackageDescription -> [Dependency]
> extractDeps d = ldeps ++ edeps
>  where ldeps = case (condLibrary d) of
>                Nothing -> []
>                Just c -> condTreeConstraints c
>        edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d
>
> So what are the results? (The output of one run is attached.) I get
> 18,134 dependencies, having run on 3,137 files, or 5.8 dependencies
> per package.
>
> --
> gwern
> http://www.gwern.net
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-01 Thread Gwern Branwen
On Fri, Jul 1, 2011 at 4:49 PM, L Corbijn  wrote:
> Is this including or exluding 'or'-ed dependency lists like
> http://hackage.haskell.org/package/hugs2yc ?

Excluding, it seems. When I run the script on that tarball:

$ tar --wildcards "*.cabal" -Oxf `find . -name "*.tar.gz" | g hugs2yc`
| runhaskell /home/gwern/deps.hs
PackageName "mtl"
PackageName "uniplate"
PackageName "yhccore"
PackageName "ycextra"
PackageName "parsec"
PackageName "directory"
PackageName "filepath"

No version of base or containers appears. (mtl appears in both
branches and also the general build-depends list.)

-- 
gwern
http://www.gwern.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-01 Thread Gwern Branwen
Athas on #haskell wondered how many dependencies the average Haskell
package had. I commented that it seemed like some fairly simple
scripting to find out, and as these things tend to go, I wound up
doing a complete solution myself.

First, we get most/all of Hackage locally to examine, as tarballs:

for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal
fetch $package; done

Then we cd .cabal/packages/hackage.haskell.org

Now we can run a command which extracts the .cabal file from each
tarball to standard output:

find . -name "*.tar.gz" -exec tar --wildcards "*.cabal" -Oxf {} \;

We could grep for 'build-depends' or something, but that gives
unreliable dirty results. (>80k items, resulting in a hard to believe
87k total deps and an average of 27 deps.) So instead, we use the
Cabal library and write a program to parse Cabal files & spit out the
dependencies, and we feed each .cabal into that:

find . -name "*.tar.gz" -exec sh -c 'tar --wildcards "*.cabal"
-Oxf {} | runhaskell ~/deps.hs' \;

And what is deps.hs? Turns out to be surprisingly easy to parse a
String, extract the Library and Executable AST, and grab the
[Dependency] field, and then print it out (code is not particularly
clean):

import Distribution.Package
import Distribution.PackageDescription
import Distribution.PackageDescription.Parse
main :: IO ()
main = do cbl <- getContents
  let desc = parsePackageDescription cbl
  case desc of
ParseFailed _ -> return ()
ParseOk _ d -> putStr $ unlines $ map show $ map
(\(Dependency x _) -> x) $ extractDeps d
extractDeps :: GenericPackageDescription -> [Dependency]
extractDeps d = ldeps ++ edeps
  where ldeps = case (condLibrary d) of
Nothing -> []
Just c -> condTreeConstraints c
edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d

So what are the results? (The output of one run is attached.) I get
18,134 dependencies, having run on 3,137 files, or 5.8 dependencies
per package.

-- 
gwern
http://www.gwern.net


deps.txt.gz
Description: GNU Zip compressed data
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe