On Thu, May 16, 2013 at 3:46 PM, David Wilson <[email protected]> wrote: > Would something like http://pypi.h1.botanicus.net/static/dump.txt.gz be > useful to you? (warning: 57mb expanding to 540mb). Each line is a > JSON-encoded dict containing a single package release. > > for line in gzip.open('dump.txt.gz'): > dct = json.loads(line) > .... > > etc > > The code for it is very simple, would be willing to clean it up and turn it > into a cron job if people found it useful. > > Note the dump above is outdated, I only made it as a test.
Seems like a useful format. https://bitbucket.org/dholth/pypi_stats is a prototype that parses requires.txt and other metadata out of all the sdists in a folder, putting them into a sqlite3 database. It may be interesting for experimentation. For example, I can easily tell you how many different version numbers there are and which are the most popular, or I can tell you which metadata keys and version numbers have been used. The database winds up being 1.6 GB or about 200MB if you delete the unparsed files. _______________________________________________ Distutils-SIG maillist - [email protected] http://mail.python.org/mailman/listinfo/distutils-sig
