I had some fun parsing and plotting the data (very simple, just the top packages for now). See here: https://github.com/lkraider/requirements-dataset/blob/master/index.ipynb
Let me know if you would accept a pull request so others can use that as a starting point. att, -- Paul Eipper On Wed, Mar 8, 2017 at 1:36 PM, Nick Timkovich <[email protected]> wrote: > Looks like a fun chunk of data, what's the query you used? Can you add a > README to the repo with some description if others want to iterate on it > (maybe look into setup.py's?) > > Nick > > On Tue, Mar 7, 2017 at 5:06 AM, Jannis Gebauer <[email protected]> wrote: > >> Hi, >> >> I ran a couple of queries against GitHubs public big query dataset [0] >> last week. I’m interested in requirement files in particular, so I ran a >> query extracting all available requirement files. >> >> Since queries against this dataset are rather expensive ($7 on all >> repos), I thought I’d share the raw data here [1]. The data contains the >> repo name, the requirements file path and the contents of the file. Every >> line represents a JSON blob, read it with: >> >> with open('data.json') as f: >> for line in f.readlines(): >> data = json.loads(line) >> >> Maybe that’s of interest to some of you. >> >> If you have any ideas on what to do with the data, please let me know. >> >> — >> >> Jannis Gebauer >> >> >> >> [0]: https://cloud.google.com/bigquery/public-data/github >> [1]: https://github.com/jayfk/requirements-dataset >> >> _______________________________________________ >> Distutils-SIG maillist - [email protected] >> https://mail.python.org/mailman/listinfo/distutils-sig >> >> > > _______________________________________________ > Distutils-SIG maillist - [email protected] > https://mail.python.org/mailman/listinfo/distutils-sig > >
_______________________________________________ Distutils-SIG maillist - [email protected] https://mail.python.org/mailman/listinfo/distutils-sig
