Dear all, Allin and I are working on automated data download from dbnomics (if you don't know what I'm talking about: https://db.nomics.world). This led both of us to using JSON object more systematically than we used to, and I have to say JSON is very nice.
During my experiments, I wrote a little function for automatically downloading datasets from DataHub (https://datahub.io). A small example script follows: <hansl> set verbose off function string datahub_get(string URL, string filename[null]) s = readfile(URL) printf "reading %s...\n", URL flush printf "Dataset name: %s\n", jsonget(s, "$.title") resources = jsonget(s, "$.resources..datahub.type") got = 0 loop i = 1 .. nelem(resources) --quiet string r = strstrip(jsonget(s, sprintf("$.resources[%d].datahub.type", i-1))) if r == "derived/csv" hit = i-1 got = 1 break endif endloop if got if exists(filename) ret = filename else fname = sprintf("%04d.csv", randgen1(i,0,9999)) if $windows ret = sprintf("@dotdir\\%s", fname) else ret = sprintf("@dotdir/%s", fname) endif endif printf "data URL found, reading...\n" flush path = jsonget(s, sprintf("$.resources[%d].path", hit)) c = readfile(path) outfile "@ret" --quiet --write print c outfile --close else printf "No data found\n" ret = "" endif return ret end function URL = "https://pkgstore.datahub.io/core/pharmaceutical-drug-spending/19/datapackage.json" fname = datahub_get(URL) print fname open "@fname" </hansl> Of course the function above can be enhanced in many ways, but the gist is: once you have the URL to the dataset, the function will download the data to a CSV file (you can supply a name if you want, otherwise a random one will be generated for you). At that point, you may just open the CSV file. How do you get the URLs for the datasets you want? Easy: * Navigate to the page of the dataset you want (for example, https://datahub.io/core/co2-ppm) * scroll to the bottom and copy the URL corresponding to the "Datapackage.json" button. In most cases, it's just the page URL with "/datapackage.json" appended at the end (for example, "https://datahub.io/core/co2-ppm/datapackage.json") * run the function datahub_get() with the URL string as first parameter; you maty supply an optional second parameter if you want to save the CSV file to a location of your choice. Enjoy! ------------------------------------------------------- Riccardo (Jack) Lucchetti Dipartimento di Scienze Economiche e Sociali (DiSES) Università Politecnica delle Marche (formerly known as Università di Ancona) r.lucchetti(a)univpm.it http://www2.econ.univpm.it/servizi/hpp/lucchetti -------------------------------------------------------
