[Gretl-users] Datahub

Jack Wed, 04 Jul 2018 10:01:46 +0200

Dear all,

Allin and I are working on automated data download from dbnomics (if you 
don't know what I'm talking about: https://db.nomics.world). This led both 
of us to using JSON object more systematically than we used to, and I have 
to say JSON is very nice.


During my experiments, I wrote a little function for automatically 
downloading datasets from DataHub (https://datahub.io). A small example 
script follows:

<hansl>
set verbose off

function string datahub_get(string URL, string filename[null])
     s = readfile(URL)
     printf "reading %s...\n", URL
     flush
     printf "Dataset name: %s\n", jsonget(s, "$.title")
     resources = jsonget(s, "$.resources..datahub.type")
     got = 0
     loop i = 1 .. nelem(resources) --quiet
         string r = strstrip(jsonget(s, sprintf("$.resources[%d].datahub.type", 
i-1)))
         if r == "derived/csv"
             hit = i-1
             got = 1
             break
         endif
     endloop

     if got
         if exists(filename)
             ret = filename
         else

             fname = sprintf("%04d.csv", randgen1(i,0,9999))
             if $windows
                 ret = sprintf("@dotdir\\%s", fname)
             else
                 ret = sprintf("@dotdir/%s", fname)
             endif
         endif

         printf "data URL found, reading...\n"
         flush
         path = jsonget(s, sprintf("$.resources[%d].path", hit))
         c = readfile(path)
         outfile "@ret" --quiet --write
         print c
         outfile --close
     else
         printf "No data found\n"
         ret = ""
     endif

     return ret
end function

URL = 
"https://pkgstore.datahub.io/core/pharmaceutical-drug-spending/19/datapackage.json";
fname = datahub_get(URL)
print fname

open "@fname"
</hansl>

Of course the function above can be enhanced in many ways, but the gist 
is: once you have the URL to the dataset, the function will download the 
data to a CSV file (you can supply a name if you want, otherwise a random 
one will be generated for you). At that point, you may just open the CSV 
file.

How do you get the URLs for the datasets you want? Easy:

* Navigate to the page of the dataset you want (for example, 
https://datahub.io/core/co2-ppm)

* scroll to the bottom and copy the URL corresponding to the 
"Datapackage.json" button. In most cases, it's just the page URL with 
"/datapackage.json" appended at the end (for example, 
"https://datahub.io/core/co2-ppm/datapackage.json";)

* run the function datahub_get() with the URL string as first parameter; 
you maty supply an optional second parameter if you want to save the CSV 
file to a location of your choice.

Enjoy!

-------------------------------------------------------
   Riccardo (Jack) Lucchetti
   Dipartimento di Scienze Economiche e Sociali (DiSES)

   Università Politecnica delle Marche
   (formerly known as Università di Ancona)

   r.lucchetti(a)univpm.it
   http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------

[Gretl-users] Datahub

Reply via email to