Jorrit pointed out this thread to us at iDigBio. Downloading and importing data 
into a relational database will work great, especially if as Jan said you can 
cut the data size down to a reasonable amount.


Another approach we've been working on in a collaboration called GUODA [1] is 
to build an Apache Spark environment with pre-formatted data frames with common 
data sets in them for researchers to use. This approach would offer a remote 
service where you could write arbitrary Spark code, probably in Jupyter 
notebooks, to iterate over data. Spark does a lot of cool stuff including 
GraphX which might be of interest. This is definitely pre-alpha at this point 
and if anyone is interested, I'd like to hear your thoughts. I'll also be at 
SPNHC talking about this.


One thing we've found in working on this is that importing data into a 
structured data format isn't always easy. If you only want a few columns, it'll 
be fine. But getting the data typing, format standardization, and column name 
syntax of the whole width of an iDigBio record right requires some code. I 
looked to see if EcoData Retriever [2] had a GBIF data source and they have an 
eBird one that perhaps you might find useful as a starting point if you wanted 
to try to use someone else's code to download and import data.


For other data structures like BHL, we're kind of making stuff up since we're 
packaging a relational structure and not something nearly as flat as GBIF and 
DWC stuff.


[1] http://guoda.bio/?

[2] http://www.ecodataretriever.org/


Matthew Collins
Technical Operations Manager
Advanced Computing and Information Systems Lab, ECE
University of Florida
352-392-5414<callto:352-392-5414>
________________________________
From: jorrit poelen <jhpoe...@xs4all.nl>
Sent: Monday, May 30, 2016 11:16 AM
To: Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer
Subject: Fwd: [API-users] Is there any NEO4J or graph-based driver for this API 
?

Hey y?all:

Interesting request below on the GBIF mailing list - sounds like a perfect fit 
for the GUODA use cases.

Would it be too early to jump onto this thread and share our efforts/vision?

thx,
-jorrit

Begin forwarded message:

From: Jan Legind <jlegind at gbif.org<mailto:jleg...@gbif.org>>
Subject: Re: [API-users] Is there any NEO4J or graph-based driver for this API ?
Date: May 30, 2016 at 5:48:51 AM PDT
To: Mauro Cavalcanti <maurobio at gmail.com<mailto:maurobio at gmail.com>>, 
"Juan M. Escamilla Molgora" <j.escamillamolgora at 
lancaster.ac.uk<mailto:j.escamillamolgora at lancaster.ac.uk>>
Cc: "api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>" 
<api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>>

Dear Juan,

Unfortunately we have no tool for creating these kind of SQL like queries to 
the portal. I am sure you are aware that the filters in the occurrence search 
pages can be applied in combination in numerous ways. The API can go even 
further in this regard[1], but it not well suited for retrieving occurrence 
records since there is a 200.000 records ceiling making it unfit for species 
exceeding this number.

There is going be updates to the pygbif package[2] in the near future that will 
enable you to launch user downloads programmatically where a whole list of 
different species can be used as a query parameter as well as adding 
polygons.[3]

In the meantime, Mauro?s suggestion is excellent. If you can narrow your search 
down until it returns a manageable download (say less than 100 million 
records), importing this into a database should be doable. From there, you can 
refine using SQL queries.

Best,
Jan K. Legind, GBIF Data manager

[1] http://www.gbif.org/developer/occurrence#search
[2] https://github.com/sckott/pygbif
[3] https://github.com/jlegind/GBIF-downloads

From: API-users [mailto:api-users-boun...@lists.gbif.org] On Behalf Of Mauro 
Cavalcanti
Sent: 30. maj 2016 14:06
To: Juan M. Escamilla Molgora
Cc: api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>
Subject: Re: [API-users] Is there any NEO4J or graph-based driver for this API ?

Hi,
One solution I have successfully adopted for this is to download the records 
(either "manually" via browser or, yet better, using a Python script using the 
fine pygbif library), storing them into a MySQL or SQLite database and then 
perform the relational queries. I can provide examples if you are interested.
Best regards,

2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora <j.escamillamolgora at 
lancaster.ac.uk<mailto:j.escamillamolgora at lancaster.ac.uk>>:
Hola,

Is there any API for making relational queries like taxonomy, location or 
timestamp?

Thank you and best wishes

Juan
_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users



--
Dr. Mauro J. Cavalcanti
E-mail: maurobio at gmail.com<mailto:maurobio at gmail.com>
Web: http://sites.google.com/site/maurobio
_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.gbif.org/pipermail/api-users/attachments/20160531/5daa63b5/attachment-0001.html>

Reply via email to