Hi Juan et al Thanks a lot for triggering this discussion. I am currently working on a Web processing service (http://birdhouse.readthedocs.io/en/latest/) including a species distribution model based on the GBIF data (and climate model data). A good connection to GBIF database is still missing and all hints were quite useful!!
If you want to share code: https://github.com/bird-house/flyingpigeon/blob/master/flyingpigeon/processes/wps_sdm.py Merci Nils On 31/05/2016 22:08, Juan M. Escamilla Molgora wrote: > > Hi Tim, > > Thank you! specially for the DwC-A hint. > > The cells are by default in decimal degrees, (wgs84 ) but the > functions for generating them are general enough to use any projection > supported by gdal using postgis. It could be done "on the fly" or > stored on the server side, > > I was thinking (day dreaming) in a standard way for coding unique but > universal grids (similar to geohash or open location code), but didn't > find something fast and ready. Maybe later :) > > I only use Open Source Software, Python, Django, GDAL, Numpy, Postgis, > Conda, Py2Neo, ete2 among others. > > Currently I don't have an official release and the project is quite > inmature, unstable as well as the installation could be non trivial. > I'm fixing all these issues but will take some time,sorry for this. > > The github repository is: > > https://github.com/molgor/biospytial.git > > An there's a very old documentation here: > > http://test.holobio.me/modules/gbif_taxonomy_class.html > > Please feel free to follow! > > > Best wishes > > > Juan > > P.s. The functions for generating the grid are in: > biospytial/SQL_functions > > > > > > On 31/05/16 19:47, Tim Robertson wrote: >> Thanks Juan >> >> You're quite right - you need the DwC-A download format to get those >> IDs. >> >> Are the cells decimal degrees, and then partitioned into smaller >> units, or equal area cells or maybe UTM grids or something else >> perhaps? I am just curious. >> >> Are you developing this as OSS? I'd like to follow progress if possible? >> >> Thanks, >> Tim, >> >> On 31 May 2016, at 20:31, Juan M. Escamilla Molgora >> <j.escamillamolgora at lancaster.ac.uk> wrote: >> >>> Hi Tim, >>> >>> The grid is made by selecting a square area and divide it in nxn >>> subsquares which form a partition on the bigger square. >>> >>> Each grid is a table in postgis and there's a mapping between this >>> table to a django model (class). >>> >>> The class constructor have attributes: id, cell and neighbours (next >>> release). >>> >>> The cell is a polygon (square) and with geodjango inherits the >>> properties of the osgeo module for polygons. >>> >>> I've tried to use the CSV data (downloaded as a CSV request ) but I >>> couldn't find a way to obtain the global id's for each taxonomic >>> level (idspecies, idgenus, idfamily, etc). >>> >>> Do you know a way for obtaining these fields? >>> >>> >>> Thank you for your email and best wishes, >>> >>> >>> Juan >>> >>> >>> On 31/05/16 19:03, Tim Robertson wrote: >>>> Hi Juan >>>> >>>> That sounds like a fun project! >>>> >>>> Can you please describe your grid / cells? >>>> >>>> Most likely your best bet will be to use the download API (as CSV >>>> data) and ingest that. The other APIs will likely hit limits (e.g. >>>> You can't page through indefinitely). >>>> >>>> Thanks, >>>> Tim >>>> >>>> On 31 May 2016, at 18:55, Juan M. Escamilla Molgora >>>> <j.escamillamolgora at lancaster.ac.uk> wrote: >>>> >>>>> Dear all, >>>>> >>>>> >>>>> Thank you very much for your valuable feedback! >>>>> >>>>> >>>>> I'll explain a bit what I'm doing just to clarify, sorry if this >>>>> spam to some. >>>>> >>>>> >>>>> I want to build a model for species assemblages based on >>>>> co-occurrence of taxa within an arbitrary area. I'm building a 2D >>>>> lattice in which for each cell I'm collapsing the data into a >>>>> taxonomic tree (the occurrences). For doing this I need first to >>>>> obtain the data from the gbif api and later, based on the ids (or >>>>> names) of each taxonomic level (from kingdom to occurrence) build >>>>> a tree coupled to each cell. >>>>> >>>>> >>>>> The implementation is done with postgresql (postgis) for storing >>>>> the raw gbif data and neo4j for storing the relation >>>>> >>>>> "Being a member of the [ specie, genus, family,,,] [name/id]" The >>>>> idea is to include data from different sources similar to the >>>>> project Matthew and Jennifer had mentioned (which I'm very >>>>> interested and like to hear more) and traverse the network looking >>>>> for significant merged information. >>>>> >>>>> >>>>> One of the immediate problems I've found is to import big chunks >>>>> of the gbif data into my specification. Thanks to this thread I've >>>>> found the tools that are the most used by the community >>>>> (pygbif,rgbif, and python-dwca-reader). I was using urlib2 and >>>>> things like that. >>>>> >>>>> I'll be happy to share any code or ideas with the people interested. >>>>> >>>>> >>>>> Btw, I've checked the tinkerpop project which uses the Gremlin >>>>> traversal language as independent from the DBMS. >>>>> >>>>> Perhaps it's possible to use it with spark and Guoda as well? >>>>> >>>>> >>>>> >>>>> Does GOuda is working now? >>>>> >>>>> >>>>> Best wishes >>>>> >>>>> >>>>> Juan. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 31/05/16 17:02, Collins, Matthew wrote: >>>>>> >>>>>> Jorrit pointed out this thread to us at iDigBio. Downloading and >>>>>> importing data into a relational database will work great, >>>>>> especially if as Jan said you can cut the data size down to a >>>>>> reasonable amount. >>>>>> >>>>>> >>>>>> Another approach we've been working on in a collaboration called >>>>>> GUODA [1] is to build an Apache Spark environment with >>>>>> pre-formatted data frames with common data sets in them for >>>>>> researchers to use. This approach would offer a remote service >>>>>> where you could write arbitrary Spark code, probably in Jupyter >>>>>> notebooks, to iterate over data. Spark does a lot of cool stuff >>>>>> including GraphX which might be of interest. This is definitely >>>>>> pre-alpha at this point and if anyone is interested, I'd like to >>>>>> hear your thoughts. I'll also be at SPNHC talking about this. >>>>>> >>>>>> >>>>>> One thing we've found in working on this is that importing data >>>>>> into a structured data format isn't always easy. If you only want >>>>>> a few columns, it'll be fine. But getting the data typing, format >>>>>> standardization, and column name syntax of the whole width of an >>>>>> iDigBio record right requires some code. I looked to see if >>>>>> EcoData Retriever [2] had a GBIF data source and they have an >>>>>> eBird one that perhaps you might find useful as a starting point >>>>>> if you wanted to try to use someone else's code to download and >>>>>> import data. >>>>>> >>>>>> >>>>>> For other data structures like BHL, we're kind of making stuff up >>>>>> since we're packaging a relational structure and not something >>>>>> nearly as flat as GBIF and DWC stuff. >>>>>> >>>>>> >>>>>> [1] http://guoda.bio/? >>>>>> >>>>>> [2] http://www.ecodataretriever.org/ >>>>>> >>>>>> >>>>>> Matthew Collins >>>>>> Technical Operations Manager >>>>>> Advanced Computing and Information Systems Lab, ECE >>>>>> University of Florida >>>>>> 352-392-5414 <callto:352-392-5414> >>>>>> ------------------------------------------------------------------------ >>>>>> *From:* jorrit poelen <jhpoelen at xs4all.nl> >>>>>> *Sent:* Monday, May 30, 2016 11:16 AM >>>>>> *To:* Collins, Matthew; Thompson, Alexander M; Hammock, Jennifer >>>>>> *Subject:* Fwd: [API-users] Is there any NEO4J or graph-based >>>>>> driver for this API ? >>>>>> Hey y?all: >>>>>> >>>>>> Interesting request below on the GBIF mailing list - sounds like >>>>>> a perfect fit for the GUODA use cases. >>>>>> >>>>>> Would it be too early to jump onto this thread and share our >>>>>> efforts/vision? >>>>>> >>>>>> thx, >>>>>> -jorrit >>>>>> >>>>>>> Begin forwarded message: >>>>>>> >>>>>>> *From: *Jan Legind <jlegind at gbif.org> >>>>>>> *Subject: **Re: [API-users] Is there any NEO4J or graph-based >>>>>>> driver for this API ?* >>>>>>> *Date: *May 30, 2016 at 5:48:51 AM PDT >>>>>>> *To: *Mauro Cavalcanti <maurobio at gmail.com>, "Juan M. Escamilla >>>>>>> Molgora" <j.escamillamolgora at lancaster.ac.uk> >>>>>>> *Cc: *"api-users at lists.gbif.org >>>>>>> <mailto:api-users at lists.gbif.org>" <api-users at lists.gbif.org> >>>>>>> >>>>>>> Dear Juan, >>>>>>> Unfortunately we have no tool for creating these kind of SQL >>>>>>> like queries to the portal. I am sure you are aware that the >>>>>>> filters in the occurrence search pages can be applied in >>>>>>> combination in numerous ways. The API can go even further in >>>>>>> this regard[1], but it not well suited for retrieving occurrence >>>>>>> records since there is a 200.000 records ceiling making it unfit >>>>>>> for species exceeding this number. >>>>>>> There is going be updates to the pygbif package[2] in the near >>>>>>> future that will enable you to launch user downloads >>>>>>> programmatically where a whole list of different species can be >>>>>>> used as a query parameter as well as adding polygons.[3] >>>>>>> In the meantime, Mauro?s suggestion is excellent. If you can >>>>>>> narrow your search down until it returns a manageable download >>>>>>> (say less than 100 million records), importing this into a >>>>>>> database should be doable. From there, you can refine using SQL >>>>>>> queries. >>>>>>> Best, >>>>>>> Jan K. Legind, GBIF Data manager >>>>>>> [1]http://www.gbif.org/developer/occurrence#search >>>>>>> [2]https://github.com/sckott/pygbif >>>>>>> [3]https://github.com/jlegind/GBIF-downloads >>>>>>> *From:*API-users [mailto:api-users-bounces at lists.gbif.org]*On >>>>>>> Behalf Of*Mauro Cavalcanti >>>>>>> *Sent:*30. maj 2016 14:06 >>>>>>> *To:*Juan M. Escamilla Molgora >>>>>>> *Cc:*api-users at lists.gbif.org >>>>>>> *Subject:*Re: [API-users] Is there any NEO4J or graph-based >>>>>>> driver for this API ? >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> One solution I have successfully adopted for this is to download >>>>>>> the records (either "manually" via browser or, yet better, using >>>>>>> a Python script using the fine pygbif library), storing them >>>>>>> into a MySQL or SQLite database and then perform the relational >>>>>>> queries. I can provide examples if you are interested. >>>>>>> >>>>>>> Best regards, >>>>>>> 2016-05-30 8:59 GMT-03:00 Juan M. Escamilla Molgora >>>>>>> <j.escamillamolgora at lancaster.ac.uk>: >>>>>>> Hola, >>>>>>> >>>>>>> Is there any API for making relational queries like taxonomy, >>>>>>> location or timestamp? >>>>>>> >>>>>>> Thank you and best wishes >>>>>>> >>>>>>> Juan >>>>>>> _______________________________________________ >>>>>>> API-users mailing list >>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> >>>>>>> http://lists.gbif.org/mailman/listinfo/api-users >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Dr. Mauro J. Cavalcanti >>>>>>> E-mail:maurobio at gmail.com >>>>>>> Web:http://sites.google.com/site/maurobio >>>>>>> _______________________________________________ >>>>>>> API-users mailing list >>>>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> >>>>>>> http://lists.gbif.org/mailman/listinfo/api-users >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> API-users mailing list >>>>>> API-users at lists.gbif.org >>>>>> http://lists.gbif.org/mailman/listinfo/api-users >>>>> >>>>> _______________________________________________ >>>>> API-users mailing list >>>>> API-users at lists.gbif.org <mailto:API-users at lists.gbif.org> >>>>> http://lists.gbif.org/mailman/listinfo/api-users >>> > > > > _______________________________________________ > API-users mailing list > API-users at lists.gbif.org > http://lists.gbif.org/mailman/listinfo/api-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gbif.org/pipermail/api-users/attachments/20160601/66fbfe78/attachment-0001.html>