Hi All,

I think i need a mentor working with me and help me make gdal under mongodb 
support.
Below is the proposal i wrote, hopefully you find it worth a trial.

Thanks,
shuai


Title: OGR Driver for MongoDB

Short description:
MongoDB, a document database that provides high performance, high availability, 
and easy scalability, can be a good platform for storing extremely large 
spatial datasets, to support high performance geo-computation and real-time 
spatial analysis in a large scale.This project aims at developing a OGR Driver 
for MongoDB to help applications or softwares based on GDAL, such QGIS, 
Geoserver, Mapserver, and so on, read & write the spatial data in it, and thus 
enable the Open Source GIS Ecosystem powered by the advanced NoSQL database.

Describe your idea
1. Introduction
MongoDB,  a document database that provides high performance, high 
availability, and easy scalability, can be a good platform for storing 
extremely large spatial datasets, to support high performance geo-computation 
and real-time spatial analysis in a large scale. Yet, there is little attention 
so far that GIS fields pay to make most of its strength. This project aims at 
developing a OGR Driver for MongoDB to help applications or softwares based on 
GDAL read & write the spatial data in it, and thus enable the Open Source GIS 
Ecosystem powered by the advanced NoSQL database.

 2. Background
Since we are living in the era of big data, tools and equipment today for 
capturing spatial data both at the mega-scale and the milli-scale are just 
dreadful. The magnitude of this data volume is well beyond the capability of 
any mainstream geographic information systems. Yet, we, GIS fields, have no 
off-the-shelf solutions to manage these massive spatial data. Relational 
spatial databases have taken in charge for decades but now the situation seems 
a little different.

A computing pattern shift can be seen throughout the IT industry in recent 
years and GIS would be no exception. Especially, data analytics may not be 
achievable within a reasonable amount of time without resorting to 
high-performance computing strategies. However, relational spatial databases 
are kind of slow to support these high-performance computing scenarios, and 
often lack of flexible scalability to handle a growing amount of work in a 
capable manner.

Fortunately, there are several groups trying to address the problem, and 
MongoDB is an apparent leader in this direction. MongoDB, which has native 
support for maintaining geospatial data, using a document-oriented model, lies 
in fifth place in the DB-Engines Ranking of database management systems classed 
according to popularity and the highest rated non-relational system. From 
version 2.4 (released on March 19, 2013), MongoDB introduces support for a 
subset of GeoJSON geometries including basic shapes like points, linestrings, 
polygons. And quite a number of partners related with big data, NoSQL, cloud, 
mobile and high performance computing join the MongoDB ecosystem. Foursquare is 
featured one of them which benefits from MongoDB’s support for geospatial 
indexing, allowing it to easily query for large location-based data.

3. The idea
MongoDB employs GeoJSON to store spatial data and concurrently GDAL supports 
for access to features encoded in GeoJSON format, which can be reusable. This 
project is trying to implement a MongoDB Driver according to the OGR format 
driver interfaces with subclasses of OGRSFDriver, OGRDataSource and OGRLayer, 
and registered with the OGRSFDriverRegistrar at runtime, so that GDAL may use 
MongoDB as a datasource to access large scale spatial data.

4. Project plan (detailed timeline: how do you plan to spend your summer?)
The first thing in the list is to design the structure inside of MongoDB 
spatial database. In the context of OGR data model, we got Datasource, Layer 
and Feature, so accordingly every database in MongoDB is regarded as a 
Datasource, and the Collections within should be treated as Layers, thus every 
Document as a Feature. PostGIS and other spatial databases often harness some 
system tables to maintain the metadata, but since MongoDB is schema free 
metadata such as spatial reference can be stored within the particular Layer, 
in this case a Collection.

The most important part of a data format driver is to define how to read and 
write the data format in the specific driver, especially the Open and Create 
method in the Datasource Class. As MongoDB organizes its spatial data in 
GeoJSON model, the GeoJSON driver already supported by current GDAL can be 
reused to code or decode the GeoJSON fetched from MongoDB database. Therefore, 
there would be totally four files to implement, including ogr_mongo.h, 
ogrmongodriver.cpp, ogrmongodatasource.cpp, and ogrmongolayer.cpp.

Test Plan
[1] After the MongoDB Driver is compiled into the OGR framework, the utility 
ogr2ogr can be used as the test program to import and output spatial data 
between shapefile and MongoDB.
[2] Conduct a parallel transformation process to find how fast MongoDB Driver 
can be compared to file system and PostGIS.

Time Line

May 19- June 8 (Coding - Phase 1 - 3 weeks)
Preparing the developing environment and bringing GDAL, MongoDB C++ driver and 
C++ together, Implementing OGRMongoDriver, OGRMongoDataSource, OGRMongoLayer 
according to the interfaces defined by OGRSFDriver, OGRDataSource and OGRLayer.
June 9 - June 23 (Coding - Phase 2 - 2 weeks)
Build MongoDB into the OGR framework, and may first support to exchange a small 
size of spatial data with MongoDB, Simultaneously bug fixing.
July 24 - July 13 (Coding - Phase 3 - 3 weeks)
Passing the query string (a JSON style document) for both spatial and attribute 
data into MongoDB to select features as requested. Compile all the codes and 
conduct several tests, fix bugs and make it faster.
July 14 - July 27 (Testing - Phase 1 - 2 weeks)
Transfer large scale spatial data with MongoDB using ogr2ogr to see the driver 
efficiency. Improve its efficiency and fix bugs.
July 28 - August 10 (Testing - Phase 2 - 2 weeks)
Conduct a parallel transformation experiment to find how fast MongoDB Driver 
can be compared to file system and PostGIS, and fix bugs.
August 11 - August 18 (pencils down)
Write code documentation, includes doxygen comments and techbase/userbase 
articles.

5. Future ideas / How can your idea be expanded?
MongoDB is also an ideal platform for storing massive geo-raster data, so next 
job would be writing a MongoDB Driver for raster dataset.

Explain how your SoC task would benefit the OSGeo member project and more 
generally the OSGeo Foundation as a whole:
MongoDB can be a distributed and parallel NoSQL spatial database with high 
performance, high availability, and easy scalability, thus extremely suitable 
for large scale data-intensive computing. By implementing the MongoDB Driver in 
the OGR framework, the whole OSGeo ecosystem based on GDAL/OGR will be benefit 
from it and powered by MongoDB.

Please provide details of general computing experience: (operating systems you 
use on a day-to-day basis, languages you could write a program in, hardware, 
networking experience, etc.)
During my college time, I mainly used .NET languages such as C#,VB.net, to 
build GIS software running on the Windows platform, while after that and my PhD 
program beginning most of my work were done in standard C++ on Linux 
environment.

Please provide details of previous GIS experience:
I’m a GIS student ever since I attend college. Right now I'm a Ph.D candidate 
in Cartography and Geographic Information System, School of Geographic and 
Oceanographic Sciences, Nanjing University, China, and a visiting scholar at 
Geography & GIScience and NCSA (The National Center for Supercomputing 
Applications), UIUC, IL, USA.

Please provide details of any previous involvement with GIS programming and 
other software programming:
[1] Climate Information Management System of Shanxi Province: Outstanding Award 
in ESRI Chinese College Student Software Development Contest, 2009.
[2] Forest Fire Simulation Model based on Geographic Cellular Automata: Third 
Prize in ESRI Chinese College Student Software Development Contest, 2009.
[3] High Performance Geospatial Computing System: HiGIS, (2011-2013)Supported 
by the National High Technology Research and Development Program of China (863 
project), in construction.
[4] NoSQL Expression of Massive Geospatial Information in the era of Big Data, 
(2013-2015) Supported by the Scientific Research Foundation of Graduate School 
of Nanjing University, in construction

Please tell us why you are interested in GIS and open source software:
They are powerful and beautiful treasures of humankind, and I want to be part 
of it.

Please tell us why you are interested in working for OSGeo and the software 
project you have selected:
It’s part of my research, since I was trying to harness MongoDB to support high 
performance geo-computing.

Please tell us why you are interested in your specific coding project:
I spent lots of time in the past three years learning how GDAL works and how to 
employ them into high performance computing applications. So I believe a new 
GDAL with MongoDB support will do much good to my current research.

Would your application contribute to your ongoing studies/ degree? If so, how?
Yes. MongoDB cluster is a good way to handle large quantities of spatial data, 
and if OGR provides MongoDB Driver, lots of tools we developed based on GDAL 
can be reusable, and powered by MongoDB, thus much faster.

Please explain how you intend to continue being an active member of your 
project and/or OSGeo AFTER the summer is over:
I’ll try my best to keep following this thread to make MongoDB Driver stable 
and efficient.

Do you understand this is a serious commitment, equivalent to a full-time paid 
summer internship or summer job?
Yes, I understand. I’ll give my best.

Do you have any known time conflicts during the official coding period? (May 19 
to August 19)
No, I don't.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to