Re: A GIS contains() for Hive?

2012-03-16 Thread Mattmann, Chris A (388J)
Hi Tim,

Over in the SIS community [1], eventually writing a driver for Hive or HBase to 
have spatial
support a la PostGIS is something that we've wanted to get around to, but 
haven't yet. The 
goal of SIS is to be an ALv2 licensed spatial toolkit, with no surprises [2]. 
If you are interested
in contributing to the SIS community and helping out, I'd certainly appreciate 
it. As would I
appreciate anyone in the HIVE community that has time to help us write the HIVE 
driver for SIS.
We currently have the ability to support point/radius and bbox QuadTree based 
searches, and
the loading of GeoRSS data into the QuadTree index.

Cheers,
Chris

[1] http://incubator.apache.org/sis/
[2] http://wiki.apache.org/incubator/SpatialProposal/

On Mar 16, 2012, at 2:21 AM, Tim Robertson wrote:

> Hi all,
> 
> I need to perform a lot of "point in polygon" checks and want to use Hive 
> (currently I mix Hive, Sqoop and PostGIS in an Oozie workto do this).
> 
> In an ideal world, I would like to create a Hive table from a Shapefile 
> containing polygons, and then do the likes of the following:
> 
>   SELECT p.id, pp.id FROM points p, polygons pp WHERE pp.contains(geom, 
> toPoint(p.lat,p.lng)) 
> 
> Has anyone done anything along these lines?
> 
> Alternatively I am capable of doing a UDF that would read the shape file into 
> memory and basically do a map side join using something like a slab 
> decomposition technique.  It is more limited but would meet my needs allowing 
> e.g.:
> 
>   SELECT contains(p.lat,p.lng, '/data/shapefiles/countries.shp') FROM points;
> 
> Before I start I thought I'd ask folks as I suspect people are doing this 
> kind of thing on Hive by now (thinking FB and user profiling by political 
> boundaries etc)
> 
> I'd love to hear from anyone who's investigated this or could provide any 
> advice.
> 
> Thanks!
> Tim
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



A GIS contains() for Hive?

2012-03-16 Thread Tim Robertson
Hi all,

I need to perform a lot of "point in polygon" checks and want to use Hive
(currently I mix Hive, Sqoop and PostGIS in an Oozie workto do this).

In an ideal world, I would like to create a Hive table from a Shapefile
containing polygons, and then do the likes of the following:

  SELECT p.id, pp.id FROM points p, polygons pp WHERE pp.contains(geom,
toPoint(p.lat,p.lng))

Has anyone done anything along these lines?

Alternatively I am capable of doing a UDF that would read the shape file
into memory and basically do a map side join using something like a slab
decomposition technique.  It is more limited but would meet my needs
allowing e.g.:

  SELECT contains(p.lat,p.lng, '/data/shapefiles/countries.shp') FROM
points;

Before I start I thought I'd ask folks as I suspect people are doing this
kind of thing on Hive by now (thinking FB and user profiling by political
boundaries etc)

I'd love to hear from anyone who's investigated this or could provide any
advice.

Thanks!
Tim