Hi all,

In general, database may contain geographical location data. For instance,
Telecom operators require to perform analytics based on a particular
region, cell tower IDs(within a region) and/or may include geographical
locations for a particular period of time. At present, Carbon do not have
native support to store geographical locations/coordinates and to do filter
queries based on them. Yet, longitude and latitude of coordinates can be
treated as independent columns, sort hierarchically and store them.

         But, when longitude and latitude are treated independently, 2D
space is linearized i.e., points in the two dimensional domain are ordered
by sorting first on longitide and then on latitude. Thus, data is not
ordered by geospatial proximity. Hence range queries require lot of IO
operations and query performance is degraded.

        To alleviate it, we can use z-order curve to store geospatial data
points. This ensures that geographically nearer points are present at same
block/blocklet. This reduces the IO operations for range queries and
improves query performance. Also can support polygon queries for geodata.

Have raised a jira https://issues.apache.org/jira/browse/CARBONDATA-3548 and
attached design document to it. Request you to please have a look. Welcome
your opinion and suggestions.

Thanks,

Reply via email to