[GitHub] drill issue #258: DRILL-4091: Support for additional gis operations in gis c...
Github user k255 commented on the issue: https://github.com/apache/drill/pull/258 It's good that now there's commiter which is aware of the GIS context! The list of functions added in this PR is as follows: ST_Buffer, ST_Contains, ST_Crosses, ST_Difference, ST_Disjoint, ST_DiST_ance, ST_Envelope, ST_Equals, ST_Intersects, ST_Overlaps, ST_Relate, ST_Touches, ST_Transform, ST_Union, ST_UnionAggregate, ST_X, ST_Y, ST_XMin, ST_XMax, ST_YMin, ST_YMax Regarding the documentation, I wouldn't like to duplicate it because I followed what is available in PostGIS (which actually uses GEOS lib, in similar way as drill-gis uses relevant java libs - esri, proj4j) and these are defined in open geospatial consortium (OGC) specs. Of course here we have just a subset of what PostGIS is capable of, but I think it's valuable subset. So i.e. for ST_X function the docs are at http://www.postgis.net/docs/ST_X.html Also on example usage please refer to examples contained in readme at: https://github.com/k255/drill-gis I'll also finally need to think about blog post/presentation on this extension, but most probably not in following days but later in the future. ---
[GitHub] drill issue #258: DRILL-4091: Support for additional gis operations in gis c...
Github user k255 commented on the issue: https://github.com/apache/drill/pull/258 @amansinha100 better later than never! The PR is updated now. @cgivre offered that he could help reviewing this. @joeauty probably in further development we can consider adding geojson support. I'm happy that you like it! ---
[GitHub] drill pull request: DRILL-4091: Support for additional gis operati...
Github user k255 commented on the pull request: https://github.com/apache/drill/pull/258#issuecomment-182857803 Aggregate version of st_union allows merging geometries using 'group by' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4303: ESRI Shapefile (shp) format plugin
GitHub user k255 opened a pull request: https://github.com/apache/drill/pull/335 DRILL-4303: ESRI Shapefile (shp) format plugin Shp format plugin. Main idea is to read shapefiles for joining with other sources or enabling the conversion to i.e. parquet file which is capable of storing geometry data in binary format (WKT) on hdfs. The implementation is based on esri java lib which lets to parse single geometry definition. Custom code is written to read whole file (ShapefileByteBufferCursor). The plugin also handles reading of accompanying data file (dbf) and srid informations (srid). Sample usage: - reading shp ```select *, ST_AsText(geom) from cp.`sample-data/CA-cities.shp`;``` - conversion to parquet ```alter session set `store.format`='parquet';``` ```create table dfs.tmp.`/CA-cities-par` as select * from cp.`sample-data/CA-cities.shp`;``` There is also sample parquet file in cp.`sample-data/CA-cities.parquet` You can merge this pull request into a Git repository by running: $ git pull https://github.com/k255/drill drill-gis-shp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/335.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #335 commit ecaa6ff5303cd179cc0c0f96518b1ee69ff40955 Author: potocki Date: 2016-01-22T11:21:04Z ESRI Shapefile (shp) reader implemented as drill format plugin commit 91ccd1ccf0d06802dcf0da2ee1ef83c903c248af Author: potocki Date: 2016-01-22T12:19:00Z added sample file in parquet format --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4091: Support for additional gis operati...
Github user k255 commented on the pull request: https://github.com/apache/drill/pull/258#issuecomment-172858385 Added new functionality to transform spatial reference of geometries (SRID) based on Proj4J. Usage: ST_Transform(geom, srcSRID, tgtSRID) This lets you transform SRID in drill without using external tools! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4091: Support for additional gis operati...
Github user k255 commented on the pull request: https://github.com/apache/drill/pull/258#issuecomment-157026760 this extends DRILL-3914 functionality --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-4091: Support for additional gis operati...
GitHub user k255 opened a pull request: https://github.com/apache/drill/pull/258 DRILL-4091: Support for additional gis operations in gis contrib module Support for commonly used gis functions in gis contrib module: relate, contains, crosses, intersects, touches, difference, disjoint, equals, overlaps, buffer, union, get x coord. of a point, get y coord of a point. You can merge this pull request into a Git repository by running: $ git pull https://github.com/k255/drill drill-gis-ext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/258.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #258 commit 6081158304ab646b1c022f4ec047df0f6cdc5d1c Author: potocki Date: 2015-11-16T13:05:18Z Support for additional gis operations (relate, contains, touches, union, get x y of a point and more) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: DRILL-3747: basic similarity search with simme...
GitHub user k255 opened a pull request: https://github.com/apache/drill/pull/224 DRILL-3747: basic similarity search with simmetric Helps handling i.e. typos in search queries with popular algorithms like levenshtein. Sample query: ``` select levenshtein('foo', 'boo') from (VALUES(1)); //gives 0.67 ``` and ``` select levenshtein('foo', 'bar') from (VALUES(1)); //not similar - gives 0 ``` More: https://github.com/k255/drill-fuzzy-search https://en.wikipedia.org/wiki/Levenshtein_distance You can merge this pull request into a Git repository by running: $ git pull https://github.com/k255/drill drill-fuzzysearch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/224.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #224 commit 51248358adf7ee71a744cccb7a22b45850f192a8 Author: potocki Date: 2015-10-30T18:54:41Z basic similarity search with simmetric --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: [DRILL-3914]: support for geospatial query fun...
Github user k255 commented on a diff in the pull request: https://github.com/apache/drill/pull/191#discussion_r42217546 --- Diff: contrib/gis/src/main/java/org/apache/drill/exec/expr/fn/impl/gis/STAsText.java --- @@ -0,0 +1,58 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.expr.fn.impl.gis; + +import javax.inject.Inject; + +import org.apache.drill.exec.expr.DrillSimpleFunc; +import org.apache.drill.exec.expr.annotations.FunctionTemplate; +import org.apache.drill.exec.expr.annotations.Output; +import org.apache.drill.exec.expr.annotations.Param; +import org.apache.drill.exec.expr.holders.VarBinaryHolder; +import org.apache.drill.exec.expr.holders.VarCharHolder; + +import io.netty.buffer.DrillBuf; + +@FunctionTemplate(name = "st_astext", scope = FunctionTemplate.FunctionScope.SIMPLE, + nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) +public class STAsText implements DrillSimpleFunc { + @Param + VarBinaryHolder geom1Param; + + @Output + VarCharHolder out; + + @Inject + DrillBuf buffer; + + public void setup() { + } + + public void eval() { +com.esri.core.geometry.ogc.OGCGeometry geom1 = com.esri.core.geometry.ogc.OGCGeometry +.fromBinary(geom1Param.buffer.nioBuffer(geom1Param.start, geom1Param.end)); + +String geomWKT = geom1.asText(); + +int outputSize = geomWKT.getBytes().length; +out.buffer = buffer.reallocIfNeeded(outputSize); --- End diff -- Thanks, this was helpful. You're right the next executions failed with "Tried to remove unmanaged buffer.". Now it's fixed. Is it also valid to use BufferManager.getManagedBuffer(size) somehow (maybe instead of injecting the DrillBuf)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: [DRILL-3914]: support for geospatial query fun...
Github user k255 commented on the pull request: https://github.com/apache/drill/pull/191#issuecomment-148369490 Fixed bug with complex geometries caused by to small buffer. Now it's possible to build more complex geometries.Would be nice if somebody could check the way I handled it (buffer reallocation). I also have some progress on integration with gis tools as shown here: http://bit.ly/1Rcvrjd --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: [DRILL-3914]: support for geospatial query fun...
Github user k255 commented on the pull request: https://github.com/apache/drill/pull/191#issuecomment-146679625 I added some general tests to check if geometry functions work as expected. I'm happy that you like it. Currently it's quite simple but it can grow. One direction is to take care of limited size of varbinary (introduce new type or extend size of existing one) because it limits geometry to just simple shapes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request: [DRILL-3914]: support for geospatial query fun...
GitHub user k255 opened a pull request: https://github.com/apache/drill/pull/191 [DRILL-3914]: support for geospatial query functionality Sample dataset is provided on classpath, after building from fork repository, you can query it like: select * from cp.`sample-data/CA-cities.csv` limit 5; For details on current geospatial functionality please see: https://github.com/k255/drill-gis Currently the solution works on common use cases, but is based on varbinary data type which has limitations for more complex geometries (size limit). You can merge this pull request into a Git repository by running: $ git pull https://github.com/k255/drill master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/191.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #191 commit dc19cb732645b2d168f04eec521848992807cf07 Author: potocki Date: 2015-10-08T06:10:33Z gis contrib module with basic spatial queries functionality --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---