Hi all, Nick has pointed out to me an alternative GIS package that can replace JTS. ESRI has recently released a GIS package<https://github.com/Esri/geometry-api-java>under Apache license. I changed Pigeon to work with that new package. I think it could be easier now to integrate this work with main branch of Apache Pig. I will go on with the current project and add more spatial functionality. We can then add a new datatype to Apache and link it to those functions.
ESRI package contains a class OGCGeometry <http://esri.github.io/geometry-api-java/javadoc/com/esri/core/geometry/ogc/OGCGeometry.html>which can be linked to a new datatype 'Geometry'. Do you think we can rely on the new package and integrate the work with Apache Pig? On May 23, 2013 11:40 PM, "Ahmed Eldawy" <aseld...@gmail.com> wrote: > Hi all, > Thanks for your help. I've started the project with a minimal > functionality as a start. It's currently hosted in github. It is licensed > under the Apache public license to make it easier to merge with Pig. > Currently it has only a very few functions. I implemented a function from > different types of functions (e.g., Aggregate and create). I'll keep adding > functions and any contributions to the project are welcome. As a beginning, > I need an ANT build file that runs the tests, compiles and generates a jar > file. I'm not familiar with ANT so any help in this is encouraged. > Here's the project home page > https://github.com/aseldawy/pigeon > > > If you have any comments or suggestion please contact me. > > > Best regards, > Ahmed Eldawy > > > On Mon, May 6, 2013 at 3:09 PM, Jonathan Coveney <jcove...@gmail.com>wrote: > >> Nick: the only issue is that the way types are implemented in Pig don't >> allow us to easily "plug-in" types externally. Adding support for that >> would be cool, but a fair bit of work. >> >> >> 2013/5/6 Nick Dimiduk <ndimi...@gmail.com> >> >> > I'm to a lawyer, but I see no reason why this cannot be an external >> > extension to Pig. It would behave the same way PostGIS is an external >> > extension to Postgres. Any Apache issues would be toward general >> > purpose enhancements, not specific to your project. >> > >> > Good on you! >> > -n >> > >> > On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy <aseld...@gmail.com> >> wrote: >> > >> > > I contacted solr developers to see how JTS can be included in an >> Apache >> > > project. See >> > > >> > > >> > >> http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/ >> > > As far as I understand, they did not include it in the main solr >> project, >> > > rather, they created a separate project (spatial 4j) which is still >> > > licensed under Apache license and refers to JTS. Users will have to >> > > download JTS libraries separately to make it run. That's pretty much >> the >> > > same plan that Jonathan mentioned. We will still have the overhead of >> > > serializing/deserializing the shapes each time a function is called. >> > Also, >> > > we will have to use the ugly bytearray data type for spatial data >> instead >> > > of creating its own data type (e.g., Geometry). >> > > I think using spatial 4j instead of JTS will not be sufficient for our >> > case >> > > as we need to provide an access to all spatial functions of JTS such >> as >> > > Union, Intersection, Difference, ... etc. This way we can claim >> > conformity >> > > with OGC standards which gives visibility and appreciations of the >> > spatial >> > > community. >> > > I think also that this means I will not add any issues to JIRA as it >> is >> > now >> > > a separate project. I'm planning to host it on github and have all the >> > > issues there. >> > > Let me know if you have any suggestions or comments. >> > > >> > > Thanks >> > > Ahmed >> > > >> > > >> > > Best regards, >> > > Ahmed Eldawy >> > > >> > > >> > > On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney <jcove...@gmail.com> >> > > wrote: >> > > >> > > > You can give them all the same label or tag and filter on that later >> > on. >> > > > >> > > > >> > > > 2013/5/6 Ahmed Eldawy <aseld...@gmail.com> >> > > > >> > > > > Thanks all for taking the time to respond. Danial, I didn't know >> that >> > > > Solr >> > > > > uses JTS. This is a good finding and we can definitely ask them to >> > see >> > > if >> > > > > there is a work around we can do. Jonathan, I thought of the same >> > idea >> > > of >> > > > > serializing/deserializing a bytearray each time a UDF is called. >> The >> > > > > deserialization part is good for letting Pig auto detect spatial >> > types >> > > if >> > > > > not set explicitly in the schema. What is the best way to start >> > this? I >> > > > > want to add an initial set of JIRA issues and start working on >> them >> > > but I >> > > > > also need to keep the work grouped in some sense just for >> > organization. >> > > > > >> > > > > Thanks >> > > > > Ahmed >> > > > > >> > > > > Best regards, >> > > > > Ahmed Eldawy >> > > > > >> > > > > >> > > > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney < >> jcove...@gmail.com >> > > >> > > > > wrote: >> > > > > >> > > > > > I agree that this is cool, and if other projects are using JTS >> it >> > is >> > > > > worth >> > > > > > talking them to see how. I also agree that licensing is very >> > > > frustrating. >> > > > > > >> > > > > > In the short term, however, while it is annoying to have to >> manage >> > > the >> > > > > > serialization and deserialization yourself, you can have the >> > geometry >> > > > > type >> > > > > > be passed around as a bytearray type. Your UDF's will have to >> know >> > > this >> > > > > and >> > > > > > treat it accordingly, but if you did this then all of the tools >> > could >> > > > be >> > > > > in >> > > > > > an external project on github instead of a branch in Pig. Then, >> if >> > we >> > > > can >> > > > > > get the licensing done, we could add the Geometry type to Pig. >> > Adding >> > > > > > types, honestly, is kind of tedious but not super difficult, so >> > once >> > > > the >> > > > > > rest is done, that shouldn't be too difficult. >> > > > > > >> > > > > > >> > > > > > 2013/5/4 Russell Jurney <russell.jur...@gmail.com> >> > > > > > >> > > > > > > If a way could be found, this would be an awesome addition to >> > Pig. >> > > > > > > >> > > > > > > Russell Jurney http://datasyndrome.com >> > > > > > > >> > > > > > > On May 3, 2013, at 4:09 PM, Daniel Dai <da...@hortonworks.com >> > >> > > > wrote: >> > > > > > > >> > > > > > > > I am not sure how other Apache projects dealing with it? >> Seems >> > > Solr >> > > > > > also >> > > > > > > > has some connector to JTS? >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > Daniel >> > > > > > > > >> > > > > > > > >> > > > > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy < >> > > aseld...@gmail.com> >> > > > > > > wrote: >> > > > > > > > >> > > > > > > >> Thanks Alan for your interest. It's too bad that an open >> > source >> > > > > > > licensing >> > > > > > > >> issue is holding me back from doing some open source work. >> I >> > > > > > understand >> > > > > > > the >> > > > > > > >> issue and your workarounds make sense. However, as I >> mentioned >> > > in >> > > > > the >> > > > > > > >> beginning, I don't want to have my own branch of Pig >> because >> > it >> > > > > makes >> > > > > > my >> > > > > > > >> extension less portable. I'll think of another way to do >> it. >> > > I'll >> > > > > ask >> > > > > > > vivid >> > > > > > > >> solutions if they can double license their code although I >> > think >> > > > the >> > > > > > > answer >> > > > > > > >> will be no. I'll also think of a way to ship my extension >> as a >> > > set >> > > > > of >> > > > > > > jar >> > > > > > > >> files without the need to change the core of Pig. This >> way, it >> > > can >> > > > > be >> > > > > > > >> easily ported to newer versions of Pig. >> > > > > > > >> >> > > > > > > >> Thanks >> > > > > > > >> Ahmed >> > > > > > > >> >> > > > > > > >> Best regards, >> > > > > > > >> Ahmed Eldawy >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> On Thu, May 2, 2013 at 12:33 PM, Alan Gates < >> > > > ga...@hortonworks.com> >> > > > > > > wrote: >> > > > > > > >> >> > > > > > > >>> I know this is frustrating, but the different licenses do >> > have >> > > > > > > different >> > > > > > > >>> requirements that make it so that Apache can't ship GPL >> code. >> > > A >> > > > > > legal >> > > > > > > >>> explanation is at >> > > > > > > >> > http://www.apache.org/licenses/GPL-compatibility.htmlForadditional >> > > > > info >> > > > > > > on the LGPL specific questions see >> > > > > > > >>> http://www.apache.org/legal/3party.html >> > > > > > > >>> >> > > > > > > >>> As far as pulling it in via ivy, the issue isn't so much >> > where >> > > > the >> > > > > > code >> > > > > > > >>> lives as much as what code we are requiring to make Pig >> work. >> > > If >> > > > > > > >> something >> > > > > > > >>> that is [L]GPL is required for Pig it violates Apache >> rules >> > as >> > > > > > outlined >> > > > > > > >>> above. It also would be a show stopper for a lot of >> > companies >> > > > that >> > > > > > > >>> redistribute Pig and that are allergic to GPL software. >> > > > > > > >>> >> > > > > > > >>> So, as I said before, if you wanted to continue with that >> > > library >> > > > > and >> > > > > > > >> they >> > > > > > > >>> are not willing to relicense it then it would have to be >> > bolted >> > > > on >> > > > > > > after >> > > > > > > >>> Apache Pig is built. Nothing stops you from doing this by >> > > > > > downloading >> > > > > > > >>> Apache Pig, adding this library and your code, and >> > > > redistributing, >> > > > > > > though >> > > > > > > >>> it wouldn't then be open to all Pig users. >> > > > > > > >>> >> > > > > > > >>> Alan. >> > > > > > > >>> >> > > > > > > >>> On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote: >> > > > > > > >>> >> > > > > > > >>>> Thanks for your response. I was never good at >> > differentiating >> > > > all >> > > > > > > those >> > > > > > > >>>> open source licenses. I mean what is the point making >> open >> > > > source >> > > > > > > >>> licenses >> > > > > > > >>>> if it blocks me from using a library in an open source >> > > project. >> > > > > Any >> > > > > > > >> way, >> > > > > > > >>>> I'm not going into debate here. Just one question, if we >> use >> > > JTS >> > > > > as >> > > > > > a >> > > > > > > >>>> library (jar file) without adding the code in Pig, is it >> > > still a >> > > > > > > >>> violation? >> > > > > > > >>>> We'll use ivy, for example, to download the jar file when >> > > > > compiling. >> > > > > > > >>>> On May 1, 2013 7:50 PM, "Alan Gates" < >> ga...@hortonworks.com >> > > >> > > > > wrote: >> > > > > > > >>>> >> > > > > > > >>>>> Passing on the technical details for a moment, I see a >> > > > licensing >> > > > > > > >> issue. >> > > > > > > >>>>> JTS is licensed under LGPL. Apache projects cannot >> contain >> > > or >> > > > > ship >> > > > > > > >>>>> [L]GPL. Apache does not meet the requirements of GPL >> and >> > > thus >> > > > we >> > > > > > > >> cannot >> > > > > > > >>>>> repackage their code. If you wanted to go forward using >> > that >> > > > > class >> > > > > > > >> this >> > > > > > > >>>>> would have to be packaged as an add on that was >> downloaded >> > > > > > separately >> > > > > > > >>> and >> > > > > > > >>>>> not from Apache. Another option is to work with the JTS >> > > > > community >> > > > > > > and >> > > > > > > >>> see >> > > > > > > >>>>> if they are willing to dual license their code under >> BSD or >> > > > > Apache >> > > > > > > >>> license >> > > > > > > >>>>> so that Pig could include it. If neither of those are >> an >> > > > option >> > > > > > you >> > > > > > > >>> would >> > > > > > > >>>>> need to come up with a new class to contain your spatial >> > > data. >> > > > > > > >>>>> >> > > > > > > >>>>> Alan. >> > > > > > > >>>>> >> > > > > > > >>>>> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote: >> > > > > > > >>>>> >> > > > > > > >>>>>> Hi all, >> > > > > > > >>>>>> First, sorry for the long email. I wanted to put all my >> > > > thoughts >> > > > > > > here >> > > > > > > >>>>> and >> > > > > > > >>>>>> get your feedback. >> > > > > > > >>>>>> I'm proposing a major addition to Pig that will greatly >> > > > increase >> > > > > > its >> > > > > > > >>>>>> functionality and user base. It is simply to add >> spatial >> > > > support >> > > > > > to >> > > > > > > >> the >> > > > > > > >>>>>> language and the framework. I've already started >> working >> > on >> > > > that >> > > > > > but >> > > > > > > >> I >> > > > > > > >>>>>> don't want it to be just another branch. I want it, >> > > > eventually, >> > > > > to >> > > > > > > be >> > > > > > > >>>>>> merged with the trunk of Apache Pig. So, I'm sending >> this >> > > > email >> > > > > > > >> mainly >> > > > > > > >>> to >> > > > > > > >>>>>> reach out the main contributors of Pig to see the >> > > feasibility >> > > > of >> > > > > > > >> this. >> > > > > > > >>>>>> This addition is a part of a big project we have been >> > > working >> > > > on >> > > > > > in >> > > > > > > >>>>>> University of Minnesota; the project is called Spatial >> > > Hadoop. >> > > > > > > >>>>>> http://spatialhadoop.cs.umn.edu. It's about building a >> > > > > MapReduce >> > > > > > > >>>>> framework >> > > > > > > >>>>>> (Hadoop) that is capable of maintaining and analyzing >> > > spatial >> > > > > data >> > > > > > > >>>>>> efficiently. I'm the main guy behind that project and >> > since >> > > we >> > > > > > > >> released >> > > > > > > >>>>> its >> > > > > > > >>>>>> first version, we received very encouraging responses >> from >> > > > > > different >> > > > > > > >>>>> groups >> > > > > > > >>>>>> in the research and industrial community. I'm sure the >> > > > addition >> > > > > we >> > > > > > > >> want >> > > > > > > >>>>> to >> > > > > > > >>>>>> make to Pig Latin will be widely accepted by the >> people in >> > > the >> > > > > > > >> spatial >> > > > > > > >>>>>> community. >> > > > > > > >>>>>> I'm proposing a plan here while we're still in the >> early >> > > > phases >> > > > > of >> > > > > > > >> this >> > > > > > > >>>>>> task to be able to discuss it with the main >> contributors >> > and >> > > > see >> > > > > > its >> > > > > > > >>>>>> feasibility. First of all, I think that we need to >> change >> > > the >> > > > > core >> > > > > > > of >> > > > > > > >>> Pig >> > > > > > > >>>>>> to be able to support spatial data. Providing a set of >> > UDFs >> > > > only >> > > > > > is >> > > > > > > >> not >> > > > > > > >>>>>> enough. The main reason is that Pig Latin does not >> > provide a >> > > > way >> > > > > > to >> > > > > > > >>>>> create >> > > > > > > >>>>>> a new data type which is needed for spatial data. Once >> we >> > > have >> > > > > the >> > > > > > > >>>>> spatial >> > > > > > > >>>>>> data types we need, the functionality can be expanded >> > using >> > > > more >> > > > > > > >> UDFs. >> > > > > > > >>>>>> >> > > > > > > >>>>>> Here's the plan as I see it. >> > > > > > > >>>>>> 1- Introduce a new primitive data type Geometry which >> > > > represents >> > > > > > all >> > > > > > > >>>>>> spatial data types. In the underlying system, this will >> > map >> > > to >> > > > > > > >>>>>> com.vividsolutions.jts.geom.Geometry. This is a class >> from >> > > > Java >> > > > > > > >>> Topology >> > > > > > > >>>>>> Suite (JTS) [ >> > http://www.vividsolutions.com/jts/JTSHome.htm >> > > ], >> > > > a >> > > > > > > >> stable >> > > > > > > >>>>> and >> > > > > > > >>>>>> efficient open source Java library for spatial data >> types >> > > and >> > > > > > > >>> algorithms. >> > > > > > > >>>>>> It is very popular in the spatial community and a C++ >> port >> > > of >> > > > it >> > > > > > is >> > > > > > > >>> used >> > > > > > > >>>>> in >> > > > > > > >>>>>> PostGIS [http://postgis.net/] (a spatial library for >> > > > Postgres). >> > > > > > JTS >> > > > > > > >>> also >> > > > > > > >>>>>> conforms with Open Geospatial Consortium (OGC) [ >> > > > > > > >>>>>> http://www.opengeospatial.org/] which is an open >> standard >> > > for >> > > > > the >> > > > > > > >>>>> spatial >> > > > > > > >>>>>> data types. The Geometry data type is read from and >> > written >> > > to >> > > > > > text >> > > > > > > >>> files >> > > > > > > >>>>>> using the Well Known Text (WKT) format. There is also a >> > way >> > > to >> > > > > > > >> convert >> > > > > > > >>> it >> > > > > > > >>>>>> to/from binary so that it can work with binary files >> and >> > > > > streams. >> > > > > > > >>>>>> 2- Add functions that manipulate spatial data types. >> These >> > > > will >> > > > > be >> > > > > > > >>> added >> > > > > > > >>>>> as >> > > > > > > >>>>>> UDFs and we will not need to mess with the internals of >> > Pig. >> > > > > Most >> > > > > > > >>>>> probably, >> > > > > > > >>>>>> there will be one new class for each operation (e.g., >> > union >> > > or >> > > > > > > >>>>>> intersection). I think it will be good to put these new >> > > > > operations >> > > > > > > >>> inside >> > > > > > > >>>>>> the core of Pig so that users can use it without >> having to >> > > > write >> > > > > > the >> > > > > > > >>>>> fully >> > > > > > > >>>>>> qualified class name. Also, since there is no way to >> > > > implicitly >> > > > > > cast >> > > > > > > >> a >> > > > > > > >>>>>> spatial data type to a non-spatial data types, there >> will >> > > not >> > > > be >> > > > > > any >> > > > > > > >>>>>> conflicts in existing operations or new operations. All >> > new >> > > > > > > >> operations, >> > > > > > > >>>>> and >> > > > > > > >>>>>> only the new operations, will be working on spatial >> data >> > > > types. >> > > > > > Here >> > > > > > > >> is >> > > > > > > >>>>> an >> > > > > > > >>>>>> initial list of operations that can be added. All those >> > > > > operations >> > > > > > > >> are >> > > > > > > >>>>>> already implemented in JTS and the UDFs added to Pig >> will >> > be >> > > > > just >> > > > > > > >>>>> wrappers >> > > > > > > >>>>>> around them. >> > > > > > > >>>>>> **Predicates (used for spatial filtering) >> > > > > > > >>>>>> Equals >> > > > > > > >>>>>> Disjoint >> > > > > > > >>>>>> Intersects >> > > > > > > >>>>>> Touches >> > > > > > > >>>>>> Crosses >> > > > > > > >>>>>> Within >> > > > > > > >>>>>> Contains >> > > > > > > >>>>>> Overlaps >> > > > > > > >>>>>> >> > > > > > > >>>>>> **Operations >> > > > > > > >>>>>> Envelope >> > > > > > > >>>>>> Area >> > > > > > > >>>>>> Length >> > > > > > > >>>>>> Buffer >> > > > > > > >>>>>> ConvexHull >> > > > > > > >>>>>> Intersection >> > > > > > > >>>>>> Union >> > > > > > > >>>>>> Difference >> > > > > > > >>>>>> SymDifference >> > > > > > > >>>>>> >> > > > > > > >>>>>> **Aggregate functions >> > > > > > > >>>>>> Accum >> > > > > > > >>>>>> ConvexHull >> > > > > > > >>>>>> Union >> > > > > > > >>>>>> >> > > > > > > >>>>>> 3- The third step is to implement spatial indexes >> (e.g., >> > > Grid >> > > > or >> > > > > > > >>>>> R-tree). A >> > > > > > > >>>>>> Pig loader and Pig output classes will be created for >> > those >> > > > > > indexes. >> > > > > > > >>> Note >> > > > > > > >>>>>> that currently we have SpatialOutputFormat and >> > > > > SpatialInputFormat >> > > > > > > for >> > > > > > > >>>>> those >> > > > > > > >>>>>> indexes inside the Spatial Hadoop project, but we need >> to >> > > > tweak >> > > > > > them >> > > > > > > >> to >> > > > > > > >>>>>> work with Pig. >> > > > > > > >>>>>> >> > > > > > > >>>>>> 4- (Advanced) Implement more sophisticated algorithms >> for >> > > > > spatial >> > > > > > > >>>>>> operations that utilize the indexes. For example, we >> can >> > > have >> > > > a >> > > > > > > >>> specific >> > > > > > > >>>>>> algorithm for spatial range query or spatial join. >> Again, >> > we >> > > > > > already >> > > > > > > >>> have >> > > > > > > >>>>>> algorithms built for different operations implemented >> in >> > > > Spatial >> > > > > > > >> Hadoop >> > > > > > > >>>>> as >> > > > > > > >>>>>> MapReduce programs, but they will need to be modified >> to >> > > work >> > > > in >> > > > > > Pig >> > > > > > > >>>>>> environment and get to work with other operations. >> > > > > > > >>>>>> >> > > > > > > >>>>>> This is my whole plan for the spatial extension to Pig. >> > I've >> > > > > > already >> > > > > > > >>>>>> started with the first step but as I mentioned >> earlier, I >> > > > don't >> > > > > > want >> > > > > > > >> to >> > > > > > > >>>>> do >> > > > > > > >>>>>> the work for our project and then the work gets >> > forgotten. I >> > > > > want >> > > > > > to >> > > > > > > >>>>>> contribute to Pig and do my research at the same time. >> If >> > > you >> > > > > > think >> > > > > > > >> the >> > > > > > > >>>>>> plan is plausible, I'll open JIRA issues for the above >> > tasks >> > > > and >> > > > > > > >> start >> > > > > > > >>>>>> shipping patches to do the stuff. I'll conform with the >> > > > > standards >> > > > > > of >> > > > > > > >>> the >> > > > > > > >>>>>> project such as adding tests and well commenting the >> code. >> > > > > > > >>>>>> Sorry for the long email and hope to hear back from >> you. >> > > > > > > >>>>>> >> > > > > > > >>>>>> >> > > > > > > >>>>>> Best regards, >> > > > > > > >>>>>> Ahmed Eldawy >> > > > > > > >>>>> >> > > > > > > >>>>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >> >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >