If a way could be found, this would be an awesome addition to Pig. Russell Jurney http://datasyndrome.com
On May 3, 2013, at 4:09 PM, Daniel Dai <[email protected]> wrote: > I am not sure how other Apache projects dealing with it? Seems Solr also > has some connector to JTS? > > Thanks, > Daniel > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy <[email protected]> wrote: > >> Thanks Alan for your interest. It's too bad that an open source licensing >> issue is holding me back from doing some open source work. I understand the >> issue and your workarounds make sense. However, as I mentioned in the >> beginning, I don't want to have my own branch of Pig because it makes my >> extension less portable. I'll think of another way to do it. I'll ask vivid >> solutions if they can double license their code although I think the answer >> will be no. I'll also think of a way to ship my extension as a set of jar >> files without the need to change the core of Pig. This way, it can be >> easily ported to newer versions of Pig. >> >> Thanks >> Ahmed >> >> Best regards, >> Ahmed Eldawy >> >> >> On Thu, May 2, 2013 at 12:33 PM, Alan Gates <[email protected]> wrote: >> >>> I know this is frustrating, but the different licenses do have different >>> requirements that make it so that Apache can't ship GPL code. A legal >>> explanation is at http://www.apache.org/licenses/GPL-compatibility.htmlFor >>> additional info on the LGPL specific questions see >>> http://www.apache.org/legal/3party.html >>> >>> As far as pulling it in via ivy, the issue isn't so much where the code >>> lives as much as what code we are requiring to make Pig work. If >> something >>> that is [L]GPL is required for Pig it violates Apache rules as outlined >>> above. It also would be a show stopper for a lot of companies that >>> redistribute Pig and that are allergic to GPL software. >>> >>> So, as I said before, if you wanted to continue with that library and >> they >>> are not willing to relicense it then it would have to be bolted on after >>> Apache Pig is built. Nothing stops you from doing this by downloading >>> Apache Pig, adding this library and your code, and redistributing, though >>> it wouldn't then be open to all Pig users. >>> >>> Alan. >>> >>> On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote: >>> >>>> Thanks for your response. I was never good at differentiating all those >>>> open source licenses. I mean what is the point making open source >>> licenses >>>> if it blocks me from using a library in an open source project. Any >> way, >>>> I'm not going into debate here. Just one question, if we use JTS as a >>>> library (jar file) without adding the code in Pig, is it still a >>> violation? >>>> We'll use ivy, for example, to download the jar file when compiling. >>>> On May 1, 2013 7:50 PM, "Alan Gates" <[email protected]> wrote: >>>> >>>>> Passing on the technical details for a moment, I see a licensing >> issue. >>>>> JTS is licensed under LGPL. Apache projects cannot contain or ship >>>>> [L]GPL. Apache does not meet the requirements of GPL and thus we >> cannot >>>>> repackage their code. If you wanted to go forward using that class >> this >>>>> would have to be packaged as an add on that was downloaded separately >>> and >>>>> not from Apache. Another option is to work with the JTS community and >>> see >>>>> if they are willing to dual license their code under BSD or Apache >>> license >>>>> so that Pig could include it. If neither of those are an option you >>> would >>>>> need to come up with a new class to contain your spatial data. >>>>> >>>>> Alan. >>>>> >>>>> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote: >>>>> >>>>>> Hi all, >>>>>> First, sorry for the long email. I wanted to put all my thoughts here >>>>> and >>>>>> get your feedback. >>>>>> I'm proposing a major addition to Pig that will greatly increase its >>>>>> functionality and user base. It is simply to add spatial support to >> the >>>>>> language and the framework. I've already started working on that but >> I >>>>>> don't want it to be just another branch. I want it, eventually, to be >>>>>> merged with the trunk of Apache Pig. So, I'm sending this email >> mainly >>> to >>>>>> reach out the main contributors of Pig to see the feasibility of >> this. >>>>>> This addition is a part of a big project we have been working on in >>>>>> University of Minnesota; the project is called Spatial Hadoop. >>>>>> http://spatialhadoop.cs.umn.edu. It's about building a MapReduce >>>>> framework >>>>>> (Hadoop) that is capable of maintaining and analyzing spatial data >>>>>> efficiently. I'm the main guy behind that project and since we >> released >>>>> its >>>>>> first version, we received very encouraging responses from different >>>>> groups >>>>>> in the research and industrial community. I'm sure the addition we >> want >>>>> to >>>>>> make to Pig Latin will be widely accepted by the people in the >> spatial >>>>>> community. >>>>>> I'm proposing a plan here while we're still in the early phases of >> this >>>>>> task to be able to discuss it with the main contributors and see its >>>>>> feasibility. First of all, I think that we need to change the core of >>> Pig >>>>>> to be able to support spatial data. Providing a set of UDFs only is >> not >>>>>> enough. The main reason is that Pig Latin does not provide a way to >>>>> create >>>>>> a new data type which is needed for spatial data. Once we have the >>>>> spatial >>>>>> data types we need, the functionality can be expanded using more >> UDFs. >>>>>> >>>>>> Here's the plan as I see it. >>>>>> 1- Introduce a new primitive data type Geometry which represents all >>>>>> spatial data types. In the underlying system, this will map to >>>>>> com.vividsolutions.jts.geom.Geometry. This is a class from Java >>> Topology >>>>>> Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a >> stable >>>>> and >>>>>> efficient open source Java library for spatial data types and >>> algorithms. >>>>>> It is very popular in the spatial community and a C++ port of it is >>> used >>>>> in >>>>>> PostGIS [http://postgis.net/] (a spatial library for Postgres). JTS >>> also >>>>>> conforms with Open Geospatial Consortium (OGC) [ >>>>>> http://www.opengeospatial.org/] which is an open standard for the >>>>> spatial >>>>>> data types. The Geometry data type is read from and written to text >>> files >>>>>> using the Well Known Text (WKT) format. There is also a way to >> convert >>> it >>>>>> to/from binary so that it can work with binary files and streams. >>>>>> 2- Add functions that manipulate spatial data types. These will be >>> added >>>>> as >>>>>> UDFs and we will not need to mess with the internals of Pig. Most >>>>> probably, >>>>>> there will be one new class for each operation (e.g., union or >>>>>> intersection). I think it will be good to put these new operations >>> inside >>>>>> the core of Pig so that users can use it without having to write the >>>>> fully >>>>>> qualified class name. Also, since there is no way to implicitly cast >> a >>>>>> spatial data type to a non-spatial data types, there will not be any >>>>>> conflicts in existing operations or new operations. All new >> operations, >>>>> and >>>>>> only the new operations, will be working on spatial data types. Here >> is >>>>> an >>>>>> initial list of operations that can be added. All those operations >> are >>>>>> already implemented in JTS and the UDFs added to Pig will be just >>>>> wrappers >>>>>> around them. >>>>>> **Predicates (used for spatial filtering) >>>>>> Equals >>>>>> Disjoint >>>>>> Intersects >>>>>> Touches >>>>>> Crosses >>>>>> Within >>>>>> Contains >>>>>> Overlaps >>>>>> >>>>>> **Operations >>>>>> Envelope >>>>>> Area >>>>>> Length >>>>>> Buffer >>>>>> ConvexHull >>>>>> Intersection >>>>>> Union >>>>>> Difference >>>>>> SymDifference >>>>>> >>>>>> **Aggregate functions >>>>>> Accum >>>>>> ConvexHull >>>>>> Union >>>>>> >>>>>> 3- The third step is to implement spatial indexes (e.g., Grid or >>>>> R-tree). A >>>>>> Pig loader and Pig output classes will be created for those indexes. >>> Note >>>>>> that currently we have SpatialOutputFormat and SpatialInputFormat for >>>>> those >>>>>> indexes inside the Spatial Hadoop project, but we need to tweak them >> to >>>>>> work with Pig. >>>>>> >>>>>> 4- (Advanced) Implement more sophisticated algorithms for spatial >>>>>> operations that utilize the indexes. For example, we can have a >>> specific >>>>>> algorithm for spatial range query or spatial join. Again, we already >>> have >>>>>> algorithms built for different operations implemented in Spatial >> Hadoop >>>>> as >>>>>> MapReduce programs, but they will need to be modified to work in Pig >>>>>> environment and get to work with other operations. >>>>>> >>>>>> This is my whole plan for the spatial extension to Pig. I've already >>>>>> started with the first step but as I mentioned earlier, I don't want >> to >>>>> do >>>>>> the work for our project and then the work gets forgotten. I want to >>>>>> contribute to Pig and do my research at the same time. If you think >> the >>>>>> plan is plausible, I'll open JIRA issues for the above tasks and >> start >>>>>> shipping patches to do the stuff. I'll conform with the standards of >>> the >>>>>> project such as adding tests and well commenting the code. >>>>>> Sorry for the long email and hope to hear back from you. >>>>>> >>>>>> >>>>>> Best regards, >>>>>> Ahmed Eldawy >>>>> >>>>> >>> >>> >>
