Thanks for your response. I was never good at differentiating all those open source licenses. I mean what is the point making open source licenses if it blocks me from using a library in an open source project. Any way, I'm not going into debate here. Just one question, if we use JTS as a library (jar file) without adding the code in Pig, is it still a violation? We'll use ivy, for example, to download the jar file when compiling. On May 1, 2013 7:50 PM, "Alan Gates" <ga...@hortonworks.com> wrote:
> Passing on the technical details for a moment, I see a licensing issue. > JTS is licensed under LGPL. Apache projects cannot contain or ship > [L]GPL. Apache does not meet the requirements of GPL and thus we cannot > repackage their code. If you wanted to go forward using that class this > would have to be packaged as an add on that was downloaded separately and > not from Apache. Another option is to work with the JTS community and see > if they are willing to dual license their code under BSD or Apache license > so that Pig could include it. If neither of those are an option you would > need to come up with a new class to contain your spatial data. > > Alan. > > On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote: > > > Hi all, > > First, sorry for the long email. I wanted to put all my thoughts here > and > > get your feedback. > > I'm proposing a major addition to Pig that will greatly increase its > > functionality and user base. It is simply to add spatial support to the > > language and the framework. I've already started working on that but I > > don't want it to be just another branch. I want it, eventually, to be > > merged with the trunk of Apache Pig. So, I'm sending this email mainly to > > reach out the main contributors of Pig to see the feasibility of this. > > This addition is a part of a big project we have been working on in > > University of Minnesota; the project is called Spatial Hadoop. > > http://spatialhadoop.cs.umn.edu. It's about building a MapReduce > framework > > (Hadoop) that is capable of maintaining and analyzing spatial data > > efficiently. I'm the main guy behind that project and since we released > its > > first version, we received very encouraging responses from different > groups > > in the research and industrial community. I'm sure the addition we want > to > > make to Pig Latin will be widely accepted by the people in the spatial > > community. > > I'm proposing a plan here while we're still in the early phases of this > > task to be able to discuss it with the main contributors and see its > > feasibility. First of all, I think that we need to change the core of Pig > > to be able to support spatial data. Providing a set of UDFs only is not > > enough. The main reason is that Pig Latin does not provide a way to > create > > a new data type which is needed for spatial data. Once we have the > spatial > > data types we need, the functionality can be expanded using more UDFs. > > > > Here's the plan as I see it. > > 1- Introduce a new primitive data type Geometry which represents all > > spatial data types. In the underlying system, this will map to > > com.vividsolutions.jts.geom.Geometry. This is a class from Java Topology > > Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a stable > and > > efficient open source Java library for spatial data types and algorithms. > > It is very popular in the spatial community and a C++ port of it is used > in > > PostGIS [http://postgis.net/] (a spatial library for Postgres). JTS also > > conforms with Open Geospatial Consortium (OGC) [ > > http://www.opengeospatial.org/] which is an open standard for the > spatial > > data types. The Geometry data type is read from and written to text files > > using the Well Known Text (WKT) format. There is also a way to convert it > > to/from binary so that it can work with binary files and streams. > > 2- Add functions that manipulate spatial data types. These will be added > as > > UDFs and we will not need to mess with the internals of Pig. Most > probably, > > there will be one new class for each operation (e.g., union or > > intersection). I think it will be good to put these new operations inside > > the core of Pig so that users can use it without having to write the > fully > > qualified class name. Also, since there is no way to implicitly cast a > > spatial data type to a non-spatial data types, there will not be any > > conflicts in existing operations or new operations. All new operations, > and > > only the new operations, will be working on spatial data types. Here is > an > > initial list of operations that can be added. All those operations are > > already implemented in JTS and the UDFs added to Pig will be just > wrappers > > around them. > > **Predicates (used for spatial filtering) > > Equals > > Disjoint > > Intersects > > Touches > > Crosses > > Within > > Contains > > Overlaps > > > > **Operations > > Envelope > > Area > > Length > > Buffer > > ConvexHull > > Intersection > > Union > > Difference > > SymDifference > > > > **Aggregate functions > > Accum > > ConvexHull > > Union > > > > 3- The third step is to implement spatial indexes (e.g., Grid or > R-tree). A > > Pig loader and Pig output classes will be created for those indexes. Note > > that currently we have SpatialOutputFormat and SpatialInputFormat for > those > > indexes inside the Spatial Hadoop project, but we need to tweak them to > > work with Pig. > > > > 4- (Advanced) Implement more sophisticated algorithms for spatial > > operations that utilize the indexes. For example, we can have a specific > > algorithm for spatial range query or spatial join. Again, we already have > > algorithms built for different operations implemented in Spatial Hadoop > as > > MapReduce programs, but they will need to be modified to work in Pig > > environment and get to work with other operations. > > > > This is my whole plan for the spatial extension to Pig. I've already > > started with the first step but as I mentioned earlier, I don't want to > do > > the work for our project and then the work gets forgotten. I want to > > contribute to Pig and do my research at the same time. If you think the > > plan is plausible, I'll open JIRA issues for the above tasks and start > > shipping patches to do the stuff. I'll conform with the standards of the > > project such as adding tests and well commenting the code. > > Sorry for the long email and hope to hear back from you. > > > > > > Best regards, > > Ahmed Eldawy > >