Thanks for your response. I was never good at differentiating all those
open source licenses. I mean what is the point making open source licenses
if it blocks me from using a library in an open source project. Any way,
I'm not going into debate here. Just one question, if we use JTS as a
library (jar file) without adding the code in Pig, is it still a violation?
We'll use ivy, for example, to download the jar file when compiling.
 On May 1, 2013 7:50 PM, "Alan Gates" <ga...@hortonworks.com> wrote:

> Passing on the technical details for a moment, I see a licensing issue.
>  JTS is licensed under LGPL.  Apache projects cannot contain or ship
> [L]GPL.  Apache does not meet the requirements of GPL and thus we cannot
> repackage their code. If you wanted to go forward using that class this
> would have to be packaged as an add on that was downloaded separately and
> not from Apache.  Another option is to work with the JTS community and see
> if they are willing to dual license their code under BSD or Apache license
> so that Pig could include it.  If neither of those are an option you would
> need to come up with a new class to contain your spatial data.
>
> Alan.
>
> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote:
>
> > Hi all,
> >  First, sorry for the long email. I wanted to put all my thoughts here
> and
> > get your feedback.
> >  I'm proposing a major addition to Pig that will greatly increase its
> > functionality and user base. It is simply to add spatial support to the
> > language and the framework. I've already started working on that but I
> > don't want it to be just another branch. I want it, eventually, to be
> > merged with the trunk of Apache Pig. So, I'm sending this email mainly to
> > reach out the main contributors of Pig to see the feasibility of this.
> > This addition is a part of a big project we have been working on in
> > University of Minnesota; the project is called Spatial Hadoop.
> > http://spatialhadoop.cs.umn.edu. It's about building a MapReduce
> framework
> > (Hadoop) that is capable of maintaining and analyzing spatial data
> > efficiently. I'm the main guy behind that project and since we released
> its
> > first version, we received very encouraging responses from different
> groups
> > in the research and industrial community. I'm sure the addition we want
> to
> > make to Pig Latin will be widely accepted by the people in the spatial
> > community.
> > I'm proposing a plan here while we're still in the early phases of this
> > task to be able to discuss it with the main contributors and see its
> > feasibility. First of all, I think that we need to change the core of Pig
> > to be able to support spatial data. Providing a set of UDFs only is not
> > enough. The main reason is that Pig Latin does not provide a way to
> create
> > a new data type which is needed for spatial data. Once we have the
> spatial
> > data types we need, the functionality can be expanded using more UDFs.
> >
> > Here's the plan as I see it.
> > 1- Introduce a new primitive data type Geometry which represents all
> > spatial data types. In the underlying system, this will map to
> > com.vividsolutions.jts.geom.Geometry. This is a class from Java Topology
> > Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a stable
> and
> > efficient open source Java library for spatial data types and algorithms.
> > It is very popular in the spatial community and a C++ port of it is used
> in
> > PostGIS [http://postgis.net/] (a spatial library for Postgres). JTS also
> > conforms with Open Geospatial Consortium (OGC) [
> > http://www.opengeospatial.org/] which is an open standard for the
> spatial
> > data types. The Geometry data type is read from and written to text files
> > using the Well Known Text (WKT) format. There is also a way to convert it
> > to/from binary so that it can work with binary files and streams.
> > 2- Add functions that manipulate spatial data types. These will be added
> as
> > UDFs and we will not need to mess with the internals of Pig. Most
> probably,
> > there will be one new class for each operation (e.g., union or
> > intersection). I think it will be good to put these new operations inside
> > the core of Pig so that users can use it without having to write the
> fully
> > qualified class name. Also, since there is no way to implicitly cast a
> > spatial data type to a non-spatial data types, there will not be any
> > conflicts in existing operations or new operations. All new operations,
> and
> > only the new operations, will be working on spatial data types. Here is
> an
> > initial list of operations that can be added. All those operations are
> > already implemented in JTS and the UDFs added to Pig will be just
> wrappers
> > around them.
> > **Predicates (used for spatial filtering)
> > Equals
> > Disjoint
> > Intersects
> > Touches
> > Crosses
> > Within
> > Contains
> > Overlaps
> >
> > **Operations
> > Envelope
> > Area
> > Length
> > Buffer
> > ConvexHull
> > Intersection
> > Union
> > Difference
> > SymDifference
> >
> > **Aggregate functions
> > Accum
> > ConvexHull
> > Union
> >
> > 3- The third step is to implement spatial indexes (e.g., Grid or
> R-tree). A
> > Pig loader and Pig output classes will be created for those indexes. Note
> > that currently we have SpatialOutputFormat and SpatialInputFormat for
> those
> > indexes inside the Spatial Hadoop project, but we need to tweak them to
> > work with Pig.
> >
> > 4- (Advanced) Implement more sophisticated algorithms for spatial
> > operations that utilize the indexes. For example, we can have a specific
> > algorithm for spatial range query or spatial join. Again, we already have
> > algorithms built for different operations implemented in Spatial Hadoop
> as
> > MapReduce programs, but they will need to be modified to work in Pig
> > environment and get to work with other operations.
> >
> > This is my whole plan for the spatial extension to Pig. I've already
> > started with the first step but as I mentioned earlier, I don't want to
> do
> > the work for our project and then the work gets forgotten. I want to
> > contribute to Pig and do my research at the same time. If you think the
> > plan is plausible, I'll open JIRA issues for the above tasks and start
> > shipping patches to do the stuff. I'll conform with the standards of the
> > project such as adding tests and well commenting the code.
> > Sorry for the long email and hope to hear back from you.
> >
> >
> > Best regards,
> > Ahmed Eldawy
>
>

Reply via email to