If a way could be found, this would be an awesome addition to Pig.

Russell Jurney http://datasyndrome.com

On May 3, 2013, at 4:09 PM, Daniel Dai <[email protected]> wrote:

> I am not sure how other Apache projects dealing with it? Seems Solr also
> has some connector to JTS?
>
> Thanks,
> Daniel
>
>
> On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy <[email protected]> wrote:
>
>> Thanks Alan for your interest. It's too bad that an open source licensing
>> issue is holding me back from doing some open source work. I understand the
>> issue and your workarounds make sense. However, as I mentioned in the
>> beginning, I don't want to have my own branch of Pig because it makes my
>> extension less portable. I'll think of another way to do it. I'll ask vivid
>> solutions if they can double license their code although I think the answer
>> will be no. I'll also think of a way to ship my extension as a set of jar
>> files without the need to change the core of Pig. This way, it can be
>> easily ported to newer versions of Pig.
>>
>> Thanks
>> Ahmed
>>
>> Best regards,
>> Ahmed Eldawy
>>
>>
>> On Thu, May 2, 2013 at 12:33 PM, Alan Gates <[email protected]> wrote:
>>
>>> I know this is frustrating, but the different licenses do have different
>>> requirements that make it so that Apache can't ship GPL code.  A legal
>>> explanation is at http://www.apache.org/licenses/GPL-compatibility.htmlFor 
>>> additional info on the LGPL specific questions see
>>> http://www.apache.org/legal/3party.html
>>>
>>> As far as pulling it in via ivy, the issue isn't so much where the code
>>> lives as much as what code we are requiring to make Pig work.  If
>> something
>>> that is [L]GPL is required for Pig it violates Apache rules as outlined
>>> above.  It also would be a show stopper for a lot of companies that
>>> redistribute Pig and that are allergic to GPL software.
>>>
>>> So, as I said before, if you wanted to continue with that library and
>> they
>>> are not willing to relicense it then it would have to be bolted on after
>>> Apache Pig is built.  Nothing stops you from doing this by downloading
>>> Apache Pig, adding this library and your code, and redistributing, though
>>> it wouldn't then be open to all Pig users.
>>>
>>> Alan.
>>>
>>> On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote:
>>>
>>>> Thanks for your response. I was never good at differentiating all those
>>>> open source licenses. I mean what is the point making open source
>>> licenses
>>>> if it blocks me from using a library in an open source project. Any
>> way,
>>>> I'm not going into debate here. Just one question, if we use JTS as a
>>>> library (jar file) without adding the code in Pig, is it still a
>>> violation?
>>>> We'll use ivy, for example, to download the jar file when compiling.
>>>> On May 1, 2013 7:50 PM, "Alan Gates" <[email protected]> wrote:
>>>>
>>>>> Passing on the technical details for a moment, I see a licensing
>> issue.
>>>>> JTS is licensed under LGPL.  Apache projects cannot contain or ship
>>>>> [L]GPL.  Apache does not meet the requirements of GPL and thus we
>> cannot
>>>>> repackage their code. If you wanted to go forward using that class
>> this
>>>>> would have to be packaged as an add on that was downloaded separately
>>> and
>>>>> not from Apache.  Another option is to work with the JTS community and
>>> see
>>>>> if they are willing to dual license their code under BSD or Apache
>>> license
>>>>> so that Pig could include it.  If neither of those are an option you
>>> would
>>>>> need to come up with a new class to contain your spatial data.
>>>>>
>>>>> Alan.
>>>>>
>>>>> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote:
>>>>>
>>>>>> Hi all,
>>>>>> First, sorry for the long email. I wanted to put all my thoughts here
>>>>> and
>>>>>> get your feedback.
>>>>>> I'm proposing a major addition to Pig that will greatly increase its
>>>>>> functionality and user base. It is simply to add spatial support to
>> the
>>>>>> language and the framework. I've already started working on that but
>> I
>>>>>> don't want it to be just another branch. I want it, eventually, to be
>>>>>> merged with the trunk of Apache Pig. So, I'm sending this email
>> mainly
>>> to
>>>>>> reach out the main contributors of Pig to see the feasibility of
>> this.
>>>>>> This addition is a part of a big project we have been working on in
>>>>>> University of Minnesota; the project is called Spatial Hadoop.
>>>>>> http://spatialhadoop.cs.umn.edu. It's about building a MapReduce
>>>>> framework
>>>>>> (Hadoop) that is capable of maintaining and analyzing spatial data
>>>>>> efficiently. I'm the main guy behind that project and since we
>> released
>>>>> its
>>>>>> first version, we received very encouraging responses from different
>>>>> groups
>>>>>> in the research and industrial community. I'm sure the addition we
>> want
>>>>> to
>>>>>> make to Pig Latin will be widely accepted by the people in the
>> spatial
>>>>>> community.
>>>>>> I'm proposing a plan here while we're still in the early phases of
>> this
>>>>>> task to be able to discuss it with the main contributors and see its
>>>>>> feasibility. First of all, I think that we need to change the core of
>>> Pig
>>>>>> to be able to support spatial data. Providing a set of UDFs only is
>> not
>>>>>> enough. The main reason is that Pig Latin does not provide a way to
>>>>> create
>>>>>> a new data type which is needed for spatial data. Once we have the
>>>>> spatial
>>>>>> data types we need, the functionality can be expanded using more
>> UDFs.
>>>>>>
>>>>>> Here's the plan as I see it.
>>>>>> 1- Introduce a new primitive data type Geometry which represents all
>>>>>> spatial data types. In the underlying system, this will map to
>>>>>> com.vividsolutions.jts.geom.Geometry. This is a class from Java
>>> Topology
>>>>>> Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a
>> stable
>>>>> and
>>>>>> efficient open source Java library for spatial data types and
>>> algorithms.
>>>>>> It is very popular in the spatial community and a C++ port of it is
>>> used
>>>>> in
>>>>>> PostGIS [http://postgis.net/] (a spatial library for Postgres). JTS
>>> also
>>>>>> conforms with Open Geospatial Consortium (OGC) [
>>>>>> http://www.opengeospatial.org/] which is an open standard for the
>>>>> spatial
>>>>>> data types. The Geometry data type is read from and written to text
>>> files
>>>>>> using the Well Known Text (WKT) format. There is also a way to
>> convert
>>> it
>>>>>> to/from binary so that it can work with binary files and streams.
>>>>>> 2- Add functions that manipulate spatial data types. These will be
>>> added
>>>>> as
>>>>>> UDFs and we will not need to mess with the internals of Pig. Most
>>>>> probably,
>>>>>> there will be one new class for each operation (e.g., union or
>>>>>> intersection). I think it will be good to put these new operations
>>> inside
>>>>>> the core of Pig so that users can use it without having to write the
>>>>> fully
>>>>>> qualified class name. Also, since there is no way to implicitly cast
>> a
>>>>>> spatial data type to a non-spatial data types, there will not be any
>>>>>> conflicts in existing operations or new operations. All new
>> operations,
>>>>> and
>>>>>> only the new operations, will be working on spatial data types. Here
>> is
>>>>> an
>>>>>> initial list of operations that can be added. All those operations
>> are
>>>>>> already implemented in JTS and the UDFs added to Pig will be just
>>>>> wrappers
>>>>>> around them.
>>>>>> **Predicates (used for spatial filtering)
>>>>>> Equals
>>>>>> Disjoint
>>>>>> Intersects
>>>>>> Touches
>>>>>> Crosses
>>>>>> Within
>>>>>> Contains
>>>>>> Overlaps
>>>>>>
>>>>>> **Operations
>>>>>> Envelope
>>>>>> Area
>>>>>> Length
>>>>>> Buffer
>>>>>> ConvexHull
>>>>>> Intersection
>>>>>> Union
>>>>>> Difference
>>>>>> SymDifference
>>>>>>
>>>>>> **Aggregate functions
>>>>>> Accum
>>>>>> ConvexHull
>>>>>> Union
>>>>>>
>>>>>> 3- The third step is to implement spatial indexes (e.g., Grid or
>>>>> R-tree). A
>>>>>> Pig loader and Pig output classes will be created for those indexes.
>>> Note
>>>>>> that currently we have SpatialOutputFormat and SpatialInputFormat for
>>>>> those
>>>>>> indexes inside the Spatial Hadoop project, but we need to tweak them
>> to
>>>>>> work with Pig.
>>>>>>
>>>>>> 4- (Advanced) Implement more sophisticated algorithms for spatial
>>>>>> operations that utilize the indexes. For example, we can have a
>>> specific
>>>>>> algorithm for spatial range query or spatial join. Again, we already
>>> have
>>>>>> algorithms built for different operations implemented in Spatial
>> Hadoop
>>>>> as
>>>>>> MapReduce programs, but they will need to be modified to work in Pig
>>>>>> environment and get to work with other operations.
>>>>>>
>>>>>> This is my whole plan for the spatial extension to Pig. I've already
>>>>>> started with the first step but as I mentioned earlier, I don't want
>> to
>>>>> do
>>>>>> the work for our project and then the work gets forgotten. I want to
>>>>>> contribute to Pig and do my research at the same time. If you think
>> the
>>>>>> plan is plausible, I'll open JIRA issues for the above tasks and
>> start
>>>>>> shipping patches to do the stuff. I'll conform with the standards of
>>> the
>>>>>> project such as adding tests and well commenting the code.
>>>>>> Sorry for the long email and hope to hear back from you.
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Ahmed Eldawy
>>>>>
>>>>>
>>>
>>>
>>

Reply via email to