Hi all,

Nick has pointed out to me an alternative GIS package that can replace JTS.
ESRI has recently released a GIS
package<https://github.com/Esri/geometry-api-java>under Apache
license. I changed Pigeon to work with that new package. I
think it could be easier now to integrate this work with main branch of
Apache Pig. I will go on with the current project and add more spatial
functionality. We can then add a new datatype to Apache and link it to
those functions.

ESRI package contains a class OGCGeometry
<http://esri.github.io/geometry-api-java/javadoc/com/esri/core/geometry/ogc/OGCGeometry.html>which
can be linked to a new datatype 'Geometry'. Do you think we can rely on the
new package and integrate the work with Apache Pig?

On May 23, 2013 11:40 PM, "Ahmed Eldawy" <aseld...@gmail.com> wrote:

> Hi all,
>   Thanks for your help. I've started the project with a minimal
> functionality as a start. It's currently hosted in github. It is licensed
> under the Apache public license to make it easier to merge with Pig.
> Currently it has only a very few functions. I implemented a function from
> different types of functions (e.g., Aggregate and create). I'll keep adding
> functions and any contributions to the project are welcome. As a beginning,
> I need an ANT build file that runs the tests, compiles and generates a jar
> file. I'm not familiar with ANT so any help in this is encouraged.
> Here's the project home page
> https://github.com/aseldawy/pigeon
>
>
> If you have any comments or suggestion please contact me.
>
>
> Best regards,
> Ahmed Eldawy
>
>
> On Mon, May 6, 2013 at 3:09 PM, Jonathan Coveney <jcove...@gmail.com>wrote:
>
>> Nick: the only issue is that the way types are implemented in Pig don't
>> allow us to easily "plug-in" types externally. Adding support for that
>> would be cool, but a fair bit of work.
>>
>>
>> 2013/5/6 Nick Dimiduk <ndimi...@gmail.com>
>>
>> > I'm to a lawyer, but I see no reason why this cannot be an external
>> > extension to Pig. It would behave the same way PostGIS is an external
>> > extension to Postgres. Any Apache issues would be toward general
>> > purpose enhancements, not specific to your project.
>> >
>> > Good on you!
>> > -n
>> >
>> > On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy <aseld...@gmail.com>
>> wrote:
>> >
>> > > I contacted solr developers to see how JTS can be included in an
>> Apache
>> > > project. See
>> > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/
>> > > As far as I understand, they did not include it in the main solr
>> project,
>> > > rather, they created a separate project (spatial 4j) which is still
>> > > licensed under Apache license and refers to JTS. Users will have to
>> > > download JTS libraries separately to make it run. That's pretty much
>> the
>> > > same plan that Jonathan mentioned. We will still have the overhead of
>> > > serializing/deserializing the shapes each time a function is called.
>> > Also,
>> > > we will have to use the ugly bytearray data type for spatial data
>> instead
>> > > of creating its own data type (e.g., Geometry).
>> > > I think using spatial 4j instead of JTS will not be sufficient for our
>> > case
>> > > as we need to provide an access to all spatial functions of JTS such
>> as
>> > > Union, Intersection, Difference, ... etc. This way we can claim
>> > conformity
>> > > with OGC standards which gives visibility and appreciations of the
>> > spatial
>> > > community.
>> > > I think also that this means I will not add any issues to JIRA as it
>> is
>> > now
>> > > a separate project. I'm planning to host it on github and have all the
>> > > issues there.
>> > > Let me know if you have any suggestions or comments.
>> > >
>> > > Thanks
>> > > Ahmed
>> > >
>> > >
>> > > Best regards,
>> > > Ahmed Eldawy
>> > >
>> > >
>> > > On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney <jcove...@gmail.com>
>> > > wrote:
>> > >
>> > > > You can give them all the same label or tag and filter on that later
>> > on.
>> > > >
>> > > >
>> > > > 2013/5/6 Ahmed Eldawy <aseld...@gmail.com>
>> > > >
>> > > > > Thanks all for taking the time to respond. Danial, I didn't know
>> that
>> > > > Solr
>> > > > > uses JTS. This is a good finding and we can definitely ask them to
>> > see
>> > > if
>> > > > > there is a work around we can do. Jonathan, I thought of the same
>> > idea
>> > > of
>> > > > > serializing/deserializing a bytearray each time a UDF is called.
>> The
>> > > > > deserialization part is good for letting Pig auto detect spatial
>> > types
>> > > if
>> > > > > not set explicitly in the schema. What is the best way to start
>> > this? I
>> > > > > want to add an initial set of JIRA issues and start working on
>> them
>> > > but I
>> > > > > also need to keep the work grouped in some sense just for
>> > organization.
>> > > > >
>> > > > > Thanks
>> > > > > Ahmed
>> > > > >
>> > > > > Best regards,
>> > > > > Ahmed Eldawy
>> > > > >
>> > > > >
>> > > > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney <
>> jcove...@gmail.com
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > I agree that this is cool, and if other projects are using JTS
>> it
>> > is
>> > > > > worth
>> > > > > > talking them to see how. I also agree that licensing is very
>> > > > frustrating.
>> > > > > >
>> > > > > > In the short term, however, while it is annoying to have to
>> manage
>> > > the
>> > > > > > serialization and deserialization yourself, you can have the
>> > geometry
>> > > > > type
>> > > > > > be passed around as a bytearray type. Your UDF's will have to
>> know
>> > > this
>> > > > > and
>> > > > > > treat it accordingly, but if you did this then all of the tools
>> > could
>> > > > be
>> > > > > in
>> > > > > > an external project on github instead of a branch in Pig. Then,
>> if
>> > we
>> > > > can
>> > > > > > get the licensing done, we could add the Geometry type to Pig.
>> > Adding
>> > > > > > types, honestly, is kind of tedious but not super difficult, so
>> > once
>> > > > the
>> > > > > > rest is done, that shouldn't be too difficult.
>> > > > > >
>> > > > > >
>> > > > > > 2013/5/4 Russell Jurney <russell.jur...@gmail.com>
>> > > > > >
>> > > > > > > If a way could be found, this would be an awesome addition to
>> > Pig.
>> > > > > > >
>> > > > > > > Russell Jurney http://datasyndrome.com
>> > > > > > >
>> > > > > > > On May 3, 2013, at 4:09 PM, Daniel Dai <da...@hortonworks.com
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > I am not sure how other Apache projects dealing with it?
>> Seems
>> > > Solr
>> > > > > > also
>> > > > > > > > has some connector to JTS?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Daniel
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy <
>> > > aseld...@gmail.com>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > >> Thanks Alan for your interest. It's too bad that an open
>> > source
>> > > > > > > licensing
>> > > > > > > >> issue is holding me back from doing some open source work.
>> I
>> > > > > > understand
>> > > > > > > the
>> > > > > > > >> issue and your workarounds make sense. However, as I
>> mentioned
>> > > in
>> > > > > the
>> > > > > > > >> beginning, I don't want to have my own branch of Pig
>> because
>> > it
>> > > > > makes
>> > > > > > my
>> > > > > > > >> extension less portable. I'll think of another way to do
>> it.
>> > > I'll
>> > > > > ask
>> > > > > > > vivid
>> > > > > > > >> solutions if they can double license their code although I
>> > think
>> > > > the
>> > > > > > > answer
>> > > > > > > >> will be no. I'll also think of a way to ship my extension
>> as a
>> > > set
>> > > > > of
>> > > > > > > jar
>> > > > > > > >> files without the need to change the core of Pig. This
>> way, it
>> > > can
>> > > > > be
>> > > > > > > >> easily ported to newer versions of Pig.
>> > > > > > > >>
>> > > > > > > >> Thanks
>> > > > > > > >> Ahmed
>> > > > > > > >>
>> > > > > > > >> Best regards,
>> > > > > > > >> Ahmed Eldawy
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >> On Thu, May 2, 2013 at 12:33 PM, Alan Gates <
>> > > > ga...@hortonworks.com>
>> > > > > > > wrote:
>> > > > > > > >>
>> > > > > > > >>> I know this is frustrating, but the different licenses do
>> > have
>> > > > > > > different
>> > > > > > > >>> requirements that make it so that Apache can't ship GPL
>> code.
>> > >  A
>> > > > > > legal
>> > > > > > > >>> explanation is at
>> > > > > > >
>> > http://www.apache.org/licenses/GPL-compatibility.htmlForadditional
>> > > > > info
>> > > > > > > on the LGPL specific questions see
>> > > > > > > >>> http://www.apache.org/legal/3party.html
>> > > > > > > >>>
>> > > > > > > >>> As far as pulling it in via ivy, the issue isn't so much
>> > where
>> > > > the
>> > > > > > code
>> > > > > > > >>> lives as much as what code we are requiring to make Pig
>> work.
>> > >  If
>> > > > > > > >> something
>> > > > > > > >>> that is [L]GPL is required for Pig it violates Apache
>> rules
>> > as
>> > > > > > outlined
>> > > > > > > >>> above.  It also would be a show stopper for a lot of
>> > companies
>> > > > that
>> > > > > > > >>> redistribute Pig and that are allergic to GPL software.
>> > > > > > > >>>
>> > > > > > > >>> So, as I said before, if you wanted to continue with that
>> > > library
>> > > > > and
>> > > > > > > >> they
>> > > > > > > >>> are not willing to relicense it then it would have to be
>> > bolted
>> > > > on
>> > > > > > > after
>> > > > > > > >>> Apache Pig is built.  Nothing stops you from doing this by
>> > > > > > downloading
>> > > > > > > >>> Apache Pig, adding this library and your code, and
>> > > > redistributing,
>> > > > > > > though
>> > > > > > > >>> it wouldn't then be open to all Pig users.
>> > > > > > > >>>
>> > > > > > > >>> Alan.
>> > > > > > > >>>
>> > > > > > > >>> On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote:
>> > > > > > > >>>
>> > > > > > > >>>> Thanks for your response. I was never good at
>> > differentiating
>> > > > all
>> > > > > > > those
>> > > > > > > >>>> open source licenses. I mean what is the point making
>> open
>> > > > source
>> > > > > > > >>> licenses
>> > > > > > > >>>> if it blocks me from using a library in an open source
>> > > project.
>> > > > > Any
>> > > > > > > >> way,
>> > > > > > > >>>> I'm not going into debate here. Just one question, if we
>> use
>> > > JTS
>> > > > > as
>> > > > > > a
>> > > > > > > >>>> library (jar file) without adding the code in Pig, is it
>> > > still a
>> > > > > > > >>> violation?
>> > > > > > > >>>> We'll use ivy, for example, to download the jar file when
>> > > > > compiling.
>> > > > > > > >>>> On May 1, 2013 7:50 PM, "Alan Gates" <
>> ga...@hortonworks.com
>> > >
>> > > > > wrote:
>> > > > > > > >>>>
>> > > > > > > >>>>> Passing on the technical details for a moment, I see a
>> > > > licensing
>> > > > > > > >> issue.
>> > > > > > > >>>>> JTS is licensed under LGPL.  Apache projects cannot
>> contain
>> > > or
>> > > > > ship
>> > > > > > > >>>>> [L]GPL.  Apache does not meet the requirements of GPL
>> and
>> > > thus
>> > > > we
>> > > > > > > >> cannot
>> > > > > > > >>>>> repackage their code. If you wanted to go forward using
>> > that
>> > > > > class
>> > > > > > > >> this
>> > > > > > > >>>>> would have to be packaged as an add on that was
>> downloaded
>> > > > > > separately
>> > > > > > > >>> and
>> > > > > > > >>>>> not from Apache.  Another option is to work with the JTS
>> > > > > community
>> > > > > > > and
>> > > > > > > >>> see
>> > > > > > > >>>>> if they are willing to dual license their code under
>> BSD or
>> > > > > Apache
>> > > > > > > >>> license
>> > > > > > > >>>>> so that Pig could include it.  If neither of those are
>> an
>> > > > option
>> > > > > > you
>> > > > > > > >>> would
>> > > > > > > >>>>> need to come up with a new class to contain your spatial
>> > > data.
>> > > > > > > >>>>>
>> > > > > > > >>>>> Alan.
>> > > > > > > >>>>>
>> > > > > > > >>>>> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote:
>> > > > > > > >>>>>
>> > > > > > > >>>>>> Hi all,
>> > > > > > > >>>>>> First, sorry for the long email. I wanted to put all my
>> > > > thoughts
>> > > > > > > here
>> > > > > > > >>>>> and
>> > > > > > > >>>>>> get your feedback.
>> > > > > > > >>>>>> I'm proposing a major addition to Pig that will greatly
>> > > > increase
>> > > > > > its
>> > > > > > > >>>>>> functionality and user base. It is simply to add
>> spatial
>> > > > support
>> > > > > > to
>> > > > > > > >> the
>> > > > > > > >>>>>> language and the framework. I've already started
>> working
>> > on
>> > > > that
>> > > > > > but
>> > > > > > > >> I
>> > > > > > > >>>>>> don't want it to be just another branch. I want it,
>> > > > eventually,
>> > > > > to
>> > > > > > > be
>> > > > > > > >>>>>> merged with the trunk of Apache Pig. So, I'm sending
>> this
>> > > > email
>> > > > > > > >> mainly
>> > > > > > > >>> to
>> > > > > > > >>>>>> reach out the main contributors of Pig to see the
>> > > feasibility
>> > > > of
>> > > > > > > >> this.
>> > > > > > > >>>>>> This addition is a part of a big project we have been
>> > > working
>> > > > on
>> > > > > > in
>> > > > > > > >>>>>> University of Minnesota; the project is called Spatial
>> > > Hadoop.
>> > > > > > > >>>>>> http://spatialhadoop.cs.umn.edu. It's about building a
>> > > > > MapReduce
>> > > > > > > >>>>> framework
>> > > > > > > >>>>>> (Hadoop) that is capable of maintaining and analyzing
>> > > spatial
>> > > > > data
>> > > > > > > >>>>>> efficiently. I'm the main guy behind that project and
>> > since
>> > > we
>> > > > > > > >> released
>> > > > > > > >>>>> its
>> > > > > > > >>>>>> first version, we received very encouraging responses
>> from
>> > > > > > different
>> > > > > > > >>>>> groups
>> > > > > > > >>>>>> in the research and industrial community. I'm sure the
>> > > > addition
>> > > > > we
>> > > > > > > >> want
>> > > > > > > >>>>> to
>> > > > > > > >>>>>> make to Pig Latin will be widely accepted by the
>> people in
>> > > the
>> > > > > > > >> spatial
>> > > > > > > >>>>>> community.
>> > > > > > > >>>>>> I'm proposing a plan here while we're still in the
>> early
>> > > > phases
>> > > > > of
>> > > > > > > >> this
>> > > > > > > >>>>>> task to be able to discuss it with the main
>> contributors
>> > and
>> > > > see
>> > > > > > its
>> > > > > > > >>>>>> feasibility. First of all, I think that we need to
>> change
>> > > the
>> > > > > core
>> > > > > > > of
>> > > > > > > >>> Pig
>> > > > > > > >>>>>> to be able to support spatial data. Providing a set of
>> > UDFs
>> > > > only
>> > > > > > is
>> > > > > > > >> not
>> > > > > > > >>>>>> enough. The main reason is that Pig Latin does not
>> > provide a
>> > > > way
>> > > > > > to
>> > > > > > > >>>>> create
>> > > > > > > >>>>>> a new data type which is needed for spatial data. Once
>> we
>> > > have
>> > > > > the
>> > > > > > > >>>>> spatial
>> > > > > > > >>>>>> data types we need, the functionality can be expanded
>> > using
>> > > > more
>> > > > > > > >> UDFs.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> Here's the plan as I see it.
>> > > > > > > >>>>>> 1- Introduce a new primitive data type Geometry which
>> > > > represents
>> > > > > > all
>> > > > > > > >>>>>> spatial data types. In the underlying system, this will
>> > map
>> > > to
>> > > > > > > >>>>>> com.vividsolutions.jts.geom.Geometry. This is a class
>> from
>> > > > Java
>> > > > > > > >>> Topology
>> > > > > > > >>>>>> Suite (JTS) [
>> > http://www.vividsolutions.com/jts/JTSHome.htm
>> > > ],
>> > > > a
>> > > > > > > >> stable
>> > > > > > > >>>>> and
>> > > > > > > >>>>>> efficient open source Java library for spatial data
>> types
>> > > and
>> > > > > > > >>> algorithms.
>> > > > > > > >>>>>> It is very popular in the spatial community and a C++
>> port
>> > > of
>> > > > it
>> > > > > > is
>> > > > > > > >>> used
>> > > > > > > >>>>> in
>> > > > > > > >>>>>> PostGIS [http://postgis.net/] (a spatial library for
>> > > > Postgres).
>> > > > > > JTS
>> > > > > > > >>> also
>> > > > > > > >>>>>> conforms with Open Geospatial Consortium (OGC) [
>> > > > > > > >>>>>> http://www.opengeospatial.org/] which is an open
>> standard
>> > > for
>> > > > > the
>> > > > > > > >>>>> spatial
>> > > > > > > >>>>>> data types. The Geometry data type is read from and
>> > written
>> > > to
>> > > > > > text
>> > > > > > > >>> files
>> > > > > > > >>>>>> using the Well Known Text (WKT) format. There is also a
>> > way
>> > > to
>> > > > > > > >> convert
>> > > > > > > >>> it
>> > > > > > > >>>>>> to/from binary so that it can work with binary files
>> and
>> > > > > streams.
>> > > > > > > >>>>>> 2- Add functions that manipulate spatial data types.
>> These
>> > > > will
>> > > > > be
>> > > > > > > >>> added
>> > > > > > > >>>>> as
>> > > > > > > >>>>>> UDFs and we will not need to mess with the internals of
>> > Pig.
>> > > > > Most
>> > > > > > > >>>>> probably,
>> > > > > > > >>>>>> there will be one new class for each operation (e.g.,
>> > union
>> > > or
>> > > > > > > >>>>>> intersection). I think it will be good to put these new
>> > > > > operations
>> > > > > > > >>> inside
>> > > > > > > >>>>>> the core of Pig so that users can use it without
>> having to
>> > > > write
>> > > > > > the
>> > > > > > > >>>>> fully
>> > > > > > > >>>>>> qualified class name. Also, since there is no way to
>> > > > implicitly
>> > > > > > cast
>> > > > > > > >> a
>> > > > > > > >>>>>> spatial data type to a non-spatial data types, there
>> will
>> > > not
>> > > > be
>> > > > > > any
>> > > > > > > >>>>>> conflicts in existing operations or new operations. All
>> > new
>> > > > > > > >> operations,
>> > > > > > > >>>>> and
>> > > > > > > >>>>>> only the new operations, will be working on spatial
>> data
>> > > > types.
>> > > > > > Here
>> > > > > > > >> is
>> > > > > > > >>>>> an
>> > > > > > > >>>>>> initial list of operations that can be added. All those
>> > > > > operations
>> > > > > > > >> are
>> > > > > > > >>>>>> already implemented in JTS and the UDFs added to Pig
>> will
>> > be
>> > > > > just
>> > > > > > > >>>>> wrappers
>> > > > > > > >>>>>> around them.
>> > > > > > > >>>>>> **Predicates (used for spatial filtering)
>> > > > > > > >>>>>> Equals
>> > > > > > > >>>>>> Disjoint
>> > > > > > > >>>>>> Intersects
>> > > > > > > >>>>>> Touches
>> > > > > > > >>>>>> Crosses
>> > > > > > > >>>>>> Within
>> > > > > > > >>>>>> Contains
>> > > > > > > >>>>>> Overlaps
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> **Operations
>> > > > > > > >>>>>> Envelope
>> > > > > > > >>>>>> Area
>> > > > > > > >>>>>> Length
>> > > > > > > >>>>>> Buffer
>> > > > > > > >>>>>> ConvexHull
>> > > > > > > >>>>>> Intersection
>> > > > > > > >>>>>> Union
>> > > > > > > >>>>>> Difference
>> > > > > > > >>>>>> SymDifference
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> **Aggregate functions
>> > > > > > > >>>>>> Accum
>> > > > > > > >>>>>> ConvexHull
>> > > > > > > >>>>>> Union
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> 3- The third step is to implement spatial indexes
>> (e.g.,
>> > > Grid
>> > > > or
>> > > > > > > >>>>> R-tree). A
>> > > > > > > >>>>>> Pig loader and Pig output classes will be created for
>> > those
>> > > > > > indexes.
>> > > > > > > >>> Note
>> > > > > > > >>>>>> that currently we have SpatialOutputFormat and
>> > > > > SpatialInputFormat
>> > > > > > > for
>> > > > > > > >>>>> those
>> > > > > > > >>>>>> indexes inside the Spatial Hadoop project, but we need
>> to
>> > > > tweak
>> > > > > > them
>> > > > > > > >> to
>> > > > > > > >>>>>> work with Pig.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> 4- (Advanced) Implement more sophisticated algorithms
>> for
>> > > > > spatial
>> > > > > > > >>>>>> operations that utilize the indexes. For example, we
>> can
>> > > have
>> > > > a
>> > > > > > > >>> specific
>> > > > > > > >>>>>> algorithm for spatial range query or spatial join.
>> Again,
>> > we
>> > > > > > already
>> > > > > > > >>> have
>> > > > > > > >>>>>> algorithms built for different operations implemented
>> in
>> > > > Spatial
>> > > > > > > >> Hadoop
>> > > > > > > >>>>> as
>> > > > > > > >>>>>> MapReduce programs, but they will need to be modified
>> to
>> > > work
>> > > > in
>> > > > > > Pig
>> > > > > > > >>>>>> environment and get to work with other operations.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> This is my whole plan for the spatial extension to Pig.
>> > I've
>> > > > > > already
>> > > > > > > >>>>>> started with the first step but as I mentioned
>> earlier, I
>> > > > don't
>> > > > > > want
>> > > > > > > >> to
>> > > > > > > >>>>> do
>> > > > > > > >>>>>> the work for our project and then the work gets
>> > forgotten. I
>> > > > > want
>> > > > > > to
>> > > > > > > >>>>>> contribute to Pig and do my research at the same time.
>> If
>> > > you
>> > > > > > think
>> > > > > > > >> the
>> > > > > > > >>>>>> plan is plausible, I'll open JIRA issues for the above
>> > tasks
>> > > > and
>> > > > > > > >> start
>> > > > > > > >>>>>> shipping patches to do the stuff. I'll conform with the
>> > > > > standards
>> > > > > > of
>> > > > > > > >>> the
>> > > > > > > >>>>>> project such as adding tests and well commenting the
>> code.
>> > > > > > > >>>>>> Sorry for the long email and hope to hear back from
>> you.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> Best regards,
>> > > > > > > >>>>>> Ahmed Eldawy
>> > > > > > > >>>>>
>> > > > > > > >>>>>
>> > > > > > > >>>
>> > > > > > > >>>
>> > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to