Nick: the only issue is that the way types are implemented in Pig don't
allow us to easily "plug-in" types externally. Adding support for that
would be cool, but a fair bit of work.


2013/5/6 Nick Dimiduk <ndimi...@gmail.com>

> I'm to a lawyer, but I see no reason why this cannot be an external
> extension to Pig. It would behave the same way PostGIS is an external
> extension to Postgres. Any Apache issues would be toward general
> purpose enhancements, not specific to your project.
>
> Good on you!
> -n
>
> On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy <aseld...@gmail.com> wrote:
>
> > I contacted solr developers to see how JTS can be included in an Apache
> > project. See
> >
> >
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/
> > As far as I understand, they did not include it in the main solr project,
> > rather, they created a separate project (spatial 4j) which is still
> > licensed under Apache license and refers to JTS. Users will have to
> > download JTS libraries separately to make it run. That's pretty much the
> > same plan that Jonathan mentioned. We will still have the overhead of
> > serializing/deserializing the shapes each time a function is called.
> Also,
> > we will have to use the ugly bytearray data type for spatial data instead
> > of creating its own data type (e.g., Geometry).
> > I think using spatial 4j instead of JTS will not be sufficient for our
> case
> > as we need to provide an access to all spatial functions of JTS such as
> > Union, Intersection, Difference, ... etc. This way we can claim
> conformity
> > with OGC standards which gives visibility and appreciations of the
> spatial
> > community.
> > I think also that this means I will not add any issues to JIRA as it is
> now
> > a separate project. I'm planning to host it on github and have all the
> > issues there.
> > Let me know if you have any suggestions or comments.
> >
> > Thanks
> > Ahmed
> >
> >
> > Best regards,
> > Ahmed Eldawy
> >
> >
> > On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney <jcove...@gmail.com>
> > wrote:
> >
> > > You can give them all the same label or tag and filter on that later
> on.
> > >
> > >
> > > 2013/5/6 Ahmed Eldawy <aseld...@gmail.com>
> > >
> > > > Thanks all for taking the time to respond. Danial, I didn't know that
> > > Solr
> > > > uses JTS. This is a good finding and we can definitely ask them to
> see
> > if
> > > > there is a work around we can do. Jonathan, I thought of the same
> idea
> > of
> > > > serializing/deserializing a bytearray each time a UDF is called. The
> > > > deserialization part is good for letting Pig auto detect spatial
> types
> > if
> > > > not set explicitly in the schema. What is the best way to start
> this? I
> > > > want to add an initial set of JIRA issues and start working on them
> > but I
> > > > also need to keep the work grouped in some sense just for
> organization.
> > > >
> > > > Thanks
> > > > Ahmed
> > > >
> > > > Best regards,
> > > > Ahmed Eldawy
> > > >
> > > >
> > > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney <jcove...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > I agree that this is cool, and if other projects are using JTS it
> is
> > > > worth
> > > > > talking them to see how. I also agree that licensing is very
> > > frustrating.
> > > > >
> > > > > In the short term, however, while it is annoying to have to manage
> > the
> > > > > serialization and deserialization yourself, you can have the
> geometry
> > > > type
> > > > > be passed around as a bytearray type. Your UDF's will have to know
> > this
> > > > and
> > > > > treat it accordingly, but if you did this then all of the tools
> could
> > > be
> > > > in
> > > > > an external project on github instead of a branch in Pig. Then, if
> we
> > > can
> > > > > get the licensing done, we could add the Geometry type to Pig.
> Adding
> > > > > types, honestly, is kind of tedious but not super difficult, so
> once
> > > the
> > > > > rest is done, that shouldn't be too difficult.
> > > > >
> > > > >
> > > > > 2013/5/4 Russell Jurney <russell.jur...@gmail.com>
> > > > >
> > > > > > If a way could be found, this would be an awesome addition to
> Pig.
> > > > > >
> > > > > > Russell Jurney http://datasyndrome.com
> > > > > >
> > > > > > On May 3, 2013, at 4:09 PM, Daniel Dai <da...@hortonworks.com>
> > > wrote:
> > > > > >
> > > > > > > I am not sure how other Apache projects dealing with it? Seems
> > Solr
> > > > > also
> > > > > > > has some connector to JTS?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Daniel
> > > > > > >
> > > > > > >
> > > > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy <
> > aseld...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks Alan for your interest. It's too bad that an open
> source
> > > > > > licensing
> > > > > > >> issue is holding me back from doing some open source work. I
> > > > > understand
> > > > > > the
> > > > > > >> issue and your workarounds make sense. However, as I mentioned
> > in
> > > > the
> > > > > > >> beginning, I don't want to have my own branch of Pig because
> it
> > > > makes
> > > > > my
> > > > > > >> extension less portable. I'll think of another way to do it.
> > I'll
> > > > ask
> > > > > > vivid
> > > > > > >> solutions if they can double license their code although I
> think
> > > the
> > > > > > answer
> > > > > > >> will be no. I'll also think of a way to ship my extension as a
> > set
> > > > of
> > > > > > jar
> > > > > > >> files without the need to change the core of Pig. This way, it
> > can
> > > > be
> > > > > > >> easily ported to newer versions of Pig.
> > > > > > >>
> > > > > > >> Thanks
> > > > > > >> Ahmed
> > > > > > >>
> > > > > > >> Best regards,
> > > > > > >> Ahmed Eldawy
> > > > > > >>
> > > > > > >>
> > > > > > >> On Thu, May 2, 2013 at 12:33 PM, Alan Gates <
> > > ga...@hortonworks.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >>> I know this is frustrating, but the different licenses do
> have
> > > > > > different
> > > > > > >>> requirements that make it so that Apache can't ship GPL code.
> >  A
> > > > > legal
> > > > > > >>> explanation is at
> > > > > >
> http://www.apache.org/licenses/GPL-compatibility.htmlForadditional
> > > > info
> > > > > > on the LGPL specific questions see
> > > > > > >>> http://www.apache.org/legal/3party.html
> > > > > > >>>
> > > > > > >>> As far as pulling it in via ivy, the issue isn't so much
> where
> > > the
> > > > > code
> > > > > > >>> lives as much as what code we are requiring to make Pig work.
> >  If
> > > > > > >> something
> > > > > > >>> that is [L]GPL is required for Pig it violates Apache rules
> as
> > > > > outlined
> > > > > > >>> above.  It also would be a show stopper for a lot of
> companies
> > > that
> > > > > > >>> redistribute Pig and that are allergic to GPL software.
> > > > > > >>>
> > > > > > >>> So, as I said before, if you wanted to continue with that
> > library
> > > > and
> > > > > > >> they
> > > > > > >>> are not willing to relicense it then it would have to be
> bolted
> > > on
> > > > > > after
> > > > > > >>> Apache Pig is built.  Nothing stops you from doing this by
> > > > > downloading
> > > > > > >>> Apache Pig, adding this library and your code, and
> > > redistributing,
> > > > > > though
> > > > > > >>> it wouldn't then be open to all Pig users.
> > > > > > >>>
> > > > > > >>> Alan.
> > > > > > >>>
> > > > > > >>> On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote:
> > > > > > >>>
> > > > > > >>>> Thanks for your response. I was never good at
> differentiating
> > > all
> > > > > > those
> > > > > > >>>> open source licenses. I mean what is the point making open
> > > source
> > > > > > >>> licenses
> > > > > > >>>> if it blocks me from using a library in an open source
> > project.
> > > > Any
> > > > > > >> way,
> > > > > > >>>> I'm not going into debate here. Just one question, if we use
> > JTS
> > > > as
> > > > > a
> > > > > > >>>> library (jar file) without adding the code in Pig, is it
> > still a
> > > > > > >>> violation?
> > > > > > >>>> We'll use ivy, for example, to download the jar file when
> > > > compiling.
> > > > > > >>>> On May 1, 2013 7:50 PM, "Alan Gates" <ga...@hortonworks.com
> >
> > > > wrote:
> > > > > > >>>>
> > > > > > >>>>> Passing on the technical details for a moment, I see a
> > > licensing
> > > > > > >> issue.
> > > > > > >>>>> JTS is licensed under LGPL.  Apache projects cannot contain
> > or
> > > > ship
> > > > > > >>>>> [L]GPL.  Apache does not meet the requirements of GPL and
> > thus
> > > we
> > > > > > >> cannot
> > > > > > >>>>> repackage their code. If you wanted to go forward using
> that
> > > > class
> > > > > > >> this
> > > > > > >>>>> would have to be packaged as an add on that was downloaded
> > > > > separately
> > > > > > >>> and
> > > > > > >>>>> not from Apache.  Another option is to work with the JTS
> > > > community
> > > > > > and
> > > > > > >>> see
> > > > > > >>>>> if they are willing to dual license their code under BSD or
> > > > Apache
> > > > > > >>> license
> > > > > > >>>>> so that Pig could include it.  If neither of those are an
> > > option
> > > > > you
> > > > > > >>> would
> > > > > > >>>>> need to come up with a new class to contain your spatial
> > data.
> > > > > > >>>>>
> > > > > > >>>>> Alan.
> > > > > > >>>>>
> > > > > > >>>>> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote:
> > > > > > >>>>>
> > > > > > >>>>>> Hi all,
> > > > > > >>>>>> First, sorry for the long email. I wanted to put all my
> > > thoughts
> > > > > > here
> > > > > > >>>>> and
> > > > > > >>>>>> get your feedback.
> > > > > > >>>>>> I'm proposing a major addition to Pig that will greatly
> > > increase
> > > > > its
> > > > > > >>>>>> functionality and user base. It is simply to add spatial
> > > support
> > > > > to
> > > > > > >> the
> > > > > > >>>>>> language and the framework. I've already started working
> on
> > > that
> > > > > but
> > > > > > >> I
> > > > > > >>>>>> don't want it to be just another branch. I want it,
> > > eventually,
> > > > to
> > > > > > be
> > > > > > >>>>>> merged with the trunk of Apache Pig. So, I'm sending this
> > > email
> > > > > > >> mainly
> > > > > > >>> to
> > > > > > >>>>>> reach out the main contributors of Pig to see the
> > feasibility
> > > of
> > > > > > >> this.
> > > > > > >>>>>> This addition is a part of a big project we have been
> > working
> > > on
> > > > > in
> > > > > > >>>>>> University of Minnesota; the project is called Spatial
> > Hadoop.
> > > > > > >>>>>> http://spatialhadoop.cs.umn.edu. It's about building a
> > > > MapReduce
> > > > > > >>>>> framework
> > > > > > >>>>>> (Hadoop) that is capable of maintaining and analyzing
> > spatial
> > > > data
> > > > > > >>>>>> efficiently. I'm the main guy behind that project and
> since
> > we
> > > > > > >> released
> > > > > > >>>>> its
> > > > > > >>>>>> first version, we received very encouraging responses from
> > > > > different
> > > > > > >>>>> groups
> > > > > > >>>>>> in the research and industrial community. I'm sure the
> > > addition
> > > > we
> > > > > > >> want
> > > > > > >>>>> to
> > > > > > >>>>>> make to Pig Latin will be widely accepted by the people in
> > the
> > > > > > >> spatial
> > > > > > >>>>>> community.
> > > > > > >>>>>> I'm proposing a plan here while we're still in the early
> > > phases
> > > > of
> > > > > > >> this
> > > > > > >>>>>> task to be able to discuss it with the main contributors
> and
> > > see
> > > > > its
> > > > > > >>>>>> feasibility. First of all, I think that we need to change
> > the
> > > > core
> > > > > > of
> > > > > > >>> Pig
> > > > > > >>>>>> to be able to support spatial data. Providing a set of
> UDFs
> > > only
> > > > > is
> > > > > > >> not
> > > > > > >>>>>> enough. The main reason is that Pig Latin does not
> provide a
> > > way
> > > > > to
> > > > > > >>>>> create
> > > > > > >>>>>> a new data type which is needed for spatial data. Once we
> > have
> > > > the
> > > > > > >>>>> spatial
> > > > > > >>>>>> data types we need, the functionality can be expanded
> using
> > > more
> > > > > > >> UDFs.
> > > > > > >>>>>>
> > > > > > >>>>>> Here's the plan as I see it.
> > > > > > >>>>>> 1- Introduce a new primitive data type Geometry which
> > > represents
> > > > > all
> > > > > > >>>>>> spatial data types. In the underlying system, this will
> map
> > to
> > > > > > >>>>>> com.vividsolutions.jts.geom.Geometry. This is a class from
> > > Java
> > > > > > >>> Topology
> > > > > > >>>>>> Suite (JTS) [
> http://www.vividsolutions.com/jts/JTSHome.htm
> > ],
> > > a
> > > > > > >> stable
> > > > > > >>>>> and
> > > > > > >>>>>> efficient open source Java library for spatial data types
> > and
> > > > > > >>> algorithms.
> > > > > > >>>>>> It is very popular in the spatial community and a C++ port
> > of
> > > it
> > > > > is
> > > > > > >>> used
> > > > > > >>>>> in
> > > > > > >>>>>> PostGIS [http://postgis.net/] (a spatial library for
> > > Postgres).
> > > > > JTS
> > > > > > >>> also
> > > > > > >>>>>> conforms with Open Geospatial Consortium (OGC) [
> > > > > > >>>>>> http://www.opengeospatial.org/] which is an open standard
> > for
> > > > the
> > > > > > >>>>> spatial
> > > > > > >>>>>> data types. The Geometry data type is read from and
> written
> > to
> > > > > text
> > > > > > >>> files
> > > > > > >>>>>> using the Well Known Text (WKT) format. There is also a
> way
> > to
> > > > > > >> convert
> > > > > > >>> it
> > > > > > >>>>>> to/from binary so that it can work with binary files and
> > > > streams.
> > > > > > >>>>>> 2- Add functions that manipulate spatial data types. These
> > > will
> > > > be
> > > > > > >>> added
> > > > > > >>>>> as
> > > > > > >>>>>> UDFs and we will not need to mess with the internals of
> Pig.
> > > > Most
> > > > > > >>>>> probably,
> > > > > > >>>>>> there will be one new class for each operation (e.g.,
> union
> > or
> > > > > > >>>>>> intersection). I think it will be good to put these new
> > > > operations
> > > > > > >>> inside
> > > > > > >>>>>> the core of Pig so that users can use it without having to
> > > write
> > > > > the
> > > > > > >>>>> fully
> > > > > > >>>>>> qualified class name. Also, since there is no way to
> > > implicitly
> > > > > cast
> > > > > > >> a
> > > > > > >>>>>> spatial data type to a non-spatial data types, there will
> > not
> > > be
> > > > > any
> > > > > > >>>>>> conflicts in existing operations or new operations. All
> new
> > > > > > >> operations,
> > > > > > >>>>> and
> > > > > > >>>>>> only the new operations, will be working on spatial data
> > > types.
> > > > > Here
> > > > > > >> is
> > > > > > >>>>> an
> > > > > > >>>>>> initial list of operations that can be added. All those
> > > > operations
> > > > > > >> are
> > > > > > >>>>>> already implemented in JTS and the UDFs added to Pig will
> be
> > > > just
> > > > > > >>>>> wrappers
> > > > > > >>>>>> around them.
> > > > > > >>>>>> **Predicates (used for spatial filtering)
> > > > > > >>>>>> Equals
> > > > > > >>>>>> Disjoint
> > > > > > >>>>>> Intersects
> > > > > > >>>>>> Touches
> > > > > > >>>>>> Crosses
> > > > > > >>>>>> Within
> > > > > > >>>>>> Contains
> > > > > > >>>>>> Overlaps
> > > > > > >>>>>>
> > > > > > >>>>>> **Operations
> > > > > > >>>>>> Envelope
> > > > > > >>>>>> Area
> > > > > > >>>>>> Length
> > > > > > >>>>>> Buffer
> > > > > > >>>>>> ConvexHull
> > > > > > >>>>>> Intersection
> > > > > > >>>>>> Union
> > > > > > >>>>>> Difference
> > > > > > >>>>>> SymDifference
> > > > > > >>>>>>
> > > > > > >>>>>> **Aggregate functions
> > > > > > >>>>>> Accum
> > > > > > >>>>>> ConvexHull
> > > > > > >>>>>> Union
> > > > > > >>>>>>
> > > > > > >>>>>> 3- The third step is to implement spatial indexes (e.g.,
> > Grid
> > > or
> > > > > > >>>>> R-tree). A
> > > > > > >>>>>> Pig loader and Pig output classes will be created for
> those
> > > > > indexes.
> > > > > > >>> Note
> > > > > > >>>>>> that currently we have SpatialOutputFormat and
> > > > SpatialInputFormat
> > > > > > for
> > > > > > >>>>> those
> > > > > > >>>>>> indexes inside the Spatial Hadoop project, but we need to
> > > tweak
> > > > > them
> > > > > > >> to
> > > > > > >>>>>> work with Pig.
> > > > > > >>>>>>
> > > > > > >>>>>> 4- (Advanced) Implement more sophisticated algorithms for
> > > > spatial
> > > > > > >>>>>> operations that utilize the indexes. For example, we can
> > have
> > > a
> > > > > > >>> specific
> > > > > > >>>>>> algorithm for spatial range query or spatial join. Again,
> we
> > > > > already
> > > > > > >>> have
> > > > > > >>>>>> algorithms built for different operations implemented in
> > > Spatial
> > > > > > >> Hadoop
> > > > > > >>>>> as
> > > > > > >>>>>> MapReduce programs, but they will need to be modified to
> > work
> > > in
> > > > > Pig
> > > > > > >>>>>> environment and get to work with other operations.
> > > > > > >>>>>>
> > > > > > >>>>>> This is my whole plan for the spatial extension to Pig.
> I've
> > > > > already
> > > > > > >>>>>> started with the first step but as I mentioned earlier, I
> > > don't
> > > > > want
> > > > > > >> to
> > > > > > >>>>> do
> > > > > > >>>>>> the work for our project and then the work gets
> forgotten. I
> > > > want
> > > > > to
> > > > > > >>>>>> contribute to Pig and do my research at the same time. If
> > you
> > > > > think
> > > > > > >> the
> > > > > > >>>>>> plan is plausible, I'll open JIRA issues for the above
> tasks
> > > and
> > > > > > >> start
> > > > > > >>>>>> shipping patches to do the stuff. I'll conform with the
> > > > standards
> > > > > of
> > > > > > >>> the
> > > > > > >>>>>> project such as adding tests and well commenting the code.
> > > > > > >>>>>> Sorry for the long email and hope to hear back from you.
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> Best regards,
> > > > > > >>>>>> Ahmed Eldawy
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to