Those JIRAs do best that are completed by one person driving them.
On Mon, Jun 3, 2013 at 10:26 AM, Ahmed Eldawy <aseld...@gmail.com> wrote: > I've just created a new JIRA issue for the spatial functionality. > https://issues.apache.org/jira/browse/PIG-3344 > This issue is all about the new datatype which is the only thing that needs > to be changed internally in Pig in this phase. Pigeon is already working > with the ESRI library but it converts between binary representation and > Geometry class back and forth. Once the new datatype is added, we can > change Pigeon to work with this datatype too. We can still keep the current > conversion functionality as it allows the system to automatically perform > the conversion from the bytearray datatype as it adds the autodetect > functionality when a column is not given a type in the schema. > > I don't know if I should provide a patch to this issue myself or there is > someone else who can work on it. I can of course do it but I think it will > take me some time to finish as I'm not yet familiar with the internals of > Pig. Someone who is familiar with the parser would definitely make a better > job here. I can focus on Pigeon and add more spatial functions there so > that we can have a plenty of functions once the new datatype is added. I'm > open to both solutions but I'm just checking with you. > > Thanks > Ahmed > > Best regards, > Ahmed Eldawy > > > On Wed, May 29, 2013 at 12:17 PM, Russell Jurney > <russell.jur...@gmail.com>wrote: > > > Awesome. This would be a great addition to Pig. Please create a JIRA. > > > > Russell Jurney http://datasyndrome.com > > > > On May 29, 2013, at 8:51 AM, Ahmed Eldawy <aseld...@gmail.com> wrote: > > > > > Hi all, > > > > > > Nick has pointed out to me an alternative GIS package that can replace > > JTS. > > > ESRI has recently released a GIS > > > package<https://github.com/Esri/geometry-api-java>under Apache > > > license. I changed Pigeon to work with that new package. I > > > think it could be easier now to integrate this work with main branch of > > > Apache Pig. I will go on with the current project and add more spatial > > > functionality. We can then add a new datatype to Apache and link it to > > > those functions. > > > > > > ESRI package contains a class OGCGeometry > > > < > > > http://esri.github.io/geometry-api-java/javadoc/com/esri/core/geometry/ogc/OGCGeometry.html > > >which > > > can be linked to a new datatype 'Geometry'. Do you think we can rely on > > the > > > new package and integrate the work with Apache Pig? > > > > > > On May 23, 2013 11:40 PM, "Ahmed Eldawy" <aseld...@gmail.com> wrote: > > > > > >> Hi all, > > >> Thanks for your help. I've started the project with a minimal > > >> functionality as a start. It's currently hosted in github. It is > > licensed > > >> under the Apache public license to make it easier to merge with Pig. > > >> Currently it has only a very few functions. I implemented a function > > from > > >> different types of functions (e.g., Aggregate and create). I'll keep > > adding > > >> functions and any contributions to the project are welcome. As a > > beginning, > > >> I need an ANT build file that runs the tests, compiles and generates a > > jar > > >> file. I'm not familiar with ANT so any help in this is encouraged. > > >> Here's the project home page > > >> https://github.com/aseldawy/pigeon > > >> > > >> > > >> If you have any comments or suggestion please contact me. > > >> > > >> > > >> Best regards, > > >> Ahmed Eldawy > > >> > > >> > > >> On Mon, May 6, 2013 at 3:09 PM, Jonathan Coveney <jcove...@gmail.com > > >wrote: > > >> > > >>> Nick: the only issue is that the way types are implemented in Pig > don't > > >>> allow us to easily "plug-in" types externally. Adding support for > that > > >>> would be cool, but a fair bit of work. > > >>> > > >>> > > >>> 2013/5/6 Nick Dimiduk <ndimi...@gmail.com> > > >>> > > >>>> I'm to a lawyer, but I see no reason why this cannot be an external > > >>>> extension to Pig. It would behave the same way PostGIS is an > external > > >>>> extension to Postgres. Any Apache issues would be toward general > > >>>> purpose enhancements, not specific to your project. > > >>>> > > >>>> Good on you! > > >>>> -n > > >>>> > > >>>> On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy <aseld...@gmail.com> > > >>> wrote: > > >>>> > > >>>>> I contacted solr developers to see how JTS can be included in an > > >>> Apache > > >>>>> project. See > > >>> > > > http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/ > > >>>>> As far as I understand, they did not include it in the main solr > > >>> project, > > >>>>> rather, they created a separate project (spatial 4j) which is still > > >>>>> licensed under Apache license and refers to JTS. Users will have to > > >>>>> download JTS libraries separately to make it run. That's pretty > much > > >>> the > > >>>>> same plan that Jonathan mentioned. We will still have the overhead > of > > >>>>> serializing/deserializing the shapes each time a function is > called. > > >>>> Also, > > >>>>> we will have to use the ugly bytearray data type for spatial data > > >>> instead > > >>>>> of creating its own data type (e.g., Geometry). > > >>>>> I think using spatial 4j instead of JTS will not be sufficient for > > our > > >>>> case > > >>>>> as we need to provide an access to all spatial functions of JTS > such > > >>> as > > >>>>> Union, Intersection, Difference, ... etc. This way we can claim > > >>>> conformity > > >>>>> with OGC standards which gives visibility and appreciations of the > > >>>> spatial > > >>>>> community. > > >>>>> I think also that this means I will not add any issues to JIRA as > it > > >>> is > > >>>> now > > >>>>> a separate project. I'm planning to host it on github and have all > > the > > >>>>> issues there. > > >>>>> Let me know if you have any suggestions or comments. > > >>>>> > > >>>>> Thanks > > >>>>> Ahmed > > >>>>> > > >>>>> > > >>>>> Best regards, > > >>>>> Ahmed Eldawy > > >>>>> > > >>>>> > > >>>>> On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney < > jcove...@gmail.com > > > > > >>>>> wrote: > > >>>>> > > >>>>>> You can give them all the same label or tag and filter on that > later > > >>>> on. > > >>>>>> > > >>>>>> > > >>>>>> 2013/5/6 Ahmed Eldawy <aseld...@gmail.com> > > >>>>>> > > >>>>>>> Thanks all for taking the time to respond. Danial, I didn't know > > >>> that > > >>>>>> Solr > > >>>>>>> uses JTS. This is a good finding and we can definitely ask them > to > > >>>> see > > >>>>> if > > >>>>>>> there is a work around we can do. Jonathan, I thought of the same > > >>>> idea > > >>>>> of > > >>>>>>> serializing/deserializing a bytearray each time a UDF is called. > > >>> The > > >>>>>>> deserialization part is good for letting Pig auto detect spatial > > >>>> types > > >>>>> if > > >>>>>>> not set explicitly in the schema. What is the best way to start > > >>>> this? I > > >>>>>>> want to add an initial set of JIRA issues and start working on > > >>> them > > >>>>> but I > > >>>>>>> also need to keep the work grouped in some sense just for > > >>>> organization. > > >>>>>>> > > >>>>>>> Thanks > > >>>>>>> Ahmed > > >>>>>>> > > >>>>>>> Best regards, > > >>>>>>> Ahmed Eldawy > > >>>>>>> > > >>>>>>> > > >>>>>>> On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney < > > >>> jcove...@gmail.com > > >>>>> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> I agree that this is cool, and if other projects are using JTS > > >>> it > > >>>> is > > >>>>>>> worth > > >>>>>>>> talking them to see how. I also agree that licensing is very > > >>>>>> frustrating. > > >>>>>>>> > > >>>>>>>> In the short term, however, while it is annoying to have to > > >>> manage > > >>>>> the > > >>>>>>>> serialization and deserialization yourself, you can have the > > >>>> geometry > > >>>>>>> type > > >>>>>>>> be passed around as a bytearray type. Your UDF's will have to > > >>> know > > >>>>> this > > >>>>>>> and > > >>>>>>>> treat it accordingly, but if you did this then all of the tools > > >>>> could > > >>>>>> be > > >>>>>>> in > > >>>>>>>> an external project on github instead of a branch in Pig. Then, > > >>> if > > >>>> we > > >>>>>> can > > >>>>>>>> get the licensing done, we could add the Geometry type to Pig. > > >>>> Adding > > >>>>>>>> types, honestly, is kind of tedious but not super difficult, so > > >>>> once > > >>>>>> the > > >>>>>>>> rest is done, that shouldn't be too difficult. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> 2013/5/4 Russell Jurney <russell.jur...@gmail.com> > > >>>>>>>> > > >>>>>>>>> If a way could be found, this would be an awesome addition to > > >>>> Pig. > > >>>>>>>>> > > >>>>>>>>> Russell Jurney http://datasyndrome.com > > >>>>>>>>> > > >>>>>>>>> On May 3, 2013, at 4:09 PM, Daniel Dai <da...@hortonworks.com > > >>>> > > >>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> I am not sure how other Apache projects dealing with it? > > >>> Seems > > >>>>> Solr > > >>>>>>>> also > > >>>>>>>>>> has some connector to JTS? > > >>>>>>>>>> > > >>>>>>>>>> Thanks, > > >>>>>>>>>> Daniel > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy < > > >>>>> aseld...@gmail.com> > > >>>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> Thanks Alan for your interest. It's too bad that an open > > >>>> source > > >>>>>>>>> licensing > > >>>>>>>>>>> issue is holding me back from doing some open source work. > > >>> I > > >>>>>>>> understand > > >>>>>>>>> the > > >>>>>>>>>>> issue and your workarounds make sense. However, as I > > >>> mentioned > > >>>>> in > > >>>>>>> the > > >>>>>>>>>>> beginning, I don't want to have my own branch of Pig > > >>> because > > >>>> it > > >>>>>>> makes > > >>>>>>>> my > > >>>>>>>>>>> extension less portable. I'll think of another way to do > > >>> it. > > >>>>> I'll > > >>>>>>> ask > > >>>>>>>>> vivid > > >>>>>>>>>>> solutions if they can double license their code although I > > >>>> think > > >>>>>> the > > >>>>>>>>> answer > > >>>>>>>>>>> will be no. I'll also think of a way to ship my extension > > >>> as a > > >>>>> set > > >>>>>>> of > > >>>>>>>>> jar > > >>>>>>>>>>> files without the need to change the core of Pig. This > > >>> way, it > > >>>>> can > > >>>>>>> be > > >>>>>>>>>>> easily ported to newer versions of Pig. > > >>>>>>>>>>> > > >>>>>>>>>>> Thanks > > >>>>>>>>>>> Ahmed > > >>>>>>>>>>> > > >>>>>>>>>>> Best regards, > > >>>>>>>>>>> Ahmed Eldawy > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On Thu, May 2, 2013 at 12:33 PM, Alan Gates < > > >>>>>> ga...@hortonworks.com> > > >>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> I know this is frustrating, but the different licenses do > > >>>> have > > >>>>>>>>> different > > >>>>>>>>>>>> requirements that make it so that Apache can't ship GPL > > >>> code. > > >>>>> A > > >>>>>>>> legal > > >>>>>>>>>>>> explanation is at > > >>>> http://www.apache.org/licenses/GPL-compatibility.htmlForadditional > > >>>>>>> info > > >>>>>>>>> on the LGPL specific questions see > > >>>>>>>>>>>> http://www.apache.org/legal/3party.html > > >>>>>>>>>>>> > > >>>>>>>>>>>> As far as pulling it in via ivy, the issue isn't so much > > >>>> where > > >>>>>> the > > >>>>>>>> code > > >>>>>>>>>>>> lives as much as what code we are requiring to make Pig > > >>> work. > > >>>>> If > > >>>>>>>>>>> something > > >>>>>>>>>>>> that is [L]GPL is required for Pig it violates Apache > > >>> rules > > >>>> as > > >>>>>>>> outlined > > >>>>>>>>>>>> above. It also would be a show stopper for a lot of > > >>>> companies > > >>>>>> that > > >>>>>>>>>>>> redistribute Pig and that are allergic to GPL software. > > >>>>>>>>>>>> > > >>>>>>>>>>>> So, as I said before, if you wanted to continue with that > > >>>>> library > > >>>>>>> and > > >>>>>>>>>>> they > > >>>>>>>>>>>> are not willing to relicense it then it would have to be > > >>>> bolted > > >>>>>> on > > >>>>>>>>> after > > >>>>>>>>>>>> Apache Pig is built. Nothing stops you from doing this by > > >>>>>>>> downloading > > >>>>>>>>>>>> Apache Pig, adding this library and your code, and > > >>>>>> redistributing, > > >>>>>>>>> though > > >>>>>>>>>>>> it wouldn't then be open to all Pig users. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Alan. > > >>>>>>>>>>>> > > >>>>>>>>>>>> On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Thanks for your response. I was never good at > > >>>> differentiating > > >>>>>> all > > >>>>>>>>> those > > >>>>>>>>>>>>> open source licenses. I mean what is the point making > > >>> open > > >>>>>> source > > >>>>>>>>>>>> licenses > > >>>>>>>>>>>>> if it blocks me from using a library in an open source > > >>>>> project. > > >>>>>>> Any > > >>>>>>>>>>> way, > > >>>>>>>>>>>>> I'm not going into debate here. Just one question, if we > > >>> use > > >>>>> JTS > > >>>>>>> as > > >>>>>>>> a > > >>>>>>>>>>>>> library (jar file) without adding the code in Pig, is it > > >>>>> still a > > >>>>>>>>>>>> violation? > > >>>>>>>>>>>>> We'll use ivy, for example, to download the jar file when > > >>>>>>> compiling. > > >>>>>>>>>>>>> On May 1, 2013 7:50 PM, "Alan Gates" < > > >>> ga...@hortonworks.com > > >>>>> > > >>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Passing on the technical details for a moment, I see a > > >>>>>> licensing > > >>>>>>>>>>> issue. > > >>>>>>>>>>>>>> JTS is licensed under LGPL. Apache projects cannot > > >>> contain > > >>>>> or > > >>>>>>> ship > > >>>>>>>>>>>>>> [L]GPL. Apache does not meet the requirements of GPL > > >>> and > > >>>>> thus > > >>>>>> we > > >>>>>>>>>>> cannot > > >>>>>>>>>>>>>> repackage their code. If you wanted to go forward using > > >>>> that > > >>>>>>> class > > >>>>>>>>>>> this > > >>>>>>>>>>>>>> would have to be packaged as an add on that was > > >>> downloaded > > >>>>>>>> separately > > >>>>>>>>>>>> and > > >>>>>>>>>>>>>> not from Apache. Another option is to work with the JTS > > >>>>>>> community > > >>>>>>>>> and > > >>>>>>>>>>>> see > > >>>>>>>>>>>>>> if they are willing to dual license their code under > > >>> BSD or > > >>>>>>> Apache > > >>>>>>>>>>>> license > > >>>>>>>>>>>>>> so that Pig could include it. If neither of those are > > >>> an > > >>>>>> option > > >>>>>>>> you > > >>>>>>>>>>>> would > > >>>>>>>>>>>>>> need to come up with a new class to contain your spatial > > >>>>> data. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Alan. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Hi all, > > >>>>>>>>>>>>>>> First, sorry for the long email. I wanted to put all my > > >>>>>> thoughts > > >>>>>>>>> here > > >>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>> get your feedback. > > >>>>>>>>>>>>>>> I'm proposing a major addition to Pig that will greatly > > >>>>>> increase > > >>>>>>>> its > > >>>>>>>>>>>>>>> functionality and user base. It is simply to add > > >>> spatial > > >>>>>> support > > >>>>>>>> to > > >>>>>>>>>>> the > > >>>>>>>>>>>>>>> language and the framework. I've already started > > >>> working > > >>>> on > > >>>>>> that > > >>>>>>>> but > > >>>>>>>>>>> I > > >>>>>>>>>>>>>>> don't want it to be just another branch. I want it, > > >>>>>> eventually, > > >>>>>>> to > > >>>>>>>>> be > > >>>>>>>>>>>>>>> merged with the trunk of Apache Pig. So, I'm sending > > >>> this > > >>>>>> email > > >>>>>>>>>>> mainly > > >>>>>>>>>>>> to > > >>>>>>>>>>>>>>> reach out the main contributors of Pig to see the > > >>>>> feasibility > > >>>>>> of > > >>>>>>>>>>> this. > > >>>>>>>>>>>>>>> This addition is a part of a big project we have been > > >>>>> working > > >>>>>> on > > >>>>>>>> in > > >>>>>>>>>>>>>>> University of Minnesota; the project is called Spatial > > >>>>> Hadoop. > > >>>>>>>>>>>>>>> http://spatialhadoop.cs.umn.edu. It's about building a > > >>>>>>> MapReduce > > >>>>>>>>>>>>>> framework > > >>>>>>>>>>>>>>> (Hadoop) that is capable of maintaining and analyzing > > >>>>> spatial > > >>>>>>> data > > >>>>>>>>>>>>>>> efficiently. I'm the main guy behind that project and > > >>>> since > > >>>>> we > > >>>>>>>>>>> released > > >>>>>>>>>>>>>> its > > >>>>>>>>>>>>>>> first version, we received very encouraging responses > > >>> from > > >>>>>>>> different > > >>>>>>>>>>>>>> groups > > >>>>>>>>>>>>>>> in the research and industrial community. I'm sure the > > >>>>>> addition > > >>>>>>> we > > >>>>>>>>>>> want > > >>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>> make to Pig Latin will be widely accepted by the > > >>> people in > > >>>>> the > > >>>>>>>>>>> spatial > > >>>>>>>>>>>>>>> community. > > >>>>>>>>>>>>>>> I'm proposing a plan here while we're still in the > > >>> early > > >>>>>> phases > > >>>>>>> of > > >>>>>>>>>>> this > > >>>>>>>>>>>>>>> task to be able to discuss it with the main > > >>> contributors > > >>>> and > > >>>>>> see > > >>>>>>>> its > > >>>>>>>>>>>>>>> feasibility. First of all, I think that we need to > > >>> change > > >>>>> the > > >>>>>>> core > > >>>>>>>>> of > > >>>>>>>>>>>> Pig > > >>>>>>>>>>>>>>> to be able to support spatial data. Providing a set of > > >>>> UDFs > > >>>>>> only > > >>>>>>>> is > > >>>>>>>>>>> not > > >>>>>>>>>>>>>>> enough. The main reason is that Pig Latin does not > > >>>> provide a > > >>>>>> way > > >>>>>>>> to > > >>>>>>>>>>>>>> create > > >>>>>>>>>>>>>>> a new data type which is needed for spatial data. Once > > >>> we > > >>>>> have > > >>>>>>> the > > >>>>>>>>>>>>>> spatial > > >>>>>>>>>>>>>>> data types we need, the functionality can be expanded > > >>>> using > > >>>>>> more > > >>>>>>>>>>> UDFs. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Here's the plan as I see it. > > >>>>>>>>>>>>>>> 1- Introduce a new primitive data type Geometry which > > >>>>>> represents > > >>>>>>>> all > > >>>>>>>>>>>>>>> spatial data types. In the underlying system, this will > > >>>> map > > >>>>> to > > >>>>>>>>>>>>>>> com.vividsolutions.jts.geom.Geometry. This is a class > > >>> from > > >>>>>> Java > > >>>>>>>>>>>> Topology > > >>>>>>>>>>>>>>> Suite (JTS) [ > > >>>> http://www.vividsolutions.com/jts/JTSHome.htm > > >>>>> ], > > >>>>>> a > > >>>>>>>>>>> stable > > >>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>> efficient open source Java library for spatial data > > >>> types > > >>>>> and > > >>>>>>>>>>>> algorithms. > > >>>>>>>>>>>>>>> It is very popular in the spatial community and a C++ > > >>> port > > >>>>> of > > >>>>>> it > > >>>>>>>> is > > >>>>>>>>>>>> used > > >>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>> PostGIS [http://postgis.net/] (a spatial library for > > >>>>>> Postgres). > > >>>>>>>> JTS > > >>>>>>>>>>>> also > > >>>>>>>>>>>>>>> conforms with Open Geospatial Consortium (OGC) [ > > >>>>>>>>>>>>>>> http://www.opengeospatial.org/] which is an open > > >>> standard > > >>>>> for > > >>>>>>> the > > >>>>>>>>>>>>>> spatial > > >>>>>>>>>>>>>>> data types. The Geometry data type is read from and > > >>>> written > > >>>>> to > > >>>>>>>> text > > >>>>>>>>>>>> files > > >>>>>>>>>>>>>>> using the Well Known Text (WKT) format. There is also a > > >>>> way > > >>>>> to > > >>>>>>>>>>> convert > > >>>>>>>>>>>> it > > >>>>>>>>>>>>>>> to/from binary so that it can work with binary files > > >>> and > > >>>>>>> streams. > > >>>>>>>>>>>>>>> 2- Add functions that manipulate spatial data types. > > >>> These > > >>>>>> will > > >>>>>>> be > > >>>>>>>>>>>> added > > >>>>>>>>>>>>>> as > > >>>>>>>>>>>>>>> UDFs and we will not need to mess with the internals of > > >>>> Pig. > > >>>>>>> Most > > >>>>>>>>>>>>>> probably, > > >>>>>>>>>>>>>>> there will be one new class for each operation (e.g., > > >>>> union > > >>>>> or > > >>>>>>>>>>>>>>> intersection). I think it will be good to put these new > > >>>>>>> operations > > >>>>>>>>>>>> inside > > >>>>>>>>>>>>>>> the core of Pig so that users can use it without > > >>> having to > > >>>>>> write > > >>>>>>>> the > > >>>>>>>>>>>>>> fully > > >>>>>>>>>>>>>>> qualified class name. Also, since there is no way to > > >>>>>> implicitly > > >>>>>>>> cast > > >>>>>>>>>>> a > > >>>>>>>>>>>>>>> spatial data type to a non-spatial data types, there > > >>> will > > >>>>> not > > >>>>>> be > > >>>>>>>> any > > >>>>>>>>>>>>>>> conflicts in existing operations or new operations. All > > >>>> new > > >>>>>>>>>>> operations, > > >>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>> only the new operations, will be working on spatial > > >>> data > > >>>>>> types. > > >>>>>>>> Here > > >>>>>>>>>>> is > > >>>>>>>>>>>>>> an > > >>>>>>>>>>>>>>> initial list of operations that can be added. All those > > >>>>>>> operations > > >>>>>>>>>>> are > > >>>>>>>>>>>>>>> already implemented in JTS and the UDFs added to Pig > > >>> will > > >>>> be > > >>>>>>> just > > >>>>>>>>>>>>>> wrappers > > >>>>>>>>>>>>>>> around them. > > >>>>>>>>>>>>>>> **Predicates (used for spatial filtering) > > >>>>>>>>>>>>>>> Equals > > >>>>>>>>>>>>>>> Disjoint > > >>>>>>>>>>>>>>> Intersects > > >>>>>>>>>>>>>>> Touches > > >>>>>>>>>>>>>>> Crosses > > >>>>>>>>>>>>>>> Within > > >>>>>>>>>>>>>>> Contains > > >>>>>>>>>>>>>>> Overlaps > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> **Operations > > >>>>>>>>>>>>>>> Envelope > > >>>>>>>>>>>>>>> Area > > >>>>>>>>>>>>>>> Length > > >>>>>>>>>>>>>>> Buffer > > >>>>>>>>>>>>>>> ConvexHull > > >>>>>>>>>>>>>>> Intersection > > >>>>>>>>>>>>>>> Union > > >>>>>>>>>>>>>>> Difference > > >>>>>>>>>>>>>>> SymDifference > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> **Aggregate functions > > >>>>>>>>>>>>>>> Accum > > >>>>>>>>>>>>>>> ConvexHull > > >>>>>>>>>>>>>>> Union > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> 3- The third step is to implement spatial indexes > > >>> (e.g., > > >>>>> Grid > > >>>>>> or > > >>>>>>>>>>>>>> R-tree). A > > >>>>>>>>>>>>>>> Pig loader and Pig output classes will be created for > > >>>> those > > >>>>>>>> indexes. > > >>>>>>>>>>>> Note > > >>>>>>>>>>>>>>> that currently we have SpatialOutputFormat and > > >>>>>>> SpatialInputFormat > > >>>>>>>>> for > > >>>>>>>>>>>>>> those > > >>>>>>>>>>>>>>> indexes inside the Spatial Hadoop project, but we need > > >>> to > > >>>>>> tweak > > >>>>>>>> them > > >>>>>>>>>>> to > > >>>>>>>>>>>>>>> work with Pig. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> 4- (Advanced) Implement more sophisticated algorithms > > >>> for > > >>>>>>> spatial > > >>>>>>>>>>>>>>> operations that utilize the indexes. For example, we > > >>> can > > >>>>> have > > >>>>>> a > > >>>>>>>>>>>> specific > > >>>>>>>>>>>>>>> algorithm for spatial range query or spatial join. > > >>> Again, > > >>>> we > > >>>>>>>> already > > >>>>>>>>>>>> have > > >>>>>>>>>>>>>>> algorithms built for different operations implemented > > >>> in > > >>>>>> Spatial > > >>>>>>>>>>> Hadoop > > >>>>>>>>>>>>>> as > > >>>>>>>>>>>>>>> MapReduce programs, but they will need to be modified > > >>> to > > >>>>> work > > >>>>>> in > > >>>>>>>> Pig > > >>>>>>>>>>>>>>> environment and get to work with other operations. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> This is my whole plan for the spatial extension to Pig. > > >>>> I've > > >>>>>>>> already > > >>>>>>>>>>>>>>> started with the first step but as I mentioned > > >>> earlier, I > > >>>>>> don't > > >>>>>>>> want > > >>>>>>>>>>> to > > >>>>>>>>>>>>>> do > > >>>>>>>>>>>>>>> the work for our project and then the work gets > > >>>> forgotten. I > > >>>>>>> want > > >>>>>>>> to > > >>>>>>>>>>>>>>> contribute to Pig and do my research at the same time. > > >>> If > > >>>>> you > > >>>>>>>> think > > >>>>>>>>>>> the > > >>>>>>>>>>>>>>> plan is plausible, I'll open JIRA issues for the above > > >>>> tasks > > >>>>>> and > > >>>>>>>>>>> start > > >>>>>>>>>>>>>>> shipping patches to do the stuff. I'll conform with the > > >>>>>>> standards > > >>>>>>>> of > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>> project such as adding tests and well commenting the > > >>> code. > > >>>>>>>>>>>>>>> Sorry for the long email and hope to hear back from > > >>> you. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Best regards, > > >>>>>>>>>>>>>>> Ahmed Eldawy > > >> > > >> > > > -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com