Hey Travis, I would strongly urge you to do development on Apache SIS on Apache hardware. Github is great; and convenience. But when you commit there, we don't get email notifications and so forth here and the community loses out (and we lose out) on having email records; archives, and other things here that show work is going on in SIS.
I have a simple proposal :) You guys are definitely more Git fans now than SVN fans. Martin D when he originally came onto the project wanted to use Git, and was more familiar with it, but took great effort to adopt SVN b/c ASF support for Git at that time was quite limited. However, with you here now; with Adam; with Martin; and with a number of other folks contributing (Joe W. are you a Git guy?) that are Git fans, it's worth revisiting this discussion. However, *after* 0.3 :) Let's release that using SVN so we don't hold that off anymore. After 0.3 maybe we can move to Git if this discussion is favorable. Apache now supports writeable Git repos (see http://git.apache.org/) and the project's canonical repository can be Git. We can still mirror to Github, etc., but the bits (and really the work) ought to be happening here at the ASF. So, discuss please :) FWIW, I'm +1 to move to Git (after 0.3). Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Travis L Pinney <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, June 20, 2013 7:31 AM To: dev <[email protected]> Subject: Re: shapefile branch >Good to know about the OGC/ISO interfaces. > >It would make sense to apply processing to NetCDF, Shapefile, Mbtiles >files >etc. I can set up in another code repo on github. The reason I want to >work >on that concurrently is to stress test the existing library with lots of >data to find bugs that may not appear with simple unit tests. > > > >Thanks, >Travis > > > > > >On Thu, Jun 20, 2013 at 7:42 AM, Martin Desruisseaux < >[email protected]> wrote: > >> Le 20/06/13 12:47, Travis L Pinney a écrit : >> >> The java.util.Map is fairly basic now. An improvement could be a >>feature >>> class that has a map of <String, DataType>, where DataType corresponds >>>to >>> the appropriate DataType ( >>> >>>http://www.clicketyclick.dk/**databases/xbase/format/data_**types.html<h >>>ttp://www.clicketyclick.dk/databases/xbase/format/data_types.html> >>> .) >>> Currently I am converting everything to strings. >>> >> >> Actually Feature, FeatureType and related interfaces derived from >>OGC/ISO >> standards (in particular GML - Geographic Markup Language - schemas) are >> already provided in GeoAPI: >> >> http://www.geoapi.org/**snapshot/pending/org/opengis/** >> >>feature/package-summary.html<http://www.geoapi.org/snapshot/pending/org/o >>pengis/feature/package-summary.html> >> >> This is in the "pending" part of GeoAPI, so we have room for revising >> them, in particular make sure that they are still in agreement with >>latest >> OGC/ISO standards. Then we would need to provide an implementation in >>SIS, >> porting Geotk classes when possible or appropriate. However there is a >> somewhat long road before we reach that point, so it seems to me that >>your >> current approach (String in java.util.Map) is good in the main time. >> >> >> >> The bulk ingests would be an api where you can call a jar file from >>> hadoop, >>> give it appropriate directory to pull shapefiles in HDFS, and it would >>> process each shapefile per mapper. The first ingest I am working on is >>>a >>> transformation of points to a 2D-histogram to get an idea of density of >>> features of all the shapefiles. This could be extended to have >>>different >>> types of outputs (store in a database or more efficient format on hdfs) >>> >> >> I would suggest to separate the two tasks. I think that the above is >>what >> we call a "processing", which is the subject of (yet an other) OGC >> standard. Processing and DataStore should be independent, i.e. someone >>may >> want to apply the above processing on NetCDF files too... Maybe we can >> focus on ShapefileStore first, and revisit processing later? Processings >> will need DataStores first in order to perform their work anyway... >> >> Martin >> >>
