+1 on the functionalities. The java.util.Map is fairly basic now. An improvement could be a feature class that has a map of <String, DataType>, where DataType corresponds to the appropriate DataType ( http://www.clicketyclick.dk/databases/xbase/format/data_types.html.) Currently I am converting everything to strings.
Another improvement may be to give ordering to the fields because fields have an intrinsic order. Maybe use something like this ? https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections/map/ListOrderedMap.html The bulk ingests would be an api where you can call a jar file from hadoop, give it appropriate directory to pull shapefiles in HDFS, and it would process each shapefile per mapper. The first ingest I am working on is a transformation of points to a 2D-histogram to get an idea of density of features of all the shapefiles. This could be extended to have different types of outputs (store in a database or more efficient format on hdfs) Thanks, Travis On Thu, Jun 20, 2013 at 6:11 AM, Martin Desruisseaux < [email protected]> wrote: > Hello Travis > > Le 20/06/13 11:13, Travis L Pinney a écrit : > > Could the sis-storage be a "module" as well as have the ability to be >> compiled to a sis-shapefile.jar that has less dependencies for people that >> only want to use shape file functionality? Maybe it can have two outputs >> and generate a standalone artifact as well as be including in the larger >> package. >> > > I think that it depends what we call using only the Shapefile > functionality. Some functionalities that would probably require other SIS > modules are: > > * Allow peoples to know what is inside the shapefiles without relying > on ShapefileStore-specific API (require sis-metadata module). > * Parse the map projection definition (will require sis-referencing > module, after completion). > * Leverage the index for faster access (may require sis-utility). > > > Maybe more important, the current ShapefileStore exposes the features as a > java.util.Map. I think that it is okay as a temporary solution since SIS > does not yet implement the Feature interface. But once a real Feature > framework is provided in SIS, we should probably leverage it in the > ShapefileStore class if we want a consistent API for the whole project... > > Furthermore, in a future SIS version we will start to implement Filters > (i.e. allow the ShapefileStore to read only the data having some > characteristics, for example only the data in some area of interest). Some > filtering can be applied on-the-fly at reading time by the ShapefileStore, > especially the filtering that can leverage index. So ShapefileStore would > depends on the filter classes (some filters may imply map projection, thus > depending on sis-referencing, etc.). > > For all those reasons, it seems to me that a ShapefileStore without SIS > dependency would have very limited functionality in medium/long term... > > > > I want to write a shapefile input format for hadoop for doing bulk ingests >> of shapefiles. Where would be the best place to add this functionality? >> > > Could you give more details about what the bulk ingests would perform > exactly? > > Martin > >
