Hi Johann, I am not all that familiar with Jackrabbit but after a little bit of reading, it looks like a good approach for maintaining a common environment for content management. You're right about the Tika/GDAL implementation living at the file access level. If the JCR APIs can use (or reuse) the aforementioned libs to gain access to a LOT of file formats, I'm pretty sire it will be good to go. How does the rest of the community feel about this?
Thanks, Adam On Jan 21, 2013, at 5:13 AM, johann sorel wrote: > Hello everyone, > > Sorry for the late answer, I wasn't yet registered on this mailing list. > Here is a quick introduction since martin already talked about me : > I'm Johann Sorel from the same company and working on the geotoolkit project > too, I mainly work on data reader/writer, rendering engines and swing user > interfaces but also a bit on everything : metadata,coverage,security,web > services. > > I have being looking at the Tika project, I never used it so correct me if I > say something wrong. > From what I see it is limited to Metadata reading only and reduced to file > types. > Writing is also something the Apache SIS project should provide so I believe > SIS should have a higher level api that Tika could implement. > > About data source, I propose a different approach : Java Content Repository > version 2 (JCR) specification (JSR 170 and 283) > A possible implementation is Apache JackRabbit : http://jackrabbit.apache.org > While Tika might be interesting for metadata, the JCR specification defines > apis for reading, writing and queries. > Beside the community using JCR is far larger then Tika or GDAL, to name some > of them : LifeRay, Exoplatform, Oracle beehive, Hippo CMS, ... > Reusing the same or a similar model would simplify the integration of the SIS > model in existing applications > and we would benefit from the expertise already made in this specification. > The JCR model is very similar to features, it has Nodes and NodeTypes which I > believe might be useable for metadata too. > > Filter would be placed just before datasource since it should have a query > api which use filters. > > If I can make an global view of the solution we have so far : > (I won't talk about referencing, martin has much more knowledge then me on > this topic) > > 1) we have 3 base storage atoms : Metadata, Feature(and underneath Geometry), > Coverage > --> defined by several OGC/ISO specifications > 2) to interrogate them we can use : Filter, Expression, Query > --> defined by OGC(exist in geoapi-pending) Query --> defined in JCR > 3) to manage/query/analyze them : Repository/DataSource/DataStore > --> can be based on JCR , GDAL ,tika models or a mix > 4) to render the datas : style model, Map model > --> can be OGC SLD/SE(exist in geoapi-pending), could also be some kind of > CSS , > -->the map model could be OGC WMC but this spec is limited to web, it would > require some improvements. > > Some of those solutions are already implemented and have been properly > separated > in interfaces (geoapi-pending) and implementations (geotoolkit-pending) so it > could be used as a starting point. > > > Johann Sorel > Geomatys > > > > > > ------------------------------------------------------------------------------- > Hey Martin, > > On 1/18/13 12:12 PM, "Martin Desruisseaux" > <[email protected]> wrote: > > >Le 18/01/13 11:31, Adam Estrada a écrit : > >> Spot on with Tika being an SIS dependency, Martin! The idea is to be > >>able > >> to extract content from as may file formats as possible based on their > >>MIME > >> types. GDAL provides the interface to a lot more geospatial formats. > > > >We have the notion of "data source" interface (not yet committed), and > >Tika or GDAL can be one of them. GeoTIFF, NetCDF, etc. are other data > >sources (we have some extra flexibility if we read NetCDF files directly > >rather than through GDAL for instance, but we would do that only for the > >most important formats instead than duplicating the totality of GDAL). > >However "data sources" appear downstream relative to metadata and other > >basic modules. A list of modules in approximative dependency order can be: > > > > - utility > > - metadata > > - referencing > > - geometry > > - feature > > - coverage > > - data source <-- Tika/GDAL can be plugged here > > - styles > > - renderer > > +1 that makes sense to me. > > Note I also believe there is another dependency from Tika to SIS > (especially for the WKT parsing). > > > > >I'm not sure if "filter" would be before or after "data source" - Johann > >Sorel would known better (I think he is watching this list, even if he > >didn't sent emails yet). > > Come on Johann, come out and say hi! :) > > > > >Actually the "sis-metadata" module being built is not about arbitrary > >metadata, but rather about the "lingua franca" to be used in SIS for > >metadata. Many metadata model could be choose for this purpose, but the > >proposed SIS approach is to select ISO standards as the lingua franca. > >All other sources of metadata would need to be converted to ISO 19115 > >before to be used in a source-independent way by all SIS modules. This > >is the purpose for instance of the NetCDF - ISO mapping mentioned in > >previous email. This explain why "data source", which is where > >input/output happen, is so far away from metadata in the above > >dependency chain; all preceding modules define the models which will > >represent the data read by the data sources. > > It would be great to use Tika to convert *insert format here* to ISO 19115 > if possible. > > > > >Obviously the XML (un)marshalling is an exception to what I just said, > >since it is defined straight in the core metadata module instead than as > >a data source. But we should have (I hope) few such exceptions. This > >exception exists for two reasons: 1) as a side effect of the way JAXB > >works (annotations straight in the source code), and 2) because while > >ISO 19115 would be the "lingua franca" for the conceptual model, XML is > >the "lingua franca" for the file format at least at OGC/ISO/INSPIRE, so > >maybe it deserves that special treatment... > > +1. > > Cheers, > Chris > > > > > Martin > > >
