Hi Johann,

I am not all that familiar with Jackrabbit but after a little bit of reading, 
it looks like a good approach for maintaining a common environment for content 
management. You're right about the Tika/GDAL implementation living at the file 
access level. If the JCR APIs can use (or reuse) the aforementioned libs to 
gain access to a LOT of file formats, I'm pretty sire it will be good to go. 
How does the rest of the community feel about this?

Thanks,
Adam

On Jan 21, 2013, at 5:13 AM, johann sorel wrote:

> Hello everyone,
> 
> Sorry for the late answer, I wasn't yet registered on this mailing list.
> Here is a quick introduction since martin already talked about me :
> I'm Johann Sorel from the same company and working on the geotoolkit project 
> too, I mainly work on data reader/writer, rendering engines and swing user 
> interfaces but also a bit on everything : metadata,coverage,security,web 
> services.
> 
> I have being looking at the Tika project, I never used it so correct me if I 
> say something wrong.
> From what I see it is limited to Metadata reading only and reduced to file 
> types.
> Writing is also something the Apache SIS project should provide so I believe 
> SIS should have a higher level api that Tika could implement.
> 
> About data source, I propose a different approach : Java Content Repository 
> version 2 (JCR) specification (JSR 170 and 283)
> A possible implementation is Apache JackRabbit : http://jackrabbit.apache.org
> While Tika might be interesting for metadata, the JCR specification defines 
> apis for reading, writing and queries.
> Beside the community using JCR is far larger then Tika or GDAL, to name some 
> of them : LifeRay, Exoplatform, Oracle beehive, Hippo CMS, ...
> Reusing the same or a similar model would simplify the integration of the SIS 
> model in existing applications
> and we would benefit from the expertise already made in this specification.
> The JCR model is very similar to features, it has Nodes and NodeTypes which I 
> believe might be useable for metadata too.
> 
> Filter would be placed just before datasource since it should have a query 
> api which use filters.
> 
> If I can make an global view of the solution we have so far :
> (I won't talk about referencing, martin has much more knowledge then me on 
> this topic)
> 
> 1) we have 3 base storage atoms : Metadata, Feature(and underneath Geometry), 
> Coverage
>  --> defined by several OGC/ISO specifications
> 2) to interrogate them we can use : Filter, Expression, Query
>  --> defined by OGC(exist in geoapi-pending)      Query --> defined in JCR
> 3) to manage/query/analyze them : Repository/DataSource/DataStore
>  --> can be based on JCR , GDAL ,tika models or a mix
> 4) to render the datas : style model, Map model
>  --> can be OGC SLD/SE(exist in geoapi-pending), could also be some kind of 
> CSS ,
>  -->the map model could be OGC WMC but this spec is limited to web, it would 
> require some improvements.
> 
> Some of those solutions are already implemented and have been properly 
> separated
> in interfaces (geoapi-pending) and implementations (geotoolkit-pending) so it 
> could be used as a starting point.
> 
> 
> Johann Sorel
> Geomatys
> 
> 
> 
> 
> 
> -------------------------------------------------------------------------------
> Hey Martin,
> 
> On 1/18/13 12:12 PM, "Martin Desruisseaux"
> <[email protected]> wrote:
> 
> >Le 18/01/13 11:31, Adam Estrada a écrit :
> >> Spot on with Tika being an SIS dependency, Martin! The idea is to be
> >>able
> >> to extract content from as may file formats as possible based on their
> >>MIME
> >> types. GDAL provides the interface to a lot more geospatial formats.
> >
> >We have the notion of "data source" interface (not yet committed), and
> >Tika or GDAL can be one of them. GeoTIFF, NetCDF, etc. are other data
> >sources (we have some extra flexibility if we read NetCDF files directly
> >rather than through GDAL for instance, but we would do that only for the
> >most important formats instead than duplicating the totality of GDAL).
> >However "data sources" appear downstream relative to metadata and other
> >basic modules. A list of modules in approximative dependency order can be:
> >
> >  - utility
> >  - metadata
> >  - referencing
> >  - geometry
> >  - feature
> >  - coverage
> >  - data source   <-- Tika/GDAL can be plugged here
> >  - styles
> >  - renderer
> 
> +1 that makes sense to me.
> 
> Note I also believe there is another dependency from Tika to SIS
> (especially for the WKT parsing).
> 
> >
> >I'm not sure if "filter" would be before or after "data source" - Johann
> >Sorel would known better (I think he is watching this list, even if he
> >didn't sent emails yet).
> 
> Come on Johann, come out and say hi! :)
> 
> >
> >Actually the "sis-metadata" module being built is not about arbitrary
> >metadata, but rather about the "lingua franca" to be used in SIS for
> >metadata. Many metadata model could be choose for this purpose, but the
> >proposed SIS approach is to select ISO standards as the lingua franca.
> >All other sources of metadata would need to be converted to ISO 19115
> >before to be used in a source-independent way by all SIS modules. This
> >is the purpose for instance of the NetCDF - ISO mapping mentioned in
> >previous email. This explain why "data source", which is where
> >input/output happen, is so far away from metadata in the above
> >dependency chain; all preceding modules define the models which will
> >represent the data read by the data sources.
> 
> It would be great to use Tika to convert *insert format here* to ISO 19115
> if possible.
> 
> >
> >Obviously the XML (un)marshalling is an exception to what I just said,
> >since it is defined straight in the core metadata module instead than as
> >a data source. But we should have (I hope) few such exceptions. This
> >exception exists for two reasons: 1) as a side effect of the way JAXB
> >works (annotations straight in the source code), and 2) because while
> >ISO 19115 would be the "lingua franca" for the conceptual model, XML is
> >the "lingua franca" for the file format at least at OGC/ISO/INSPIRE, so
> >maybe it deserves that special treatment...
> 
> +1.
> 
> Cheers,
> Chris
> 
> >
> >     Martin
> >
> 

Reply via email to