Hello Chris!
Le 26/02/14 07:59, Mattmann, Chris A (3980) a écrit :
How about making DefaultFeature leverage Apache Tika's Metadata [1]
class? It's a key->multi-value structure, and uses Adobe XMP properties
to represent the value distribution.
If I'm understanding right, Tika can work before Lucene in order to
represents data from various sources (PDF, Office, TIFF, etc.) in a
uniform way that Lucene can index, is that right?
Actually one of our guys made some experiments with Tika, and we have
the feeling that it is a good match for the 'org.apache.sis.metadata'
package. The SIS metadata classes were necessary for ISO 19115 / 19139
support, but we should probably provide an adapter to Tika metadata for
Lucene indexing. It could be done in a "sis-tika" module in order to
keep the dependency in its dedicated module, as we did for "sis-netcdf"
for instance.
The match may be less direct for Feature, since a Feature instance is
not really like metadata but rather like a single row in a database
table. In particular we will need to introduce later FeatureType,
AttributeType and PropertyType for describing the "feature schema"
(similar to declaring the columns of a database table). There is also a
wish to follow the ISO 19109:2013 standard (which defines the
above-cited types), and have classes that we can map to GML (Geographic
Markup Language).
I think we should create a JIRA task for a "sis-tika" module mapping
metadata. Do you want me to do so? Mapping Feature could also be
investigated, but this seem less obvious to me than metadata. Do you
wish to elaborate on Feature-Tika mapping, or do we focus on metadata
for now?
Cheers,
Martin