+1 to making it work for vector formats too -- geospatial imagery was just the first notch to tackle... :)
Cheers, Chris On Feb 26, 2012, at 11:09 AM, Joe White wrote: > Chris, > One other thing occurred to me while looking at this. All of the discussion > I've seen thus far revolves around geospatial imagery. Has there been any > discussion about using Tika on any of the geospatial vector formats? I would > think they would go hand in hand, and OGR recognizes many of them. > > Joe > > On Feb 26, 2012, at 1:10 PM, Mattmann, Chris A (388J) wrote: > >> Hi Joe, >> >> Awesome! Thanks for picking this up and getting interested in this work. >> Right now, the only use cases we've had so far >> is to represent lats and lons (WGS84). It would be great to extract more >> information and come up with a policy for representing >> more WKTs and so forth. We should probably start by coming up with a scheme >> for encoding the extracted information in the >> Tika metadata object and in its output XHTML. Do you have any ideas about >> how to do that? Right now in the existing patch >> on TIKA-605, I simply was intended to use the met object and its >> key-multi-value structure to represent the extracted information >> but to take advantage of streaming and of content handlers, we ought to >> encode this information in the output XHTML. >> >> Thoughts? >> >> Cheers, >> Chris >> >> On Feb 26, 2012, at 9:39 AM, Joe White wrote: >> >>> Hi, >>> I'm looking into implementing a bridge/link between Tika and GDAL so that >>> geospatial information can be saved from georeferenced images and vector >>> types. One thing that I have noticed while going through the code is that >>> the code only defines geographic coordinate types, using latitudes and >>> longitudes. Is this by design? If GDAL is wrapped into Tika, and a >>> projected image is imported, are the geospatial extents meant to be held in >>> the metadata as geographic points, possibly as WGS 84? >>> >>> Thanks >>> >>> Joe White >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++