Hi Joe,

On Feb 26, 2012, at 11:06 AM, Joe White wrote:

> Hi, Chris,
> I would agree that we probably should come up with a more comprehensive 
> solution for this wrt the metadata object and the resulting XHTML.  That 
> would make this feel a little more like the geospatial stuff is more of a 
> first class citizen in the metadata hierarchy.

+1.

> 
> We will probably need to support more coordinate systems than just WGS 84, as 
> there are a number of systems that either have no transformation to WGS 84.  

+1, agreed, WGS84 was just the first one that came to mind.


> The encoding of the WKT is also pretty important.  Would you rather break it 
> down to it's component parts, probably datum and projection for starters, or 
> leave it whole?  Obviously, the more metadata we have, the more powerful Tika 
> becomes, but there is a point where you have too much data that is not as 
> useful.

Let's start out with its component parts, datum and projection, and encode 
those as metadata fields. So we'd likely update the existing Geographic 
metadata interface
with these new keys as a starter.

> 
> On another note, I took a look at the code for your 605 patch, and I have a 
> suggestion. Reading the notes on the checkins for the patch, I noticed that 
> no one had suggested using the in-memory Dataset as the default type.  There 
> is no reason why the stream used to open the Tika parser could not be used to 
> fill a buffer with the file data, and then use that to create a dataset.

Hmm, so your suggestion is to use the in-memory Dataset API and that would be 
streamable via Tika? Hmm, that would be great, I just wasn't as familiar with 
GDAL
to know how to do that, so a coding example if you have one in Java would help 
me to wrap my head around it.

> 
> As it is, I'm trying to get GDAL to cooperate with me on my Mac.  Being a 
> newcomer to Mac seems to be a drawback when trying to be productive.  It just 
> takes a little more fight to get the bits to do what I really want.
> 

Heh, yeah I was trying to do this too. At one point I had it running but a few 
OS upgrades have nixed that. Let's see if I can get it up
and running again too so we can co-develop this.

> In any case, once I get GDAL whipped into shape, I'll see if I can't get a 
> test file to recognize any geospatial data, and then we will be off and 
> running.

Great!

Cheers,
Chris

> On Feb 26, 2012, at 1:10 PM, Mattmann, Chris A (388J) wrote:
> 
>> Hi Joe,
>> 
>> Awesome! Thanks for picking this up and getting interested in this work. 
>> Right now, the only use cases we've had so far
>> is to represent lats and lons (WGS84). It would be great to extract more 
>> information and come up with a policy for representing
>> more WKTs and so forth. We should probably start by coming up with a scheme 
>> for encoding the extracted information in the 
>> Tika metadata object and in its output XHTML. Do you have any ideas about 
>> how to do that? Right now in the existing patch
>> on TIKA-605, I simply was intended to use the met object and its 
>> key-multi-value structure to represent the extracted information
>> but to take advantage of streaming and of content handlers, we ought to 
>> encode this information in the output XHTML.
>> 
>> Thoughts?
>> 
>> Cheers,
>> Chris
>> 
>> On Feb 26, 2012, at 9:39 AM, Joe White wrote:
>> 
>>> Hi,
>>> I'm looking into implementing a bridge/link between Tika and GDAL so that 
>>> geospatial information can be saved from georeferenced images and vector 
>>> types.  One thing that I have noticed while going through the code is that 
>>> the code only defines geographic coordinate types, using latitudes and 
>>> longitudes.  Is this by design?  If GDAL is wrapped into Tika, and a 
>>> projected image is imported, are the geospatial extents meant to be held in 
>>> the metadata as geographic points, possibly as WGS 84?  
>>> 
>>> Thanks
>>> 
>>> Joe White
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to