Hi Chris, Thanks for the ODL reference. I'm working through it now.
Steve On Sat, May 28, 2011 at 1:55 AM, Mattmann, Chris A (388J) < [email protected]> wrote: > Hey Steve, > > > > Chris, do you have a good reference for ODL files? > > The NASA Planetary Data System (PDS) Standards Reference and Chapter 12 on > ODL is the best one I know: > > http://pds.nasa.gov/tools/standards-reference.shtml > > > It sounds like MinODL > > parser will allow you to traverse from Group to Group Data Fields to > > dimensions and the variables in an HDF-EOS file > > +1, yep. > > > > > and to dimensions and variables in netCDF land, true? > > +1, yep. > > That's the goal! > > Cheers, > Chris > > > > > On Fri, May 27, 2011 at 12:31 PM, Mattmann, Chris A (388J) < > > [email protected]> wrote: > > > >> Hey Steve! > >> > >> Nice to see you show up on the list :-) Yep, I totally agree, I have a > >> couple of useful additions I'm going to create issues for and contribute > >> back to Tika: > >> > >> 1. MinODL parser for ODL files themselves and also used in 2 below; > >> 2. ParseContext properties identifying: > >> - groups that are in fact ODL values, that need to be parsed with the > >> MinODL parser (useful for NetCDF and for HDF) > >> - what groups to select out (e.g., in HDF, by Path > >> /Group1/SubGroup1/Property, and in NetCDF just by name) > >> > >> I think the combination of those will help the HDF and NetCDF parsers to > >> become more robust, and configurable. Also, GDAL is high on my priority > >> list. I've already built the Java bindings, but am working through some > >> trickery with GDAL since it doesn't like the fact that Tika isn't file > >> based, and when we use TikaInputStream, it creates a file of arbitrary > >> extension (which ticks off GDAL as it's looking for something specific). > I > >> have a work-around though in the works... > >> > >> Cheers, > >> Chris > >> > >> > >> On May 26, 2011, at 4:20 AM, Steve Aulenbach wrote: > >> > >>> Hi Chris, > >>> > >>> I think your plan to improve the netCDF and HDF parsing is a great one. > >> The > >>> richness of a full ncdump of netCDF metadata and a full ncdump HDF-EOS > >>> metadata would be an excellent addition to the 1.0 release of Tika. I > >> have > >>> discussed Tika to several science data user and they usually ask about > >>> netCDF and HDF-EOS metadata capabilities. A GDAL parser is also a great > >>> idea. > >>> > >>> Thanks, > >>> Steve > >>> > >>> On Fri, May 20, 2011 at 12:22 PM, Mattmann, Chris A (388J) < > >>> [email protected]> wrote: > >>> > >>>> Hey Jukka et al., > >>>> > >>>>> It's a few months since 0.9 and our Tika in Action book is soon ready > >>>>> for print, so I think it's good time to start planning for the 1.0 > >>>>> release. > >>>> > >>>> Looking forward to not writing anything for a while :-) I doubt it'll > >>>> happen knowing how things go, but also really really happy with where > >> the > >>>> book is (and banging on those last revisions! :-) ). > >>>> > >>>>> > >>>>> There are a few odds and ends that I'd still like to sort out in the > >>>>> trunk, but overall I think we're in a pretty much ready for the > switch > >>>>> from 0.x to 1.x. > >>>> > >>>> +1. > >>>> > >>>>> > >>>>> One major issue to be decided is whether we want to follow up with > the > >>>>> earlier intention of dropping deprecated functionality (like the > >>>>> three-argument parse() method) before the 1.0 release. > >>>> > >>>> +1, I'd be fine with this. I'm a fan of following through on things > that > >> we > >>>> say we're going to do if for no other good reason than we said we're > >> going > >>>> to do it. > >>>> > >>>> +1 to dropping the 3 arg parse method. > >>>> > >>>>> I think we > >>>>> should do that and also make some other backwards-incompatible > >>>>> cleanups while we're at it. That way we'll have less old baggage to > >>>>> carry as we evolve through the 1.x release cycle. > >>>> > >>>> +1, my biggest thing to work on is improving the NetCDF and HDF > parsing, > >>>> adding an ODL parser (I'll create an issue for this), adding some > >> spatial > >>>> parsers (working on the GDAL one right now), and maybe some > >> documentation on > >>>> how to use the science data file formats. I should have time over the > >> next > >>>> month or so to complete these. > >>>> > >>>>> > >>>>> Another thing to think about is whether we want to do a formal Apache > >>>>> press release about Tika reaching 1.0 status. > >>>> > >>>> +1. I'd be happy to work with Jukka, as Nick suggested, to draft this, > >> and > >>>> then from there to work with Sally to make it happen. > >>>> > >>>> Thanks! > >>>> > >>>> Cheers, > >>>> Chris > >>>> > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> Chris Mattmann, Ph.D. > >>>> Senior Computer Scientist > >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>>> Office: 171-266B, Mailstop: 171-246 > >>>> Email: [email protected] > >>>> WWW: http://sunset.usc.edu/~mattmann/ > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> Adjunct Assistant Professor, Computer Science Department > >>>> University of Southern California, Los Angeles, CA 90089 USA > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> > >>>> > >> > >> > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Chris Mattmann, Ph.D. > >> Senior Computer Scientist > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 171-266B, Mailstop: 171-246 > >> Email: [email protected] > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Adjunct Assistant Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >
