Hi Chris,

Thanks for the ODL reference. I'm working through it now.

Steve

On Sat, May 28, 2011 at 1:55 AM, Mattmann, Chris A (388J) <
[email protected]> wrote:

> Hey Steve,
>
>
> > Chris, do you have a good reference for ODL files?
>
> The NASA Planetary Data System (PDS) Standards Reference and Chapter 12 on
> ODL is the best one I know:
>
> http://pds.nasa.gov/tools/standards-reference.shtml
>
> > It sounds like MinODL
> > parser will allow you to traverse from Group to Group Data Fields to
> > dimensions and the variables in an HDF-EOS file
>
> +1, yep.
>
> >
> > and to dimensions and variables in netCDF land, true?
>
> +1, yep.
>
> That's the goal!
>
> Cheers,
> Chris
>
> >
> > On Fri, May 27, 2011 at 12:31 PM, Mattmann, Chris A (388J) <
> > [email protected]> wrote:
> >
> >> Hey Steve!
> >>
> >> Nice to see you show up on the list :-) Yep, I totally agree, I have a
> >> couple of useful additions I'm going to create issues for and contribute
> >> back to Tika:
> >>
> >> 1. MinODL parser for ODL files themselves and also used in 2 below;
> >> 2. ParseContext properties identifying:
> >>  - groups that are in fact ODL values, that need to be parsed with the
> >> MinODL parser (useful for NetCDF and for HDF)
> >>  - what groups to select out (e.g., in HDF, by Path
> >> /Group1/SubGroup1/Property, and in NetCDF just by name)
> >>
> >> I think the combination of those will help the HDF and NetCDF parsers to
> >> become more robust, and configurable. Also, GDAL is high on my priority
> >> list. I've already built the Java bindings, but am working through some
> >> trickery with GDAL since it doesn't like the fact that Tika isn't file
> >> based, and when we use TikaInputStream, it creates a file of arbitrary
> >> extension (which ticks off GDAL as it's looking for something specific).
> I
> >> have a work-around though in the works...
> >>
> >> Cheers,
> >> Chris
> >>
> >>
> >> On May 26, 2011, at 4:20 AM, Steve Aulenbach wrote:
> >>
> >>> Hi Chris,
> >>>
> >>> I think your plan to improve the netCDF and HDF parsing is a great one.
> >> The
> >>> richness of a full ncdump of netCDF metadata and a full ncdump HDF-EOS
> >>> metadata would be an excellent addition to the 1.0 release of Tika. I
> >> have
> >>> discussed Tika to several science data user  and they usually ask about
> >>> netCDF and HDF-EOS metadata capabilities. A GDAL parser is also a great
> >>> idea.
> >>>
> >>> Thanks,
> >>> Steve
> >>>
> >>> On Fri, May 20, 2011 at 12:22 PM, Mattmann, Chris A (388J) <
> >>> [email protected]> wrote:
> >>>
> >>>> Hey Jukka et al.,
> >>>>
> >>>>> It's a few months since 0.9 and our Tika in Action book is soon ready
> >>>>> for print, so I think it's good time to start planning for the 1.0
> >>>>> release.
> >>>>
> >>>> Looking forward to not writing anything for a while :-) I doubt it'll
> >>>> happen knowing how things go, but also really really happy with where
> >> the
> >>>> book is (and banging on those last revisions! :-) ).
> >>>>
> >>>>>
> >>>>> There are a few odds and ends that I'd still like to sort out in the
> >>>>> trunk, but overall I think we're in a pretty much ready for the
> switch
> >>>>> from 0.x to 1.x.
> >>>>
> >>>> +1.
> >>>>
> >>>>>
> >>>>> One major issue to be decided is whether we want to follow up with
> the
> >>>>> earlier intention of dropping deprecated functionality (like the
> >>>>> three-argument parse() method) before the 1.0 release.
> >>>>
> >>>> +1, I'd be fine with this. I'm a fan of following through on things
> that
> >> we
> >>>> say we're going to do if for no other good reason than we said we're
> >> going
> >>>> to do it.
> >>>>
> >>>> +1 to dropping the 3 arg parse method.
> >>>>
> >>>>> I think we
> >>>>> should do that and also make some other backwards-incompatible
> >>>>> cleanups while we're at it. That way we'll have less old baggage to
> >>>>> carry as we evolve through the 1.x release cycle.
> >>>>
> >>>> +1, my biggest thing to work on is improving the NetCDF and HDF
> parsing,
> >>>> adding an ODL parser (I'll create an issue for this), adding some
> >> spatial
> >>>> parsers (working on the GDAL one right now), and maybe some
> >> documentation on
> >>>> how to use the science data file formats. I should have time over the
> >> next
> >>>> month or so to complete these.
> >>>>
> >>>>>
> >>>>> Another thing to think about is whether we want to do a formal Apache
> >>>>> press release about Tika reaching 1.0 status.
> >>>>
> >>>> +1. I'd be happy to work with Jukka, as Nick suggested, to draft this,
> >> and
> >>>> then from there to work with Sally to make it happen.
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Cheers,
> >>>> Chris
> >>>>
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Chris Mattmann, Ph.D.
> >>>> Senior Computer Scientist
> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>> Office: 171-266B, Mailstop: 171-246
> >>>> Email: [email protected]
> >>>> WWW:   http://sunset.usc.edu/~mattmann/
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Adjunct Assistant Professor, Computer Science Department
> >>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>
> >>>>
> >>
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: [email protected]
> >> WWW:   http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [email protected]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Reply via email to