[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143631#comment-14143631 ]
Ann Burgess commented on TIKA-1423: ----------------------------------- Hi all, I've actually written a parser for GRIB files that works well on my computer, however I've come up against some roadblocks in the implementation of the parser. Particularly related to the updated NetCDF-All.jar that is necessary to open .grib2 files. I've communicated with the folks at UCAR regarding this issue, but have yet to resolve it. See below correspondence with UCAR related to this issue: "It turns out that this is an issue with Tika. tika-app-1.6-SNAPSHOT.jar actually includes netcdf-4.2.20, so you have 2 different versions of the same library on the classpath, which is always bad. And unfortunately, using Tika's bundled netcdf-4.2.20 alone won't work, because support for reading GRIB files was only added to NetCDF-Java in 4.3+. This may be a problem that only the Tika developers know how to fix. I suggest opening a ticket with them. Good luck! Christian Ward-Garrison" I've attached a test file (gdas1.forecmwf.2014062612.grib2) and my .grib2 file parser (GribParser.java) file. I call the parser from my computer as: Annies-MacBook-Pro:tika $ java -classpath .:netcdfAll-4.3.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar:annie-parsers.jar org.apache.tika.cli.TikaCLI --text /Users/IGSWAHWSWBURGESS/Development/tikadev/tika/tika-parsers/src/test/resources/test-documents/gdas1.forecmwf.2014062612.grib2 **NOTE: Because of the issues with the new netcdfAll-4.3.jar and tika-app-1.6-SNAPSHOT.jar, for this parser to work, netcdfAll-4.3.jar must be called first in the sequence, as that is the .jar file that has the capability to open .grib2 files. OUTPUT: ---------------------------------------------------------------------- SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/IGSWAHWSWBURGESS/Development/tikadev/tika/netcdfAll-4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/IGSWAHWSWBURGESS/Development/tikadev/tika/tika-app/target/tika-app-1.6-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory] Sep 22, 2014 10:36:04 AM ucar.nc2.util.DiskCache2 setRootDirectory INFO: DiskCache2 create directory /Users/IGSWAHWSWBURGESS/.unidata/cache/ dimensions: lon=360; lat=181; isobaric=11; isobaric1=15; time=1; variables: floatlat(lat=181); :units = "degrees_north"; floatlon(lon=360); :units = "degrees_east"; floatisobaric(isobaric=11); :units = "Pa"; :long_name = "Isobaric surface"; :positive = "down"; :Grib2_level_type = 100; floatisobaric1(isobaric1=15); :units = "Pa"; :long_name = "Isobaric surface"; :positive = "down"; :Grib2_level_type = 100; inttime(time=1); :units = "Hour since 2014-06-26T12:00:00Z"; :standard_name = "time"; floatTemperature_isobaric(time=1, isobaric1=15, lat=181, lon=360); :long_name = "Temperature @ Isobaric surface"; :units = "K"; :missing_value = NaNf; :abbreviation = "TMP"; :Grib_Variable_Id = "VAR_0-0-0_L100"; :Grib2_Parameter = 0, 0, 0; :Grib2_Parameter_Discipline = "Meteorological products"; :Grib2_Parameter_Category = "Temperature"; :Grib2_Parameter_Name = "Temperature"; :Grib2_Level_Type = 100; :Grib2_Generating_Process_Type = "Forecast"; floatRelative_humidity_isobaric(time=1, isobaric=11, lat=181, lon=360); :long_name = "Relative humidity @ Isobaric surface"; :units = "%"; :missing_value = NaNf; :abbreviation = "RH"; :Grib_Variable_Id = "VAR_0-1-1_L100"; :Grib2_Parameter = 0, 1, 1; :Grib2_Parameter_Discipline = "Meteorological products"; :Grib2_Parameter_Category = "Moisture"; :Grib2_Parameter_Name = "Relative humidity"; :Grib2_Level_Type = 100; :Grib2_Generating_Process_Type = "Forecast"; floatu-component_of_wind_isobaric(time=1, isobaric1=15, lat=181, lon=360); :long_name = "u-component of wind @ Isobaric surface"; :units = "m/s"; :missing_value = NaNf; :abbreviation = "UGRD"; :Grib_Variable_Id = "VAR_0-2-2_L100"; :Grib2_Parameter = 0, 2, 2; :Grib2_Parameter_Discipline = "Meteorological products"; :Grib2_Parameter_Category = "Momentum"; :Grib2_Parameter_Name = "u-component of wind"; :Grib2_Level_Type = 100; :Grib2_Generating_Process_Type = "Forecast"; floatv-component_of_wind_isobaric(time=1, isobaric1=15, lat=181, lon=360); :long_name = "v-component of wind @ Isobaric surface"; :units = "m/s"; :missing_value = NaNf; :abbreviation = "VGRD"; :Grib_Variable_Id = "VAR_0-2-3_L100"; :Grib2_Parameter = 0, 2, 3; :Grib2_Parameter_Discipline = "Meteorological products"; :Grib2_Parameter_Category = "Momentum"; :Grib2_Parameter_Name = "v-component of wind"; :Grib2_Level_Type = 100; :Grib2_Generating_Process_Type = "Forecast"; ---------------------------------------------------------------------- As you can see, the parser is able to extract the appropriate information from the .grib2 file, ONLY after the warnings about SLF4J issues. I'll be available the rest of the day if you have any questions, but will only be available quite sporadically after this. GOOD LUCK with the parser - I hope we can make a commit soon!!! Annie > Build a parser to extract data from GRIB formats > ------------------------------------------------ > > Key: TIKA-1423 > URL: https://issues.apache.org/jira/browse/TIKA-1423 > Project: Tika > Issue Type: New Feature > Components: metadata, mime, parser > Affects Versions: 1.6 > Reporter: Vineet Ghatge > Priority: Critical > Labels: features, newbie > Fix For: 1.7 > > Attachments: GribParser.java, gdas1.forecmwf.2014062612.grib2 > > > Arctic dataset contains a MIME format called GRIB - General > Regularlydistributed information in Binary form > http://en.wikipedia.org/wiki/GRIB . GRIB is a well known data format which is > a concise data format used in meteorology to store historical and > weather data. There are 2 different types of the format GRIB 0, GRIB 2. > The focus will be on GRIB 2 which is the most prevalent. Each GRIB record > intended for either transmission or storage contains a single parameter with > values located at an array of grid points, or represented as a set of > spectral coefficients, for a single level (or layer), encoded as a continuous > bit stream. Logical divisions of the record are designated as "sections", > each of which provides control information and/or data. A GRIB record > consists of six sections, two of which are optional: > > (0) Indicator Section > (1) Product Definition Section (PDS) > (2) Grid Description Section (GDS) optional > (3) Bit Map Section (BMS) optional > (4) Binary Data Section (BDS) > (5) '7777' (ASCII Characters) -- This message was sent by Atlassian JIRA (v6.3.4#6332)