[ 
https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143631#comment-14143631
 ] 

Ann Burgess commented on TIKA-1423:
-----------------------------------

Hi all, 

I've actually written a parser for GRIB files that works well on my computer, 
however I've come up against some roadblocks in the implementation of the 
parser. Particularly related to the updated NetCDF-All.jar that is necessary to 
open .grib2 files.  I've communicated with the folks at UCAR regarding this 
issue, but have yet to resolve it.  See below correspondence with UCAR related 
to this issue:
 
"It turns out that this is an issue with Tika. tika-app-1.6-SNAPSHOT.jar 
actually includes netcdf-4.2.20, so you have 2 different versions of the same 
library on the classpath, which is always bad. And unfortunately, using Tika's 
bundled netcdf-4.2.20 alone won't work, because support for reading GRIB files 
was only added to NetCDF-Java in 4.3+. This may be a problem that only the Tika 
developers know how to fix. I suggest opening a ticket with them.
Good luck!  Christian Ward-Garrison"

I've attached a test file (gdas1.forecmwf.2014062612.grib2) and my .grib2 file 
parser (GribParser.java) file.  

I call the parser from my computer as: 

Annies-MacBook-Pro:tika $ java -classpath 
.:netcdfAll-4.3.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar:annie-parsers.jar 
org.apache.tika.cli.TikaCLI --text 
/Users/IGSWAHWSWBURGESS/Development/tikadev/tika/tika-parsers/src/test/resources/test-documents/gdas1.forecmwf.2014062612.grib2
 

**NOTE:  Because of the issues with the new netcdfAll-4.3.jar and 
tika-app-1.6-SNAPSHOT.jar, for this parser to work, netcdfAll-4.3.jar must be 
called first in the sequence, as that is the .jar file that has the capability 
to open .grib2 files. 

OUTPUT:

----------------------------------------------------------------------
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/Users/IGSWAHWSWBURGESS/Development/tikadev/tika/netcdfAll-4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/Users/IGSWAHWSWBURGESS/Development/tikadev/tika/tika-app/target/tika-app-1.6-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
Sep 22, 2014 10:36:04 AM ucar.nc2.util.DiskCache2 setRootDirectory
INFO: DiskCache2 create directory /Users/IGSWAHWSWBURGESS/.unidata/cache/

dimensions:
        lon=360;
        lat=181;
        isobaric=11;
        isobaric1=15;
        time=1;

variables:

floatlat(lat=181);
         :units = "degrees_north";

floatlon(lon=360);
         :units = "degrees_east";

floatisobaric(isobaric=11);
         :units = "Pa";
         :long_name = "Isobaric surface";
         :positive = "down";
         :Grib2_level_type = 100;

floatisobaric1(isobaric1=15);
         :units = "Pa";
         :long_name = "Isobaric surface";
         :positive = "down";
         :Grib2_level_type = 100;

inttime(time=1);
         :units = "Hour since 2014-06-26T12:00:00Z";
         :standard_name = "time";

floatTemperature_isobaric(time=1, isobaric1=15, lat=181, lon=360);
         :long_name = "Temperature @ Isobaric surface";
         :units = "K";
         :missing_value = NaNf;
         :abbreviation = "TMP";
         :Grib_Variable_Id = "VAR_0-0-0_L100";
         :Grib2_Parameter = 0, 0, 0;
         :Grib2_Parameter_Discipline = "Meteorological products";
         :Grib2_Parameter_Category = "Temperature";
         :Grib2_Parameter_Name = "Temperature";
         :Grib2_Level_Type = 100;
         :Grib2_Generating_Process_Type = "Forecast";

floatRelative_humidity_isobaric(time=1, isobaric=11, lat=181, lon=360);
         :long_name = "Relative humidity @ Isobaric surface";
         :units = "%";
         :missing_value = NaNf;
         :abbreviation = "RH";
         :Grib_Variable_Id = "VAR_0-1-1_L100";
         :Grib2_Parameter = 0, 1, 1;
         :Grib2_Parameter_Discipline = "Meteorological products";
         :Grib2_Parameter_Category = "Moisture";
         :Grib2_Parameter_Name = "Relative humidity";
         :Grib2_Level_Type = 100;
         :Grib2_Generating_Process_Type = "Forecast";

floatu-component_of_wind_isobaric(time=1, isobaric1=15, lat=181, lon=360);
         :long_name = "u-component of wind @ Isobaric surface";
         :units = "m/s";
         :missing_value = NaNf;
         :abbreviation = "UGRD";
         :Grib_Variable_Id = "VAR_0-2-2_L100";
         :Grib2_Parameter = 0, 2, 2;
         :Grib2_Parameter_Discipline = "Meteorological products";
         :Grib2_Parameter_Category = "Momentum";
         :Grib2_Parameter_Name = "u-component of wind";
         :Grib2_Level_Type = 100;
         :Grib2_Generating_Process_Type = "Forecast";

floatv-component_of_wind_isobaric(time=1, isobaric1=15, lat=181, lon=360);
         :long_name = "v-component of wind @ Isobaric surface";
         :units = "m/s";
         :missing_value = NaNf;
         :abbreviation = "VGRD";
         :Grib_Variable_Id = "VAR_0-2-3_L100";
         :Grib2_Parameter = 0, 2, 3;
         :Grib2_Parameter_Discipline = "Meteorological products";
         :Grib2_Parameter_Category = "Momentum";
         :Grib2_Parameter_Name = "v-component of wind";
         :Grib2_Level_Type = 100;
         :Grib2_Generating_Process_Type = "Forecast";

----------------------------------------------------------------------

As you can see, the parser is able to extract the appropriate information from 
the .grib2 file, ONLY after the warnings about SLF4J issues.  

I'll be available the rest of the day if you have any questions, but will only 
be available quite sporadically  after this.  GOOD LUCK with the parser - I 
hope we can make a commit soon!!!

Annie

> Build a parser to extract data from GRIB formats
> ------------------------------------------------
>
>                 Key: TIKA-1423
>                 URL: https://issues.apache.org/jira/browse/TIKA-1423
>             Project: Tika
>          Issue Type: New Feature
>          Components: metadata, mime, parser
>    Affects Versions: 1.6
>            Reporter: Vineet Ghatge
>            Priority: Critical
>              Labels: features, newbie
>             Fix For: 1.7
>
>         Attachments: GribParser.java, gdas1.forecmwf.2014062612.grib2
>
>
> Arctic dataset contains a MIME format called GRIB -  General 
> Regularly­distributed information in Binary form 
> http://en.wikipedia.org/wiki/GRIB . GRIB is a well known data format which is 
> a concise data format used in meteorology to store historical and 
> weather data. There are 2 different types of the format ­ GRIB 0, GRIB 2.  
> The focus will be on GRIB 2 which is the most prevalent. Each GRIB record 
> intended for either transmission or storage contains a single parameter with 
> values located at an array of grid points, or represented as a set of 
> spectral coefficients, for a single level (or layer), encoded as a continuous 
> bit stream. Logical divisions of the record are designated as "sections", 
> each of which provides control information and/or data. A GRIB record 
> consists of six sections, two of which are optional: 
>  
> (0) Indicator Section 
> (1) Product Definition Section (PDS) 
> (2) Grid Description Section (GDS) ­ optional 
> (3) Bit Map Section (BMS) ­ optional 
> (4) Binary Data Section (BDS) 
> (5) '7777' (ASCII Characters)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to