[jira] [Commented] (TIKA-1577) NetCDF Data Extraction
[ https://issues.apache.org/jira/browse/TIKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534847#comment-14534847 ] Ann Burgess commented on TIKA-1577: --- Take it away [~riverma]! NetCDF Data Extraction -- Key: TIKA-1577 URL: https://issues.apache.org/jira/browse/TIKA-1577 Project: Tika Issue Type: Improvement Components: handler, parser Affects Versions: 1.7 Reporter: Ann Burgess Assignee: Ann Burgess Labels: features, handler Fix For: 1.9 Original Estimate: 504h Remaining Estimate: 504h A netCDF classic or 64-bit offset dataset is stored as a single file comprising two parts: - a header, containing all the information about dimensions, attributes, and variables except for the variable data; - a data part, comprising fixed-size data, containing the data for variables that don't have an unlimited dimension; and variable-size data, containing the data for variables that have an unlimited dimension. The NetCDFparser currently extracts the header part. -- text extracts file Dimensions and Variables -- metadata extracts Global Attributes We want the option to extract the data part of NetCDF files. Lets use the NetCDF test file for our dev testing: tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1579) Add file type to NetCDFParser
[ https://issues.apache.org/jira/browse/TIKA-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385383#comment-14385383 ] Ann Burgess commented on TIKA-1579: --- Yes! On Sat, Mar 28, 2015 at 6:09 AM, Tyler Palsulich (JIRA) j...@apache.org -- -- Ann Bryant Burgess, PhD Postdoctoral Fellow Computer Science Department Viterbi School of Engineering University of Southern California Phone: (585) 738-7549 -- Add file type to NetCDFParser - Key: TIKA-1579 URL: https://issues.apache.org/jira/browse/TIKA-1579 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Assignee: Ann Burgess Attachments: TIKA-1579.abburgess.190315.patch.txt [~gostep] explains that, there are three versions of NetCDF (classic format, 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF file, the netCDF library will transparently detect its format so we do not need to adjust according to the detected format. That said, it would be good to know the file type as each can have the .nc extension. This will add patch with add file type to the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1577) NetCDF Data Extraction
[ https://issues.apache.org/jira/browse/TIKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384323#comment-14384323 ] Ann Burgess commented on TIKA-1577: --- This is a great idea. I'm all for not re-creating code if it already exists in good form! NetCDF Data Extraction -- Key: TIKA-1577 URL: https://issues.apache.org/jira/browse/TIKA-1577 Project: Tika Issue Type: Improvement Components: handler, parser Affects Versions: 1.7 Reporter: Ann Burgess Assignee: Ann Burgess Labels: features, handler Fix For: 1.8 Original Estimate: 504h Remaining Estimate: 504h A netCDF classic or 64-bit offset dataset is stored as a single file comprising two parts: - a header, containing all the information about dimensions, attributes, and variables except for the variable data; - a data part, comprising fixed-size data, containing the data for variables that don't have an unlimited dimension; and variable-size data, containing the data for variables that have an unlimited dimension. The NetCDFparser currently extracts the header part. -- text extracts file Dimensions and Variables -- metadata extracts Global Attributes We want the option to extract the data part of NetCDF files. Lets use the NetCDF test file for our dev testing: tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1578) Add file type description to HDFParsers
[ https://issues.apache.org/jira/browse/TIKA-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369982#comment-14369982 ] Ann Burgess commented on TIKA-1578: --- https://reviews.apache.org/r/32255/ Add file type description to HDFParsers --- Key: TIKA-1578 URL: https://issues.apache.org/jira/browse/TIKA-1578 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Assignee: Ann Burgess Attachments: TIKA-1578.abburgess.150319.patch.txt [~gostep] explains that, there are three versions of NetCDF (classic format, 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF file, the netCDF library will transparently detect its format so we do not need to adjust according to the detected format. That said, it would be good to know the file type as each can have the .nc extension. This will add patch with add file type to the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1578) Add file type description to HDFParsers
[ https://issues.apache.org/jira/browse/TIKA-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1578: -- Attachment: TIKA-1578.abburgess.150319.patch.txt File type added to HDFParser Add file type description to HDFParsers --- Key: TIKA-1578 URL: https://issues.apache.org/jira/browse/TIKA-1578 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Assignee: Ann Burgess Attachments: TIKA-1578.abburgess.150319.patch.txt [~gostep] explains that, there are three versions of NetCDF (classic format, 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF file, the netCDF library will transparently detect its format so we do not need to adjust according to the detected format. That said, it would be good to know the file type as each can have the .nc extension. This will add patch with add file type to the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1579) Add file type to NetCDFParser
[ https://issues.apache.org/jira/browse/TIKA-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370137#comment-14370137 ] Ann Burgess commented on TIKA-1579: --- https://reviews.apache.org/r/32260/ Add file type to NetCDFParser - Key: TIKA-1579 URL: https://issues.apache.org/jira/browse/TIKA-1579 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Assignee: Ann Burgess [~gostep] explains that, there are three versions of NetCDF (classic format, 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF file, the netCDF library will transparently detect its format so we do not need to adjust according to the detected format. That said, it would be good to know the file type as each can have the .nc extension. This will add patch with add file type to the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1579) Add file type to NetCDFParser
[ https://issues.apache.org/jira/browse/TIKA-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1579: -- Attachment: TIKA-1579.abburgess.190315.patch.txt Add file type to NetCDFParser - Key: TIKA-1579 URL: https://issues.apache.org/jira/browse/TIKA-1579 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Assignee: Ann Burgess Attachments: TIKA-1579.abburgess.190315.patch.txt [~gostep] explains that, there are three versions of NetCDF (classic format, 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF file, the netCDF library will transparently detect its format so we do not need to adjust according to the detected format. That said, it would be good to know the file type as each can have the .nc extension. This will add patch with add file type to the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1577) NetCDF Data Extraction
[ https://issues.apache.org/jira/browse/TIKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370177#comment-14370177 ] Ann Burgess commented on TIKA-1577: --- [~riverma] this is a good place to start: http://www.unidata.ucar.edu/software/netcdf/old_docs/really_old/guide_toc.html NetCDF Data Extraction -- Key: TIKA-1577 URL: https://issues.apache.org/jira/browse/TIKA-1577 Project: Tika Issue Type: Improvement Components: handler, parser Affects Versions: 1.7 Reporter: Ann Burgess Assignee: Ann Burgess Labels: features, handler Original Estimate: 504h Remaining Estimate: 504h A netCDF classic or 64-bit offset dataset is stored as a single file comprising two parts: - a header, containing all the information about dimensions, attributes, and variables except for the variable data; - a data part, comprising fixed-size data, containing the data for variables that don't have an unlimited dimension; and variable-size data, containing the data for variables that have an unlimited dimension. The NetCDFparser currently extracts the header part. -- text extracts file Dimensions and Variables -- metadata extracts Global Attributes We want the option to extract the data part of NetCDF files. Lets use the NetCDF test file for our dev testing: tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1578) Add file type description to HDFParsers
Ann Burgess created TIKA-1578: - Summary: Add file type description to HDFParsers Key: TIKA-1578 URL: https://issues.apache.org/jira/browse/TIKA-1578 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Assignee: Ann Burgess [~gostep] explains that, there are three versions of NetCDF (classic format, 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF file, the netCDF library will transparently detect its format so we do not need to adjust according to the detected format. That said, it would be good to know the file type as each can have the .nc extension. This will add patch with add file type to the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1579) Add file type to NetCDFParser
Ann Burgess created TIKA-1579: - Summary: Add file type to NetCDFParser Key: TIKA-1579 URL: https://issues.apache.org/jira/browse/TIKA-1579 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Assignee: Ann Burgess [~gostep] explains that, there are three versions of NetCDF (classic format, 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF file, the netCDF library will transparently detect its format so we do not need to adjust according to the detected format. That said, it would be good to know the file type as each can have the .nc extension. This will add patch with add file type to the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1577) NetCDF Data Extraction
[ https://issues.apache.org/jira/browse/TIKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1577: -- Description: A netCDF classic or 64-bit offset dataset is stored as a single file comprising two parts: - a header, containing all the information about dimensions, attributes, and variables except for the variable data; - a data part, comprising fixed-size data, containing the data for variables that don't have an unlimited dimension; and variable-size data, containing the data for variables that have an unlimited dimension. The NetCDFparser currently extracts the header part. -- text extracts file Dimensions and Variables -- metadata extracts Global Attributes We want the option to extract the data part of NetCDF files. Lets use the NetCDF test file for our dev testing: tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc was: We want the option to extract data associated with each NetCDF variable. For our development testing, lets use the NetCDF: tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc NetCDF Data Extraction -- Key: TIKA-1577 URL: https://issues.apache.org/jira/browse/TIKA-1577 Project: Tika Issue Type: Improvement Components: handler, parser Affects Versions: 1.7 Reporter: Ann Burgess Assignee: Ann Burgess Labels: features, handler Original Estimate: 504h Remaining Estimate: 504h A netCDF classic or 64-bit offset dataset is stored as a single file comprising two parts: - a header, containing all the information about dimensions, attributes, and variables except for the variable data; - a data part, comprising fixed-size data, containing the data for variables that don't have an unlimited dimension; and variable-size data, containing the data for variables that have an unlimited dimension. The NetCDFparser currently extracts the header part. -- text extracts file Dimensions and Variables -- metadata extracts Global Attributes We want the option to extract the data part of NetCDF files. Lets use the NetCDF test file for our dev testing: tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1577) NetCDF Data Extraction
Ann Burgess created TIKA-1577: - Summary: NetCDF Data Extraction Key: TIKA-1577 URL: https://issues.apache.org/jira/browse/TIKA-1577 Project: Tika Issue Type: Improvement Components: handler, parser Affects Versions: 1.7 Reporter: Ann Burgess Assignee: Ann Burgess We want the option to extract data associated with each NetCDF variable. For our development testing, lets use the NetCDF: tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1423) Build a parser to extract data from GRIB formats
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1423: -- Attachment: GribParser.java Build a parser to extract data from GRIB formats Key: TIKA-1423 URL: https://issues.apache.org/jira/browse/TIKA-1423 Project: Tika Issue Type: New Feature Components: metadata, mime, parser Affects Versions: 1.6 Reporter: Vineet Ghatge Priority: Critical Labels: features, newbie Fix For: 1.7 Attachments: GribParser.java, gdas1.forecmwf.2014062612.grib2 Arctic dataset contains a MIME format called GRIB - General Regularlydistributed information in Binary form http://en.wikipedia.org/wiki/GRIB . GRIB is a well known data format which is a concise data format used in meteorology to store historical and weather data. There are 2 different types of the format GRIB 0, GRIB 2. The focus will be on GRIB 2 which is the most prevalent. Each GRIB record intended for either transmission or storage contains a single parameter with values located at an array of grid points, or represented as a set of spectral coefficients, for a single level (or layer), encoded as a continuous bit stream. Logical divisions of the record are designated as sections, each of which provides control information and/or data. A GRIB record consists of six sections, two of which are optional: (0) Indicator Section (1) Product Definition Section (PDS) (2) Grid Description Section (GDS) optional (3) Bit Map Section (BMS) optional (4) Binary Data Section (BDS) (5) '' (ASCII Characters) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1423) Build a parser to extract data from GRIB formats
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1423: -- Attachment: gdas1.forecmwf.2014062612.grib2 Build a parser to extract data from GRIB formats Key: TIKA-1423 URL: https://issues.apache.org/jira/browse/TIKA-1423 Project: Tika Issue Type: New Feature Components: metadata, mime, parser Affects Versions: 1.6 Reporter: Vineet Ghatge Priority: Critical Labels: features, newbie Fix For: 1.7 Attachments: GribParser.java, gdas1.forecmwf.2014062612.grib2 Arctic dataset contains a MIME format called GRIB - General Regularlydistributed information in Binary form http://en.wikipedia.org/wiki/GRIB . GRIB is a well known data format which is a concise data format used in meteorology to store historical and weather data. There are 2 different types of the format GRIB 0, GRIB 2. The focus will be on GRIB 2 which is the most prevalent. Each GRIB record intended for either transmission or storage contains a single parameter with values located at an array of grid points, or represented as a set of spectral coefficients, for a single level (or layer), encoded as a continuous bit stream. Logical divisions of the record are designated as sections, each of which provides control information and/or data. A GRIB record consists of six sections, two of which are optional: (0) Indicator Section (1) Product Definition Section (PDS) (2) Grid Description Section (GDS) optional (3) Bit Map Section (BMS) optional (4) Binary Data Section (BDS) (5) '' (ASCII Characters) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1287) Update NetCDF .jar file on Maven Central
[ https://issues.apache.org/jira/browse/TIKA-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079926#comment-14079926 ] Ann Burgess commented on TIKA-1287: --- I am picking this issue back up as I've just finished writing a parser for GRIB files and GRIB support was not added to netcdf-java until version 4.3+. As stated above, Central currently hosts 4.2-min. I've just been granted deployer rights on Sonotype to stage an upload of netcdf-4.3+ to Maven Central: https://issues.sonatype.org/browse/CENTRALSRV-82. I've made the bundle jar from the most recent stable release from Unidata at: https://artifacts.unidata.ucar.edu/content/repositories/unidata-releases/edu/ucar/netcdf/4.3.22/. I will create a separate JIRA for the new GRIB parser, including updating the Tika .pom with the updated netcdf .jar file. Please let me know if you have any thoughts/insights about updating 3rd party jar files as this. Update NetCDF .jar file on Maven Central Key: TIKA-1287 URL: https://issues.apache.org/jira/browse/TIKA-1287 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.5 Reporter: Ann Burgess Labels: jar, maven, netcdf, tika, unit-test, update I am working to update the NetCDFParser file. When using the most-recent .jar file available from http://www.unidata.ucar.edu/ at the command line I receive a note about a depreciated API: javac -classpath ../../../../tika-core/target/tika-core-1.6-SNAPSHOT.jar:../../../../toolsUI-4.3.jar org/apache/tika/parser/netcdf/NetCDFParser.java Note: org/apache/tika/parser/netcdf/NetCDFParser.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. After updating the NetCDFParser file with non-deprecated methods (e.x. changing dimension.getName() to dimension.getFullName()) however, I get failed unit tests in maven, which I assume is because the Maven Central Repo has the lapsed version of the .jar file needed for NetCDF files ( http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22edu.ucar%22%20AND%20a%3A%22netcdf%22) . Can anyone provide insight into how I get the updated .jar file into the Maven Central Repository? Is there an alternative method to update Tika so I can run my unit tests in Maven? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1363) .mat files not parsing
[ https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061323#comment-14061323 ] Ann Burgess commented on TIKA-1363: --- Just pulled most recent tika and I'm still not getting text from the Matlab parser: $ svn co http://svn.apache.org/repos/asf/tika/trunk tika $ mvn install $ java -jar tika-app/target/tika-app-1.6-SNAPSHOT.jar --text /Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/test_mat_text.mat $ It does seem like the mime-type is recognized: $ java -classpath annie-parsers.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --detect /Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat $ application/x-matlab-data Tyler, did you integrate the patch and get -t and -m output? Want to make sure I'm not missing a step. On Mon, Jul 14, 2014 at 1:16 PM, Chris A. Mattmann (JIRA) j...@apache.org -- -- Ann Bryant Burgess, PhD Postdoctoral Fellow Computer Science Department University of Southern California Viterbi School of Engineering Los Angeles, CA Alaska Science Center/USGS Anchorage, AK Cell: (585) 738-7549 Office: (907) 786-7059 Fax: (907) 786-7150 E-mail: anniebryant.burg...@gmail.com Office Address: 4210 University Dr., Anchorage, AK 99508-4626 --- .mat files not parsing -- Key: TIKA-1363 URL: https://issues.apache.org/jira/browse/TIKA-1363 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.6 Reporter: Ann Burgess Labels: metadata, parser, snapshot Attachments: test_data_1.mat We recently committed a parser for Matlab .mat files, however I've just downloaded the most recent Tika and am not getting any parsed --text or --metadata for the .mat file used in the unit test. The steps I've used are below. Am I missing something at the command line? Can anyone else successfully get a text or metadata output for a .mat file? Steps: svn co https://svn.apache.org/repos/asf/tika/trunk tika setenv MAVEN_OPTS -Xms128m -Xmx256m cd tika mvn install java -jar tika-app/target/tika-app-1.6-SNAPSHOT.jar --text /Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TIKA-1363) .mat files not parsing
[ https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1363: -- Attachment: test_data_1.mat .mat files not parsing -- Key: TIKA-1363 URL: https://issues.apache.org/jira/browse/TIKA-1363 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.6 Reporter: Ann Burgess Labels: metadata, parser, snapshot Attachments: test_data_1.mat We recently committed a parser for Matlab .mat files, however I've just downloaded the most recent Tika and am not getting any parsed --text or --metadata for the .mat file used in the unit test. The steps I've used are below. Am I missing something at the command line? Can anyone else successfully get a text or metadata output for a .mat file? Steps: svn co https://svn.apache.org/repos/asf/tika/trunk tika setenv MAVEN_OPTS -Xms128m -Xmx256m cd tika mvn install java -jar tika-app/target/tika-app-1.6-SNAPSHOT.jar --text /Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1363) .mat files not parsing
[ https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056476#comment-14056476 ] Ann Burgess commented on TIKA-1363: --- Hi Tyler, Attached is a very simple .mat file. Annie .mat files not parsing -- Key: TIKA-1363 URL: https://issues.apache.org/jira/browse/TIKA-1363 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.6 Reporter: Ann Burgess Labels: metadata, parser, snapshot Attachments: test_data_1.mat We recently committed a parser for Matlab .mat files, however I've just downloaded the most recent Tika and am not getting any parsed --text or --metadata for the .mat file used in the unit test. The steps I've used are below. Am I missing something at the command line? Can anyone else successfully get a text or metadata output for a .mat file? Steps: svn co https://svn.apache.org/repos/asf/tika/trunk tika setenv MAVEN_OPTS -Xms128m -Xmx256m cd tika mvn install java -jar tika-app/target/tika-app-1.6-SNAPSHOT.jar --text /Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1363) .mat files not parsing
[ https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056560#comment-14056560 ] Ann Burgess commented on TIKA-1363: --- That is it. Very simple, so the text output should be just as you said, double: [2x2 double array] On Wed, Jul 9, 2014 at 9:49 AM, Tyler Palsulich (JIRA) j...@apache.org -- -- Ann Bryant Burgess, PhD Postdoctoral Fellow Computer Science Department University of Southern California Viterbi School of Engineering Los Angeles, CA Alaska Science Center/USGS Anchorage, AK Cell: (585) 738-7549 Office: (907) 786-7059 Fax: (907) 786-7150 E-mail: anniebryant.burg...@gmail.com Office Address: 4210 University Dr., Anchorage, AK 99508-4626 --- .mat files not parsing -- Key: TIKA-1363 URL: https://issues.apache.org/jira/browse/TIKA-1363 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.6 Reporter: Ann Burgess Labels: metadata, parser, snapshot Attachments: test_data_1.mat We recently committed a parser for Matlab .mat files, however I've just downloaded the most recent Tika and am not getting any parsed --text or --metadata for the .mat file used in the unit test. The steps I've used are below. Am I missing something at the command line? Can anyone else successfully get a text or metadata output for a .mat file? Steps: svn co https://svn.apache.org/repos/asf/tika/trunk tika setenv MAVEN_OPTS -Xms128m -Xmx256m cd tika mvn install java -jar tika-app/target/tika-app-1.6-SNAPSHOT.jar --text /Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TIKA-1357) Buffered text in EnviHeaderParser
[ https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1357: -- Attachment: TIKA-1357.aburgess.140630.patch.txt Patch to add line by line p tags to ENVI header output. Buffered text in EnviHeaderParser - Key: TIKA-1357 URL: https://issues.apache.org/jira/browse/TIKA-1357 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.6 Reporter: Ann Burgess Priority: Minor Labels: parser Attachments: TIKA-1357.aburgess.140630.patch.txt User BufferedReader to insert line by line p tags when parsing ENVI headers per reviewer comment: https://reviews.apache.org/r/22892/#comment81964 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TIKA-1357) Buffered text in EnviHeaderParser
[ https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047796#comment-14047796 ] Ann Burgess edited comment on TIKA-1357 at 6/30/14 4:16 PM: Patch to add line by line p tags to ENVI header output. Unit test remains a success with the added tags. was (Author: annieburgess): Patch to add line by line p tags to ENVI header output. Buffered text in EnviHeaderParser - Key: TIKA-1357 URL: https://issues.apache.org/jira/browse/TIKA-1357 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.6 Reporter: Ann Burgess Priority: Minor Labels: parser Attachments: TIKA-1357.aburgess.140630.patch.txt User BufferedReader to insert line by line p tags when parsing ENVI headers per reviewer comment: https://reviews.apache.org/r/22892/#comment81964 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1357) Buffered text in EnviHeaderParser
[ https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046627#comment-14046627 ] Ann Burgess commented on TIKA-1357: --- That bit of code certainly works Tyler, new -x output reads: head meta name=Content-Length content=818/ meta name=Content-Encoding content=ISO-8859-1/ meta name=Content-Type content=application/envi.hdr/ meta name=resourceName content=envi_test_header.hdr/ title/ /head bodypENVI/p pdescription = {/p p GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}/p psamples = 2400/p plines = 2400/p pbands = 7/p pheader offset = 0/p pfile type = ENVI Standard/p pdata type = 2/p pinterleave = bip/p psensor type = Unknown/p pbyte order = 0/p pmap info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 4.6331271653e+02, 4.6331271653e+02, , units=Meters}/p pprojection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, units=Meters}/p pcoordinate system string = {PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]}/p pwavelength units = Unknown/p Is this what you were aiming for Nick? If so, I'll create patch. On Fri, Jun 27, 2014 at 8:56 AM, Tyler Palsulich (JIRA) j...@apache.org -- -- Ann Bryant Burgess, PhD Postdoctoral Fellow Computer Science Department University of Southern California Viterbi School of Engineering Los Angeles, CA Alaska Science Center/USGS Anchorage, AK Cell: (585) 738-7549 Office: (907) 786-7059 Fax: (907) 786-7150 E-mail: anniebryant.burg...@gmail.com Office Address: 4210 University Dr., Anchorage, AK 99508-4626 --- Buffered text in EnviHeaderParser - Key: TIKA-1357 URL: https://issues.apache.org/jira/browse/TIKA-1357 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.6 Reporter: Ann Burgess Priority: Minor Labels: parser User BufferedReader to insert line by line p tags when parsing ENVI headers per reviewer comment: https://reviews.apache.org/r/22892/#comment81964 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TIKA-1357) Buffered text in EnviHeaderParser
Ann Burgess created TIKA-1357: - Summary: Buffered text in EnviHeaderParser Key: TIKA-1357 URL: https://issues.apache.org/jira/browse/TIKA-1357 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.6 Reporter: Ann Burgess Priority: Minor User BufferedReader to insert line by line p tags when parsing ENVI headers per reviewer comment: https://reviews.apache.org/r/22892/#comment81964 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1327) New parser for Matlab .mat files
[ https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026821#comment-14026821 ] Ann Burgess commented on TIKA-1327: --- .mat unit test file too large for JIRA, file is attached on Reviewboard here: https://reviews.apache.org/media/uploaded/files/2014/06/10/43092452-6890-42cc-8254-fcbb1c8e07c6__breidamerkurjokull_radar_profiles_2009.mat New parser for Matlab .mat files Key: TIKA-1327 URL: https://issues.apache.org/jira/browse/TIKA-1327 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.5 Reporter: Ann Burgess Assignee: Chris A. Mattmann Labels: parser New parser for Matlab .mat files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1327) New parser for Matlab .mat files
[ https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020129#comment-14020129 ] Ann Burgess commented on TIKA-1327: --- Code posted on Review Board at: https://reviews.apache.org/r/22246/ New parser for Matlab .mat files Key: TIKA-1327 URL: https://issues.apache.org/jira/browse/TIKA-1327 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.5 Reporter: Ann Burgess Labels: parser New parser for Matlab .mat files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1287) Update NetCDF .jar file on Maven Central
[ https://issues.apache.org/jira/browse/TIKA-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992944#comment-13992944 ] Ann Burgess commented on TIKA-1287: --- Message from John Caron at Unidata: Hi Annie: We find it difficult to keep maven central updated, and are maintaining our our maven server here: https://artifacts.unidata.ucar.edu/content/repositories/unidata-releases/edu/ucar/ is that sufficient for your project? John Update NetCDF .jar file on Maven Central Key: TIKA-1287 URL: https://issues.apache.org/jira/browse/TIKA-1287 Project: Tika Issue Type: Improvement Affects Versions: 1.5 Reporter: Ann Burgess Labels: jar, maven, netcdf, tika, unit-test, update I am working to update the NetCDFParser file. When using the most-recent .jar file available from http://www.unidata.ucar.edu/ at the command line I receive a note about a depreciated API: javac -classpath ../../../../tika-core/target/tika-core-1.6-SNAPSHOT.jar:../../../../toolsUI-4.3.jar org/apache/tika/parser/netcdf/NetCDFParser.java Note: org/apache/tika/parser/netcdf/NetCDFParser.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. After updating the NetCDFParser file with non-deprecated methods (e.x. changing dimension.getName() to dimension.getFullName()) however, I get failed unit tests in maven, which I assume is because the Maven Central Repo has the lapsed version of the .jar file needed for NetCDF files ( http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22edu.ucar%22%20AND%20a%3A%22netcdf%22) . Can anyone provide insight into how I get the updated .jar file into the Maven Central Repository? Is there an alternative method to update Tika so I can run my unit tests in Maven? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1287) Update NetCDF .jar file on Maven Central
[ https://issues.apache.org/jira/browse/TIKA-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991117#comment-13991117 ] Ann Burgess commented on TIKA-1287: --- I have modified my updates to the NetCDFparser to work with the current version of NetCDF on Maven Central. Once the new NetCDF version is on Maven Central, I will update the code accordingly. I have created a NetCDFParserPatch.patch file for review, but will start a new JIRA issue for that. Update NetCDF .jar file on Maven Central Key: TIKA-1287 URL: https://issues.apache.org/jira/browse/TIKA-1287 Project: Tika Issue Type: Improvement Affects Versions: 1.5 Reporter: Ann Burgess Labels: jar, maven, netcdf, tika, unit-test, update I am working to update the NetCDFParser file. When using the most-recent .jar file available from http://www.unidata.ucar.edu/ at the command line I receive a note about a depreciated API: javac -classpath ../../../../tika-core/target/tika-core-1.6-SNAPSHOT.jar:../../../../toolsUI-4.3.jar org/apache/tika/parser/netcdf/NetCDFParser.java Note: org/apache/tika/parser/netcdf/NetCDFParser.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. After updating the NetCDFParser file with non-deprecated methods (e.x. changing dimension.getName() to dimension.getFullName()) however, I get failed unit tests in maven, which I assume is because the Maven Central Repo has the lapsed version of the .jar file needed for NetCDF files ( http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22edu.ucar%22%20AND%20a%3A%22netcdf%22) . Can anyone provide insight into how I get the updated .jar file into the Maven Central Repository? Is there an alternative method to update Tika so I can run my unit tests in Maven? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TIKA-1265) Text parsing support for NetCDF
[ https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1265: -- Attachment: NetCDFParserPatch.patch Text parsing support for NetCDF --- Key: TIKA-1265 URL: https://issues.apache.org/jira/browse/TIKA-1265 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Labels: patch Attachments: NetCDFParserPatch.patch Original Estimate: 672h Remaining Estimate: 672h Currently Tika extracts -metadata information from NetCDF files. We are working on a patch that will enable -text extraction, thus providing the 'Dimension' and 'Variable' information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TIKA-1265) Text parsing support for NetCDF
[ https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1265: -- Attachment: NetCDFParserPatch.patch Text parsing support for NetCDF --- Key: TIKA-1265 URL: https://issues.apache.org/jira/browse/TIKA-1265 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Labels: patch Attachments: NetCDFParserPatch.patch Original Estimate: 672h Remaining Estimate: 672h Currently Tika extracts -metadata information from NetCDF files. We are working on a patch that will enable -text extraction, thus providing the 'Dimension' and 'Variable' information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TIKA-1265) [patch] Text output for NetCDF
[ https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1265: -- Summary: [patch] Text output for NetCDF (was: Text parsing support for NetCDF) [patch] Text output for NetCDF -- Key: TIKA-1265 URL: https://issues.apache.org/jira/browse/TIKA-1265 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Labels: patch Attachments: NetCDFParserPatch.patch Original Estimate: 672h Remaining Estimate: 672h Currently Tika extracts -metadata information from NetCDF files. We are working on a patch that will enable -text extraction, thus providing the 'Dimension' and 'Variable' information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TIKA-1265) [patch] Text output for NetCDF
[ https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1265: -- Summary: [patch] Text output for NetCDF (was: Text parsing support for NetCDF) [patch] Text output for NetCDF -- Key: TIKA-1265 URL: https://issues.apache.org/jira/browse/TIKA-1265 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Labels: patch Attachments: NetCDFParserPatch.patch Original Estimate: 672h Remaining Estimate: 672h Currently Tika extracts -metadata information from NetCDF files. We are working on a patch that will enable -text extraction, thus providing the 'Dimension' and 'Variable' information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1265) [patch] Text output for NetCDF
[ https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991162#comment-13991162 ] Ann Burgess commented on TIKA-1265: --- This patch updates the NetCDFParser to provide 'Dimension' and 'Variable' information as --text output for NetCDF files. Additionally, the patch updates NetCDFParserTest to test the new text output. To test the new parser and create the patch, I followed the steps at: https://wiki.apache.org/nutch/HowToContribute . Please let me know if I've missed any steps along the way in the process to get this committed. The .patch file is attached. [patch] Text output for NetCDF -- Key: TIKA-1265 URL: https://issues.apache.org/jira/browse/TIKA-1265 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Labels: patch Attachments: NetCDFParserPatch.patch Original Estimate: 672h Remaining Estimate: 672h Currently Tika extracts -metadata information from NetCDF files. We are working on a patch that will enable -text extraction, thus providing the 'Dimension' and 'Variable' information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TIKA-1287) Update NetCDF .jar file on Maven Central
Ann Burgess created TIKA-1287: - Summary: Update NetCDF .jar file on Maven Central Key: TIKA-1287 URL: https://issues.apache.org/jira/browse/TIKA-1287 Project: Tika Issue Type: Bug Affects Versions: 1.5 Reporter: Ann Burgess I am working to update the NetCDFParser file. When using the most-recent .jar file available from http://www.unidata.ucar.edu/ at the command line I receive a note about a depreciated API: javac -classpath ../../../../tika-core/target/tika-core-1.6-SNAPSHOT.jar:../../../../toolsUI-4.3.jar org/apache/tika/parser/netcdf/NetCDFParser.java Note: org/apache/tika/parser/netcdf/NetCDFParser.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. After updating the NetCDFParser file with non-deprecated methods (e.x. changing dimension.getName() to dimension.getFullName()) however, I get failed unit tests in maven, which I assume is because the Maven Central Repo has the lapsed version of the .jar file needed for NetCDF files ( http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22edu.ucar%22%20AND%20a%3A%22netcdf%22) . Can anyone provide insight into how I get the updated .jar file into the Maven Central Repository? Is there an alternative method to update Tika so I can run my unit tests in Maven? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (TIKA-1274) ENVI header parser
[ https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1274: -- Comment: was deleted (was: Hey Chris, How is your week looking? Want to set a time to do a chat? I'm actually home sick today, out with a nasty cold that started yesterday. Later in the week might work best, so I'm lucid. AB On Mon, Apr 21, 2014 at 1:39 PM, Chris A. Mattmann (JIRA) -- -- Ann Bryant Burgess, PhD Postdoctoral Fellow Computer Science Department University of Southern California Viterbi School of Engineering Los Angeles, CA Alaska Science Center/USGS Anchorage, AK Cell: (585) 738-7549 Office: (907) 786-7059 Fax: (907) 786-7150 E-mail: anniebryant.burg...@gmail.com Office Address: 4210 University Dr., Anchorage, AK 99508-4626 --- ) ENVI header parser -- Key: TIKA-1274 URL: https://issues.apache.org/jira/browse/TIKA-1274 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.5 Reporter: Ann Burgess Assignee: Chris A. Mattmann Labels: mime, newbie, parser, patch I have written a parser that extracts text and metadata from ENVI header files, currently called at the command line as: abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr Content-Encoding: ISO-8859-1 Content-Length: 818 Content-Type: application/envi.hdr resourceName: MOD09GA_test_header.hdr abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr ENVI description = { GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]} samples = 2400 lines = 2400 bands = 7 header offset = 0 file type = ENVI Standard data type = 2 interleave = bip sensor type = Unknown byte order = 0 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 4.6331271653e+02, 4.6331271653e+02, , units=Meters} projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, units=Meters} coordinate system string = {PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]} wavelength units = Unknown __ As a current non-certified committer, could someone enlighten me to the steps needed to submit this new parser for review. The parser is located in my directory structure as: /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class My custom mimetypes.xml file is located at: /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1274) ENVI header parser
[ https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983500#comment-13983500 ] Ann Burgess commented on TIKA-1274: --- I've got the EnviHeaderParser and EnviHeaderParserTest (unit test) files now on github: https://github.com/abburgess/ENVIJava I've run the unit test successfully in maven. If this looks good, I will create a patch for review. ENVI header parser -- Key: TIKA-1274 URL: https://issues.apache.org/jira/browse/TIKA-1274 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.5 Reporter: Ann Burgess Assignee: Chris A. Mattmann Labels: mime, newbie, parser, patch I have written a parser that extracts text and metadata from ENVI header files, currently called at the command line as: abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr Content-Encoding: ISO-8859-1 Content-Length: 818 Content-Type: application/envi.hdr resourceName: MOD09GA_test_header.hdr abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr ENVI description = { GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]} samples = 2400 lines = 2400 bands = 7 header offset = 0 file type = ENVI Standard data type = 2 interleave = bip sensor type = Unknown byte order = 0 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 4.6331271653e+02, 4.6331271653e+02, , units=Meters} projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, units=Meters} coordinate system string = {PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]} wavelength units = Unknown __ As a current non-certified committer, could someone enlighten me to the steps needed to submit this new parser for review. The parser is located in my directory structure as: /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class My custom mimetypes.xml file is located at: /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1274) ENVI header parser
[ https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983597#comment-13983597 ] Ann Burgess commented on TIKA-1274: --- Hi Nick, Thank you for the git repo tips. I added the 'target' directory and I was mimicking the directory structure of the tika build - consider it removed. On that note, I'd appreciate any documentation on the dos and don'ts of building a git repo for Tika or other Apache projects... if such documentation exists. As for the file contents, ENVI header fileshttp://www.exelisvis.com/docs/ENVIHeaderFiles.htmlare plain text documents. The contents of the ENVI header files are, in fact, metadata for a corresponding data file, i.e. to read a file named some_file.img, it requires the corresponding file some_file.img.hdr. In other words, because the entire contents of a some_file.img.hdr file is metadata for some_file.img, the actual contents of the some_file.img.hdr file do NOT describe the .hdr file itself, rather they describe the .img file. That is why I didn't think it appropriate to move parts of the 'raw content' into metadata. Does that make sense? I'm also very open to how this sort of thing is normally treated or to open a conversation about the topic of how to treat one file type describing another file type. Thanks for the input and any further suggestions. -- -- Ann Bryant Burgess, PhD Postdoctoral Fellow Computer Science Department University of Southern California Viterbi School of Engineering Los Angeles, CA Alaska Science Center/USGS Anchorage, AK Cell: (585) 738-7549 Office: (907) 786-7059 Fax: (907) 786-7150 E-mail: anniebryant.burg...@gmail.com Office Address: 4210 University Dr., Anchorage, AK 99508-4626 --- ENVI header parser -- Key: TIKA-1274 URL: https://issues.apache.org/jira/browse/TIKA-1274 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.5 Reporter: Ann Burgess Assignee: Chris A. Mattmann Labels: mime, newbie, parser, patch I have written a parser that extracts text and metadata from ENVI header files, currently called at the command line as: abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr Content-Encoding: ISO-8859-1 Content-Length: 818 Content-Type: application/envi.hdr resourceName: MOD09GA_test_header.hdr abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr ENVI description = { GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]} samples = 2400 lines = 2400 bands = 7 header offset = 0 file type = ENVI Standard data type = 2 interleave = bip sensor type = Unknown byte order = 0 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 4.6331271653e+02, 4.6331271653e+02, , units=Meters} projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, units=Meters} coordinate system string = {PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]} wavelength units = Unknown __ As a current non-certified committer, could someone enlighten me to the steps needed to submit this new parser for review. The parser is located in my directory structure as: /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class My custom mimetypes.xml file is located at: /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TIKA-1274) ENVI header parser
[ https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983597#comment-13983597 ] Ann Burgess edited comment on TIKA-1274 at 4/28/14 11:10 PM: - Hi Nick, Thank you for the git repo tips. I added the 'target' directory and I was mimicking the directory structure of the tika build - consider it removed. On that note, I'd appreciate any documentation on the dos and don'ts of building a git repo for Tika or other Apache projects... if such documentation exists. As for the file contents, ENVI header fileshttp://www.exelisvis.com/docs/ENVIHeaderFiles.htmlare plain text documents. The contents of the ENVI header files are, in fact, metadata for a corresponding data file, i.e. to read a file named some_file.img, it requires the corresponding file some_file.img.hdr. In other words, because the entire contents of a some_file.img.hdr file is metadata for some_file.img, the actual contents of the some_file.img.hdr file do NOT describe the .hdr file itself, rather they describe the .img file. That is why I didn't think it appropriate to move parts of the 'raw content' into metadata. Does that make sense? I'm also very open to how this sort of thing is normally treated or to open a conversation about the topic of how to treat one file type describing another file type. Thanks for the input and any further suggestions. was (Author: annieburgess): Hi Nick, Thank you for the git repo tips. I added the 'target' directory and I was mimicking the directory structure of the tika build - consider it removed. On that note, I'd appreciate any documentation on the dos and don'ts of building a git repo for Tika or other Apache projects... if such documentation exists. As for the file contents, ENVI header fileshttp://www.exelisvis.com/docs/ENVIHeaderFiles.htmlare plain text documents. The contents of the ENVI header files are, in fact, metadata for a corresponding data file, i.e. to read a file named some_file.img, it requires the corresponding file some_file.img.hdr. In other words, because the entire contents of a some_file.img.hdr file is metadata for some_file.img, the actual contents of the some_file.img.hdr file do NOT describe the .hdr file itself, rather they describe the .img file. That is why I didn't think it appropriate to move parts of the 'raw content' into metadata. Does that make sense? I'm also very open to how this sort of thing is normally treated or to open a conversation about the topic of how to treat one file type describing another file type. Thanks for the input and any further suggestions. -- -- Ann Bryant Burgess, PhD Postdoctoral Fellow Computer Science Department University of Southern California Viterbi School of Engineering Los Angeles, CA Alaska Science Center/USGS Anchorage, AK Cell: (585) 738-7549 Office: (907) 786-7059 Fax: (907) 786-7150 E-mail: anniebryant.burg...@gmail.com Office Address: 4210 University Dr., Anchorage, AK 99508-4626 --- ENVI header parser -- Key: TIKA-1274 URL: https://issues.apache.org/jira/browse/TIKA-1274 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.5 Reporter: Ann Burgess Assignee: Chris A. Mattmann Labels: mime, newbie, parser, patch I have written a parser that extracts text and metadata from ENVI header files, currently called at the command line as: abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr Content-Encoding: ISO-8859-1 Content-Length: 818 Content-Type: application/envi.hdr resourceName: MOD09GA_test_header.hdr abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr ENVI description = { GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]} samples = 2400 lines = 2400 bands = 7 header offset = 0 file type = ENVI Standard data type = 2 interleave = bip sensor type = Unknown byte order = 0 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 4.6331271653e+02, 4.6331271653e+02, , units=Meters} projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, units=Meters} coordinate system string =
[jira] [Commented] (TIKA-1274) ENVI header parser
[ https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976107#comment-13976107 ] Ann Burgess commented on TIKA-1274: --- Hey Chris, How is your week looking? Want to set a time to do a chat? I'm actually home sick today, out with a nasty cold that started yesterday. Later in the week might work best, so I'm lucid. AB On Mon, Apr 21, 2014 at 1:39 PM, Chris A. Mattmann (JIRA) -- -- Ann Bryant Burgess, PhD Postdoctoral Fellow Computer Science Department University of Southern California Viterbi School of Engineering Los Angeles, CA Alaska Science Center/USGS Anchorage, AK Cell: (585) 738-7549 Office: (907) 786-7059 Fax: (907) 786-7150 E-mail: anniebryant.burg...@gmail.com Office Address: 4210 University Dr., Anchorage, AK 99508-4626 --- ENVI header parser -- Key: TIKA-1274 URL: https://issues.apache.org/jira/browse/TIKA-1274 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.5 Reporter: Ann Burgess Assignee: Chris A. Mattmann Labels: mime, newbie, parser, patch I have written a parser that extracts text and metadata from ENVI header files, currently called at the command line as: abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr Content-Encoding: ISO-8859-1 Content-Length: 818 Content-Type: application/envi.hdr resourceName: MOD09GA_test_header.hdr abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr ENVI description = { GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]} samples = 2400 lines = 2400 bands = 7 header offset = 0 file type = ENVI Standard data type = 2 interleave = bip sensor type = Unknown byte order = 0 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 4.6331271653e+02, 4.6331271653e+02, , units=Meters} projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, units=Meters} coordinate system string = {PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]} wavelength units = Unknown __ As a current non-certified committer, could someone enlighten me to the steps needed to submit this new parser for review. The parser is located in my directory structure as: /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class My custom mimetypes.xml file is located at: /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TIKA-1274) ENVI header parser
Ann Burgess created TIKA-1274: - Summary: ENVI header parser Key: TIKA-1274 URL: https://issues.apache.org/jira/browse/TIKA-1274 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.5 Reporter: Ann Burgess I have written a parser that extracts text and metadata from ENVI header files, currently called at the command line as: abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr Content-Encoding: ISO-8859-1 Content-Length: 818 Content-Type: application/envi.hdr resourceName: MOD09GA_test_header.hdr abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr ENVI description = { GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]} samples = 2400 lines = 2400 bands = 7 header offset = 0 file type = ENVI Standard data type = 2 interleave = bip sensor type = Unknown byte order = 0 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 4.6331271653e+02, 4.6331271653e+02, , units=Meters} projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, units=Meters} coordinate system string = {PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]} wavelength units = Unknown __ As a current non-certified committer, could someone enlighten me to the steps needed to submit this new parser for review. The parser is located in my directory structure as: /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class My custom mimetypes.xml file is located at: /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TIKA-1265) Text parsing support for NetCDF
Ann Burgess created TIKA-1265: - Summary: Text parsing support for NetCDF Key: TIKA-1265 URL: https://issues.apache.org/jira/browse/TIKA-1265 Project: Tika Issue Type: Improvement Components: parser Reporter: Ann Burgess Currently Tika extracts -metadata information from NetCDF files. We are working on a patch that will enable -text extraction, thus providing the 'Dimension' and 'Variable' information. -- This message was sent by Atlassian JIRA (v6.2#6252)