[jira] [Commented] (TIKA-1577) NetCDF Data Extraction

2015-05-08 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534847#comment-14534847
 ] 

Ann Burgess commented on TIKA-1577:
---

Take it away [~riverma]! 

 NetCDF Data Extraction
 --

 Key: TIKA-1577
 URL: https://issues.apache.org/jira/browse/TIKA-1577
 Project: Tika
  Issue Type: Improvement
  Components: handler, parser
Affects Versions: 1.7
Reporter: Ann Burgess
Assignee: Ann Burgess
  Labels: features, handler
 Fix For: 1.9

   Original Estimate: 504h
  Remaining Estimate: 504h

 A netCDF classic or 64-bit offset dataset is stored as a single file 
 comprising two parts:
  - a header, containing all the information about dimensions, attributes, and 
 variables except for the variable data;
  - a data part, comprising fixed-size data, containing the data for variables 
 that don't have an unlimited dimension; and variable-size data, containing 
 the data for variables that have an unlimited dimension.
 The NetCDFparser currently extracts the header part.  
  -- text extracts file Dimensions and Variables
  -- metadata extracts Global Attributes
 We want the option to extract the data part of NetCDF files.  
 Lets use the NetCDF test file for our dev testing:  
 tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1579) Add file type to NetCDFParser

2015-03-28 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385383#comment-14385383
 ] 

Ann Burgess commented on TIKA-1579:
---

Yes!

On Sat, Mar 28, 2015 at 6:09 AM, Tyler Palsulich (JIRA) j...@apache.org




-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
Viterbi School of Engineering
University of Southern California

Phone:  (585) 738-7549
--


 Add file type to NetCDFParser
 -

 Key: TIKA-1579
 URL: https://issues.apache.org/jira/browse/TIKA-1579
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
Assignee: Ann Burgess
 Attachments: TIKA-1579.abburgess.190315.patch.txt


 [~gostep] explains that, there are three versions of NetCDF (classic format, 
 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF 
 file, the netCDF library will transparently detect its format so we do not 
 need to adjust according to the detected format.
 That said, it would be good to know the file type as each can have the .nc 
 extension.  This will add patch with add file type to the metadata.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1577) NetCDF Data Extraction

2015-03-27 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384323#comment-14384323
 ] 

Ann Burgess commented on TIKA-1577:
---

This is a great idea.  I'm all for not re-creating code if it already exists in 
good form!

 NetCDF Data Extraction
 --

 Key: TIKA-1577
 URL: https://issues.apache.org/jira/browse/TIKA-1577
 Project: Tika
  Issue Type: Improvement
  Components: handler, parser
Affects Versions: 1.7
Reporter: Ann Burgess
Assignee: Ann Burgess
  Labels: features, handler
 Fix For: 1.8

   Original Estimate: 504h
  Remaining Estimate: 504h

 A netCDF classic or 64-bit offset dataset is stored as a single file 
 comprising two parts:
  - a header, containing all the information about dimensions, attributes, and 
 variables except for the variable data;
  - a data part, comprising fixed-size data, containing the data for variables 
 that don't have an unlimited dimension; and variable-size data, containing 
 the data for variables that have an unlimited dimension.
 The NetCDFparser currently extracts the header part.  
  -- text extracts file Dimensions and Variables
  -- metadata extracts Global Attributes
 We want the option to extract the data part of NetCDF files.  
 Lets use the NetCDF test file for our dev testing:  
 tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1578) Add file type description to HDFParsers

2015-03-19 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369982#comment-14369982
 ] 

Ann Burgess commented on TIKA-1578:
---

https://reviews.apache.org/r/32255/

 Add file type description to HDFParsers
 ---

 Key: TIKA-1578
 URL: https://issues.apache.org/jira/browse/TIKA-1578
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
Assignee: Ann Burgess
 Attachments: TIKA-1578.abburgess.150319.patch.txt


 [~gostep] explains that, there are three versions of NetCDF (classic format, 
 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF 
 file, the netCDF library will transparently detect its format so we do not 
 need to adjust according to the detected format. 
 That said, it would be good to know the file type as each can have the .nc 
 extension.  This will add patch with add file type to the metadata. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1578) Add file type description to HDFParsers

2015-03-19 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1578:
--
Attachment: TIKA-1578.abburgess.150319.patch.txt

File type added to HDFParser

 Add file type description to HDFParsers
 ---

 Key: TIKA-1578
 URL: https://issues.apache.org/jira/browse/TIKA-1578
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
Assignee: Ann Burgess
 Attachments: TIKA-1578.abburgess.150319.patch.txt


 [~gostep] explains that, there are three versions of NetCDF (classic format, 
 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF 
 file, the netCDF library will transparently detect its format so we do not 
 need to adjust according to the detected format. 
 That said, it would be good to know the file type as each can have the .nc 
 extension.  This will add patch with add file type to the metadata. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1579) Add file type to NetCDFParser

2015-03-19 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370137#comment-14370137
 ] 

Ann Burgess commented on TIKA-1579:
---

https://reviews.apache.org/r/32260/

 Add file type to NetCDFParser
 -

 Key: TIKA-1579
 URL: https://issues.apache.org/jira/browse/TIKA-1579
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
Assignee: Ann Burgess

 [~gostep] explains that, there are three versions of NetCDF (classic format, 
 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF 
 file, the netCDF library will transparently detect its format so we do not 
 need to adjust according to the detected format.
 That said, it would be good to know the file type as each can have the .nc 
 extension.  This will add patch with add file type to the metadata.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1579) Add file type to NetCDFParser

2015-03-19 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1579:
--
Attachment: TIKA-1579.abburgess.190315.patch.txt

 Add file type to NetCDFParser
 -

 Key: TIKA-1579
 URL: https://issues.apache.org/jira/browse/TIKA-1579
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
Assignee: Ann Burgess
 Attachments: TIKA-1579.abburgess.190315.patch.txt


 [~gostep] explains that, there are three versions of NetCDF (classic format, 
 64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF 
 file, the netCDF library will transparently detect its format so we do not 
 need to adjust according to the detected format.
 That said, it would be good to know the file type as each can have the .nc 
 extension.  This will add patch with add file type to the metadata.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1577) NetCDF Data Extraction

2015-03-19 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370177#comment-14370177
 ] 

Ann Burgess commented on TIKA-1577:
---

[~riverma] this is a good place to start: 
http://www.unidata.ucar.edu/software/netcdf/old_docs/really_old/guide_toc.html

 NetCDF Data Extraction
 --

 Key: TIKA-1577
 URL: https://issues.apache.org/jira/browse/TIKA-1577
 Project: Tika
  Issue Type: Improvement
  Components: handler, parser
Affects Versions: 1.7
Reporter: Ann Burgess
Assignee: Ann Burgess
  Labels: features, handler
   Original Estimate: 504h
  Remaining Estimate: 504h

 A netCDF classic or 64-bit offset dataset is stored as a single file 
 comprising two parts:
  - a header, containing all the information about dimensions, attributes, and 
 variables except for the variable data;
  - a data part, comprising fixed-size data, containing the data for variables 
 that don't have an unlimited dimension; and variable-size data, containing 
 the data for variables that have an unlimited dimension.
 The NetCDFparser currently extracts the header part.  
  -- text extracts file Dimensions and Variables
  -- metadata extracts Global Attributes
 We want the option to extract the data part of NetCDF files.  
 Lets use the NetCDF test file for our dev testing:  
 tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1578) Add file type description to HDFParsers

2015-03-18 Thread Ann Burgess (JIRA)
Ann Burgess created TIKA-1578:
-

 Summary: Add file type description to HDFParsers
 Key: TIKA-1578
 URL: https://issues.apache.org/jira/browse/TIKA-1578
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
Assignee: Ann Burgess


[~gostep] explains that, there are three versions of NetCDF (classic format, 
64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF file, 
the netCDF library will transparently detect its format so we do not need to 
adjust according to the detected format. 

That said, it would be good to know the file type as each can have the .nc 
extension.  This will add patch with add file type to the metadata. 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1579) Add file type to NetCDFParser

2015-03-18 Thread Ann Burgess (JIRA)
Ann Burgess created TIKA-1579:
-

 Summary: Add file type to NetCDFParser
 Key: TIKA-1579
 URL: https://issues.apache.org/jira/browse/TIKA-1579
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
Assignee: Ann Burgess


[~gostep] explains that, there are three versions of NetCDF (classic format, 
64-bit offset, and netCDF-4/HDF5 format). When opening an existing netCDF file, 
the netCDF library will transparently detect its format so we do not need to 
adjust according to the detected format.

That said, it would be good to know the file type as each can have the .nc 
extension.  This will add patch with add file type to the metadata.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1577) NetCDF Data Extraction

2015-03-17 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1577:
--
Description: 
A netCDF classic or 64-bit offset dataset is stored as a single file comprising 
two parts:
 - a header, containing all the information about dimensions, attributes, and 
variables except for the variable data;
 - a data part, comprising fixed-size data, containing the data for variables 
that don't have an unlimited dimension; and variable-size data, containing the 
data for variables that have an unlimited dimension.

The NetCDFparser currently extracts the header part.  
 -- text extracts file Dimensions and Variables
 -- metadata extracts Global Attributes

We want the option to extract the data part of NetCDF files.  
Lets use the NetCDF test file for our dev testing:  
tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc


 



  was:
We want the option to extract data associated with each NetCDF variable.  

For our development testing, lets use the NetCDF:  
tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc


 




 NetCDF Data Extraction
 --

 Key: TIKA-1577
 URL: https://issues.apache.org/jira/browse/TIKA-1577
 Project: Tika
  Issue Type: Improvement
  Components: handler, parser
Affects Versions: 1.7
Reporter: Ann Burgess
Assignee: Ann Burgess
  Labels: features, handler
   Original Estimate: 504h
  Remaining Estimate: 504h

 A netCDF classic or 64-bit offset dataset is stored as a single file 
 comprising two parts:
  - a header, containing all the information about dimensions, attributes, and 
 variables except for the variable data;
  - a data part, comprising fixed-size data, containing the data for variables 
 that don't have an unlimited dimension; and variable-size data, containing 
 the data for variables that have an unlimited dimension.
 The NetCDFparser currently extracts the header part.  
  -- text extracts file Dimensions and Variables
  -- metadata extracts Global Attributes
 We want the option to extract the data part of NetCDF files.  
 Lets use the NetCDF test file for our dev testing:  
 tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1577) NetCDF Data Extraction

2015-03-17 Thread Ann Burgess (JIRA)
Ann Burgess created TIKA-1577:
-

 Summary: NetCDF Data Extraction
 Key: TIKA-1577
 URL: https://issues.apache.org/jira/browse/TIKA-1577
 Project: Tika
  Issue Type: Improvement
  Components: handler, parser
Affects Versions: 1.7
Reporter: Ann Burgess
Assignee: Ann Burgess


We want the option to extract data associated with each NetCDF variable.  

For our development testing, lets use the NetCDF:  
tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_21.nc


 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1423) Build a parser to extract data from GRIB formats

2014-09-22 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1423:
--
Attachment: GribParser.java

 Build a parser to extract data from GRIB formats
 

 Key: TIKA-1423
 URL: https://issues.apache.org/jira/browse/TIKA-1423
 Project: Tika
  Issue Type: New Feature
  Components: metadata, mime, parser
Affects Versions: 1.6
Reporter: Vineet Ghatge
Priority: Critical
  Labels: features, newbie
 Fix For: 1.7

 Attachments: GribParser.java, gdas1.forecmwf.2014062612.grib2


 Arctic dataset contains a MIME format called GRIB -  General 
 Regularly­distributed information in Binary form 
 http://en.wikipedia.org/wiki/GRIB . GRIB is a well known data format which is 
 a concise data format used in meteorology to store historical and 
 weather data. There are 2 different types of the format ­ GRIB 0, GRIB 2.  
 The focus will be on GRIB 2 which is the most prevalent. Each GRIB record 
 intended for either transmission or storage contains a single parameter with 
 values located at an array of grid points, or represented as a set of 
 spectral coefficients, for a single level (or layer), encoded as a continuous 
 bit stream. Logical divisions of the record are designated as sections, 
 each of which provides control information and/or data. A GRIB record 
 consists of six sections, two of which are optional: 
  
 (0) Indicator Section 
 (1) Product Definition Section (PDS) 
 (2) Grid Description Section (GDS) ­ optional 
 (3) Bit Map Section (BMS) ­ optional 
 (4) Binary Data Section (BDS) 
 (5) '' (ASCII Characters)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1423) Build a parser to extract data from GRIB formats

2014-09-22 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1423:
--
Attachment: gdas1.forecmwf.2014062612.grib2

 Build a parser to extract data from GRIB formats
 

 Key: TIKA-1423
 URL: https://issues.apache.org/jira/browse/TIKA-1423
 Project: Tika
  Issue Type: New Feature
  Components: metadata, mime, parser
Affects Versions: 1.6
Reporter: Vineet Ghatge
Priority: Critical
  Labels: features, newbie
 Fix For: 1.7

 Attachments: GribParser.java, gdas1.forecmwf.2014062612.grib2


 Arctic dataset contains a MIME format called GRIB -  General 
 Regularly­distributed information in Binary form 
 http://en.wikipedia.org/wiki/GRIB . GRIB is a well known data format which is 
 a concise data format used in meteorology to store historical and 
 weather data. There are 2 different types of the format ­ GRIB 0, GRIB 2.  
 The focus will be on GRIB 2 which is the most prevalent. Each GRIB record 
 intended for either transmission or storage contains a single parameter with 
 values located at an array of grid points, or represented as a set of 
 spectral coefficients, for a single level (or layer), encoded as a continuous 
 bit stream. Logical divisions of the record are designated as sections, 
 each of which provides control information and/or data. A GRIB record 
 consists of six sections, two of which are optional: 
  
 (0) Indicator Section 
 (1) Product Definition Section (PDS) 
 (2) Grid Description Section (GDS) ­ optional 
 (3) Bit Map Section (BMS) ­ optional 
 (4) Binary Data Section (BDS) 
 (5) '' (ASCII Characters)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1287) Update NetCDF .jar file on Maven Central

2014-07-30 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079926#comment-14079926
 ] 

Ann Burgess commented on TIKA-1287:
---

I am picking this issue back up as I've just finished writing a parser for GRIB 
files and GRIB support was not added to netcdf-java until version 4.3+.  As 
stated above, Central currently hosts 4.2-min. 

I've just been granted deployer rights on Sonotype to stage an upload of 
netcdf-4.3+ to Maven Central: https://issues.sonatype.org/browse/CENTRALSRV-82.

I've made the bundle jar from the most recent stable release from Unidata at: 
https://artifacts.unidata.ucar.edu/content/repositories/unidata-releases/edu/ucar/netcdf/4.3.22/.
  

I will create a separate JIRA for the new GRIB parser, including updating the 
Tika .pom with the updated netcdf .jar file.   Please let me know if you have 
any thoughts/insights about updating 3rd party jar files as this.  


 Update NetCDF .jar file on Maven Central
 

 Key: TIKA-1287
 URL: https://issues.apache.org/jira/browse/TIKA-1287
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.5
Reporter: Ann Burgess
  Labels: jar, maven, netcdf, tika, unit-test, update

 I am working to update the NetCDFParser file.  When using the most-recent 
 .jar file available from http://www.unidata.ucar.edu/ at the command line I 
 receive a note about a depreciated API: 
 javac -classpath 
 ../../../../tika-core/target/tika-core-1.6-SNAPSHOT.jar:../../../../toolsUI-4.3.jar
  org/apache/tika/parser/netcdf/NetCDFParser.java
 Note: org/apache/tika/parser/netcdf/NetCDFParser.java uses or overrides a 
 deprecated API.
 Note: Recompile with -Xlint:deprecation for details.
 After updating the NetCDFParser file with non-deprecated methods (e.x. 
 changing dimension.getName() to dimension.getFullName()) however, I get 
 failed unit tests in maven, which I assume is because the Maven Central Repo 
 has the lapsed version of the .jar file needed for NetCDF files (
 http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22edu.ucar%22%20AND%20a%3A%22netcdf%22)
  .
 Can anyone provide insight into how I get the updated .jar file into the 
 Maven Central Repository? Is there an alternative method to update Tika so I 
 can run my unit tests in Maven?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1363) .mat files not parsing

2014-07-14 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061323#comment-14061323
 ] 

Ann Burgess commented on TIKA-1363:
---

Just pulled most recent tika and I'm still not getting text from the Matlab
parser:
$ svn co http://svn.apache.org/repos/asf/tika/trunk tika
$ mvn install
$ java -jar tika-app/target/tika-app-1.6-SNAPSHOT.jar --text
/Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/test_mat_text.mat
$

It does seem like the mime-type is recognized:

$ java -classpath
annie-parsers.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar
org.apache.tika.cli.TikaCLI --detect
/Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat
$ application/x-matlab-data

Tyler, did you integrate the patch and get -t and -m output?  Want to make
sure I'm not missing a step.


On Mon, Jul 14, 2014 at 1:16 PM, Chris A. Mattmann (JIRA) j...@apache.org




-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---


 .mat files not parsing
 --

 Key: TIKA-1363
 URL: https://issues.apache.org/jira/browse/TIKA-1363
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.6
Reporter: Ann Burgess
  Labels: metadata, parser, snapshot
 Attachments: test_data_1.mat


 We recently committed a parser for Matlab .mat files, however I've just 
 downloaded the most recent Tika and am not getting any parsed --text or 
 --metadata for the .mat file used in the unit test.  The steps I've used are 
 below.  Am I missing something at the command line?  Can anyone else 
 successfully get a text or metadata output for a .mat file?
 Steps: 
 svn co https://svn.apache.org/repos/asf/tika/trunk tika
 setenv MAVEN_OPTS -Xms128m -Xmx256m
 cd tika
 mvn install
 java -jar tika-app/target/tika-app-1.6-SNAPSHOT.jar --text 
 /Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-1363) .mat files not parsing

2014-07-09 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1363:
--

Attachment: test_data_1.mat

 .mat files not parsing
 --

 Key: TIKA-1363
 URL: https://issues.apache.org/jira/browse/TIKA-1363
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.6
Reporter: Ann Burgess
  Labels: metadata, parser, snapshot
 Attachments: test_data_1.mat


 We recently committed a parser for Matlab .mat files, however I've just 
 downloaded the most recent Tika and am not getting any parsed --text or 
 --metadata for the .mat file used in the unit test.  The steps I've used are 
 below.  Am I missing something at the command line?  Can anyone else 
 successfully get a text or metadata output for a .mat file?
 Steps: 
 svn co https://svn.apache.org/repos/asf/tika/trunk tika
 setenv MAVEN_OPTS -Xms128m -Xmx256m
 cd tika
 mvn install
 java -jar tika-app/target/tika-app-1.6-SNAPSHOT.jar --text 
 /Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1363) .mat files not parsing

2014-07-09 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056476#comment-14056476
 ] 

Ann Burgess commented on TIKA-1363:
---

Hi Tyler, 

Attached is a very simple .mat file.  

Annie

 .mat files not parsing
 --

 Key: TIKA-1363
 URL: https://issues.apache.org/jira/browse/TIKA-1363
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.6
Reporter: Ann Burgess
  Labels: metadata, parser, snapshot
 Attachments: test_data_1.mat


 We recently committed a parser for Matlab .mat files, however I've just 
 downloaded the most recent Tika and am not getting any parsed --text or 
 --metadata for the .mat file used in the unit test.  The steps I've used are 
 below.  Am I missing something at the command line?  Can anyone else 
 successfully get a text or metadata output for a .mat file?
 Steps: 
 svn co https://svn.apache.org/repos/asf/tika/trunk tika
 setenv MAVEN_OPTS -Xms128m -Xmx256m
 cd tika
 mvn install
 java -jar tika-app/target/tika-app-1.6-SNAPSHOT.jar --text 
 /Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1363) .mat files not parsing

2014-07-09 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056560#comment-14056560
 ] 

Ann Burgess commented on TIKA-1363:
---

That is it. Very simple, so the text output should be just as you said,
double: [2x2 double array]


On Wed, Jul 9, 2014 at 9:49 AM, Tyler Palsulich (JIRA) j...@apache.org




-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---


 .mat files not parsing
 --

 Key: TIKA-1363
 URL: https://issues.apache.org/jira/browse/TIKA-1363
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.6
Reporter: Ann Burgess
  Labels: metadata, parser, snapshot
 Attachments: test_data_1.mat


 We recently committed a parser for Matlab .mat files, however I've just 
 downloaded the most recent Tika and am not getting any parsed --text or 
 --metadata for the .mat file used in the unit test.  The steps I've used are 
 below.  Am I missing something at the command line?  Can anyone else 
 successfully get a text or metadata output for a .mat file?
 Steps: 
 svn co https://svn.apache.org/repos/asf/tika/trunk tika
 setenv MAVEN_OPTS -Xms128m -Xmx256m
 cd tika
 mvn install
 java -jar tika-app/target/tika-app-1.6-SNAPSHOT.jar --text 
 /Users/IGSWAHWSWBURGESS/Development/tika/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-1357) Buffered text in EnviHeaderParser

2014-06-30 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1357:
--

Attachment: TIKA-1357.aburgess.140630.patch.txt

Patch to add line by line p tags to ENVI header output. 

 Buffered text in EnviHeaderParser
 -

 Key: TIKA-1357
 URL: https://issues.apache.org/jira/browse/TIKA-1357
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.6
Reporter: Ann Burgess
Priority: Minor
  Labels: parser
 Attachments: TIKA-1357.aburgess.140630.patch.txt


 User BufferedReader to insert line by line p tags when parsing ENVI headers 
 per reviewer comment: https://reviews.apache.org/r/22892/#comment81964



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TIKA-1357) Buffered text in EnviHeaderParser

2014-06-30 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047796#comment-14047796
 ] 

Ann Burgess edited comment on TIKA-1357 at 6/30/14 4:16 PM:


Patch to add line by line p tags to ENVI header output.   Unit test remains a 
success with the added tags. 


was (Author: annieburgess):
Patch to add line by line p tags to ENVI header output. 

 Buffered text in EnviHeaderParser
 -

 Key: TIKA-1357
 URL: https://issues.apache.org/jira/browse/TIKA-1357
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.6
Reporter: Ann Burgess
Priority: Minor
  Labels: parser
 Attachments: TIKA-1357.aburgess.140630.patch.txt


 User BufferedReader to insert line by line p tags when parsing ENVI headers 
 per reviewer comment: https://reviews.apache.org/r/22892/#comment81964



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1357) Buffered text in EnviHeaderParser

2014-06-27 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046627#comment-14046627
 ] 

Ann Burgess commented on TIKA-1357:
---

That bit of code certainly works Tyler, new -x output reads:

head
meta name=Content-Length content=818/
meta name=Content-Encoding content=ISO-8859-1/
meta name=Content-Type content=application/envi.hdr/
meta name=resourceName content=envi_test_header.hdr/
title/
/head
bodypENVI/p
pdescription = {/p
p  GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}/p
psamples = 2400/p
plines   = 2400/p
pbands   = 7/p
pheader offset = 0/p
pfile type = ENVI Standard/p
pdata type = 2/p
pinterleave = bip/p
psensor type = Unknown/p
pbyte order = 0/p
pmap info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856,
4.6331271653e+02, 4.6331271653e+02, , units=Meters}/p
pprojection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal,
units=Meters}/p
pcoordinate system string =
{PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]}/p
pwavelength units = Unknown/p


Is this what you were aiming for Nick?  If so, I'll create patch.




On Fri, Jun 27, 2014 at 8:56 AM, Tyler Palsulich (JIRA) j...@apache.org




-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---


 Buffered text in EnviHeaderParser
 -

 Key: TIKA-1357
 URL: https://issues.apache.org/jira/browse/TIKA-1357
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.6
Reporter: Ann Burgess
Priority: Minor
  Labels: parser

 User BufferedReader to insert line by line p tags when parsing ENVI headers 
 per reviewer comment: https://reviews.apache.org/r/22892/#comment81964



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TIKA-1357) Buffered text in EnviHeaderParser

2014-06-25 Thread Ann Burgess (JIRA)
Ann Burgess created TIKA-1357:
-

 Summary: Buffered text in EnviHeaderParser
 Key: TIKA-1357
 URL: https://issues.apache.org/jira/browse/TIKA-1357
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.6
Reporter: Ann Burgess
Priority: Minor


User BufferedReader to insert line by line p tags when parsing ENVI headers per 
reviewer comment: https://reviews.apache.org/r/22892/#comment81964



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1327) New parser for Matlab .mat files

2014-06-10 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026821#comment-14026821
 ] 

Ann Burgess commented on TIKA-1327:
---

.mat unit test file too large for JIRA, file is attached on Reviewboard here: 
https://reviews.apache.org/media/uploaded/files/2014/06/10/43092452-6890-42cc-8254-fcbb1c8e07c6__breidamerkurjokull_radar_profiles_2009.mat

 New parser for Matlab .mat files
 

 Key: TIKA-1327
 URL: https://issues.apache.org/jira/browse/TIKA-1327
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.5
Reporter: Ann Burgess
Assignee: Chris A. Mattmann
  Labels: parser

 New parser for Matlab .mat files. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1327) New parser for Matlab .mat files

2014-06-06 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020129#comment-14020129
 ] 

Ann Burgess commented on TIKA-1327:
---

Code posted on Review Board at: https://reviews.apache.org/r/22246/

 New parser for Matlab .mat files
 

 Key: TIKA-1327
 URL: https://issues.apache.org/jira/browse/TIKA-1327
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.5
Reporter: Ann Burgess
  Labels: parser

 New parser for Matlab .mat files. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1287) Update NetCDF .jar file on Maven Central

2014-05-11 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992944#comment-13992944
 ] 

Ann Burgess commented on TIKA-1287:
---

Message from John Caron at Unidata: 

Hi Annie:
We find it difficult to keep maven central updated, and are maintaining our our 
maven server here:
https://artifacts.unidata.ucar.edu/content/repositories/unidata-releases/edu/ucar/
is that sufficient for your project?

John

 Update NetCDF .jar file on Maven Central
 

 Key: TIKA-1287
 URL: https://issues.apache.org/jira/browse/TIKA-1287
 Project: Tika
  Issue Type: Improvement
Affects Versions: 1.5
Reporter: Ann Burgess
  Labels: jar, maven, netcdf, tika, unit-test, update

 I am working to update the NetCDFParser file.  When using the most-recent 
 .jar file available from http://www.unidata.ucar.edu/ at the command line I 
 receive a note about a depreciated API: 
 javac -classpath 
 ../../../../tika-core/target/tika-core-1.6-SNAPSHOT.jar:../../../../toolsUI-4.3.jar
  org/apache/tika/parser/netcdf/NetCDFParser.java
 Note: org/apache/tika/parser/netcdf/NetCDFParser.java uses or overrides a 
 deprecated API.
 Note: Recompile with -Xlint:deprecation for details.
 After updating the NetCDFParser file with non-deprecated methods (e.x. 
 changing dimension.getName() to dimension.getFullName()) however, I get 
 failed unit tests in maven, which I assume is because the Maven Central Repo 
 has the lapsed version of the .jar file needed for NetCDF files (
 http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22edu.ucar%22%20AND%20a%3A%22netcdf%22)
  .
 Can anyone provide insight into how I get the updated .jar file into the 
 Maven Central Repository? Is there an alternative method to update Tika so I 
 can run my unit tests in Maven?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1287) Update NetCDF .jar file on Maven Central

2014-05-06 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991117#comment-13991117
 ] 

Ann Burgess commented on TIKA-1287:
---

I have modified my updates to the NetCDFparser to work with the current version 
of NetCDF on Maven Central.  Once the new NetCDF version is on Maven Central, I 
will update the code accordingly.  I have created a NetCDFParserPatch.patch 
file for review, but will start a new JIRA issue for that. 

 Update NetCDF .jar file on Maven Central
 

 Key: TIKA-1287
 URL: https://issues.apache.org/jira/browse/TIKA-1287
 Project: Tika
  Issue Type: Improvement
Affects Versions: 1.5
Reporter: Ann Burgess
  Labels: jar, maven, netcdf, tika, unit-test, update

 I am working to update the NetCDFParser file.  When using the most-recent 
 .jar file available from http://www.unidata.ucar.edu/ at the command line I 
 receive a note about a depreciated API: 
 javac -classpath 
 ../../../../tika-core/target/tika-core-1.6-SNAPSHOT.jar:../../../../toolsUI-4.3.jar
  org/apache/tika/parser/netcdf/NetCDFParser.java
 Note: org/apache/tika/parser/netcdf/NetCDFParser.java uses or overrides a 
 deprecated API.
 Note: Recompile with -Xlint:deprecation for details.
 After updating the NetCDFParser file with non-deprecated methods (e.x. 
 changing dimension.getName() to dimension.getFullName()) however, I get 
 failed unit tests in maven, which I assume is because the Maven Central Repo 
 has the lapsed version of the .jar file needed for NetCDF files (
 http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22edu.ucar%22%20AND%20a%3A%22netcdf%22)
  .
 Can anyone provide insight into how I get the updated .jar file into the 
 Maven Central Repository? Is there an alternative method to update Tika so I 
 can run my unit tests in Maven?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-1265) Text parsing support for NetCDF

2014-05-06 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1265:
--

Attachment: NetCDFParserPatch.patch

 Text parsing support for NetCDF
 ---

 Key: TIKA-1265
 URL: https://issues.apache.org/jira/browse/TIKA-1265
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
  Labels: patch
 Attachments: NetCDFParserPatch.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 Currently Tika extracts -metadata information from NetCDF files. We are 
 working on a patch that will enable -text extraction, thus providing the 
 'Dimension' and 'Variable' information.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-1265) Text parsing support for NetCDF

2014-05-06 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1265:
--

Attachment: NetCDFParserPatch.patch

 Text parsing support for NetCDF
 ---

 Key: TIKA-1265
 URL: https://issues.apache.org/jira/browse/TIKA-1265
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
  Labels: patch
 Attachments: NetCDFParserPatch.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 Currently Tika extracts -metadata information from NetCDF files. We are 
 working on a patch that will enable -text extraction, thus providing the 
 'Dimension' and 'Variable' information.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-1265) [patch] Text output for NetCDF

2014-05-06 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1265:
--

Summary: [patch] Text output for NetCDF  (was: Text parsing support for 
NetCDF)

 [patch] Text output for NetCDF
 --

 Key: TIKA-1265
 URL: https://issues.apache.org/jira/browse/TIKA-1265
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
  Labels: patch
 Attachments: NetCDFParserPatch.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 Currently Tika extracts -metadata information from NetCDF files. We are 
 working on a patch that will enable -text extraction, thus providing the 
 'Dimension' and 'Variable' information.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-1265) [patch] Text output for NetCDF

2014-05-06 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1265:
--

Summary: [patch] Text output for NetCDF  (was: Text parsing support for 
NetCDF)

 [patch] Text output for NetCDF
 --

 Key: TIKA-1265
 URL: https://issues.apache.org/jira/browse/TIKA-1265
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
  Labels: patch
 Attachments: NetCDFParserPatch.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 Currently Tika extracts -metadata information from NetCDF files. We are 
 working on a patch that will enable -text extraction, thus providing the 
 'Dimension' and 'Variable' information.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1265) [patch] Text output for NetCDF

2014-05-06 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991162#comment-13991162
 ] 

Ann Burgess commented on TIKA-1265:
---

This patch updates the NetCDFParser to provide 'Dimension' and 'Variable' 
information as --text output for NetCDF files.  Additionally, the patch updates 
NetCDFParserTest to test the new text output.

To test the new parser and create the patch, I followed the steps at: 
https://wiki.apache.org/nutch/HowToContribute .

Please let me know if I've missed any steps along the way in the process to get 
this committed. 

The .patch file is attached. 



 [patch] Text output for NetCDF
 --

 Key: TIKA-1265
 URL: https://issues.apache.org/jira/browse/TIKA-1265
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess
  Labels: patch
 Attachments: NetCDFParserPatch.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 Currently Tika extracts -metadata information from NetCDF files. We are 
 working on a patch that will enable -text extraction, thus providing the 
 'Dimension' and 'Variable' information.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TIKA-1287) Update NetCDF .jar file on Maven Central

2014-05-01 Thread Ann Burgess (JIRA)
Ann Burgess created TIKA-1287:
-

 Summary: Update NetCDF .jar file on Maven Central
 Key: TIKA-1287
 URL: https://issues.apache.org/jira/browse/TIKA-1287
 Project: Tika
  Issue Type: Bug
Affects Versions: 1.5
Reporter: Ann Burgess


I am working to update the NetCDFParser file.  When using the most-recent .jar 
file available from http://www.unidata.ucar.edu/ at the command line I receive 
a note about a depreciated API: 

javac -classpath 
../../../../tika-core/target/tika-core-1.6-SNAPSHOT.jar:../../../../toolsUI-4.3.jar
 org/apache/tika/parser/netcdf/NetCDFParser.java

Note: org/apache/tika/parser/netcdf/NetCDFParser.java uses or overrides a 
deprecated API.
Note: Recompile with -Xlint:deprecation for details.

After updating the NetCDFParser file with non-deprecated methods (e.x. changing 
dimension.getName() to dimension.getFullName()) however, I get failed unit 
tests in maven, which I assume is because the Maven Central Repo has the lapsed 
version of the .jar file needed for NetCDF files (
http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22edu.ucar%22%20AND%20a%3A%22netcdf%22)
 .

Can anyone provide insight into how I get the updated .jar file into the Maven 
Central Repository? Is there an alternative method to update Tika so I can run 
my unit tests in Maven?





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (TIKA-1274) ENVI header parser

2014-04-28 Thread Ann Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ann Burgess updated TIKA-1274:
--

Comment: was deleted

(was: Hey Chris,
How is your week looking? Want to set a time to do a chat?

I'm actually home sick today, out with a nasty cold that started yesterday.
 Later in the week might work best, so I'm lucid.
AB


On Mon, Apr 21, 2014 at 1:39 PM, Chris A. Mattmann (JIRA)




-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---
)

 ENVI header parser
 --

 Key: TIKA-1274
 URL: https://issues.apache.org/jira/browse/TIKA-1274
 Project: Tika
  Issue Type: New Feature
  Components: parser
Affects Versions: 1.5
Reporter: Ann Burgess
Assignee: Chris A. Mattmann
  Labels: mime, newbie, parser, patch

 I have written a parser that extracts text and metadata from ENVI header 
 files, currently called at the command line as: 
 abryant:tika abryant$ java -classpath 
 annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
 org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
Content-Encoding: ISO-8859-1
Content-Length: 818
Content-Type: application/envi.hdr
resourceName: MOD09GA_test_header.hdr
 abryant:tika abryant$ java -classpath 
 annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
 org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
 ENVI
 description = {
   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
 samples = 2400
 lines   = 2400
 bands   = 7
 header offset = 0
 file type = ENVI Standard
 data type = 2
 interleave = bip
 sensor type = Unknown
 byte order = 0
 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
 projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
 units=Meters}
 coordinate system string = 
 {PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]}
 wavelength units = Unknown
 __
 As a current non-certified committer, could someone enlighten me to the steps 
 needed to submit this new parser for review.  
 The parser is located in my directory structure as: 
 /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
 My custom mimetypes.xml file is located at: 
 /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-28 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983500#comment-13983500
 ] 

Ann Burgess commented on TIKA-1274:
---

I've got the EnviHeaderParser and EnviHeaderParserTest (unit test) files now on 
github: https://github.com/abburgess/ENVIJava

I've run the unit test successfully in maven. If this looks good, I will create 
a patch for review.

 ENVI header parser
 --

 Key: TIKA-1274
 URL: https://issues.apache.org/jira/browse/TIKA-1274
 Project: Tika
  Issue Type: New Feature
  Components: parser
Affects Versions: 1.5
Reporter: Ann Burgess
Assignee: Chris A. Mattmann
  Labels: mime, newbie, parser, patch

 I have written a parser that extracts text and metadata from ENVI header 
 files, currently called at the command line as: 
 abryant:tika abryant$ java -classpath 
 annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
 org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
Content-Encoding: ISO-8859-1
Content-Length: 818
Content-Type: application/envi.hdr
resourceName: MOD09GA_test_header.hdr
 abryant:tika abryant$ java -classpath 
 annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
 org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
 ENVI
 description = {
   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
 samples = 2400
 lines   = 2400
 bands   = 7
 header offset = 0
 file type = ENVI Standard
 data type = 2
 interleave = bip
 sensor type = Unknown
 byte order = 0
 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
 projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
 units=Meters}
 coordinate system string = 
 {PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]}
 wavelength units = Unknown
 __
 As a current non-certified committer, could someone enlighten me to the steps 
 needed to submit this new parser for review.  
 The parser is located in my directory structure as: 
 /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
 My custom mimetypes.xml file is located at: 
 /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-28 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983597#comment-13983597
 ] 

Ann Burgess commented on TIKA-1274:
---

Hi Nick,

Thank you for the git repo tips.  I added the 'target' directory and I was
mimicking the directory structure of the tika build - consider it removed.
On that note, I'd appreciate any documentation on the dos and don'ts of
building a git repo for Tika or other Apache projects... if such
documentation exists.

As for the file contents, ENVI header
fileshttp://www.exelisvis.com/docs/ENVIHeaderFiles.htmlare plain
text documents. The contents of the ENVI header files are, in
fact, metadata for a corresponding data file, i.e. to read a file named
some_file.img, it requires the corresponding file some_file.img.hdr.  In
other words, because the entire contents of a some_file.img.hdr file
is metadata for some_file.img, the actual contents of the some_file.img.hdr
file do NOT describe the .hdr file itself, rather they describe the .img
file.  That is why I didn't think it appropriate to move parts of the 'raw
content' into metadata.  Does that make sense?  I'm also very open to how
this sort of thing is normally treated or to open a conversation about the
topic of how to treat one file type describing another file type.

Thanks for the input and any further suggestions.








-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---


 ENVI header parser
 --

 Key: TIKA-1274
 URL: https://issues.apache.org/jira/browse/TIKA-1274
 Project: Tika
  Issue Type: New Feature
  Components: parser
Affects Versions: 1.5
Reporter: Ann Burgess
Assignee: Chris A. Mattmann
  Labels: mime, newbie, parser, patch

 I have written a parser that extracts text and metadata from ENVI header 
 files, currently called at the command line as: 
 abryant:tika abryant$ java -classpath 
 annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
 org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
Content-Encoding: ISO-8859-1
Content-Length: 818
Content-Type: application/envi.hdr
resourceName: MOD09GA_test_header.hdr
 abryant:tika abryant$ java -classpath 
 annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
 org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
 ENVI
 description = {
   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
 samples = 2400
 lines   = 2400
 bands   = 7
 header offset = 0
 file type = ENVI Standard
 data type = 2
 interleave = bip
 sensor type = Unknown
 byte order = 0
 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
 projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
 units=Meters}
 coordinate system string = 
 {PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]}
 wavelength units = Unknown
 __
 As a current non-certified committer, could someone enlighten me to the steps 
 needed to submit this new parser for review.  
 The parser is located in my directory structure as: 
 /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
 My custom mimetypes.xml file is located at: 
 /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TIKA-1274) ENVI header parser

2014-04-28 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983597#comment-13983597
 ] 

Ann Burgess edited comment on TIKA-1274 at 4/28/14 11:10 PM:
-

Hi Nick,

Thank you for the git repo tips.  I added the 'target' directory and I was
mimicking the directory structure of the tika build - consider it removed.
On that note, I'd appreciate any documentation on the dos and don'ts of
building a git repo for Tika or other Apache projects... if such
documentation exists.

As for the file contents, ENVI header
fileshttp://www.exelisvis.com/docs/ENVIHeaderFiles.htmlare plain
text documents. The contents of the ENVI header files are, in
fact, metadata for a corresponding data file, i.e. to read a file named
some_file.img, it requires the corresponding file some_file.img.hdr.  In
other words, because the entire contents of a some_file.img.hdr file
is metadata for some_file.img, the actual contents of the some_file.img.hdr
file do NOT describe the .hdr file itself, rather they describe the .img
file.  That is why I didn't think it appropriate to move parts of the 'raw
content' into metadata.  Does that make sense?  I'm also very open to how
this sort of thing is normally treated or to open a conversation about the
topic of how to treat one file type describing another file type.

Thanks for the input and any further suggestions.



was (Author: annieburgess):
Hi Nick,

Thank you for the git repo tips.  I added the 'target' directory and I was
mimicking the directory structure of the tika build - consider it removed.
On that note, I'd appreciate any documentation on the dos and don'ts of
building a git repo for Tika or other Apache projects... if such
documentation exists.

As for the file contents, ENVI header
fileshttp://www.exelisvis.com/docs/ENVIHeaderFiles.htmlare plain
text documents. The contents of the ENVI header files are, in
fact, metadata for a corresponding data file, i.e. to read a file named
some_file.img, it requires the corresponding file some_file.img.hdr.  In
other words, because the entire contents of a some_file.img.hdr file
is metadata for some_file.img, the actual contents of the some_file.img.hdr
file do NOT describe the .hdr file itself, rather they describe the .img
file.  That is why I didn't think it appropriate to move parts of the 'raw
content' into metadata.  Does that make sense?  I'm also very open to how
this sort of thing is normally treated or to open a conversation about the
topic of how to treat one file type describing another file type.

Thanks for the input and any further suggestions.








-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---


 ENVI header parser
 --

 Key: TIKA-1274
 URL: https://issues.apache.org/jira/browse/TIKA-1274
 Project: Tika
  Issue Type: New Feature
  Components: parser
Affects Versions: 1.5
Reporter: Ann Burgess
Assignee: Chris A. Mattmann
  Labels: mime, newbie, parser, patch

 I have written a parser that extracts text and metadata from ENVI header 
 files, currently called at the command line as: 
 abryant:tika abryant$ java -classpath 
 annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
 org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
Content-Encoding: ISO-8859-1
Content-Length: 818
Content-Type: application/envi.hdr
resourceName: MOD09GA_test_header.hdr
 abryant:tika abryant$ java -classpath 
 annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
 org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
 ENVI
 description = {
   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
 samples = 2400
 lines   = 2400
 bands   = 7
 header offset = 0
 file type = ENVI Standard
 data type = 2
 interleave = bip
 sensor type = Unknown
 byte order = 0
 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
 projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
 units=Meters}
 coordinate system string = 
 

[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-21 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976107#comment-13976107
 ] 

Ann Burgess commented on TIKA-1274:
---

Hey Chris,
How is your week looking? Want to set a time to do a chat?

I'm actually home sick today, out with a nasty cold that started yesterday.
 Later in the week might work best, so I'm lucid.
AB


On Mon, Apr 21, 2014 at 1:39 PM, Chris A. Mattmann (JIRA)




-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---


 ENVI header parser
 --

 Key: TIKA-1274
 URL: https://issues.apache.org/jira/browse/TIKA-1274
 Project: Tika
  Issue Type: New Feature
  Components: parser
Affects Versions: 1.5
Reporter: Ann Burgess
Assignee: Chris A. Mattmann
  Labels: mime, newbie, parser, patch

 I have written a parser that extracts text and metadata from ENVI header 
 files, currently called at the command line as: 
 abryant:tika abryant$ java -classpath 
 annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
 org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
Content-Encoding: ISO-8859-1
Content-Length: 818
Content-Type: application/envi.hdr
resourceName: MOD09GA_test_header.hdr
 abryant:tika abryant$ java -classpath 
 annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
 org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
 ENVI
 description = {
   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
 samples = 2400
 lines   = 2400
 bands   = 7
 header offset = 0
 file type = ENVI Standard
 data type = 2
 interleave = bip
 sensor type = Unknown
 byte order = 0
 map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
 projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
 units=Meters}
 coordinate system string = 
 {PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]}
 wavelength units = Unknown
 __
 As a current non-certified committer, could someone enlighten me to the steps 
 needed to submit this new parser for review.  
 The parser is located in my directory structure as: 
 /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
 My custom mimetypes.xml file is located at: 
 /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TIKA-1274) ENVI header parser

2014-04-16 Thread Ann Burgess (JIRA)
Ann Burgess created TIKA-1274:
-

 Summary: ENVI header parser
 Key: TIKA-1274
 URL: https://issues.apache.org/jira/browse/TIKA-1274
 Project: Tika
  Issue Type: New Feature
  Components: parser
Affects Versions: 1.5
Reporter: Ann Burgess


I have written a parser that extracts text and metadata from ENVI header files, 
currently called at the command line as: 

abryant:tika abryant$ java -classpath 
annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr

   Content-Encoding: ISO-8859-1
   Content-Length: 818
   Content-Type: application/envi.hdr
   resourceName: MOD09GA_test_header.hdr

abryant:tika abryant$ java -classpath 
annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr

ENVI
description = {
  GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
samples = 2400
lines   = 2400
bands   = 7
header offset = 0
file type = ENVI Standard
data type = 2
interleave = bip
sensor type = Unknown
byte order = 0
map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
4.6331271653e+02, 4.6331271653e+02, , units=Meters}
projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, units=Meters}
coordinate system string = 
{PROJCS[Sinusoidal,GEOGCS[GCS_ELLIPSE_BASED_1,DATUM[D_ELLIPSE_BASED_1,SPHEROID[S_ELLIPSE_BASED_1,6371007.181,0.0]],PRIMEM[Greenwich,0.0],UNIT[Degree,0.0174532925199433]],PROJECTION[Sinusoidal],PARAMETER[False_Easting,0.0],PARAMETER[False_Northing,0.0],PARAMETER[Central_Meridian,0.0],UNIT[Meter,1.0]]}
wavelength units = Unknown

__

As a current non-certified committer, could someone enlighten me to the steps 
needed to submit this new parser for review.  

The parser is located in my directory structure as: 
/users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class

My custom mimetypes.xml file is located at: 
/Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TIKA-1265) Text parsing support for NetCDF

2014-03-25 Thread Ann Burgess (JIRA)
Ann Burgess created TIKA-1265:
-

 Summary: Text parsing support for NetCDF
 Key: TIKA-1265
 URL: https://issues.apache.org/jira/browse/TIKA-1265
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Ann Burgess


Currently Tika extracts -metadata information from NetCDF files. We are working 
on a patch that will enable -text extraction, thus providing the 'Dimension' 
and 'Variable' information.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)