[jira] [Commented] (TIKA-1274) ENVI header parser

2014-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043655#comment-14043655
 ] 

Hudson commented on TIKA-1274:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #66 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/66/])
- apply patch for TIKA-1274 ENVI Header parser contributed by Ann Burgess 
(mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1605434)
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java
* 
/tika/trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr


> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
> Fix For: 1.6
>
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043639#comment-14043639
 ] 

Hudson commented on TIKA-1274:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #67 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/67/])
- apply patch for TIKA-1274 ENVI Header parser contributed by Ann Burgess 
(mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1605434)
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java
* 
/tika/trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr


> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
> Fix For: 1.6
>
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-06-25 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043598#comment-14043598
 ] 

Chris A. Mattmann commented on TIKA-1274:
-

[~gagravarr] had a good comment from the review:

https://reviews.apache.org/r/22892/#comment81964

[~annieburgess] if you can work on the above and open a new issue I think we're 
good. Just ref this issue as the basis for it.

> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
> Fix For: 1.6
>
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-06-25 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043579#comment-14043579
 ] 

Chris A. Mattmann commented on TIKA-1274:
-

Annie your latest RB patch works great! https://reviews.apache.org/r/22892/

Tests pass:

{noformat}
Tests run: 519, Failures: 0, Errors: 0, Skipped: 2

[INFO] 
[INFO] --- maven-bundle-plugin:2.3.4:bundle (default-bundle) @ tika-parsers ---
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
tika-parsers ---
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:test-jar (default) @ tika-parsers ---
[INFO] Building jar: 
/Users/mattmann/src/tika-parsers/target/tika-parsers-1.6-SNAPSHOT-tests.jar
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ tika-parsers 
---
[INFO] Installing 
/Users/mattmann/src/tika-parsers/target/tika-parsers-1.6-SNAPSHOT.jar to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-parsers/1.6-SNAPSHOT/tika-parsers-1.6-SNAPSHOT.jar
[INFO] Installing /Users/mattmann/src/tika-parsers/pom.xml to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-parsers/1.6-SNAPSHOT/tika-parsers-1.6-SNAPSHOT.pom
[INFO] Installing 
/Users/mattmann/src/tika-parsers/target/tika-parsers-1.6-SNAPSHOT-tests.jar to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-parsers/1.6-SNAPSHOT/tika-parsers-1.6-SNAPSHOT-tests.jar
[INFO] 
[INFO] --- maven-bundle-plugin:2.3.4:install (default-install) @ tika-parsers 
---
[INFO] Installing 
org/apache/tika/tika-parsers/1.6-SNAPSHOT/tika-parsers-1.6-SNAPSHOT.jar
[INFO] Writing OBR metadata
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 1:57.841s
[INFO] Finished at: Wed Jun 25 10:59:04 EDT 2014
[INFO] Final Memory: 31M/135M
[INFO] 
[chipotle:~/src/tika-parsers] mattmann% 
{noformat}

Going to submit this now.

> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-06-12 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030037#comment-14030037
 ] 

Chris A. Mattmann commented on TIKA-1274:
-

Annie this makes sense to me. Can you whip up a patch for Apache Tika for this 
based on your Github account? Then send it up through Review Board and let's 
get this into the sources.

> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-28 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983597#comment-13983597
 ] 

Ann Burgess commented on TIKA-1274:
---

Hi Nick,

Thank you for the git repo tips.  I added the 'target' directory and I was
mimicking the directory structure of the tika build - consider it removed.
On that note, I'd appreciate any documentation on the dos and don'ts of
building a git repo for Tika or other Apache projects... if such
documentation exists.

As for the file contents, ENVI header
filesare plain
text documents. The contents of the ENVI header files are, in
fact, metadata for a corresponding data file, i.e. to read a file named
some_file.img, it requires the corresponding file some_file.img.hdr.  In
other words, because the entire contents of a some_file.img.hdr file
is metadata for some_file.img, the actual contents of the some_file.img.hdr
file do NOT describe the .hdr file itself, rather they describe the .img
file.  That is why I didn't think it appropriate to move parts of the 'raw
content' into metadata.  Does that make sense?  I'm also very open to how
this sort of thing is normally treated or to open a conversation about the
topic of how to treat one file type describing another file type.

Thanks for the input and any further suggestions.








-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---


> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-28 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983523#comment-13983523
 ] 

Nick Burch commented on TIKA-1274:
--

Few quick bits:
 * There's a few files in that git repo that wouldn't normally be there - eg 
.class files and a /target/ directory
 * You seem to have some inconsistent indenting going on - IIRC Tika uses 4 
spaces no tabs

Secondly, you seem to be outputting the raw contents of the file as the textual 
part, but not doing any parsing of any parts into the metadata. At first glance 
(and I'm not an ENVI file format expert here!), I would've expected things like 
"samples = 2400" to get mapped onto some sort of suitable metadata key/value 
pair

Are you able to dig out any documentation on the format of the ENVI header 
file? If so, we may be able to help suggest which bits of it may be best placed 
into the metadata object, and also what of that can use standard metadata keys 
+ which ones will need new metadata keys defining to be used

> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-28 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983500#comment-13983500
 ] 

Ann Burgess commented on TIKA-1274:
---

I've got the EnviHeaderParser and EnviHeaderParserTest (unit test) files now on 
github: https://github.com/abburgess/ENVIJava

I've run the unit test successfully in maven. If this looks good, I will create 
a patch for review.

> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-21 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976298#comment-13976298
 ] 

Nick Burch commented on TIKA-1274:
--

Give a shout on the dev list if you have git / github queries - Jukka is a 
veritable Git wizard, and for the simpler stuff I've been using Git(hub) for 
the Ogg parser + library stuff

(First tip - add a .gitignore so you can exclude your .class files from the 
repo!)

> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-21 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976107#comment-13976107
 ] 

Ann Burgess commented on TIKA-1274:
---

Hey Chris,
How is your week looking? Want to set a time to do a chat?

I'm actually home sick today, out with a nasty cold that started yesterday.
 Later in the week might work best, so I'm lucid.
AB


On Mon, Apr 21, 2014 at 1:39 PM, Chris A. Mattmann (JIRA)




-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---


> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-21 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976080#comment-13976080
 ] 

Chris A. Mattmann commented on TIKA-1274:
-

We will bring it back to the list as a pull request against the Tika repo, but 
right now am working with Annie to learn Github and some other 
bells/whistles...hopefully in the next week will have a cleaned up patch 
upstream here.

> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-21 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976079#comment-13976079
 ] 

Chris A. Mattmann commented on TIKA-1274:
-

Annie and I are working on a patch here:

https://github.com/abburgess/ENVIJava


> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-16 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972315#comment-13972315
 ] 

Chris A. Mattmann commented on TIKA-1274:
-

Thanks for attaching the ENVI parser Annie! Nick, great comments, perfect. 
Annie, if you need any help here I'd be happy to help commit the work of course 
crediting you along the way. Feel free to use Review Board too and to add a 
patch there (http://reviews.apache.org/) and select the Tika group. Also if you 
are so inclined you can use Github too and just submit a pull request (which in 
turn will submit an email message with a link to your patch to the dev list).

Thanks!

> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1274) ENVI header parser

2014-04-16 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971978#comment-13971978
 ] 

Nick Burch commented on TIKA-1274:
--

If this were changes to existing files, we'd need a patch file for the changes 
to review

As it's all new files, what we'd need attaching to the ticket are:
 * The custom-mimetypes file that defines your new format
 * The parser java file(s)
 * A sample ENVI header file
 * A unit test file that tests the detection and parsing
 * Details of any new dependencies (if any)

For general advice on contributing, patches, tests etc, the Apache Nutch 
project has some good wiki pages describing all of that, most of which will 
apply equally to Apache Tika too:
 * https://wiki.apache.org/nutch/HowToContribute
 * https://wiki.apache.org/nutch/Becoming_A_Nutch_Developer

Another good source is the ComDev (Apache Community Development) site - pick 
"For Contributors" from the menu and look through the pages in that section

For an example of a simple Tika parser + simple Tika parser unit test, I can 
suggest the VorbisParser from late 2011, when it largely only supported the one 
file (Ogg Vorbis), before additional Ogg based formats were added in. You can 
see that at something like 
https://github.com/Gagravarr/VorbisJava/tree/f6d20407477011735c16daf947635f1b67e14660/tika

> ENVI header parser
> --
>
> Key: TIKA-1274
> URL: https://issues.apache.org/jira/browse/TIKA-1274
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>  Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>Content-Encoding: ISO-8859-1
>Content-Length: 818
>Content-Type: application/envi.hdr
>resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.00, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> __
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)