[ 
https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971978#comment-13971978
 ] 

Nick Burch commented on TIKA-1274:
----------------------------------

If this were changes to existing files, we'd need a patch file for the changes 
to review

As it's all new files, what we'd need attaching to the ticket are:
 * The custom-mimetypes file that defines your new format
 * The parser java file(s)
 * A sample ENVI header file
 * A unit test file that tests the detection and parsing
 * Details of any new dependencies (if any)

For general advice on contributing, patches, tests etc, the Apache Nutch 
project has some good wiki pages describing all of that, most of which will 
apply equally to Apache Tika too:
 * https://wiki.apache.org/nutch/HowToContribute
 * https://wiki.apache.org/nutch/Becoming_A_Nutch_Developer

Another good source is the ComDev (Apache Community Development) site - pick 
"For Contributors" from the menu and look through the pages in that section

For an example of a simple Tika parser + simple Tika parser unit test, I can 
suggest the VorbisParser from late 2011, when it largely only supported the one 
file (Ogg Vorbis), before additional Ogg based formats were added in. You can 
see that at something like 
https://github.com/Gagravarr/VorbisJava/tree/f6d20407477011735c16daf947635f1b67e14660/tika

> ENVI header parser
> ------------------
>
>                 Key: TIKA-1274
>                 URL: https://issues.apache.org/jira/browse/TIKA-1274
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Ann Burgess
>              Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header 
> files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>    Content-Encoding: ISO-8859-1
>    Content-Length: 818
>    Content-Type: application/envi.hdr
>    resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath 
> annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar 
> org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 
> 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.000000, 0.0, 0.0, Sinusoidal, 
> units=Meters}
> coordinate system string = 
> {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> ______________
> As a current non-certified committer, could someone enlighten me to the steps 
> needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: 
> /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to