[ 
https://issues.apache.org/jira/browse/TIKA-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460469#comment-16460469
 ] 

Hudson commented on TIKA-2636:
------------------------------

UNSTABLE: Integrated in Jenkins build Tika-trunk #1478 (See 
[https://builds.apache.org/job/Tika-trunk/1478/])
TIKA-2636 ENVI Header metadata fields can span more than one line 
(lewis.mcgibbney: 
[https://github.com/apache/tika/commit/1e45da928ab699cd3e483d5c1006412c69ff6c09])
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java
TIKA-2636 ENVI Header metadata fields can span more than one line 
(lewis.mcgibbney: 
[https://github.com/apache/tika/commit/95d967b14c33acd4e82814f6cb470cbab0f4ee08])
* (edit) 
tika-parsers/src/test/java/org/apache/tika/config/TikaEncodingDetectorTest.java


> ENVI Header metadata fields can span more than one line
> -------------------------------------------------------
>
>                 Key: TIKA-2636
>                 URL: https://issues.apache.org/jira/browse/TIKA-2636
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.18
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Major
>             Fix For: 1.19
>
>         Attachments: ang20150420t182050_corr_v1e_img.hdr
>
>
> [~tpalsulich] was correct when [he 
> stated|https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046140#comment-14046140]
>  "...See below for how to read and output line by line (copy & paste between 
> the xml start/end in EnviHeaderParser). I have a hunch this isn't really what 
> we want -- what if a metadata field has a newline in it? What if the line is 
> too long to fit into a string? On the other hand, with nice input, it's much 
> nicer output."
> As it turns out ENVI header metadata fields can span more than one line. An 
> example is as follows
> {code}
> 1.    ENVI
> 2.    description = {
> 3.      Georeferenced Image built from input GLT. [Wed Jun 10 04:37:54 2015] 
> [Wed
> 4.      Jun 10 04:48:52 2015]}
> 5.    samples = 739
> 6.    lines = 14674
> 7.    bands = 432
> 8.    header offset = 0
> 9.    file type = ENVI Standard
> 10.    data type = 4
> 11.    interleave = bil
> 12.    sensor type = Unknown
> 13.    byte order = 0
> 14.    map info = { UTM , 1.000 , 1.000 , 724522.127 , 4074620.759 , 
> 1.1000000000e+00 , 1.1000000000e+00 , 12 , North , WGS-84 , units=Meters , 
> rotation=75.00000000 }
> 15.    wavelength units = Nanometers
> ...
> {code}
> The case here is when a metadata field value is contained within curly 
> brackets. The examples above are clearly L2-L4 where the value is spread over 
> three lines and L14 where the value is contained within the one line.
> This requires a patch to fix the 
> [EnviHeaderParser|https://github.com/apache/tika/blob/9130bbc1fa6d69419b2ad294917260d6b1cced08/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to