[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything

2016-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182140#comment-15182140
 ] 

Hudson commented on TIKA-1877:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #920 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/920/])
fix for TIKA-1877 contributed by prasadns14 (prasadns14: rev 
602d237feec48bfd97bc2b2b38ea614b1ae2c55d)
* tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
* tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* 
tika-parsers/src/test/resources/test-documents/4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984
Rename the test file for TIKA-1877 to better match our test file naming (nick: 
rev 1299c9e494882e011749fb8a4514c40631df03d1)
* tika-parsers/src/test/resources/test-documents/testFITS_ShorterHeader.fits
* tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
* 
tika-parsers/src/test/resources/test-documents/4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984


> On updating the tika-mimetypes.xml to detect .fts file format, tika detector 
> does not return anything
> -
>
> Key: TIKA-1877
> URL: https://issues.apache.org/jira/browse/TIKA-1877
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Reporter: Prasad Nagaraj Subramanya
>Priority: Minor
> Fix For: 1.13
>
> Attachments: 
> 3DEE2CE70CAD248DC8A46C2D0BD0BD6C21AACE54AC958264773390B39C8AF079, 
> 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, 
> tika-mimetypes.xml
>
>
> The match value for .fts file format in tika-mimetypes.xml is "SIMPLE  =  
>   T".
> Tika detected a .fts file as application/octet-stream. On verifying the 
> header I found the value to be "SIMPLE  =T"(just 16 spaces 
> before = and T)
> I tried the following changes-
> Change 1) Updated the existing match value. But the build failed 
> Change 2) Added a new match value  type="string" offset="0"/> after the existing one.
> But now, tika returns empty value. It neither identifies the file as .fts nor 
> as application/octet-stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything

2016-03-06 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182131#comment-15182131
 ] 

Nick Burch commented on TIKA-1877:
--

With your patch applied, the Tika app correctly detects your new text file for 
me, both with and without the filename hint:
{code}
$ tika --detect 
tika-parsers/src/test/resources/test-documents/testFITS_ShorterHeader.fits
application/fits
$ tika --detect < 
tika-parsers/src/test/resources/test-documents/testFITS_ShorterHeader.fits
application/fits
{code}

> On updating the tika-mimetypes.xml to detect .fts file format, tika detector 
> does not return anything
> -
>
> Key: TIKA-1877
> URL: https://issues.apache.org/jira/browse/TIKA-1877
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Reporter: Prasad Nagaraj Subramanya
>Priority: Minor
> Attachments: 
> 3DEE2CE70CAD248DC8A46C2D0BD0BD6C21AACE54AC958264773390B39C8AF079, 
> 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, 
> tika-mimetypes.xml
>
>
> The match value for .fts file format in tika-mimetypes.xml is "SIMPLE  =  
>   T".
> Tika detected a .fts file as application/octet-stream. On verifying the 
> header I found the value to be "SIMPLE  =T"(just 16 spaces 
> before = and T)
> I tried the following changes-
> Change 1) Updated the existing match value. But the build failed 
> Change 2) Added a new match value  type="string" offset="0"/> after the existing one.
> But now, tika returns empty value. It neither identifies the file as .fts nor 
> as application/octet-stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything

2016-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182129#comment-15182129
 ] 

ASF GitHub Bot commented on TIKA-1877:
--

Github user asfgit closed the pull request at:

https://github.com/apache/tika/pull/81


> On updating the tika-mimetypes.xml to detect .fts file format, tika detector 
> does not return anything
> -
>
> Key: TIKA-1877
> URL: https://issues.apache.org/jira/browse/TIKA-1877
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Reporter: Prasad Nagaraj Subramanya
>Priority: Minor
> Attachments: 
> 3DEE2CE70CAD248DC8A46C2D0BD0BD6C21AACE54AC958264773390B39C8AF079, 
> 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, 
> tika-mimetypes.xml
>
>
> The match value for .fts file format in tika-mimetypes.xml is "SIMPLE  =  
>   T".
> Tika detected a .fts file as application/octet-stream. On verifying the 
> header I found the value to be "SIMPLE  =T"(just 16 spaces 
> before = and T)
> I tried the following changes-
> Change 1) Updated the existing match value. But the build failed 
> Change 2) Added a new match value  type="string" offset="0"/> after the existing one.
> But now, tika returns empty value. It neither identifies the file as .fts nor 
> as application/octet-stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything

2016-02-29 Thread Prasad Nagaraj Subramanya (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172956#comment-15172956
 ] 

Prasad Nagaraj Subramanya commented on TIKA-1877:
-

I have added a new .fts file under test-documents and created a unit test for 
the same. There are two more unit tests for .fts file and all the three cases 
pass with my changes. 

The interesting thing is that Tika returns empty string when i try to detect 
.fts file from command line. But it returns application/fits when i call detect 
method on tika object.

> On updating the tika-mimetypes.xml to detect .fts file format, tika detector 
> does not return anything
> -
>
> Key: TIKA-1877
> URL: https://issues.apache.org/jira/browse/TIKA-1877
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Reporter: Prasad Nagaraj Subramanya
>Priority: Minor
> Attachments: 
> 3DEE2CE70CAD248DC8A46C2D0BD0BD6C21AACE54AC958264773390B39C8AF079, 
> 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, 
> tika-mimetypes.xml
>
>
> The match value for .fts file format in tika-mimetypes.xml is "SIMPLE  =  
>   T".
> Tika detected a .fts file as application/octet-stream. On verifying the 
> header I found the value to be "SIMPLE  =T"(just 16 spaces 
> before = and T)
> I tried the following changes-
> Change 1) Updated the existing match value. But the build failed 
> Change 2) Added a new match value  type="string" offset="0"/> after the existing one.
> But now, tika returns empty value. It neither identifies the file as .fts nor 
> as application/octet-stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything

2016-02-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172890#comment-15172890
 ] 

ASF GitHub Bot commented on TIKA-1877:
--

GitHub user prasadns14 opened a pull request:

https://github.com/apache/tika/pull/81

fix for TIKA-1877 contributed by prasadns14

Updated the tika-mimetypes.xml
Also, added a new .fits file to test-documents and created a unit test too.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/prasadns14/tika TIKA-1877

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tika/pull/81.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #81


commit 602d237feec48bfd97bc2b2b38ea614b1ae2c55d
Author: prasadns14 
Date:   2016-02-29T23:03:13Z

fix for TIKA-1877 contributed by prasadns14




> On updating the tika-mimetypes.xml to detect .fts file format, tika detector 
> does not return anything
> -
>
> Key: TIKA-1877
> URL: https://issues.apache.org/jira/browse/TIKA-1877
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Reporter: Prasad Nagaraj Subramanya
>Priority: Minor
> Attachments: 
> 3DEE2CE70CAD248DC8A46C2D0BD0BD6C21AACE54AC958264773390B39C8AF079, 
> 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, 
> tika-mimetypes.xml
>
>
> The match value for .fts file format in tika-mimetypes.xml is "SIMPLE  =  
>   T".
> Tika detected a .fts file as application/octet-stream. On verifying the 
> header I found the value to be "SIMPLE  =T"(just 16 spaces 
> before = and T)
> I tried the following changes-
> Change 1) Updated the existing match value. But the build failed 
> Change 2) Added a new match value  type="string" offset="0"/> after the existing one.
> But now, tika returns empty value. It neither identifies the file as .fts nor 
> as application/octet-stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything

2016-02-28 Thread Namitha Sanjeeva Ganiga (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170969#comment-15170969
 ] 

Namitha Sanjeeva Ganiga commented on TIKA-1877:
---

I also noted this same issue.. 
>From the descriptions on the .fits file format, looks like there are 20 spaces 
>from "=" to "T".  Tika parser is behaving the same now.


If we check the file that has been classified as octet-stream, we see that 
there are 16 spaces between "=" and "T". (That is why it is getting classified 
as octet-stream and not application/fits.

The question then would be , if the files( like the ones attached, that has 16 
spaces) need to be classified into application/fits?

Reference :
http://fits.gsfc.nasa.gov/standard30/fits_standard30.pdf
https://tools.ietf.org/html/rfc4047


> On updating the tika-mimetypes.xml to detect .fts file format, tika detector 
> does not return anything
> -
>
> Key: TIKA-1877
> URL: https://issues.apache.org/jira/browse/TIKA-1877
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Reporter: Prasad Nagaraj Subramanya
>Priority: Minor
> Attachments: 
> 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, 
> tika-mimetypes.xml
>
>
> The match value for .fts file format in tika-mimetypes.xml is "SIMPLE  =  
>   T".
> Tika detected a .fts file as application/octet-stream. On verifying the 
> header I found the value to be "SIMPLE  =T"(just 16 spaces 
> before = and T)
> I tried the following changes-
> Change 1) Updated the existing match value. But the build failed 
> Change 2) Added a new match value  type="string" offset="0"/> after the existing one.
> But now, tika returns empty value. It neither identifies the file as .fts nor 
> as application/octet-stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything

2016-02-27 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170579#comment-15170579
 ] 

Nick Burch commented on TIKA-1877:
--

Posting the whole modified tika mimetypes file isn't ideal - it's hard for us 
to see what has changed and what hasn't, especially given the file's large 
size. Would you be able to post a patch/diff showing just your changes, to help 
us review and possibly spot the issue?

(I tried diff'ing it to trunk, but got such a large number of changes I 
couldn't see what was supposed to be your change amongst them)

Ideally, also, it would be easier if you could write a short junit unit test 
showing the detection issue. That's generally much quicker and easier to test 
with, as well as having the bonus of proving a check to ensure that post-fix it 
stays fixed!

> On updating the tika-mimetypes.xml to detect .fts file format, tika detector 
> does not return anything
> -
>
> Key: TIKA-1877
> URL: https://issues.apache.org/jira/browse/TIKA-1877
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Reporter: Prasad Nagaraj Subramanya
>Priority: Minor
> Attachments: 
> 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, 
> tika-mimetypes.xml
>
>
> The match value for .fts file format in tika-mimetypes.xml is "SIMPLE  =  
>   T".
> Tika detected a .fts file as application/octet-stream. On verifying the 
> header I found the value to be "SIMPLE  =T"(just 16 spaces 
> before = and T)
> I tried the following changes-
> Change 1) Updated the existing match value. But the build failed 
> Change 2) Added a new match value  type="string" offset="0"/> after the existing one.
> But now, tika returns empty value. It neither identifies the file as .fts nor 
> as application/octet-stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)