[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything
[ https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182140#comment-15182140 ] Hudson commented on TIKA-1877: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #920 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/920/]) fix for TIKA-1877 contributed by prasadns14 (prasadns14: rev 602d237feec48bfd97bc2b2b38ea614b1ae2c55d) * tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java * tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml * tika-parsers/src/test/resources/test-documents/4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984 Rename the test file for TIKA-1877 to better match our test file naming (nick: rev 1299c9e494882e011749fb8a4514c40631df03d1) * tika-parsers/src/test/resources/test-documents/testFITS_ShorterHeader.fits * tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java * tika-parsers/src/test/resources/test-documents/4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984 > On updating the tika-mimetypes.xml to detect .fts file format, tika detector > does not return anything > - > > Key: TIKA-1877 > URL: https://issues.apache.org/jira/browse/TIKA-1877 > Project: Tika > Issue Type: Bug > Components: mime >Reporter: Prasad Nagaraj Subramanya >Priority: Minor > Fix For: 1.13 > > Attachments: > 3DEE2CE70CAD248DC8A46C2D0BD0BD6C21AACE54AC958264773390B39C8AF079, > 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, > tika-mimetypes.xml > > > The match value for .fts file format in tika-mimetypes.xml is "SIMPLE = > T". > Tika detected a .fts file as application/octet-stream. On verifying the > header I found the value to be "SIMPLE =T"(just 16 spaces > before = and T) > I tried the following changes- > Change 1) Updated the existing match value. But the build failed > Change 2) Added a new match value type="string" offset="0"/> after the existing one. > But now, tika returns empty value. It neither identifies the file as .fts nor > as application/octet-stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything
[ https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182131#comment-15182131 ] Nick Burch commented on TIKA-1877: -- With your patch applied, the Tika app correctly detects your new text file for me, both with and without the filename hint: {code} $ tika --detect tika-parsers/src/test/resources/test-documents/testFITS_ShorterHeader.fits application/fits $ tika --detect < tika-parsers/src/test/resources/test-documents/testFITS_ShorterHeader.fits application/fits {code} > On updating the tika-mimetypes.xml to detect .fts file format, tika detector > does not return anything > - > > Key: TIKA-1877 > URL: https://issues.apache.org/jira/browse/TIKA-1877 > Project: Tika > Issue Type: Bug > Components: mime >Reporter: Prasad Nagaraj Subramanya >Priority: Minor > Attachments: > 3DEE2CE70CAD248DC8A46C2D0BD0BD6C21AACE54AC958264773390B39C8AF079, > 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, > tika-mimetypes.xml > > > The match value for .fts file format in tika-mimetypes.xml is "SIMPLE = > T". > Tika detected a .fts file as application/octet-stream. On verifying the > header I found the value to be "SIMPLE =T"(just 16 spaces > before = and T) > I tried the following changes- > Change 1) Updated the existing match value. But the build failed > Change 2) Added a new match value type="string" offset="0"/> after the existing one. > But now, tika returns empty value. It neither identifies the file as .fts nor > as application/octet-stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything
[ https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182129#comment-15182129 ] ASF GitHub Bot commented on TIKA-1877: -- Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/81 > On updating the tika-mimetypes.xml to detect .fts file format, tika detector > does not return anything > - > > Key: TIKA-1877 > URL: https://issues.apache.org/jira/browse/TIKA-1877 > Project: Tika > Issue Type: Bug > Components: mime >Reporter: Prasad Nagaraj Subramanya >Priority: Minor > Attachments: > 3DEE2CE70CAD248DC8A46C2D0BD0BD6C21AACE54AC958264773390B39C8AF079, > 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, > tika-mimetypes.xml > > > The match value for .fts file format in tika-mimetypes.xml is "SIMPLE = > T". > Tika detected a .fts file as application/octet-stream. On verifying the > header I found the value to be "SIMPLE =T"(just 16 spaces > before = and T) > I tried the following changes- > Change 1) Updated the existing match value. But the build failed > Change 2) Added a new match value type="string" offset="0"/> after the existing one. > But now, tika returns empty value. It neither identifies the file as .fts nor > as application/octet-stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything
[ https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172956#comment-15172956 ] Prasad Nagaraj Subramanya commented on TIKA-1877: - I have added a new .fts file under test-documents and created a unit test for the same. There are two more unit tests for .fts file and all the three cases pass with my changes. The interesting thing is that Tika returns empty string when i try to detect .fts file from command line. But it returns application/fits when i call detect method on tika object. > On updating the tika-mimetypes.xml to detect .fts file format, tika detector > does not return anything > - > > Key: TIKA-1877 > URL: https://issues.apache.org/jira/browse/TIKA-1877 > Project: Tika > Issue Type: Bug > Components: mime >Reporter: Prasad Nagaraj Subramanya >Priority: Minor > Attachments: > 3DEE2CE70CAD248DC8A46C2D0BD0BD6C21AACE54AC958264773390B39C8AF079, > 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, > tika-mimetypes.xml > > > The match value for .fts file format in tika-mimetypes.xml is "SIMPLE = > T". > Tika detected a .fts file as application/octet-stream. On verifying the > header I found the value to be "SIMPLE =T"(just 16 spaces > before = and T) > I tried the following changes- > Change 1) Updated the existing match value. But the build failed > Change 2) Added a new match value type="string" offset="0"/> after the existing one. > But now, tika returns empty value. It neither identifies the file as .fts nor > as application/octet-stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything
[ https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172890#comment-15172890 ] ASF GitHub Bot commented on TIKA-1877: -- GitHub user prasadns14 opened a pull request: https://github.com/apache/tika/pull/81 fix for TIKA-1877 contributed by prasadns14 Updated the tika-mimetypes.xml Also, added a new .fits file to test-documents and created a unit test too. You can merge this pull request into a Git repository by running: $ git pull https://github.com/prasadns14/tika TIKA-1877 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/81.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #81 commit 602d237feec48bfd97bc2b2b38ea614b1ae2c55d Author: prasadns14Date: 2016-02-29T23:03:13Z fix for TIKA-1877 contributed by prasadns14 > On updating the tika-mimetypes.xml to detect .fts file format, tika detector > does not return anything > - > > Key: TIKA-1877 > URL: https://issues.apache.org/jira/browse/TIKA-1877 > Project: Tika > Issue Type: Bug > Components: mime >Reporter: Prasad Nagaraj Subramanya >Priority: Minor > Attachments: > 3DEE2CE70CAD248DC8A46C2D0BD0BD6C21AACE54AC958264773390B39C8AF079, > 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, > tika-mimetypes.xml > > > The match value for .fts file format in tika-mimetypes.xml is "SIMPLE = > T". > Tika detected a .fts file as application/octet-stream. On verifying the > header I found the value to be "SIMPLE =T"(just 16 spaces > before = and T) > I tried the following changes- > Change 1) Updated the existing match value. But the build failed > Change 2) Added a new match value type="string" offset="0"/> after the existing one. > But now, tika returns empty value. It neither identifies the file as .fts nor > as application/octet-stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything
[ https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170969#comment-15170969 ] Namitha Sanjeeva Ganiga commented on TIKA-1877: --- I also noted this same issue.. >From the descriptions on the .fits file format, looks like there are 20 spaces >from "=" to "T". Tika parser is behaving the same now. If we check the file that has been classified as octet-stream, we see that there are 16 spaces between "=" and "T". (That is why it is getting classified as octet-stream and not application/fits. The question then would be , if the files( like the ones attached, that has 16 spaces) need to be classified into application/fits? Reference : http://fits.gsfc.nasa.gov/standard30/fits_standard30.pdf https://tools.ietf.org/html/rfc4047 > On updating the tika-mimetypes.xml to detect .fts file format, tika detector > does not return anything > - > > Key: TIKA-1877 > URL: https://issues.apache.org/jira/browse/TIKA-1877 > Project: Tika > Issue Type: Bug > Components: mime >Reporter: Prasad Nagaraj Subramanya >Priority: Minor > Attachments: > 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, > tika-mimetypes.xml > > > The match value for .fts file format in tika-mimetypes.xml is "SIMPLE = > T". > Tika detected a .fts file as application/octet-stream. On verifying the > header I found the value to be "SIMPLE =T"(just 16 spaces > before = and T) > I tried the following changes- > Change 1) Updated the existing match value. But the build failed > Change 2) Added a new match value type="string" offset="0"/> after the existing one. > But now, tika returns empty value. It neither identifies the file as .fts nor > as application/octet-stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything
[ https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170579#comment-15170579 ] Nick Burch commented on TIKA-1877: -- Posting the whole modified tika mimetypes file isn't ideal - it's hard for us to see what has changed and what hasn't, especially given the file's large size. Would you be able to post a patch/diff showing just your changes, to help us review and possibly spot the issue? (I tried diff'ing it to trunk, but got such a large number of changes I couldn't see what was supposed to be your change amongst them) Ideally, also, it would be easier if you could write a short junit unit test showing the detection issue. That's generally much quicker and easier to test with, as well as having the bonus of proving a check to ensure that post-fix it stays fixed! > On updating the tika-mimetypes.xml to detect .fts file format, tika detector > does not return anything > - > > Key: TIKA-1877 > URL: https://issues.apache.org/jira/browse/TIKA-1877 > Project: Tika > Issue Type: Bug > Components: mime >Reporter: Prasad Nagaraj Subramanya >Priority: Minor > Attachments: > 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, > tika-mimetypes.xml > > > The match value for .fts file format in tika-mimetypes.xml is "SIMPLE = > T". > Tika detected a .fts file as application/octet-stream. On verifying the > header I found the value to be "SIMPLE =T"(just 16 spaces > before = and T) > I tried the following changes- > Change 1) Updated the existing match value. But the build failed > Change 2) Added a new match value type="string" offset="0"/> after the existing one. > But now, tika returns empty value. It neither identifies the file as .fts nor > as application/octet-stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)