[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256527#comment-14256527 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #366 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/366/]) Some test database files for TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647473) * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.frm * /tika/trunk/tika-parsers/src/test/resources/test-documents/testSQLITE3.db Mime magic for database file formats Key: TIKA-1502 URL: https://issues.apache.org/jira/browse/TIKA-1502 Project: Tika Issue Type: Improvement Components: mime Affects Versions: 1.6 Reporter: Nick Burch I noticed today that Tika can't detect a lot of common database formats, such as sqlite or Berkeley DB or MISAM The unix file utility got most of those, which makes me think that there's a sensible-ish header on most we can write some mime magic for It'd therefore be good to add mime entries, with magic where possible, for many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256561#comment-14256561 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #383 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/383/]) TIKA-1502 MySQL and SQLite3 mime types, with magic where possible (nick: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647478) * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml Some test database files for TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647473) * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.frm * /tika/trunk/tika-parsers/src/test/resources/test-documents/testSQLITE3.db Mime magic for database file formats Key: TIKA-1502 URL: https://issues.apache.org/jira/browse/TIKA-1502 Project: Tika Issue Type: Improvement Components: mime Affects Versions: 1.6 Reporter: Nick Burch I noticed today that Tika can't detect a lot of common database formats, such as sqlite or Berkeley DB or MISAM The unix file utility got most of those, which makes me think that there's a sensible-ish header on most we can write some mime magic for It'd therefore be good to add mime entries, with magic where possible, for many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256571#comment-14256571 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #367 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/367/]) TIKA-1502 MySQL and SQLite3 mime types, with magic where possible (nick: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647478) * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml Mime magic for database file formats Key: TIKA-1502 URL: https://issues.apache.org/jira/browse/TIKA-1502 Project: Tika Issue Type: Improvement Components: mime Affects Versions: 1.6 Reporter: Nick Burch I noticed today that Tika can't detect a lot of common database formats, such as sqlite or Berkeley DB or MISAM The unix file utility got most of those, which makes me think that there's a sensible-ish header on most we can write some mime magic for It'd therefore be good to add mime entries, with magic where possible, for many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256640#comment-14256640 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #368 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/368/]) More test database files for TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647484) * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_5.db Mime magic for database file formats Key: TIKA-1502 URL: https://issues.apache.org/jira/browse/TIKA-1502 Project: Tika Issue Type: Improvement Components: mime Affects Versions: 1.6 Reporter: Nick Burch I noticed today that Tika can't detect a lot of common database formats, such as sqlite or Berkeley DB or MISAM The unix file utility got most of those, which makes me think that there's a sensible-ish header on most we can write some mime magic for It'd therefore be good to add mime entries, with magic where possible, for many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256655#comment-14256655 ] Nick Burch commented on TIKA-1502: -- As of r1647486, we now have mime types for SQLite3, MySQL (most) and Berkeley DB. We have magic for SQLite3, most of the MySQL formats (some are headerless), and expanded BDB ones. One remaining issue is getting MimeTypesReaderTest.testReadParameterHeirarchy() to pass - for some reason the 3 level hierarchy of the BDB mime types is getting flattened to just two Mime magic for database file formats Key: TIKA-1502 URL: https://issues.apache.org/jira/browse/TIKA-1502 Project: Tika Issue Type: Improvement Components: mime Affects Versions: 1.6 Reporter: Nick Burch I noticed today that Tika can't detect a lot of common database formats, such as sqlite or Berkeley DB or MISAM The unix file utility got most of those, which makes me think that there's a sensible-ish header on most we can write some mime magic for It'd therefore be good to add mime entries, with magic where possible, for many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256663#comment-14256663 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #384 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/384/]) Split the Berkeley DB mimetypes into three levels, and add a detection test (passes) and a heirarchy test (disabled as fails) TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647486) * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml * /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java Start on magic for subtypes of Berkeley DB TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647485) * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml More test database files for TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647484) * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_5.db Mime magic for database file formats Key: TIKA-1502 URL: https://issues.apache.org/jira/browse/TIKA-1502 Project: Tika Issue Type: Improvement Components: mime Affects Versions: 1.6 Reporter: Nick Burch I noticed today that Tika can't detect a lot of common database formats, such as sqlite or Berkeley DB or MISAM The unix file utility got most of those, which makes me think that there's a sensible-ish header on most we can write some mime magic for It'd therefore be good to add mime entries, with magic where possible, for many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)