[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256527#comment-14256527
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #366 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/366/])
Some test database files for TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647473)
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.frm
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testSQLITE3.db


 Mime magic for database file formats
 

 Key: TIKA-1502
 URL: https://issues.apache.org/jira/browse/TIKA-1502
 Project: Tika
  Issue Type: Improvement
  Components: mime
Affects Versions: 1.6
Reporter: Nick Burch

 I noticed today that Tika can't detect a lot of common database formats, such 
 as sqlite or Berkeley DB or MISAM
 The unix file utility got most of those, which makes me think that there's a 
 sensible-ish header on most we can write some mime magic for
 It'd therefore be good to add mime entries, with magic where possible, for 
 many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256561#comment-14256561
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #383 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/383/])
TIKA-1502 MySQL and SQLite3 mime types, with magic where possible (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647478)
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Some test database files for TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647473)
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.frm
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testSQLITE3.db


 Mime magic for database file formats
 

 Key: TIKA-1502
 URL: https://issues.apache.org/jira/browse/TIKA-1502
 Project: Tika
  Issue Type: Improvement
  Components: mime
Affects Versions: 1.6
Reporter: Nick Burch

 I noticed today that Tika can't detect a lot of common database formats, such 
 as sqlite or Berkeley DB or MISAM
 The unix file utility got most of those, which makes me think that there's a 
 sensible-ish header on most we can write some mime magic for
 It'd therefore be good to add mime entries, with magic where possible, for 
 many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256571#comment-14256571
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #367 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/367/])
TIKA-1502 MySQL and SQLite3 mime types, with magic where possible (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647478)
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml


 Mime magic for database file formats
 

 Key: TIKA-1502
 URL: https://issues.apache.org/jira/browse/TIKA-1502
 Project: Tika
  Issue Type: Improvement
  Components: mime
Affects Versions: 1.6
Reporter: Nick Burch

 I noticed today that Tika can't detect a lot of common database formats, such 
 as sqlite or Berkeley DB or MISAM
 The unix file utility got most of those, which makes me think that there's a 
 sensible-ish header on most we can write some mime magic for
 It'd therefore be good to add mime entries, with magic where possible, for 
 many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256640#comment-14256640
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #368 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/368/])
More test database files for TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647484)
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_5.db


 Mime magic for database file formats
 

 Key: TIKA-1502
 URL: https://issues.apache.org/jira/browse/TIKA-1502
 Project: Tika
  Issue Type: Improvement
  Components: mime
Affects Versions: 1.6
Reporter: Nick Burch

 I noticed today that Tika can't detect a lot of common database formats, such 
 as sqlite or Berkeley DB or MISAM
 The unix file utility got most of those, which makes me think that there's a 
 sensible-ish header on most we can write some mime magic for
 It'd therefore be good to add mime entries, with magic where possible, for 
 many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256655#comment-14256655
 ] 

Nick Burch commented on TIKA-1502:
--

As of r1647486, we now have mime types for SQLite3, MySQL (most) and Berkeley 
DB. We have magic for SQLite3, most of the MySQL formats (some are headerless), 
and expanded BDB ones.

One remaining issue is getting MimeTypesReaderTest.testReadParameterHeirarchy() 
to pass - for some reason the 3 level hierarchy of the BDB mime types is 
getting flattened to just two

 Mime magic for database file formats
 

 Key: TIKA-1502
 URL: https://issues.apache.org/jira/browse/TIKA-1502
 Project: Tika
  Issue Type: Improvement
  Components: mime
Affects Versions: 1.6
Reporter: Nick Burch

 I noticed today that Tika can't detect a lot of common database formats, such 
 as sqlite or Berkeley DB or MISAM
 The unix file utility got most of those, which makes me think that there's a 
 sensible-ish header on most we can write some mime magic for
 It'd therefore be good to add mime entries, with magic where possible, for 
 many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256663#comment-14256663
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #384 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/384/])
Split the Berkeley DB mimetypes into three levels, and add a detection test 
(passes) and a heirarchy test (disabled as fails) TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647486)
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
Start on magic for subtypes of Berkeley DB TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647485)
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
More test database files for TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1647484)
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_5.db


 Mime magic for database file formats
 

 Key: TIKA-1502
 URL: https://issues.apache.org/jira/browse/TIKA-1502
 Project: Tika
  Issue Type: Improvement
  Components: mime
Affects Versions: 1.6
Reporter: Nick Burch

 I noticed today that Tika can't detect a lot of common database formats, such 
 as sqlite or Berkeley DB or MISAM
 The unix file utility got most of those, which makes me think that there's a 
 sensible-ish header on most we can write some mime magic for
 It'd therefore be good to add mime entries, with magic where possible, for 
 many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)