[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256710#comment-14256710 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #385 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/385/]) Fix test for TIKA-1502 - re-order the MediaTypeRegistry logic for getting the super type, so that if an explicit inheritance has been defined between one parametered type and another, that inheritance is used in preference to "drop all parameters" (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647489) * /tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MediaTypeRegistry.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java > Mime magic for database file formats > > > Key: TIKA-1502 > URL: https://issues.apache.org/jira/browse/TIKA-1502 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.6 >Reporter: Nick Burch > Fix For: 1.7 > > > I noticed today that Tika can't detect a lot of common database formats, such > as sqlite or Berkeley DB or MISAM > The unix file utility got most of those, which makes me think that there's a > sensible-ish header on most we can write some mime magic for > It'd therefore be good to add mime entries, with magic where possible, for > many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256675#comment-14256675 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #369 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/369/]) Fix test for TIKA-1502 - re-order the MediaTypeRegistry logic for getting the super type, so that if an explicit inheritance has been defined between one parametered type and another, that inheritance is used in preference to "drop all parameters" (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647489) * /tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MediaTypeRegistry.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java Split the Berkeley DB mimetypes into three levels, and add a detection test (passes) and a heirarchy test (disabled as fails) TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647486) * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml * /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java Start on magic for subtypes of Berkeley DB TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647485) * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > Mime magic for database file formats > > > Key: TIKA-1502 > URL: https://issues.apache.org/jira/browse/TIKA-1502 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.6 >Reporter: Nick Burch > Fix For: 1.7 > > > I noticed today that Tika can't detect a lot of common database formats, such > as sqlite or Berkeley DB or MISAM > The unix file utility got most of those, which makes me think that there's a > sensible-ish header on most we can write some mime magic for > It'd therefore be good to add mime entries, with magic where possible, for > many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1502. -- Resolution: Fixed Fix Version/s: 1.7 In r1647489 I've re-ordered the MediaTypeRegistry logic for getting the super type, so that if an explicit inheritance has been defined between one parametered type and another, that inheritance is used in preference to "drop all parameters" That means that the supertype fetching for something defined in the mimetypes file can go like: application/x-berkeley-db;format=hash;version=2 to application/x-berkeley-db;format=hash to application/x-berkeley-db However, for parameters unknown to the mime types file, the behaviour remains things like text/plain; charset=UTF-8 to text/plain > Mime magic for database file formats > > > Key: TIKA-1502 > URL: https://issues.apache.org/jira/browse/TIKA-1502 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.6 >Reporter: Nick Burch > Fix For: 1.7 > > > I noticed today that Tika can't detect a lot of common database formats, such > as sqlite or Berkeley DB or MISAM > The unix file utility got most of those, which makes me think that there's a > sensible-ish header on most we can write some mime magic for > It'd therefore be good to add mime entries, with magic where possible, for > many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256663#comment-14256663 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #384 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/384/]) Split the Berkeley DB mimetypes into three levels, and add a detection test (passes) and a heirarchy test (disabled as fails) TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647486) * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml * /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java Start on magic for subtypes of Berkeley DB TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647485) * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml More test database files for TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647484) * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_5.db > Mime magic for database file formats > > > Key: TIKA-1502 > URL: https://issues.apache.org/jira/browse/TIKA-1502 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.6 >Reporter: Nick Burch > > I noticed today that Tika can't detect a lot of common database formats, such > as sqlite or Berkeley DB or MISAM > The unix file utility got most of those, which makes me think that there's a > sensible-ish header on most we can write some mime magic for > It'd therefore be good to add mime entries, with magic where possible, for > many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256655#comment-14256655 ] Nick Burch commented on TIKA-1502: -- As of r1647486, we now have mime types for SQLite3, MySQL (most) and Berkeley DB. We have magic for SQLite3, most of the MySQL formats (some are headerless), and expanded BDB ones. One remaining issue is getting MimeTypesReaderTest.testReadParameterHeirarchy() to pass - for some reason the 3 level hierarchy of the BDB mime types is getting flattened to just two > Mime magic for database file formats > > > Key: TIKA-1502 > URL: https://issues.apache.org/jira/browse/TIKA-1502 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.6 >Reporter: Nick Burch > > I noticed today that Tika can't detect a lot of common database formats, such > as sqlite or Berkeley DB or MISAM > The unix file utility got most of those, which makes me think that there's a > sensible-ish header on most we can write some mime magic for > It'd therefore be good to add mime entries, with magic where possible, for > many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256640#comment-14256640 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #368 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/368/]) More test database files for TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647484) * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_5.db > Mime magic for database file formats > > > Key: TIKA-1502 > URL: https://issues.apache.org/jira/browse/TIKA-1502 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.6 >Reporter: Nick Burch > > I noticed today that Tika can't detect a lot of common database formats, such > as sqlite or Berkeley DB or MISAM > The unix file utility got most of those, which makes me think that there's a > sensible-ish header on most we can write some mime magic for > It'd therefore be good to add mime entries, with magic where possible, for > many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256571#comment-14256571 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #367 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/367/]) TIKA-1502 MySQL and SQLite3 mime types, with magic where possible (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647478) * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > Mime magic for database file formats > > > Key: TIKA-1502 > URL: https://issues.apache.org/jira/browse/TIKA-1502 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.6 >Reporter: Nick Burch > > I noticed today that Tika can't detect a lot of common database formats, such > as sqlite or Berkeley DB or MISAM > The unix file utility got most of those, which makes me think that there's a > sensible-ish header on most we can write some mime magic for > It'd therefore be good to add mime entries, with magic where possible, for > many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256561#comment-14256561 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #383 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/383/]) TIKA-1502 MySQL and SQLite3 mime types, with magic where possible (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647478) * /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml Some test database files for TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647473) * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.frm * /tika/trunk/tika-parsers/src/test/resources/test-documents/testSQLITE3.db > Mime magic for database file formats > > > Key: TIKA-1502 > URL: https://issues.apache.org/jira/browse/TIKA-1502 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.6 >Reporter: Nick Burch > > I noticed today that Tika can't detect a lot of common database formats, such > as sqlite or Berkeley DB or MISAM > The unix file utility got most of those, which makes me think that there's a > sensible-ish header on most we can write some mime magic for > It'd therefore be good to add mime entries, with magic where possible, for > many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1502) Mime magic for database file formats
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256527#comment-14256527 ] Hudson commented on TIKA-1502: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #366 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/366/]) Some test database files for TIKA-1502 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647473) * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI * /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.frm * /tika/trunk/tika-parsers/src/test/resources/test-documents/testSQLITE3.db > Mime magic for database file formats > > > Key: TIKA-1502 > URL: https://issues.apache.org/jira/browse/TIKA-1502 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.6 >Reporter: Nick Burch > > I noticed today that Tika can't detect a lot of common database formats, such > as sqlite or Berkeley DB or MISAM > The unix file utility got most of those, which makes me think that there's a > sensible-ish header on most we can write some mime magic for > It'd therefore be good to add mime entries, with magic where possible, for > many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1502) Mime magic for database file formats
Nick Burch created TIKA-1502: Summary: Mime magic for database file formats Key: TIKA-1502 URL: https://issues.apache.org/jira/browse/TIKA-1502 Project: Tika Issue Type: Improvement Components: mime Affects Versions: 1.6 Reporter: Nick Burch I noticed today that Tika can't detect a lot of common database formats, such as sqlite or Berkeley DB or MISAM The unix file utility got most of those, which makes me think that there's a sensible-ish header on most we can write some mime magic for It'd therefore be good to add mime entries, with magic where possible, for many of these common database file formats -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1483) Create a general raw string parser
[ https://issues.apache.org/jira/browse/TIKA-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256377#comment-14256377 ] Luis Filipe Nassif commented on TIKA-1483: -- [~talli...@apache.org], Do you mean add language models to do automatic language/charset detection? My original purpose was to extract strings from binary and non-text files, so I think it would be difficult to detect the language and charset used in that files. My idea was to let the user configure the language(s) and charsets of interest and the parser would do a best effort to decode them. I think TextParser already do an automatic charset detection (do not know about language). > Create a general raw string parser > -- > > Key: TIKA-1483 > URL: https://issues.apache.org/jira/browse/TIKA-1483 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 1.6 >Reporter: Luis Filipe Nassif > > I think it can be very useful adding a general parser able to extract raw > strings from files (like the strings command), which can be used as the > fallback parser for all mimetypes not having a specific parser > implementation, like application/octet-stream. It can also be used as a > fallback for corrupt files throwing a TikaException. > It must be configured with the script/language to be extracted from the files > (currently I implemented one specific for Latin1). > It can use heuristics to extract strings encoded with different charsets > within the same file, mainly the common ISO-8859-1, UTF8 and UTF16. > What the community thinks about that? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1483) Create a general raw string parser
[ https://issues.apache.org/jira/browse/TIKA-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256304#comment-14256304 ] Luis Filipe Nassif commented on TIKA-1483: -- Do you think it would be useful adding a first implementation specific and optimized for extracting Latin1 scripts (Western European languages) coded with ISO8859-1, UTF8 and UTF16? If yes, I will try to submit a patch. > Create a general raw string parser > -- > > Key: TIKA-1483 > URL: https://issues.apache.org/jira/browse/TIKA-1483 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 1.6 >Reporter: Luis Filipe Nassif > > I think it can be very useful adding a general parser able to extract raw > strings from files (like the strings command), which can be used as the > fallback parser for all mimetypes not having a specific parser > implementation, like application/octet-stream. It can also be used as a > fallback for corrupt files throwing a TikaException. > It must be configured with the script/language to be extracted from the files > (currently I implemented one specific for Latin1). > It can use heuristics to extract strings encoded with different charsets > within the same file, mainly the common ISO-8859-1, UTF8 and UTF16. > What the community thinks about that? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 1.7 release?
+1 for going. Many thanks to Tyler and to Nick to take the POI upgrade. So many christmas gifts in advance or just after :-) Merry christmas to all 2014-12-22 19:59 GMT+01:00 Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov>: > WOOO HOO! Go Tyler go! :0) Merry Christmas bud. > > ++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++ > > > > > > > -Original Message- > From: Tyler Palsulich > Reply-To: "dev@tika.apache.org" > Date: Monday, December 22, 2014 at 10:57 AM > To: "dev@tika.apache.org" > Subject: Re: 1.7 release? > > >Hi All, > > > >Nick added the temporary fix for TIKA-1445 and made the POI updates for > >TIKA-1469 (thanks!). And, I'll volunteer to be the Release Manager for > >1.7! > >:) > > > >I'll start the process this weekend or a couple days into the new year. > > > >Cheers, > >Tyler > >On Dec 18, 2014 9:45 PM, "Mattmann, Chris A (3980)" < > >chris.a.mattm...@jpl.nasa.gov> wrote: > > > >> +1 > >> > >> ++ > >> Chris Mattmann, Ph.D. > >> Chief Architect > >> Instrument Software and Science Data Systems Section (398) > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 168-519, Mailstop: 168-527 > >> Email: chris.a.mattm...@nasa.gov > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++ > >> Adjunct Associate Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> ++ > >> > >> > >> > >> > >> > >> > >> -Original Message- > >> From: Tyler Palsulich > >> Reply-To: "dev@tika.apache.org" > >> Date: Thursday, December 18, 2014 at 9:15 PM > >> To: "dev@tika.apache.org" > >> Subject: Re: 1.7 release? > >> > >> >I'm OK with trying the fix in 1.8 (or 1.7 if people feel strongly). As > >> >Nick > >> >just recommended, I'll try adding metadata extraction to Tesseract > >>soon, > >> >then adding the extensible solution in 1.8. > >> > > >> >Tyler > >> > > >> >On Thu, Dec 18, 2014 at 11:58 PM, Mattmann, Chris A (3980) < > >> >chris.a.mattm...@jpl.nasa.gov> wrote: > >> >> > >> >> I haven’t tried my hand at it - been super busy. tyler if you have a > >> >> chance go for it, I think that’s the remaining blocker. > >> >> > >> >> ++ > >> >> Chris Mattmann, Ph.D. > >> >> Chief Architect > >> >> Instrument Software and Science Data Systems Section (398) > >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >> Office: 168-519, Mailstop: 168-527 > >> >> Email: chris.a.mattm...@nasa.gov > >> >> WWW: http://sunset.usc.edu/~mattmann/ > >> >> ++ > >> >> Adjunct Associate Professor, Computer Science Department > >> >> University of Southern California, Los Angeles, CA 90089 USA > >> >> ++ > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> -Original Message- > >> >> From: Tyler Palsulich > >> >> Reply-To: "dev@tika.apache.org" > >> >> Date: Thursday, December 18, 2014 at 12:54 PM > >> >> To: "dev@tika.apache.org" > >> >> Subject: Re: 1.7 release? > >> >> > >> >> >Hi All, > >> >> > > >> >> >It's been a few months, so I just want to follow up on this thread. > >> >>We've > >> >> >resolved/closed 51 issues for v1.7 [0]. There are two on JIRA > >>marked as > >> >> >1.7 > >> >> >(TIKA-1465 and TIKA-894). Do we still want to aim for 1.7 with > >> >>TIKA-1445? > >> >> >Has anyone tried their hand at the suggested (significant) fix? > >> >> > > >> >> >Are there any other issues someone would like to fit in? > >> >> > > >> >> >Cheers, > >> >> >Tyler > >> >> > > >> >> >[0] - > >> >> > > >> >> > >> >> > >> > >> > https://issues.apache.org/jira/browse/TIKA/fixforversion/12327096/?select > >> >>e > >> >> >dTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel > >> >> > > >> >> >On Tue, Oct 28, 2014 at 1:46 AM, Mattmann, Chris A (3980) < > >> >> >chris.a.mattm...@jpl.nasa.gov> wrote: > >> >> >> > >> >> >> Thanks Tim saw your patch and am looking now. > >> >> >> > >> >> >> ++ > >> >> >> Chris Mattmann, Ph.D. > >> >> >> Chief Architect > >> >> >> Instrument Software and Science Data Systems Section (398) > >> >> >> NASA Jet
Re: 1.7 release?
WOOO HOO! Go Tyler go! :0) Merry Christmas bud. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Tyler Palsulich Reply-To: "dev@tika.apache.org" Date: Monday, December 22, 2014 at 10:57 AM To: "dev@tika.apache.org" Subject: Re: 1.7 release? >Hi All, > >Nick added the temporary fix for TIKA-1445 and made the POI updates for >TIKA-1469 (thanks!). And, I'll volunteer to be the Release Manager for >1.7! >:) > >I'll start the process this weekend or a couple days into the new year. > >Cheers, >Tyler >On Dec 18, 2014 9:45 PM, "Mattmann, Chris A (3980)" < >chris.a.mattm...@jpl.nasa.gov> wrote: > >> +1 >> >> ++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++ >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++ >> >> >> >> >> >> >> -Original Message- >> From: Tyler Palsulich >> Reply-To: "dev@tika.apache.org" >> Date: Thursday, December 18, 2014 at 9:15 PM >> To: "dev@tika.apache.org" >> Subject: Re: 1.7 release? >> >> >I'm OK with trying the fix in 1.8 (or 1.7 if people feel strongly). As >> >Nick >> >just recommended, I'll try adding metadata extraction to Tesseract >>soon, >> >then adding the extensible solution in 1.8. >> > >> >Tyler >> > >> >On Thu, Dec 18, 2014 at 11:58 PM, Mattmann, Chris A (3980) < >> >chris.a.mattm...@jpl.nasa.gov> wrote: >> >> >> >> I haven’t tried my hand at it - been super busy. tyler if you have a >> >> chance go for it, I think that’s the remaining blocker. >> >> >> >> ++ >> >> Chris Mattmann, Ph.D. >> >> Chief Architect >> >> Instrument Software and Science Data Systems Section (398) >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> Office: 168-519, Mailstop: 168-527 >> >> Email: chris.a.mattm...@nasa.gov >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> ++ >> >> Adjunct Associate Professor, Computer Science Department >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> ++ >> >> >> >> >> >> >> >> >> >> >> >> >> >> -Original Message- >> >> From: Tyler Palsulich >> >> Reply-To: "dev@tika.apache.org" >> >> Date: Thursday, December 18, 2014 at 12:54 PM >> >> To: "dev@tika.apache.org" >> >> Subject: Re: 1.7 release? >> >> >> >> >Hi All, >> >> > >> >> >It's been a few months, so I just want to follow up on this thread. >> >>We've >> >> >resolved/closed 51 issues for v1.7 [0]. There are two on JIRA >>marked as >> >> >1.7 >> >> >(TIKA-1465 and TIKA-894). Do we still want to aim for 1.7 with >> >>TIKA-1445? >> >> >Has anyone tried their hand at the suggested (significant) fix? >> >> > >> >> >Are there any other issues someone would like to fit in? >> >> > >> >> >Cheers, >> >> >Tyler >> >> > >> >> >[0] - >> >> > >> >> >> >> >> >>https://issues.apache.org/jira/browse/TIKA/fixforversion/12327096/?select >> >>e >> >> >dTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel >> >> > >> >> >On Tue, Oct 28, 2014 at 1:46 AM, Mattmann, Chris A (3980) < >> >> >chris.a.mattm...@jpl.nasa.gov> wrote: >> >> >> >> >> >> Thanks Tim saw your patch and am looking now. >> >> >> >> >> >> ++ >> >> >> Chris Mattmann, Ph.D. >> >> >> Chief Architect >> >> >> Instrument Software and Science Data Systems Section (398) >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> >> Office: 168-519, Mailstop: 168-527 >> >> >> Email: chris.a.mattm...@nasa.gov >> >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> >> ++ >> >> >> Adjunct Associate Professor, Computer Science Department >> >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> >> ++ >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>
Re: 1.7 release?
Hi All, Nick added the temporary fix for TIKA-1445 and made the POI updates for TIKA-1469 (thanks!). And, I'll volunteer to be the Release Manager for 1.7! :) I'll start the process this weekend or a couple days into the new year. Cheers, Tyler On Dec 18, 2014 9:45 PM, "Mattmann, Chris A (3980)" < chris.a.mattm...@jpl.nasa.gov> wrote: > +1 > > ++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++ > > > > > > > -Original Message- > From: Tyler Palsulich > Reply-To: "dev@tika.apache.org" > Date: Thursday, December 18, 2014 at 9:15 PM > To: "dev@tika.apache.org" > Subject: Re: 1.7 release? > > >I'm OK with trying the fix in 1.8 (or 1.7 if people feel strongly). As > >Nick > >just recommended, I'll try adding metadata extraction to Tesseract soon, > >then adding the extensible solution in 1.8. > > > >Tyler > > > >On Thu, Dec 18, 2014 at 11:58 PM, Mattmann, Chris A (3980) < > >chris.a.mattm...@jpl.nasa.gov> wrote: > >> > >> I haven’t tried my hand at it - been super busy. tyler if you have a > >> chance go for it, I think that’s the remaining blocker. > >> > >> ++ > >> Chris Mattmann, Ph.D. > >> Chief Architect > >> Instrument Software and Science Data Systems Section (398) > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 168-519, Mailstop: 168-527 > >> Email: chris.a.mattm...@nasa.gov > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++ > >> Adjunct Associate Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> ++ > >> > >> > >> > >> > >> > >> > >> -Original Message- > >> From: Tyler Palsulich > >> Reply-To: "dev@tika.apache.org" > >> Date: Thursday, December 18, 2014 at 12:54 PM > >> To: "dev@tika.apache.org" > >> Subject: Re: 1.7 release? > >> > >> >Hi All, > >> > > >> >It's been a few months, so I just want to follow up on this thread. > >>We've > >> >resolved/closed 51 issues for v1.7 [0]. There are two on JIRA marked as > >> >1.7 > >> >(TIKA-1465 and TIKA-894). Do we still want to aim for 1.7 with > >>TIKA-1445? > >> >Has anyone tried their hand at the suggested (significant) fix? > >> > > >> >Are there any other issues someone would like to fit in? > >> > > >> >Cheers, > >> >Tyler > >> > > >> >[0] - > >> > > >> > >> > https://issues.apache.org/jira/browse/TIKA/fixforversion/12327096/?select > >>e > >> >dTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel > >> > > >> >On Tue, Oct 28, 2014 at 1:46 AM, Mattmann, Chris A (3980) < > >> >chris.a.mattm...@jpl.nasa.gov> wrote: > >> >> > >> >> Thanks Tim saw your patch and am looking now. > >> >> > >> >> ++ > >> >> Chris Mattmann, Ph.D. > >> >> Chief Architect > >> >> Instrument Software and Science Data Systems Section (398) > >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >> Office: 168-519, Mailstop: 168-527 > >> >> Email: chris.a.mattm...@nasa.gov > >> >> WWW: http://sunset.usc.edu/~mattmann/ > >> >> ++ > >> >> Adjunct Associate Professor, Computer Science Department > >> >> University of Southern California, Los Angeles, CA 90089 USA > >> >> ++ > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> -Original Message- > >> >> From: , "Timothy B." > >> >> Reply-To: "dev@tika.apache.org" > >> >> Date: Monday, October 27, 2014 at 12:30 PM > >> >> To: "dev@tika.apache.org" > >> >> Subject: RE: 1.7 release? > >> >> > >> >> >Sounds good. As long as the default behavior remains the same, I'm > >> >> >happy. I'm going to play with a combination of your patch and > >>Tyler's > >> >> >and see what the ramifications are for embedded docs. > >> >> > > >> >> >To confirm, the OCR integration is fantastic. Thank you and Tyler! > >> >> > > >> >> > > >> >> >Best, > >> >> > > >> >> > Tim > >> >> > > >> >> >-Original Message- > >> >> >From: Mattmann, Chris A (3980) > >>[mailto:chris.a.mattm...@jpl.nasa.gov] > >> >> >Sent: Friday, October 24, 2014 5:36 PM > >> >> >To: dev@tika.apache.org > >> >> >Subject: Re: 1.7 release? > >> >> > > >> >> >Hey Tim, > >> >> > > >> >> >What do you think about my existi