[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256710#comment-14256710
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #385 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/385/])
Fix test for TIKA-1502 - re-order the MediaTypeRegistry logic for getting the 
super type, so that if an explicit inheritance has been defined between one 
parametered type and another, that inheritance is used in preference to "drop 
all parameters" (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647489)
* 
/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MediaTypeRegistry.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java


> Mime magic for database file formats
> 
>
> Key: TIKA-1502
> URL: https://issues.apache.org/jira/browse/TIKA-1502
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.6
>Reporter: Nick Burch
> Fix For: 1.7
>
>
> I noticed today that Tika can't detect a lot of common database formats, such 
> as sqlite or Berkeley DB or MISAM
> The unix file utility got most of those, which makes me think that there's a 
> sensible-ish header on most we can write some mime magic for
> It'd therefore be good to add mime entries, with magic where possible, for 
> many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256675#comment-14256675
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #369 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/369/])
Fix test for TIKA-1502 - re-order the MediaTypeRegistry logic for getting the 
super type, so that if an explicit inheritance has been defined between one 
parametered type and another, that inheritance is used in preference to "drop 
all parameters" (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647489)
* 
/tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MediaTypeRegistry.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
Split the Berkeley DB mimetypes into three levels, and add a detection test 
(passes) and a heirarchy test (disabled as fails) TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647486)
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
Start on magic for subtypes of Berkeley DB TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647485)
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml


> Mime magic for database file formats
> 
>
> Key: TIKA-1502
> URL: https://issues.apache.org/jira/browse/TIKA-1502
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.6
>Reporter: Nick Burch
> Fix For: 1.7
>
>
> I noticed today that Tika can't detect a lot of common database formats, such 
> as sqlite or Berkeley DB or MISAM
> The unix file utility got most of those, which makes me think that there's a 
> sensible-ish header on most we can write some mime magic for
> It'd therefore be good to add mime entries, with magic where possible, for 
> many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Nick Burch (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch resolved TIKA-1502.
--
   Resolution: Fixed
Fix Version/s: 1.7

In r1647489 I've re-ordered the MediaTypeRegistry logic for getting the super 
type, so that if an explicit inheritance has been defined between one 
parametered type and another, that inheritance is used in preference to "drop 
all parameters"

That means that the supertype fetching for something defined in the mimetypes 
file can go like:

application/x-berkeley-db;format=hash;version=2
to
application/x-berkeley-db;format=hash
to
application/x-berkeley-db

However, for parameters unknown to the mime types file, the behaviour remains 
things like

text/plain; charset=UTF-8
to
text/plain

> Mime magic for database file formats
> 
>
> Key: TIKA-1502
> URL: https://issues.apache.org/jira/browse/TIKA-1502
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.6
>Reporter: Nick Burch
> Fix For: 1.7
>
>
> I noticed today that Tika can't detect a lot of common database formats, such 
> as sqlite or Berkeley DB or MISAM
> The unix file utility got most of those, which makes me think that there's a 
> sensible-ish header on most we can write some mime magic for
> It'd therefore be good to add mime entries, with magic where possible, for 
> many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256663#comment-14256663
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #384 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/384/])
Split the Berkeley DB mimetypes into three levels, and add a detection test 
(passes) and a heirarchy test (disabled as fails) TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647486)
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
Start on magic for subtypes of Berkeley DB TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647485)
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
More test database files for TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647484)
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_5.db


> Mime magic for database file formats
> 
>
> Key: TIKA-1502
> URL: https://issues.apache.org/jira/browse/TIKA-1502
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.6
>Reporter: Nick Burch
>
> I noticed today that Tika can't detect a lot of common database formats, such 
> as sqlite or Berkeley DB or MISAM
> The unix file utility got most of those, which makes me think that there's a 
> sensible-ish header on most we can write some mime magic for
> It'd therefore be good to add mime entries, with magic where possible, for 
> many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256655#comment-14256655
 ] 

Nick Burch commented on TIKA-1502:
--

As of r1647486, we now have mime types for SQLite3, MySQL (most) and Berkeley 
DB. We have magic for SQLite3, most of the MySQL formats (some are headerless), 
and expanded BDB ones.

One remaining issue is getting MimeTypesReaderTest.testReadParameterHeirarchy() 
to pass - for some reason the 3 level hierarchy of the BDB mime types is 
getting flattened to just two

> Mime magic for database file formats
> 
>
> Key: TIKA-1502
> URL: https://issues.apache.org/jira/browse/TIKA-1502
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.6
>Reporter: Nick Burch
>
> I noticed today that Tika can't detect a lot of common database formats, such 
> as sqlite or Berkeley DB or MISAM
> The unix file utility got most of those, which makes me think that there's a 
> sensible-ish header on most we can write some mime magic for
> It'd therefore be good to add mime entries, with magic where possible, for 
> many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256640#comment-14256640
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #368 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/368/])
More test database files for TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647484)
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_btree_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_hash_5.db


> Mime magic for database file formats
> 
>
> Key: TIKA-1502
> URL: https://issues.apache.org/jira/browse/TIKA-1502
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.6
>Reporter: Nick Burch
>
> I noticed today that Tika can't detect a lot of common database formats, such 
> as sqlite or Berkeley DB or MISAM
> The unix file utility got most of those, which makes me think that there's a 
> sensible-ish header on most we can write some mime magic for
> It'd therefore be good to add mime entries, with magic where possible, for 
> many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256571#comment-14256571
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #367 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/367/])
TIKA-1502 MySQL and SQLite3 mime types, with magic where possible (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647478)
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml


> Mime magic for database file formats
> 
>
> Key: TIKA-1502
> URL: https://issues.apache.org/jira/browse/TIKA-1502
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.6
>Reporter: Nick Burch
>
> I noticed today that Tika can't detect a lot of common database formats, such 
> as sqlite or Berkeley DB or MISAM
> The unix file utility got most of those, which makes me think that there's a 
> sensible-ish header on most we can write some mime magic for
> It'd therefore be good to add mime entries, with magic where possible, for 
> many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256561#comment-14256561
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #383 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/383/])
TIKA-1502 MySQL and SQLite3 mime types, with magic where possible (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647478)
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Some test database files for TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647473)
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.frm
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testSQLITE3.db


> Mime magic for database file formats
> 
>
> Key: TIKA-1502
> URL: https://issues.apache.org/jira/browse/TIKA-1502
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.6
>Reporter: Nick Burch
>
> I noticed today that Tika can't detect a lot of common database formats, such 
> as sqlite or Berkeley DB or MISAM
> The unix file utility got most of those, which makes me think that there's a 
> sensible-ish header on most we can write some mime magic for
> It'd therefore be good to add mime entries, with magic where possible, for 
> many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256527#comment-14256527
 ] 

Hudson commented on TIKA-1502:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #366 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/366/])
Some test database files for TIKA-1502 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1647473)
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_2.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_3.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_4.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testBDB_5.db
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testMYSQL.frm
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testSQLITE3.db


> Mime magic for database file formats
> 
>
> Key: TIKA-1502
> URL: https://issues.apache.org/jira/browse/TIKA-1502
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.6
>Reporter: Nick Burch
>
> I noticed today that Tika can't detect a lot of common database formats, such 
> as sqlite or Berkeley DB or MISAM
> The unix file utility got most of those, which makes me think that there's a 
> sensible-ish header on most we can write some mime magic for
> It'd therefore be good to add mime entries, with magic where possible, for 
> many of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Nick Burch (JIRA)
Nick Burch created TIKA-1502:


 Summary: Mime magic for database file formats
 Key: TIKA-1502
 URL: https://issues.apache.org/jira/browse/TIKA-1502
 Project: Tika
  Issue Type: Improvement
  Components: mime
Affects Versions: 1.6
Reporter: Nick Burch


I noticed today that Tika can't detect a lot of common database formats, such 
as sqlite or Berkeley DB or MISAM

The unix file utility got most of those, which makes me think that there's a 
sensible-ish header on most we can write some mime magic for

It'd therefore be good to add mime entries, with magic where possible, for many 
of these common database file formats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1483) Create a general raw string parser

2014-12-22 Thread Luis Filipe Nassif (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256377#comment-14256377
 ] 

Luis Filipe Nassif commented on TIKA-1483:
--

[~talli...@apache.org],
Do you mean add language models to do automatic language/charset detection? My 
original purpose was to extract strings from binary and non-text files, so I 
think it would be difficult to detect the language and charset used in that 
files. My idea was to let the user configure the language(s) and charsets of 
interest and the parser would do a best effort to decode them. I think 
TextParser already do an automatic charset detection (do not know about 
language).

> Create a general raw string parser
> --
>
> Key: TIKA-1483
> URL: https://issues.apache.org/jira/browse/TIKA-1483
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.6
>Reporter: Luis Filipe Nassif
>
> I think it can be very useful adding a general parser able to extract raw 
> strings from files (like the strings command), which can be used as the 
> fallback parser for all mimetypes not having a specific parser 
> implementation, like application/octet-stream. It can also be used as a 
> fallback for corrupt files throwing a TikaException.
> It must be configured with the script/language to be extracted from the files 
> (currently I implemented one specific for Latin1).
> It can use heuristics to extract strings encoded with different charsets 
> within the same file, mainly the common ISO-8859-1, UTF8 and UTF16.
> What the community thinks about that?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1483) Create a general raw string parser

2014-12-22 Thread Luis Filipe Nassif (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256304#comment-14256304
 ] 

Luis Filipe Nassif commented on TIKA-1483:
--

Do you think it would be useful adding a first implementation specific and 
optimized for extracting Latin1 scripts (Western European languages) coded with 
ISO8859-1, UTF8 and UTF16? If yes, I will try to submit a patch.

> Create a general raw string parser
> --
>
> Key: TIKA-1483
> URL: https://issues.apache.org/jira/browse/TIKA-1483
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.6
>Reporter: Luis Filipe Nassif
>
> I think it can be very useful adding a general parser able to extract raw 
> strings from files (like the strings command), which can be used as the 
> fallback parser for all mimetypes not having a specific parser 
> implementation, like application/octet-stream. It can also be used as a 
> fallback for corrupt files throwing a TikaException.
> It must be configured with the script/language to be extracted from the files 
> (currently I implemented one specific for Latin1).
> It can use heuristics to extract strings encoded with different charsets 
> within the same file, mainly the common ISO-8859-1, UTF8 and UTF16.
> What the community thinks about that?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 1.7 release?

2014-12-22 Thread Thomas Ledoux
+1 for going.
Many thanks to Tyler and to Nick to take the POI upgrade.

So many christmas gifts in advance or just after :-)

Merry christmas to all

2014-12-22 19:59 GMT+01:00 Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov>:

> WOOO HOO! Go Tyler go! :0) Merry Christmas bud.
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
>
> -Original Message-
> From: Tyler Palsulich 
> Reply-To: "dev@tika.apache.org" 
> Date: Monday, December 22, 2014 at 10:57 AM
> To: "dev@tika.apache.org" 
> Subject: Re: 1.7 release?
>
> >Hi All,
> >
> >Nick added the temporary fix for TIKA-1445 and made the POI updates for
> >TIKA-1469 (thanks!). And, I'll volunteer to be the Release Manager for
> >1.7!
> >:)
> >
> >I'll start the process this weekend or a couple days into the new year.
> >
> >Cheers,
> >Tyler
> >On Dec 18, 2014 9:45 PM, "Mattmann, Chris A (3980)" <
> >chris.a.mattm...@jpl.nasa.gov> wrote:
> >
> >> +1
> >>
> >> ++
> >> Chris Mattmann, Ph.D.
> >> Chief Architect
> >> Instrument Software and Science Data Systems Section (398)
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 168-519, Mailstop: 168-527
> >> Email: chris.a.mattm...@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++
> >> Adjunct Associate Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++
> >>
> >>
> >>
> >>
> >>
> >>
> >> -Original Message-
> >> From: Tyler Palsulich 
> >> Reply-To: "dev@tika.apache.org" 
> >> Date: Thursday, December 18, 2014 at 9:15 PM
> >> To: "dev@tika.apache.org" 
> >> Subject: Re: 1.7 release?
> >>
> >> >I'm OK with trying the fix in 1.8 (or 1.7 if people feel strongly). As
> >> >Nick
> >> >just recommended, I'll try adding metadata extraction to Tesseract
> >>soon,
> >> >then adding the extensible solution in 1.8.
> >> >
> >> >Tyler
> >> >
> >> >On Thu, Dec 18, 2014 at 11:58 PM, Mattmann, Chris A (3980) <
> >> >chris.a.mattm...@jpl.nasa.gov> wrote:
> >> >>
> >> >> I haven’t tried my hand at it - been super busy. tyler if you have a
> >> >> chance go for it, I think that’s the remaining blocker.
> >> >>
> >> >> ++
> >> >> Chris Mattmann, Ph.D.
> >> >> Chief Architect
> >> >> Instrument Software and Science Data Systems Section (398)
> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >> Office: 168-519, Mailstop: 168-527
> >> >> Email: chris.a.mattm...@nasa.gov
> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> >> ++
> >> >> Adjunct Associate Professor, Computer Science Department
> >> >> University of Southern California, Los Angeles, CA 90089 USA
> >> >> ++
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> -Original Message-
> >> >> From: Tyler Palsulich 
> >> >> Reply-To: "dev@tika.apache.org" 
> >> >> Date: Thursday, December 18, 2014 at 12:54 PM
> >> >> To: "dev@tika.apache.org" 
> >> >> Subject: Re: 1.7 release?
> >> >>
> >> >> >Hi All,
> >> >> >
> >> >> >It's been a few months, so I just want to follow up on this thread.
> >> >>We've
> >> >> >resolved/closed 51 issues for v1.7 [0]. There are two on JIRA
> >>marked as
> >> >> >1.7
> >> >> >(TIKA-1465 and TIKA-894). Do we still want to aim for 1.7 with
> >> >>TIKA-1445?
> >> >> >Has anyone tried their hand at the suggested (significant) fix?
> >> >> >
> >> >> >Are there any other issues someone would like to fit in?
> >> >> >
> >> >> >Cheers,
> >> >> >Tyler
> >> >> >
> >> >> >[0] -
> >> >> >
> >> >>
> >> >>
> >>
> >>
> https://issues.apache.org/jira/browse/TIKA/fixforversion/12327096/?select
> >> >>e
> >> >> >dTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel
> >> >> >
> >> >> >On Tue, Oct 28, 2014 at 1:46 AM, Mattmann, Chris A (3980) <
> >> >> >chris.a.mattm...@jpl.nasa.gov> wrote:
> >> >> >>
> >> >> >> Thanks Tim saw your patch and am looking now.
> >> >> >>
> >> >> >> ++
> >> >> >> Chris Mattmann, Ph.D.
> >> >> >> Chief Architect
> >> >> >> Instrument Software and Science Data Systems Section (398)
> >> >> >> NASA Jet

Re: 1.7 release?

2014-12-22 Thread Mattmann, Chris A (3980)
WOOO HOO! Go Tyler go! :0) Merry Christmas bud.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Tyler Palsulich 
Reply-To: "dev@tika.apache.org" 
Date: Monday, December 22, 2014 at 10:57 AM
To: "dev@tika.apache.org" 
Subject: Re: 1.7 release?

>Hi All,
>
>Nick added the temporary fix for TIKA-1445 and made the POI updates for
>TIKA-1469 (thanks!). And, I'll volunteer to be the Release Manager for
>1.7!
>:)
>
>I'll start the process this weekend or a couple days into the new year.
>
>Cheers,
>Tyler
>On Dec 18, 2014 9:45 PM, "Mattmann, Chris A (3980)" <
>chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> +1
>>
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>>
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Tyler Palsulich 
>> Reply-To: "dev@tika.apache.org" 
>> Date: Thursday, December 18, 2014 at 9:15 PM
>> To: "dev@tika.apache.org" 
>> Subject: Re: 1.7 release?
>>
>> >I'm OK with trying the fix in 1.8 (or 1.7 if people feel strongly). As
>> >Nick
>> >just recommended, I'll try adding metadata extraction to Tesseract
>>soon,
>> >then adding the extensible solution in 1.8.
>> >
>> >Tyler
>> >
>> >On Thu, Dec 18, 2014 at 11:58 PM, Mattmann, Chris A (3980) <
>> >chris.a.mattm...@jpl.nasa.gov> wrote:
>> >>
>> >> I haven’t tried my hand at it - been super busy. tyler if you have a
>> >> chance go for it, I think that’s the remaining blocker.
>> >>
>> >> ++
>> >> Chris Mattmann, Ph.D.
>> >> Chief Architect
>> >> Instrument Software and Science Data Systems Section (398)
>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> Office: 168-519, Mailstop: 168-527
>> >> Email: chris.a.mattm...@nasa.gov
>> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> ++
>> >> Adjunct Associate Professor, Computer Science Department
>> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> ++
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -Original Message-
>> >> From: Tyler Palsulich 
>> >> Reply-To: "dev@tika.apache.org" 
>> >> Date: Thursday, December 18, 2014 at 12:54 PM
>> >> To: "dev@tika.apache.org" 
>> >> Subject: Re: 1.7 release?
>> >>
>> >> >Hi All,
>> >> >
>> >> >It's been a few months, so I just want to follow up on this thread.
>> >>We've
>> >> >resolved/closed 51 issues for v1.7 [0]. There are two on JIRA
>>marked as
>> >> >1.7
>> >> >(TIKA-1465 and TIKA-894). Do we still want to aim for 1.7 with
>> >>TIKA-1445?
>> >> >Has anyone tried their hand at the suggested (significant) fix?
>> >> >
>> >> >Are there any other issues someone would like to fit in?
>> >> >
>> >> >Cheers,
>> >> >Tyler
>> >> >
>> >> >[0] -
>> >> >
>> >>
>> >>
>> 
>>https://issues.apache.org/jira/browse/TIKA/fixforversion/12327096/?select
>> >>e
>> >> >dTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel
>> >> >
>> >> >On Tue, Oct 28, 2014 at 1:46 AM, Mattmann, Chris A (3980) <
>> >> >chris.a.mattm...@jpl.nasa.gov> wrote:
>> >> >>
>> >> >> Thanks Tim saw your patch and am looking now.
>> >> >>
>> >> >> ++
>> >> >> Chris Mattmann, Ph.D.
>> >> >> Chief Architect
>> >> >> Instrument Software and Science Data Systems Section (398)
>> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> >> Office: 168-519, Mailstop: 168-527
>> >> >> Email: chris.a.mattm...@nasa.gov
>> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> >> ++
>> >> >> Adjunct Associate Professor, Computer Science Department
>> >> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> >> ++
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> 

Re: 1.7 release?

2014-12-22 Thread Tyler Palsulich
Hi All,

Nick added the temporary fix for TIKA-1445 and made the POI updates for
TIKA-1469 (thanks!). And, I'll volunteer to be the Release Manager for 1.7!
:)

I'll start the process this weekend or a couple days into the new year.

Cheers,
Tyler
On Dec 18, 2014 9:45 PM, "Mattmann, Chris A (3980)" <
chris.a.mattm...@jpl.nasa.gov> wrote:

> +1
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
>
> -Original Message-
> From: Tyler Palsulich 
> Reply-To: "dev@tika.apache.org" 
> Date: Thursday, December 18, 2014 at 9:15 PM
> To: "dev@tika.apache.org" 
> Subject: Re: 1.7 release?
>
> >I'm OK with trying the fix in 1.8 (or 1.7 if people feel strongly). As
> >Nick
> >just recommended, I'll try adding metadata extraction to Tesseract soon,
> >then adding the extensible solution in 1.8.
> >
> >Tyler
> >
> >On Thu, Dec 18, 2014 at 11:58 PM, Mattmann, Chris A (3980) <
> >chris.a.mattm...@jpl.nasa.gov> wrote:
> >>
> >> I haven’t tried my hand at it - been super busy. tyler if you have a
> >> chance go for it, I think that’s the remaining blocker.
> >>
> >> ++
> >> Chris Mattmann, Ph.D.
> >> Chief Architect
> >> Instrument Software and Science Data Systems Section (398)
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 168-519, Mailstop: 168-527
> >> Email: chris.a.mattm...@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++
> >> Adjunct Associate Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++
> >>
> >>
> >>
> >>
> >>
> >>
> >> -Original Message-
> >> From: Tyler Palsulich 
> >> Reply-To: "dev@tika.apache.org" 
> >> Date: Thursday, December 18, 2014 at 12:54 PM
> >> To: "dev@tika.apache.org" 
> >> Subject: Re: 1.7 release?
> >>
> >> >Hi All,
> >> >
> >> >It's been a few months, so I just want to follow up on this thread.
> >>We've
> >> >resolved/closed 51 issues for v1.7 [0]. There are two on JIRA marked as
> >> >1.7
> >> >(TIKA-1465 and TIKA-894). Do we still want to aim for 1.7 with
> >>TIKA-1445?
> >> >Has anyone tried their hand at the suggested (significant) fix?
> >> >
> >> >Are there any other issues someone would like to fit in?
> >> >
> >> >Cheers,
> >> >Tyler
> >> >
> >> >[0] -
> >> >
> >>
> >>
> https://issues.apache.org/jira/browse/TIKA/fixforversion/12327096/?select
> >>e
> >> >dTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel
> >> >
> >> >On Tue, Oct 28, 2014 at 1:46 AM, Mattmann, Chris A (3980) <
> >> >chris.a.mattm...@jpl.nasa.gov> wrote:
> >> >>
> >> >> Thanks Tim saw your patch and am looking now.
> >> >>
> >> >> ++
> >> >> Chris Mattmann, Ph.D.
> >> >> Chief Architect
> >> >> Instrument Software and Science Data Systems Section (398)
> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >> Office: 168-519, Mailstop: 168-527
> >> >> Email: chris.a.mattm...@nasa.gov
> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> >> ++
> >> >> Adjunct Associate Professor, Computer Science Department
> >> >> University of Southern California, Los Angeles, CA 90089 USA
> >> >> ++
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> -Original Message-
> >> >> From: , "Timothy B." 
> >> >> Reply-To: "dev@tika.apache.org" 
> >> >> Date: Monday, October 27, 2014 at 12:30 PM
> >> >> To: "dev@tika.apache.org" 
> >> >> Subject: RE: 1.7 release?
> >> >>
> >> >> >Sounds good.  As long as the default behavior remains the same, I'm
> >> >> >happy.  I'm going to play with a combination of your patch and
> >>Tyler's
> >> >> >and see what the ramifications are for embedded docs.
> >> >> >
> >> >> >To confirm, the OCR integration is fantastic.  Thank you and Tyler!
> >> >> >
> >> >> >
> >> >> >Best,
> >> >> >
> >> >> >   Tim
> >> >> >
> >> >> >-Original Message-
> >> >> >From: Mattmann, Chris A (3980)
> >>[mailto:chris.a.mattm...@jpl.nasa.gov]
> >> >> >Sent: Friday, October 24, 2014 5:36 PM
> >> >> >To: dev@tika.apache.org
> >> >> >Subject: Re: 1.7 release?
> >> >> >
> >> >> >Hey Tim,
> >> >> >
> >> >> >What do you think about my existi