RE: Tika 1.15.1? -> 1.16

2017-06-30 Thread Allison, Timothy B.
Y, I was thinking that I may have already pushed us over this threshold with 
the * below.  1.16 it is then?

Chris, let us know when the age detection is good to go or if 1.17 is a better 
target.


  * Allow extraction of scripts as embedded "MACRO". Users
must turn this on via TikaConfig (TIKA-2391).

  * Allow users to turn off extraction of headers and footers
from .doc, .docx, .xls, .xlsx, .xlsb (TIKA-2362)

  * Extract text from charts in .docx, .pptx, .xlsx and .xlsb
(TIKA-2254).

  * Extract text from diagrams in .docx, .pptx, .xlsx and .xlsb
(TIKA-1945).

  * Enable base32 encoding of digests and enable BouncyCastle implementations
of digest algorithms (TIKA-2386).

-Original Message-
From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com] 
Sent: Thursday, June 29, 2017 4:12 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15.1?

Agreed.

Luis


2017-06-29 15:45 GMT-03:00 Bob Paulin :

> If we're adding features does it make sense just to bump to 1.16 
> rather than 1.15.1?  Traditionally point releases would be bug fixes only [1].
>
>
> - Bob
>
> [1] http://semver.org/
> On 6/29/2017 1:18 PM, Allison, Timothy B. wrote:
> > K.
> >
> > -Original Message-
> > From: Mattmann, Chris A (3010) 
> > [mailto:chris.a.mattm...@jpl.nasa.gov]
> > Sent: Thursday, June 29, 2017 1:59 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15.1?
> >
> > Hey Tim, I’d like to try and get in:
> >
> > https://issues.apache.org/jira/browse/TIKA-1988
> >
> > today for 15.1. I am working on integrating it now and adding some 
> > docs
> to the wiki.
> >
> > I’ll keep you posted.
> >
> > Cheers,
> > Chris
> >
> >
> > 
> ++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development 
> Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > 
> ++
> > Director, Information Retrieval and Data Science Group (IRDS) 
> > Adjunct
> Associate Professor, Computer Science Department University of 
> Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > 
> ++
> >
> >
> > On 6/28/17, 12:24 PM, "Allison, Timothy B."  wrote:
> >
> > POI is available on maven, and I just upgraded.
> >
> > Unless there are objections, I'll change our
> >
> > org.apache.tika.parser.sentiment.analysis.SentimentParser
> >
> > to
> >
> > 
> > org.apache.tika.parser.sentiment.analysis.SentimentAnalysisParser
> >
> > and we should be good to go for 1.15.1?
> >
> > Let me know if you'd like to hold off for a bit, but there's 
> > always
> 1.15.2.   :)
> >
> > Cheers,
> >
> >   Tim
> >
> > -Original Message-
> > From: Mattmann, Chris A (3010) 
> > [mailto:chris.a.mattm...@jpl.nasa.gov
> ]
> > Sent: Friday, June 23, 2017 3:39 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15.1?
> >
> > Let me get back to you I’d like to see if we can get some 
> > progress
> on the Age Detector Parser
> >
> > 
> ++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office 
> > (3010)
> Manager, NSF & Open Source Projects Formulation and Development 
> Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > 
> ++
> > Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department University of 
> Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > 
> ++
> >
> >
> > On 6/23/17, 10:01 AM, "Allison, Timothy B." 
> wrote:
> >
> > All,
> >   With the exception of the SentimentParser (which we have a
> path forward on), I think we're good to go.  It looks like POI is 
> about to kick off the release process for 3.17-beta1, and the batch 
> results look good.  I propose waiting a week or so to incorporate that.
> >   Anything else we need to get in for 1.15.1?
> >
> >  Cheers,
> >
> >   Tim
> >
> > -Original Message-
> > From: Chris Mattmann [mailto:mattm...@apache.org]
> > Sent: Friday, June 16, 2017 2:43 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15.1?
> >
> 

RE: Tika 1.15.1? -> 1.16

2017-07-03 Thread Allison, Timothy B.
All,
  I think we're now solidly at 1.16.  Anyone still strongly in favor of 1.15.1? 
 

Chris,
  Will age detection be ready soon, or should we push that to 1.17?

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Friday, June 30, 2017 7:01 AM
To: dev@tika.apache.org; lfcnas...@gmail.com
Subject: RE: Tika 1.15.1? -> 1.16

Y, I was thinking that I may have already pushed us over this threshold with 
the * below.  1.16 it is then?

Chris, let us know when the age detection is good to go or if 1.17 is a better 
target.


  * Allow extraction of scripts as embedded "MACRO". Users
must turn this on via TikaConfig (TIKA-2391).

  * Allow users to turn off extraction of headers and footers
from .doc, .docx, .xls, .xlsx, .xlsb (TIKA-2362)

  * Extract text from charts in .docx, .pptx, .xlsx and .xlsb
(TIKA-2254).

  * Extract text from diagrams in .docx, .pptx, .xlsx and .xlsb
(TIKA-1945).

  * Enable base32 encoding of digests and enable BouncyCastle implementations
of digest algorithms (TIKA-2386).

-Original Message-
From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
Sent: Thursday, June 29, 2017 4:12 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15.1?

Agreed.

Luis


2017-06-29 15:45 GMT-03:00 Bob Paulin :

> If we're adding features does it make sense just to bump to 1.16 
> rather than 1.15.1?  Traditionally point releases would be bug fixes only [1].
>
>
> - Bob
>
> [1] http://semver.org/
> On 6/29/2017 1:18 PM, Allison, Timothy B. wrote:
> > K.
> >
> > -Original Message-
> > From: Mattmann, Chris A (3010)
> > [mailto:chris.a.mattm...@jpl.nasa.gov]
> > Sent: Thursday, June 29, 2017 1:59 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15.1?
> >
> > Hey Tim, I’d like to try and get in:
> >
> > https://issues.apache.org/jira/browse/TIKA-1988
> >
> > today for 15.1. I am working on integrating it now and adding some 
> > docs
> to the wiki.
> >
> > I’ll keep you posted.
> >
> > Cheers,
> > Chris
> >
> >
> > 
> ++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development 
> Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > 
> ++
> > Director, Information Retrieval and Data Science Group (IRDS) 
> > Adjunct
> Associate Professor, Computer Science Department University of 
> Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > 
> ++
> >
> >
> > On 6/28/17, 12:24 PM, "Allison, Timothy B."  wrote:
> >
> > POI is available on maven, and I just upgraded.
> >
> > Unless there are objections, I'll change our
> >
> > org.apache.tika.parser.sentiment.analysis.SentimentParser
> >
> > to
> >
> > 
> > org.apache.tika.parser.sentiment.analysis.SentimentAnalysisParser
> >
> > and we should be good to go for 1.15.1?
> >
> > Let me know if you'd like to hold off for a bit, but there's 
> > always
> 1.15.2.   :)
> >
> > Cheers,
> >
> >   Tim
> >
> > -Original Message-
> > From: Mattmann, Chris A (3010)
> > [mailto:chris.a.mattm...@jpl.nasa.gov
> ]
> > Sent: Friday, June 23, 2017 3:39 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15.1?
> >
> > Let me get back to you I’d like to see if we can get some 
> > progress
> on the Age Detector Parser
> >
> > 
> ++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office
> > (3010)
> Manager, NSF & Open Source Projects Formulation and Development 
> Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > 
> ++
> > Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor

Re: Tika 1.15.1? -> 1.16

2017-07-03 Thread Tyler Bui-Palsulich
+1 for 1.16.

Tyler

On Mon, Jul 3, 2017 at 7:17 AM, Allison, Timothy B. 
wrote:

> All,
>   I think we're now solidly at 1.16.  Anyone still strongly in favor of
> 1.15.1?
>
> Chris,
>   Will age detection be ready soon, or should we push that to 1.17?
>
> -Original Message-
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Friday, June 30, 2017 7:01 AM
> To: dev@tika.apache.org; lfcnas...@gmail.com
> Subject: RE: Tika 1.15.1? -> 1.16
>
> Y, I was thinking that I may have already pushed us over this threshold
> with the * below.  1.16 it is then?
>
> Chris, let us know when the age detection is good to go or if 1.17 is a
> better target.
>
>
>   * Allow extraction of scripts as embedded "MACRO". Users
> must turn this on via TikaConfig (TIKA-2391).
>
>   * Allow users to turn off extraction of headers and footers
> from .doc, .docx, .xls, .xlsx, .xlsb (TIKA-2362)
>
>   * Extract text from charts in .docx, .pptx, .xlsx and .xlsb
> (TIKA-2254).
>
>   * Extract text from diagrams in .docx, .pptx, .xlsx and .xlsb
> (TIKA-1945).
>
>   * Enable base32 encoding of digests and enable BouncyCastle
> implementations
> of digest algorithms (TIKA-2386).
>
> -Original Message-
> From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
> Sent: Thursday, June 29, 2017 4:12 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.15.1?
>
> Agreed.
>
> Luis
>
>
> 2017-06-29 15:45 GMT-03:00 Bob Paulin :
>
> > If we're adding features does it make sense just to bump to 1.16
> > rather than 1.15.1?  Traditionally point releases would be bug fixes
> only [1].
> >
> >
> > - Bob
> >
> > [1] http://semver.org/
> > On 6/29/2017 1:18 PM, Allison, Timothy B. wrote:
> > > K.
> > >
> > > -Original Message-
> > > From: Mattmann, Chris A (3010)
> > > [mailto:chris.a.mattm...@jpl.nasa.gov]
> > > Sent: Thursday, June 29, 2017 1:59 PM
> > > To: dev@tika.apache.org
> > > Subject: Re: Tika 1.15.1?
> > >
> > > Hey Tim, I’d like to try and get in:
> > >
> > > https://issues.apache.org/jira/browse/TIKA-1988
> > >
> > > today for 15.1. I am working on integrating it now and adding some
> > > docs
> > to the wiki.
> > >
> > > I’ll keep you posted.
> > >
> > > Cheers,
> > > Chris
> > >
> > >
> > > 
> > ++
> > > Chris Mattmann, Ph.D.
> > > Principal Data Scientist, Engineering Administrative Office (3010)
> > Manager, NSF & Open Source Projects Formulation and Development
> > Offices
> > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > Office: 180-503E, Mailstop: 180-503
> > > Email: chris.a.mattm...@nasa.gov
> > > WWW:  http://sunset.usc.edu/~mattmann/
> > > 
> > ++
> > > Director, Information Retrieval and Data Science Group (IRDS)
> > > Adjunct
> > Associate Professor, Computer Science Department University of
> > Southern California, Los Angeles, CA 90089 USA
> > > WWW: http://irds.usc.edu/
> > > 
> > ++
> > >
> > >
> > > On 6/28/17, 12:24 PM, "Allison, Timothy B." 
> wrote:
> > >
> > > POI is available on maven, and I just upgraded.
> > >
> > > Unless there are objections, I'll change our
> > >
> > > org.apache.tika.parser.sentiment.analysis.SentimentParser
> > >
> > > to
> > >
> > >
> > > org.apache.tika.parser.sentiment.analysis.SentimentAnalysisParser
> > >
> > > and we should be good to go for 1.15.1?
> > >
> > > Let me know if you'd like to hold off for a bit, but there's
> > > always
> > 1.15.2.   :)
> > >
> > > Cheers,
> > >
> > >   Tim
> > >
> > > -Original Message-
> > > From: Mattmann, Chris A (3010)
> > > [mailto:chris.a.mattm...@jpl.nasa.gov
> > ]
> > > Sent: Friday, June 23, 2017 3:39 PM
> > > To: dev@tika.apache.org
> > > Subject: Re: Tika 1.15.1?
> > >
> > > Let me get back 

Re: Tika 1.15.1? -> 1.16

2017-07-03 Thread Mattmann, Chris A (3010)
Hey Tim, if I don’t get it done by today, push 1.16 and we’ll put Age Detection 
in 1.17.

++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++
 

On 7/3/17, 7:17 AM, "Allison, Timothy B."  wrote:

All,
  I think we're now solidly at 1.16.  Anyone still strongly in favor of 
1.15.1?  

Chris,
  Will age detection be ready soon, or should we push that to 1.17?

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Friday, June 30, 2017 7:01 AM
To: dev@tika.apache.org; lfcnas...@gmail.com
    Subject: RE: Tika 1.15.1? -> 1.16

Y, I was thinking that I may have already pushed us over this threshold 
with the * below.  1.16 it is then?

Chris, let us know when the age detection is good to go or if 1.17 is a 
better target.


  * Allow extraction of scripts as embedded "MACRO". Users
must turn this on via TikaConfig (TIKA-2391).

  * Allow users to turn off extraction of headers and footers
from .doc, .docx, .xls, .xlsx, .xlsb (TIKA-2362)

  * Extract text from charts in .docx, .pptx, .xlsx and .xlsb
(TIKA-2254).

  * Extract text from diagrams in .docx, .pptx, .xlsx and .xlsb
(TIKA-1945).

  * Enable base32 encoding of digests and enable BouncyCastle 
implementations
of digest algorithms (TIKA-2386).

-Original Message-
From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
Sent: Thursday, June 29, 2017 4:12 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15.1?

Agreed.

Luis


2017-06-29 15:45 GMT-03:00 Bob Paulin :

> If we're adding features does it make sense just to bump to 1.16 
> rather than 1.15.1?  Traditionally point releases would be bug fixes only 
[1].
>
>
> - Bob
>
> [1] http://semver.org/
> On 6/29/2017 1:18 PM, Allison, Timothy B. wrote:
> > K.
> >
> > -Original Message-
> > From: Mattmann, Chris A (3010)
> > [mailto:chris.a.mattm...@jpl.nasa.gov]
> > Sent: Thursday, June 29, 2017 1:59 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15.1?
> >
> > Hey Tim, I’d like to try and get in:
> >
> > https://issues.apache.org/jira/browse/TIKA-1988
> >
> > today for 15.1. I am working on integrating it now and adding some 
> > docs
> to the wiki.
> >
> > I’ll keep you posted.
> >
> > Cheers,
> > Chris
> >
> >
> > 
> ++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development 
> Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > 
> ++
> > Director, Information Retrieval and Data Science Group (IRDS) 
> > Adjunct
> Associate Professor, Computer Science Department University of 
> Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > 
> ++
> >
> >
> > On 6/28/17, 12:24 PM, "Allison, Timothy B."  wrote:
> >
> > POI is available on maven, and I just upgraded.
> >
> > Unless there are objections, I'll change our
> >
> > org.apache.tika.parser.sentiment.analysis.SentimentParser
> >
> > to
> >
> > 
> > org.apache.tika.parser.sentiment.analysis.SentimentAnalysisParser
> >
> > and we should be good to go for 1.15

RE: Tika 1.15.1? -> 1.16

2017-07-03 Thread Allison, Timothy B.
Sounds good. I'll kick off regression tests now, with a goal of creating 
1.16-rc1 on Wednesday 14:00 UTC?

-Original Message-
From: Mattmann, Chris A (3010) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Monday, July 3, 2017 2:24 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15.1? -> 1.16

Hey Tim, if I don’t get it done by today, push 1.16 and we’ll put Age Detection 
in 1.17.

++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF 
& Open Source Projects Formulation and Development Offices (8212) NASA Jet 
Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++
 

On 7/3/17, 7:17 AM, "Allison, Timothy B."  wrote:

All,
  I think we're now solidly at 1.16.  Anyone still strongly in favor of 
1.15.1?  

Chris,
  Will age detection be ready soon, or should we push that to 1.17?

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Friday, June 30, 2017 7:01 AM
To: dev@tika.apache.org; lfcnas...@gmail.com
    Subject: RE: Tika 1.15.1? -> 1.16

Y, I was thinking that I may have already pushed us over this threshold 
with the * below.  1.16 it is then?

Chris, let us know when the age detection is good to go or if 1.17 is a 
better target.


  * Allow extraction of scripts as embedded "MACRO". Users
must turn this on via TikaConfig (TIKA-2391).

  * Allow users to turn off extraction of headers and footers
from .doc, .docx, .xls, .xlsx, .xlsb (TIKA-2362)

  * Extract text from charts in .docx, .pptx, .xlsx and .xlsb
(TIKA-2254).

  * Extract text from diagrams in .docx, .pptx, .xlsx and .xlsb
(TIKA-1945).

  * Enable base32 encoding of digests and enable BouncyCastle 
implementations
of digest algorithms (TIKA-2386).

-Original Message-
From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
Sent: Thursday, June 29, 2017 4:12 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15.1?

Agreed.

Luis


2017-06-29 15:45 GMT-03:00 Bob Paulin :

> If we're adding features does it make sense just to bump to 1.16 
> rather than 1.15.1?  Traditionally point releases would be bug fixes only 
[1].
>
>
> - Bob
>
> [1] http://semver.org/
> On 6/29/2017 1:18 PM, Allison, Timothy B. wrote:
> > K.
> >
> > -Original Message-
> > From: Mattmann, Chris A (3010)
> > [mailto:chris.a.mattm...@jpl.nasa.gov]
> > Sent: Thursday, June 29, 2017 1:59 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15.1?
> >
> > Hey Tim, I’d like to try and get in:
> >
> > https://issues.apache.org/jira/browse/TIKA-1988
> >
> > today for 15.1. I am working on integrating it now and adding some 
> > docs
> to the wiki.
> >
> > I’ll keep you posted.
> >
> > Cheers,
> > Chris
> >
> >
> > 
> ++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development 
> Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > 
> ++
> > Director, Information Retrieval and Data Science Group (IRDS) 
> > Adjunct
> Associate Professor, Computer Science Department University of 
> Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > 
> ++
> >
> >
> > On 6/28/17, 12:24 PM, "Allison, Timothy B."  wrote:
> >
> > POI is available on maven, and I just upgraded.
> >
> > Unless there are objections, I&

RE: Tika 1.15.1? -> 1.16

2017-07-05 Thread Allison, Timothy B.
All,
  I'm waiting to get some resolution on TIKA-2399.  The regression tests came 
back with nothing surprising.  I fixed the npe that they uncovered in the new 
ppt macro extraction code.
  Will I need to rerun with the updates to mime detection that Nick just made?  
Or are we good enough to go once we figure out what we can do w TIKA-2399?

  Onward.

   Cheers,
 Tim

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Monday, July 3, 2017 2:35 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.15.1? -> 1.16

Sounds good. I'll kick off regression tests now, with a goal of creating 
1.16-rc1 on Wednesday 14:00 UTC?

-Original Message-
From: Mattmann, Chris A (3010) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Monday, July 3, 2017 2:24 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15.1? -> 1.16

Hey Tim, if I don’t get it done by today, push 1.16 and we’ll put Age Detection 
in 1.17.

++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF 
& Open Source Projects Formulation and Development Offices (8212) NASA Jet 
Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++
 

On 7/3/17, 7:17 AM, "Allison, Timothy B."  wrote:

All,
  I think we're now solidly at 1.16.  Anyone still strongly in favor of 
1.15.1?  

Chris,
  Will age detection be ready soon, or should we push that to 1.17?

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Friday, June 30, 2017 7:01 AM
To: dev@tika.apache.org; lfcnas...@gmail.com
    Subject: RE: Tika 1.15.1? -> 1.16

Y, I was thinking that I may have already pushed us over this threshold 
with the * below.  1.16 it is then?

Chris, let us know when the age detection is good to go or if 1.17 is a 
better target.


  * Allow extraction of scripts as embedded "MACRO". Users
must turn this on via TikaConfig (TIKA-2391).

  * Allow users to turn off extraction of headers and footers
from .doc, .docx, .xls, .xlsx, .xlsb (TIKA-2362)

  * Extract text from charts in .docx, .pptx, .xlsx and .xlsb
(TIKA-2254).

  * Extract text from diagrams in .docx, .pptx, .xlsx and .xlsb
(TIKA-1945).

  * Enable base32 encoding of digests and enable BouncyCastle 
implementations
of digest algorithms (TIKA-2386).

-Original Message-
From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
Sent: Thursday, June 29, 2017 4:12 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15.1?

Agreed.

Luis


2017-06-29 15:45 GMT-03:00 Bob Paulin :

> If we're adding features does it make sense just to bump to 1.16 
> rather than 1.15.1?  Traditionally point releases would be bug fixes only 
[1].
>
>
> - Bob
>
> [1] http://semver.org/
> On 6/29/2017 1:18 PM, Allison, Timothy B. wrote:
> > K.
> >
> > -Original Message-
> > From: Mattmann, Chris A (3010)
> > [mailto:chris.a.mattm...@jpl.nasa.gov]
> > Sent: Thursday, June 29, 2017 1:59 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15.1?
> >
> > Hey Tim, I’d like to try and get in:
> >
> > https://issues.apache.org/jira/browse/TIKA-1988
> >
> > today for 15.1. I am working on integrating it now and adding some 
> > docs
> to the wiki.
> >
> > I’ll keep you posted.
> >
> > Cheers,
> > Chris
> >
> >
> > 
> ++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development 
> Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++

Re: Tika 1.15.1? -> 1.16

2017-07-05 Thread Chris Mattmann
Tim I really think I can get AgeDetection in. Let me try now.

Then I’m +1. I’m +1 either way (



On 7/5/17, 6:35 PM, "Allison, Timothy B."  wrote:

All,
  I'm waiting to get some resolution on TIKA-2399.  The regression tests 
came back with nothing surprising.  I fixed the npe that they uncovered in the 
new ppt macro extraction code.
  Will I need to rerun with the updates to mime detection that Nick just 
made?  Or are we good enough to go once we figure out what we can do w 
TIKA-2399?

  Onward.

   Cheers,
 Tim

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Monday, July 3, 2017 2:35 PM
To: dev@tika.apache.org
    Subject: RE: Tika 1.15.1? -> 1.16

Sounds good. I'll kick off regression tests now, with a goal of creating 
1.16-rc1 on Wednesday 14:00 UTC?

-Original Message-
From: Mattmann, Chris A (3010) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Monday, July 3, 2017 2:24 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15.1? -> 1.16

Hey Tim, if I don’t get it done by today, push 1.16 and we’ll put Age 
Detection in 1.17.

++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010) Manager, 
NSF & Open Source Projects Formulation and Development Offices (8212) NASA Jet 
Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct 
Associate Professor, Computer Science Department University of Southern 
California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++
 

On 7/3/17, 7:17 AM, "Allison, Timothy B."  wrote:

All,
  I think we're now solidly at 1.16.  Anyone still strongly in favor of 
1.15.1?  

Chris,
  Will age detection be ready soon, or should we push that to 1.17?

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Friday, June 30, 2017 7:01 AM
To: dev@tika.apache.org; lfcnas...@gmail.com
Subject: RE: Tika 1.15.1? -> 1.16

Y, I was thinking that I may have already pushed us over this threshold 
with the * below.  1.16 it is then?

Chris, let us know when the age detection is good to go or if 1.17 is a 
better target.


  * Allow extraction of scripts as embedded "MACRO". Users
must turn this on via TikaConfig (TIKA-2391).

  * Allow users to turn off extraction of headers and footers
from .doc, .docx, .xls, .xlsx, .xlsb (TIKA-2362)

  * Extract text from charts in .docx, .pptx, .xlsx and .xlsb
(TIKA-2254).

  * Extract text from diagrams in .docx, .pptx, .xlsx and .xlsb
(TIKA-1945).

  * Enable base32 encoding of digests and enable BouncyCastle 
implementations
of digest algorithms (TIKA-2386).

-Original Message-
From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
Sent: Thursday, June 29, 2017 4:12 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15.1?

Agreed.

Luis


2017-06-29 15:45 GMT-03:00 Bob Paulin :

> If we're adding features does it make sense just to bump to 1.16 
> rather than 1.15.1?  Traditionally point releases would be bug fixes 
only [1].
>
>
> - Bob
>
> [1] http://semver.org/
> On 6/29/2017 1:18 PM, Allison, Timothy B. wrote:
> > K.
> >
> > -Original Message-
> > From: Mattmann, Chris A (3010)
> > [mailto:chris.a.mattm...@jpl.nasa.gov]
> > Sent: Thursday, June 29, 2017 1:59 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15.1?
> >
> > Hey Tim, I’d like to try and get in:
> >
> > https://issues.apache.org/jira/browse/TIKA-1988
> >
> > today for 15.1. I am working on integrating it now and adding some 
> > docs
> to the wiki.
> >
> > I’ll keep you posted.
> >
> > Cheers,
> > Chris
> >
> >
> > 

Re: Tika 1.15.1? -> 1.16

2017-07-05 Thread Luís Filipe Nassif
Hi Tim,

Taking a fast look at Nick's fix on TIKA-2419 seems conservative to me,
restricted to corrupted xml, so I think there is no need to rerun the
regression tests.

So +1 from me, ++1 with age detection :)

2017-07-05 22:35 GMT-03:00 Allison, Timothy B. :

> All,
>   I'm waiting to get some resolution on TIKA-2399.  The regression tests
> came back with nothing surprising.  I fixed the npe that they uncovered in
> the new ppt macro extraction code.
>   Will I need to rerun with the updates to mime detection that Nick just
> made?  Or are we good enough to go once we figure out what we can do w
> TIKA-2399?
>
>   Onward.
>
>Cheers,
>  Tim
>
> -Original Message-
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Monday, July 3, 2017 2:35 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.15.1? -> 1.16
>
> Sounds good. I'll kick off regression tests now, with a goal of creating
> 1.16-rc1 on Wednesday 14:00 UTC?
>
> -Original Message-
> From: Mattmann, Chris A (3010) [mailto:chris.a.mattm...@jpl.nasa.gov]
> Sent: Monday, July 3, 2017 2:24 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.15.1? -> 1.16
>
> Hey Tim, if I don’t get it done by today, push 1.16 and we’ll put Age
> Detection in 1.17.
>
> ++
> Chris Mattmann, Ph.D.
> Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 180-503E, Mailstop: 180-503
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS) Adjunct
> Associate Professor, Computer Science Department University of Southern
> California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
>
>
> On 7/3/17, 7:17 AM, "Allison, Timothy B."  wrote:
>
> All,
>   I think we're now solidly at 1.16.  Anyone still strongly in favor
> of 1.15.1?
>
> Chris,
>   Will age detection be ready soon, or should we push that to 1.17?
>
> -Original Message-
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Friday, June 30, 2017 7:01 AM
> To: dev@tika.apache.org; lfcnas...@gmail.com
> Subject: RE: Tika 1.15.1? -> 1.16
>
> Y, I was thinking that I may have already pushed us over this
> threshold with the * below.  1.16 it is then?
>
> Chris, let us know when the age detection is good to go or if 1.17 is
> a better target.
>
>
>   * Allow extraction of scripts as embedded "MACRO". Users
> must turn this on via TikaConfig (TIKA-2391).
>
>   * Allow users to turn off extraction of headers and footers
> from .doc, .docx, .xls, .xlsx, .xlsb (TIKA-2362)
>
>   * Extract text from charts in .docx, .pptx, .xlsx and .xlsb
> (TIKA-2254).
>
>   * Extract text from diagrams in .docx, .pptx, .xlsx and .xlsb
> (TIKA-1945).
>
>   * Enable base32 encoding of digests and enable BouncyCastle
> implementations
> of digest algorithms (TIKA-2386).
>
> -Original Message-
> From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
> Sent: Thursday, June 29, 2017 4:12 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.15.1?
>
> Agreed.
>
> Luis
>
>
> 2017-06-29 15:45 GMT-03:00 Bob Paulin :
>
> > If we're adding features does it make sense just to bump to 1.16
> > rather than 1.15.1?  Traditionally point releases would be bug fixes
> only [1].
> >
> >
> > - Bob
> >
> > [1] http://semver.org/
> > On 6/29/2017 1:18 PM, Allison, Timothy B. wrote:
> > > K.
> > >
> > > -Original Message-
> > > From: Mattmann, Chris A (3010)
> > > [mailto:chris.a.mattm...@jpl.nasa.gov]
> > > Sent: Thursday, June 29, 2017 1:59 PM
> > > To: dev@tika.apache.org
> > > Subject: Re: Tika 1.15.1?
> > >
> > > Hey Tim, I’d like to try and get in:
> > >
> > > https://issues.apache.org/jira/browse/TIKA-1988
> > >
> > > today for 15.1. I am working on integrating it now and adding some
> > > docs
> > to the wiki.
> > >
> >

Re: Tika 1.15.1? -> 1.16

2017-07-06 Thread Chris Mattmann
OK Tim / all, TIKA-1988 is done! Age resolution is in.

Enjoy and proceed with the release, please +1.

Cheers,
Chris




On 7/5/17, 8:37 PM, "Luís Filipe Nassif"  wrote:

Hi Tim,

Taking a fast look at Nick's fix on TIKA-2419 seems conservative to me,
restricted to corrupted xml, so I think there is no need to rerun the
regression tests.

So +1 from me, ++1 with age detection :)

2017-07-05 22:35 GMT-03:00 Allison, Timothy B. :

> All,
>   I'm waiting to get some resolution on TIKA-2399.  The regression tests
> came back with nothing surprising.  I fixed the npe that they uncovered in
> the new ppt macro extraction code.
>   Will I need to rerun with the updates to mime detection that Nick just
> made?  Or are we good enough to go once we figure out what we can do w
> TIKA-2399?
>
>   Onward.
>
>Cheers,
>  Tim
>
> -Original Message-
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Monday, July 3, 2017 2:35 PM
    > To: dev@tika.apache.org
> Subject: RE: Tika 1.15.1? -> 1.16
>
> Sounds good. I'll kick off regression tests now, with a goal of creating
> 1.16-rc1 on Wednesday 14:00 UTC?
>
> -Original Message-
> From: Mattmann, Chris A (3010) [mailto:chris.a.mattm...@jpl.nasa.gov]
    > Sent: Monday, July 3, 2017 2:24 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.15.1? -> 1.16
>
> Hey Tim, if I don’t get it done by today, push 1.16 and we’ll put Age
> Detection in 1.17.
>
> ++
> Chris Mattmann, Ph.D.
> Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 180-503E, Mailstop: 180-503
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS) Adjunct
> Associate Professor, Computer Science Department University of Southern
> California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
>
>
> On 7/3/17, 7:17 AM, "Allison, Timothy B."  wrote:
>
> All,
>   I think we're now solidly at 1.16.  Anyone still strongly in favor
> of 1.15.1?
>
> Chris,
>   Will age detection be ready soon, or should we push that to 1.17?
>
    >     -Original Message-
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Friday, June 30, 2017 7:01 AM
> To: dev@tika.apache.org; lfcnas...@gmail.com
> Subject: RE: Tika 1.15.1? -> 1.16
>
> Y, I was thinking that I may have already pushed us over this
> threshold with the * below.  1.16 it is then?
>
> Chris, let us know when the age detection is good to go or if 1.17 is
> a better target.
>
>
>   * Allow extraction of scripts as embedded "MACRO". Users
> must turn this on via TikaConfig (TIKA-2391).
>
>   * Allow users to turn off extraction of headers and footers
> from .doc, .docx, .xls, .xlsx, .xlsb (TIKA-2362)
>
>   * Extract text from charts in .docx, .pptx, .xlsx and .xlsb
> (TIKA-2254).
>
>   * Extract text from diagrams in .docx, .pptx, .xlsx and .xlsb
> (TIKA-1945).
>
>   * Enable base32 encoding of digests and enable BouncyCastle
> implementations
> of digest algorithms (TIKA-2386).
>
> -Original Message-
> From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
> Sent: Thursday, June 29, 2017 4:12 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.15.1?
>
> Agreed.
>
> Luis
>
>
> 2017-06-29 15:45 GMT-03:00 Bob Paulin :
>
> > If we're adding features does it make sense just to bump to 1.16
> > rather than 1.15.1?  Traditionally point releases would be bug fixes
> only [1].
> >
> >
> > - Bob
> >
> > [1] http://semver.org/
> > On 6/29/2017 1:18 PM, Allison, Timothy B. wrote:
> > > K.