[jira] [Commented] (TIKA-1936) Clean up parsers not cleaning up resources

2016-04-06 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228852#comment-15228852
 ] 

Tim Allison commented on TIKA-1936:
---

Looks like an unhappy day in Hudson's world, probably not the fault of this 
commit.

{noformat}
Waiting for Jenkins to finish collecting data
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-deploy-plugin:2.8.2:deploy (default-deploy) on 
project tika-parent: Failed to deploy artifacts: Could not transfer artifact 
org.apache.tika:tika-parent:pom:2.0-20160406.182253-115 from/to 
apache.snapshots.https 
(https://repository.apache.org/content/repositories/snapshots): Failed to 
transfer file: 
https://repository.apache.org/content/repositories/snapshots/org/apache/tika/tika-parent/2.0-SNAPSHOT/tika-parent-2.0-20160406.182253-115.pom.
 Return code is: 401, ReasonPhrase:Unauthorized
{noformat}

> Clean up parsers not cleaning up resources
> --
>
> Key: TIKA-1936
> URL: https://issues.apache.org/jira/browse/TIKA-1936
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>
> Instead of opening individual issues, prob better to open one big one.  
> Apologies for the clutter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] tika pull request: Corrected and Improved

2016-04-06 Thread reevapp
GitHub user reevapp reopened a pull request:

https://github.com/apache/tika/pull/99

Corrected and Improved

The original code did not work at all, the WebClient was an Instance 
Variable and not only it was not thread-safe but also it would only work for 
the very first request (all subsequent requests would include duplicated query 
parameters and fail).
Some error control was added to allow for troubleshooting, before there 
would be no clue on the source of a failed request.
It is now possible to specify the User Key.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/reevapp/tika master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tika/pull/99.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #99


commit 17241b5cd86300127a343b7b22d6ed294f44c2bb
Author: ReEvApp - Re-Evolution Applications, LLC 
Date:   2016-04-06T18:27:05Z

Corrections and Improvements

The original code did not work at all, the WebClient was an Instance 
Variable and not only it was not thread-safe but also it would only work for 
the very first request (all subsequent requests would include duplicated query 
parameters and fail).
Some error control was added to allow for troubleshooting, before there 
would be no clue on the source of a failed request.
It is now possible to specify the User Key.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tika pull request: Corrected and Improved

2016-04-06 Thread reevapp
GitHub user reevapp opened a pull request:

https://github.com/apache/tika/pull/99

Corrected and Improved

The original code did not work at all, the WebClient was an Instance 
Variable and not only it was not thread-safe but also it would only work for 
the very first request (all subsequent requests would include duplicated query 
parameters and fail).
Some error control was added to allow for troubleshooting, before there 
would be no clue on the source of a failed request.
It is now possible to specify the User Key.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/reevapp/tika master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tika/pull/99.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #99


commit 17241b5cd86300127a343b7b22d6ed294f44c2bb
Author: ReEvApp - Re-Evolution Applications, LLC 
Date:   2016-04-06T18:27:05Z

Corrections and Improvements

The original code did not work at all, the WebClient was an Instance 
Variable and not only it was not thread-safe but also it would only work for 
the very first request (all subsequent requests would include duplicated query 
parameters and fail).
Some error control was added to allow for troubleshooting, before there 
would be no clue on the source of a failed request.
It is now possible to specify the User Key.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TIKA-1936) Clean up parsers not cleaning up resources

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228818#comment-15228818
 ] 

Hudson commented on TIKA-1936:
--

FAILURE: Integrated in tika-2.x #78 (See 
[https://builds.apache.org/job/tika-2.x/78/])
TIKA-1936 (tallison: rev bd230cb28c0a25335423f06355dfff092773493d)
* 
tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
* tika-core/src/test/java/org/apache/tika/TikaTest.java
* tika-server/src/test/java/org/apache/tika/server/CXFTestBase.java
* tika-example/src/main/java/org/apache/tika/example/ParsingExample.java
* 
tika-parser-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* 
tika-parser-modules/tika-parser-scientific-module/src/main/java/org/apache/tika/parser/mat/MatParser.java
* 
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java
* 
tika-parser-modules/tika-parser-scientific-module/src/main/java/org/apache/tika/parser/netcdf/NetCDFParser.java
TIKA-1936 -- whitespace cleanup (tallison: rev 
b69ea1a5dc6a065ba81a0bbfbcebe6889d0d4a2e)
* 
tika-parser-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* 
tika-parser-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
* 
tika-parser-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java


> Clean up parsers not cleaning up resources
> --
>
> Key: TIKA-1936
> URL: https://issues.apache.org/jira/browse/TIKA-1936
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>
> Instead of opening individual issues, prob better to open one big one.  
> Apologies for the clutter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: FW: Apache Tika used to parse the Panama papers!

2016-04-06 Thread Tilman Hausherr

Yes I read about that too :-)

It would be interesting to hear whether they had any problems, and 
whether they made any support requests, and were these answered 
successfully? Were there any files that failed or did poorly? Or was 
everything so good that no help was needed at all?


I'm delighted that a java product was used, despite that native code 
products would likely have been faster.


Tilman (I'm slightly skeptic about the ICIJ because of the funding and 
the suspicious lack of US data, but as a huge data archeology project, I 
love it!)


Am 06.04.2016 um 19:18 schrieb Allison, Timothy B.:

Looks like quite a few PDFs [0]...

Couldn't have done it without you!

Cheers,

Tim

P.S. Tip of the hat to Andreas for rt the link!

[0] https://twitter.com/bigdata/status/717346207312392192

-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
Sent: Tuesday, April 05, 2016 6:47 PM
To: dev@tika.apache.org
Cc: pr...@apache.org
Subject: Apache Tika used to parse the Panama papers!

FYI:
http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5


BTW I know Thomas and am in touch..he wrote an article about MEMEX last year.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion 
Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++






-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org





[jira] [Commented] (TIKA-1936) Clean up parsers not cleaning up resources

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228745#comment-15228745
 ] 

Hudson commented on TIKA-1936:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #947 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/947/])
TIKA-1936 -- clean up parsers and tests that aren't cleaning up tmp (tallison: 
rev 71f8423b0403955265a0726ff3173cc5708518c4)
* tika-parsers/src/main/java/org/apache/tika/parser/netcdf/NetCDFParser.java
* tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* tika-example/src/main/java/org/apache/tika/example/ParsingExample.java
* tika-parsers/src/main/java/org/apache/tika/parser/mat/MatParser.java
* 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
* tika-server/src/test/java/org/apache/tika/server/CXFTestBase.java
* 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java
* tika-core/src/test/java/org/apache/tika/TikaTest.java
TIKA-1936 -- small cleanup in pdfparser test (tallison: rev 
270b8a966c2a0d68d92d1776c4b57f2137adc2b5)
* tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java


> Clean up parsers not cleaning up resources
> --
>
> Key: TIKA-1936
> URL: https://issues.apache.org/jira/browse/TIKA-1936
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>
> Instead of opening individual issues, prob better to open one big one.  
> Apologies for the clutter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


FW: Apache Tika used to parse the Panama papers!

2016-04-06 Thread Allison, Timothy B.
Looks like quite a few MSG files!


-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Tuesday, April 05, 2016 6:47 PM
To: dev@tika.apache.org
Cc: pr...@apache.org
Subject: Apache Tika used to parse the Panama papers!

FYI:
http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5


BTW I know Thomas and am in touch..he wrote an article about MEMEX last year.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion 
Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++







FW: Apache Tika used to parse the Panama papers!

2016-04-06 Thread Allison, Timothy B.
Looks like quite a few PDFs [0]...

Couldn't have done it without you! 

Cheers,

   Tim

P.S. Tip of the hat to Andreas for rt the link!

[0] https://twitter.com/bigdata/status/717346207312392192 

-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Tuesday, April 05, 2016 6:47 PM
To: dev@tika.apache.org
Cc: pr...@apache.org
Subject: Apache Tika used to parse the Panama papers!

FYI:
http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5


BTW I know Thomas and am in touch..he wrote an article about MEMEX last year.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion 
Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++







[jira] [Updated] (TIKA-1934) GeographicInformationParserTest leaving behind temp file in trunk

2016-04-06 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1934:
--
Issue Type: Sub-task  (was: Improvement)
Parent: TIKA-1936

> GeographicInformationParserTest leaving behind temp file in trunk
> -
>
> Key: TIKA-1934
> URL: https://issues.apache.org/jira/browse/TIKA-1934
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>
> GeographicInformationParser needs to release TemporaryResources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1935) ISArchiveParser not releasing resources

2016-04-06 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1935:
--
Issue Type: Sub-task  (was: Improvement)
Parent: TIKA-1936

> ISArchiveParser not releasing resources
> ---
>
> Key: TIKA-1935
> URL: https://issues.apache.org/jira/browse/TIKA-1935
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1932) Clear resources in ParserDecorator

2016-04-06 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1932:
--
Issue Type: Sub-task  (was: Bug)
Parent: TIKA-1936

> Clear resources in ParserDecorator
> --
>
> Key: TIKA-1932
> URL: https://issues.apache.org/jira/browse/TIKA-1932
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>
> In ParserDecorator, we're creating a new TikaInputStream to trigger the 
> creation of a temp file, but we're not closing that stream so a temp file is 
> left behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1933) ForkParser leaves tmp jars behind on Windows (at least)

2016-04-06 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1933:
--
Issue Type: Sub-task  (was: Improvement)
Parent: TIKA-1936

> ForkParser leaves tmp jars behind on Windows (at least)
> ---
>
> Key: TIKA-1933
> URL: https://issues.apache.org/jira/browse/TIKA-1933
> Project: Tika
>  Issue Type: Sub-task
> Environment: Windows 7
>Reporter: Tim Allison
>Priority: Trivial
>
> During the build process, the ForkParser is leaving behind its temp jars.  
> I think the process is still holding onto the jar very briefly after we 
> destroy() it.  
> Java thinks the process is done -- exitValue() returns 1 and then the jar 
> fails to be deleted.
> If we add waitFor() or even a sleep(10), after we destroy(), the tmp jar is 
> deleted.
> I'm always hesitant to add an unbounded waitFor() (which we'll be able to 
> bound in Java 8).  Any preferences for a fix?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1929) Need to close resources on exception in sqlite parser

2016-04-06 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1929:
--
Issue Type: Sub-task  (was: Bug)
Parent: TIKA-1936

> Need to close resources on exception in sqlite parser
> -
>
> Key: TIKA-1929
> URL: https://issues.apache.org/jira/browse/TIKA-1929
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Priority: Minor
>
> If there's an exception during parsing of a SQLite file, we aren't 
> guaranteeing that the temp file is deleted.
> If a TikaInputStream is used, we assume the calling code will close the 
> stream and thereby delete the temp file.  However, if another type of 
> InputStream is used, we copy that to a temp file, and we need to ensure that 
> we delete that temp file if there's an exception during the parse.
> While we're at it, we should also clean up test code to close streams 
> correctly.
> Unrelated to this issue... I noticed that xerial's SQLite code is still 
> leaving behind a copy of the native dll in the temp folder on Windows the 
> first time the SQLite parser is called.  See 
> https://github.com/xerial/sqlite-jdbc/issues/80.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1930) Clean up resources from grib parser

2016-04-06 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1930:
--
Issue Type: Sub-task  (was: Bug)
Parent: TIKA-1936

> Clean up resources from grib parser
> ---
>
> Key: TIKA-1930
> URL: https://issues.apache.org/jira/browse/TIKA-1930
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Tim Allison
>Priority: Minor
>
> The grib parser isn't currently set up to delete the temp file correctly.  
> That part is a trivial fix.  
> The underlying library also generates temp files ending in "gbx9" and "ncx2" 
> that we aren't currently cleaning up.  I've tried variations on 
> {{DiskCache.cleanCache(0, new StringBuilder());}} with no luck on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1936) Clean up parsers not cleaning up resources

2016-04-06 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228633#comment-15228633
 ] 

Tim Allison commented on TIKA-1936:
---

Made a few modifications to:

* MATParser
* NetCDFParser
* AbstractOOXMLExtractor's handling of embedded OLE
* one unit test in microsoft package
* CXFTestBase

Started large decluttering of tests in PDFParser tests (mea culpa).

> Clean up parsers not cleaning up resources
> --
>
> Key: TIKA-1936
> URL: https://issues.apache.org/jira/browse/TIKA-1936
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>
> Instead of opening individual issues, prob better to open one big one.  
> Apologies for the clutter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TIKA-1935) ISArchiveParser not releasing resources

2016-04-06 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-1935.
---
Resolution: Fixed

> ISArchiveParser not releasing resources
> ---
>
> Key: TIKA-1935
> URL: https://issues.apache.org/jira/browse/TIKA-1935
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1935) ISArchiveParser not releasing resources

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228405#comment-15228405
 ] 

Hudson commented on TIKA-1935:
--

FAILURE: Integrated in tika-trunk-jdk1.7 #946 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/946/])
TIKA-1935 (tallison: rev f89a19ffced973f212a7f28e8eed3b17f2b75d69)
* tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISArchiveParser.java


> ISArchiveParser not releasing resources
> ---
>
> Key: TIKA-1935
> URL: https://issues.apache.org/jira/browse/TIKA-1935
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


tika-trunk-jdk1.7 - Build # 946 - Failure

2016-04-06 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #946)

Status: Failure

Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/946/ to 
view the results.

[jira] [Commented] (TIKA-1934) GeographicInformationParserTest leaving behind temp file in trunk

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228377#comment-15228377
 ] 

Hudson commented on TIKA-1934:
--

SUCCESS: Integrated in tika-2.x #77 (See 
[https://builds.apache.org/job/tika-2.x/77/])
TIKA-1934 (tallison: rev 4cd69838e879484642fccf5ff6e51426fb6650ec)
* 
tika-parser-modules/tika-parser-scientific-module/src/main/java/org/apache/tika/parser/geoinfo/GeographicInformationParser.java


> GeographicInformationParserTest leaving behind temp file in trunk
> -
>
> Key: TIKA-1934
> URL: https://issues.apache.org/jira/browse/TIKA-1934
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>
> GeographicInformationParser needs to release TemporaryResources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: @ApacheTika , and release related tweets question

2016-04-06 Thread Mattmann, Chris A (3980)
FYI I updated the front page with a news item link to the Panama
papers.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 4/6/16, 10:10 AM, "Mattmann, Chris A (3980)"  
wrote:

>++1 on all the feedback from you two below :)
>
>++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattm...@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++
>Director, Information Retrieval and Data Science Group (IRDS)
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>WWW: http://irds.usc.edu/
>++
>
>
>
>
>
>
>
>
>
>
>On 4/6/16, 9:08 AM, "Bob Paulin"  wrote:
>
>>Hi Nick,
>>
>>This is awesome and I think should be great for the community!  I looked 
>>to commons as an example https://twitter.com/ApacheCommons . Looks like 
>>they tweet out the releases with a link to the mailing list comments.  
>>Might be a good precedent to follow to bring attention to the fact that 
>>there is a mailing list.  Other things to consider: CVE we'd need to 
>>report publicly, committer/PMC updates, and perhaps PSAs of changes like 
>>the SVN - > GIT change.  Thanks again!
>>
>>- Bob
>>
>>On 4/6/2016 7:41 AM, Nick Burch wrote:
>>> Hi All
>>>
>>> Firstly, in case you haven't heard, we've setup a twitter account for 
>>> the project! It's @ApacheTika - https://twitter.com/ApacheTika
>>>
>>>
>>> One thing we'll want to use it for is project publicity, linking to 
>>> interesting things going on around the project, such as today's post 
>>> on how the panama papers investigation used Apache Tika and SOLR :)
>>>
>>> Another thing we can use it for is release announcements. That leads 
>>> to a question though - which parts? Should we just tweet when there's 
>>> a new release out, linking to the download and the changelog?
>>>
>>> Or would people prefer it if we tweeted when we start the countdown to 
>>> a release (to give a chance to test / get last patches ready), again 
>>> when the vote starts (to get a wider group testing and voting), and 
>>> finally when the release is out?
>>>
>>> Thoughts?
>>>
>>> Nick
>>>
>>


Re: @ApacheTika , and release related tweets question

2016-04-06 Thread Mattmann, Chris A (3980)
++1 on all the feedback from you two below :)

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 4/6/16, 9:08 AM, "Bob Paulin"  wrote:

>Hi Nick,
>
>This is awesome and I think should be great for the community!  I looked 
>to commons as an example https://twitter.com/ApacheCommons . Looks like 
>they tweet out the releases with a link to the mailing list comments.  
>Might be a good precedent to follow to bring attention to the fact that 
>there is a mailing list.  Other things to consider: CVE we'd need to 
>report publicly, committer/PMC updates, and perhaps PSAs of changes like 
>the SVN - > GIT change.  Thanks again!
>
>- Bob
>
>On 4/6/2016 7:41 AM, Nick Burch wrote:
>> Hi All
>>
>> Firstly, in case you haven't heard, we've setup a twitter account for 
>> the project! It's @ApacheTika - https://twitter.com/ApacheTika
>>
>>
>> One thing we'll want to use it for is project publicity, linking to 
>> interesting things going on around the project, such as today's post 
>> on how the panama papers investigation used Apache Tika and SOLR :)
>>
>> Another thing we can use it for is release announcements. That leads 
>> to a question though - which parts? Should we just tweet when there's 
>> a new release out, linking to the download and the changelog?
>>
>> Or would people prefer it if we tweeted when we start the countdown to 
>> a release (to give a chance to test / get last patches ready), again 
>> when the vote starts (to get a wider group testing and voting), and 
>> finally when the release is out?
>>
>> Thoughts?
>>
>> Nick
>>
>


[jira] [Commented] (TIKA-1934) GeographicInformationParserTest leaving behind temp file in trunk

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228271#comment-15228271
 ] 

Hudson commented on TIKA-1934:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #945 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/945/])
TIKA-1934 (tallison: rev cffda850a1ff23836195da25416712f815136354)
* 
tika-parsers/src/test/java/org/apache/tika/parser/geoinfo/GeographicInformationParserTest.java
* tika-core/src/test/java/org/apache/tika/TikaTest.java
TIKA-1934 (tallison: rev d6923907d78cdc4bd05f0d67d88b9fb2bb162002)
* 
tika-parsers/src/main/java/org/apache/tika/parser/geoinfo/GeographicInformationParser.java


> GeographicInformationParserTest leaving behind temp file in trunk
> -
>
> Key: TIKA-1934
> URL: https://issues.apache.org/jira/browse/TIKA-1934
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>
> GeographicInformationParser needs to release TemporaryResources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1932) Clear resources in ParserDecorator

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228272#comment-15228272
 ] 

Hudson commented on TIKA-1932:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #945 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/945/])
TIKA-1932 - with correct pattern, third times the charm...argh.  This 
(tallison: rev ca1c265bb0c2084e0a736659c5fe1872acf9008d)
* tika-core/src/main/java/org/apache/tika/parser/ParserDecorator.java


> Clear resources in ParserDecorator
> --
>
> Key: TIKA-1932
> URL: https://issues.apache.org/jira/browse/TIKA-1932
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>
> In ParserDecorator, we're creating a new TikaInputStream to trigger the 
> creation of a temp file, but we're not closing that stream so a temp file is 
> left behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1936) Clean up parsers not cleaning up resources

2016-04-06 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1936:
-

 Summary: Clean up parsers not cleaning up resources
 Key: TIKA-1936
 URL: https://issues.apache.org/jira/browse/TIKA-1936
 Project: Tika
  Issue Type: Improvement
Reporter: Tim Allison


Instead of opening individual issues, prob better to open one big one.  
Apologies for the clutter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1932) Clear resources in ParserDecorator

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228262#comment-15228262
 ] 

Hudson commented on TIKA-1932:
--

SUCCESS: Integrated in tika-2.x #76 (See 
[https://builds.apache.org/job/tika-2.x/76/])
TIKA-1932 (tallison: rev 7aebf9d99dc37b549a37c6e4061a01914024bdf7)
* tika-core/src/main/java/org/apache/tika/parser/ParserDecorator.java


> Clear resources in ParserDecorator
> --
>
> Key: TIKA-1932
> URL: https://issues.apache.org/jira/browse/TIKA-1932
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>
> In ParserDecorator, we're creating a new TikaInputStream to trigger the 
> creation of a temp file, but we're not closing that stream so a temp file is 
> left behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1935) ISArchiveParser not releasing resources

2016-04-06 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1935:
-

 Summary: ISArchiveParser not releasing resources
 Key: TIKA-1935
 URL: https://issues.apache.org/jira/browse/TIKA-1935
 Project: Tika
  Issue Type: Improvement
Reporter: Tim Allison
Assignee: Tim Allison
Priority: Trivial
 Fix For: 2.0, 1.13






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TIKA-1934) GeographicInformationParserTest leaving behind temp file in trunk

2016-04-06 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-1934.
---
Resolution: Fixed

> GeographicInformationParserTest leaving behind temp file in trunk
> -
>
> Key: TIKA-1934
> URL: https://issues.apache.org/jira/browse/TIKA-1934
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>
> Need to close TikaInputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1934) GeographicInformationParserTest leaving behind temp file in trunk

2016-04-06 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1934:
-

 Summary: GeographicInformationParserTest leaving behind temp file 
in trunk
 Key: TIKA-1934
 URL: https://issues.apache.org/jira/browse/TIKA-1934
 Project: Tika
  Issue Type: Improvement
Reporter: Tim Allison
Assignee: Tim Allison
Priority: Trivial
 Fix For: 2.0, 1.13


Need to close TikaInputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: @ApacheTika , and release related tweets question

2016-04-06 Thread Bob Paulin

Hi Nick,

This is awesome and I think should be great for the community!  I looked 
to commons as an example https://twitter.com/ApacheCommons . Looks like 
they tweet out the releases with a link to the mailing list comments.  
Might be a good precedent to follow to bring attention to the fact that 
there is a mailing list.  Other things to consider: CVE we'd need to 
report publicly, committer/PMC updates, and perhaps PSAs of changes like 
the SVN - > GIT change.  Thanks again!


- Bob

On 4/6/2016 7:41 AM, Nick Burch wrote:

Hi All

Firstly, in case you haven't heard, we've setup a twitter account for 
the project! It's @ApacheTika - https://twitter.com/ApacheTika



One thing we'll want to use it for is project publicity, linking to 
interesting things going on around the project, such as today's post 
on how the panama papers investigation used Apache Tika and SOLR :)


Another thing we can use it for is release announcements. That leads 
to a question though - which parts? Should we just tweet when there's 
a new release out, linking to the download and the changelog?


Or would people prefer it if we tweeted when we start the countdown to 
a release (to give a chance to test / get last patches ready), again 
when the vote starts (to get a wider group testing and voting), and 
finally when the release is out?


Thoughts?

Nick





[jira] [Commented] (TIKA-1932) Clear resources in ParserDecorator

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228196#comment-15228196
 ] 

Hudson commented on TIKA-1932:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #944 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/944/])
TIKA-1932 (tallison: rev 27f1b8f726d7757c4aa57a0e3ae07d3e22cdfba7)
* tika-core/src/main/java/org/apache/tika/parser/ParserDecorator.java
TIKA-1932 - with correct pattern (tallison: rev 
e2ef2e9aba5783d205e428a3e607cafdb3ce102e)
* tika-core/src/main/java/org/apache/tika/parser/ParserDecorator.java


> Clear resources in ParserDecorator
> --
>
> Key: TIKA-1932
> URL: https://issues.apache.org/jira/browse/TIKA-1932
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 2.0, 1.13
>
>
> In ParserDecorator, we're creating a new TikaInputStream to trigger the 
> creation of a temp file, but we're not closing that stream so a temp file is 
> left behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


@ApacheTika , and release related tweets question

2016-04-06 Thread Nick Burch

Hi All

Firstly, in case you haven't heard, we've setup a twitter account for the 
project! It's @ApacheTika - https://twitter.com/ApacheTika



One thing we'll want to use it for is project publicity, linking to 
interesting things going on around the project, such as today's post on 
how the panama papers investigation used Apache Tika and SOLR :)


Another thing we can use it for is release announcements. That leads to a 
question though - which parts? Should we just tweet when there's a new 
release out, linking to the download and the changelog?


Or would people prefer it if we tweeted when we start the countdown to a 
release (to give a chance to test / get last patches ready), again when 
the vote starts (to get a wider group testing and voting), and finally 
when the release is out?


Thoughts?

Nick


[jira] [Created] (TIKA-1932) Clear resources in ParserDecorator

2016-04-06 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1932:
-

 Summary: Clear resources in ParserDecorator
 Key: TIKA-1932
 URL: https://issues.apache.org/jira/browse/TIKA-1932
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Assignee: Tim Allison
Priority: Trivial


In ParserDecorator, we're creating a new TikaInputStream to trigger the 
creation of a temp file, but we're not closing that stream so a temp file is 
left behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1931) Revert mp4 parser version because of new permanent hangs with 1.1.18

2016-04-06 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228128#comment-15228128
 ] 

Tim Allison commented on TIKA-1931:
---

https://github.com/sannies/mp4parser/issues/187

> Revert mp4 parser version because of new permanent hangs with 1.1.18
> 
>
> Key: TIKA-1931
> URL: https://issues.apache.org/jira/browse/TIKA-1931
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Reporter: Tim Allison
>Assignee: Tim Allison
> Attachments: mp4_parser_timeouts.zip
>
>
> In pre-pre-release-1-13 corpus testing of trunk, I found that the upgraded 
> mp4parser is hitting permanent hangs with three files.  In the older version, 
> it was able to parse 2 with no problem and threw an exception on one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1931) Revert mp4 parser version because of new permanent hangs with 1.1.18

2016-04-06 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1931:
--
Attachment: mp4_parser_timeouts.zip

Triggering files

> Revert mp4 parser version because of new permanent hangs with 1.1.18
> 
>
> Key: TIKA-1931
> URL: https://issues.apache.org/jira/browse/TIKA-1931
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Reporter: Tim Allison
>Assignee: Tim Allison
> Attachments: mp4_parser_timeouts.zip
>
>
> In pre-pre-release-1-13 corpus testing of trunk, I found that the upgraded 
> mp4parser is hitting permanent hangs with three files.  In the older version, 
> it was able to parse 2 with no problem and threw an exception on one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1931) Revert mp4 parser version because of new permanent hangs with 1.1.18

2016-04-06 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1931:
-

 Summary: Revert mp4 parser version because of new permanent hangs 
with 1.1.18
 Key: TIKA-1931
 URL: https://issues.apache.org/jira/browse/TIKA-1931
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Tim Allison
Assignee: Tim Allison


In pre-pre-release-1-13 corpus testing of trunk, I found that the upgraded 
mp4parser is hitting permanent hangs with three files.  In the older version, 
it was able to parse 2 with no problem and threw an exception on one.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Apache Tika used to parse the Panama papers!

2016-04-06 Thread Vasu Jain
Looks like someone took USC- CS572's assignments to a new level. ;)
> Date: Wed, 6 Apr 2016 09:28:49 +0200
> Subject: Re: Apache Tika used to parse the Panama papers!
> From: bdelacre...@apache.org
> To: chris.a.mattm...@jpl.nasa.gov; pr...@apache.org
> CC: dev@tika.apache.org
> 
> Hi,
> 
> On Wed, Apr 6, 2016 at 12:46 AM, Mattmann, Chris A (3980)
> 
> > http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak
> 
> Note that this also mentions Apache Solr.
> 
> -Bertrand
  

Re: Apache Tika used to parse the Panama papers!

2016-04-06 Thread Bertrand Delacretaz
Hi,

On Wed, Apr 6, 2016 at 12:46 AM, Mattmann, Chris A (3980)

> http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak

Note that this also mentions Apache Solr.

-Bertrand