Re: [VOTE] Release Apache Tika 2.6.0 Candidate #1

2022-11-06 Thread David Meikle
On Thu, 3 Nov 2022 at 13:47, Tim Allison  wrote:

> A candidate for the Tika 2.6.0 release is available at:
> https://dist.apache.org/repos/dist/dev/tika/2.6.0
>
> The release candidate is a zip archive of the sources in:
> https://github.com/apache/tika/tree/2.6.0-rc1/
>
> The SHA-512 checksum of the archive is
>
> 6b1011304da6a43e17697695fa78f86bfafd6828be52baefadb9d562ea328e43a0ae99fa7e0f020a234173470ee29ae19c917c4562dfdc4cff27945bd7e46e69.
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1091/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 2.6.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 2.6.0
> [ ] -1 Do not release this package because...
>

+1 from me too. Thanks Tim.

Cheers,
Dave


Re: [VOTE] Release Apache Tika 1.28.2 Candidate #2

2022-04-30 Thread David Meikle
On Thu, 28 Apr 2022 at 15:55, Tim Allison  wrote:

> A candidate for the Tika 1.28.2 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/1.28.2
>
> The release candidate is a zip archive of the sources in:
>   https://github.com/apache/tika/tree/1.28.2-rc2/
>
> The SHA-512 checksum of the archive is
>
> 035f3643a302e2a88f99ca549c4d5c5c6eecd7736d03e4a686b17028f519f6a7a40229e48f2aac0bdf1653391e0bd7d34d0c7d099a2e5a2cb6141df00a4181bf.
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1083/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 1.28.2.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.28.2
> [ ] -1 Do not release this package because...
>
>
Everything looks good (sigs, etc) and has build and tests fine for me in
Ubuntu 22.04 (Java 11 and 17) but is failing on Windows 11 (Java 11).

The test that is
failing TikaServerIntegrationTest.testSameServerIdAfterOOM() is consistent
in Maven build but not in IDE.

Will drill into it today to see if I can see anything.

Anyone building fine on Windows?

Cheers,
Dave


Re: [VOTE] Release Apache Tika 2.4.0 Candidate #1

2022-04-30 Thread David Meikle
Hi

On Fri, 29 Apr 2022 at 00:23, Tim Allison  wrote:

>
> The SHA-512 checksum of the archive is
>
> aff68637527fa4fa1ec21678ef2771a1dcd5eb3944bc1b1171c59459274295b903e093dc63ade0b6532bf137834d32bcb9cdf0d6a32efca187b9d6b8ac64f690.
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1085/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 2.4.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 2.4.0
> [ ] -1 Do not release this package because...
>

+1 - built on both Windows 11 (Java 11) and Ubuntu 22.04 (Java 11 and Java
17).

Cheers,
Dave


Re: [VOTE] Release Apache Tika 1.28.1 Candidate #1

2022-02-10 Thread David Meikle
Hello,

On Tue, 8 Feb 2022 at 18:22, Tim Allison  wrote:

> A candidate for the Tika 1.28.1 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/1.28.1
>
> The release candidate is a zip archive of the sources in:
>   https://github.com/apache/tika/tree/1.28.1-rc1/
>
> The SHA-512 checksum of the archive is
>
> 17e92425d1cb53932d883b890a98491d5744345a75fa159bab90d47449470705b0c23aa75af845c0c5e4a2c175879eafd368b59e7559168f12722428d4b45fa4.
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1081/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 1.28.1.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.28.1
> [ ] -1 Do not release this package because...
>

+1 from me.

Cheers,
Dave


[jira] [Commented] (TIKA-3188) Add IDML Parser

2022-01-17 Thread David Meikle (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17477391#comment-17477391
 ] 

David Meikle commented on TIKA-3188:


[~tilman] Pretty much because I didn't engage my brain :) I'll switch to use 
the _pdfbox.version_

 

> Add IDML Parser
> ---
>
> Key: TIKA-3188
> URL: https://issues.apache.org/jira/browse/TIKA-3188
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dave Meikle
>Assignee: Dave Meikle
>Priority: Major
> Fix For: 1.25
>
>
> Add a basic IDML parser to get content, XMP metadata and spread counts.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release Apache Tika 1.28 Candidate #3

2021-12-20 Thread David Meikle
On Mon, 20 Dec 2021 at 16:31, Tim Allison  wrote:

>
> The SHA-512 checksum of the archive is
>
> f8487f58aeec011c993ac46d8e99f8bed64333ccfa57edf8ff9773653204fa2a4e27cb1102e53c181ae7a1e98f892da4c1766f473ce5ee83c1b9229c4f8e5aec.
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1079/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 1.28.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.28
> [ ] -1 Do not release this package because.
>

+1, thanks Tim!

Cheers,
Dave


Re: [VOTE] Release Apache Tika 1.27 Candidate #1

2021-07-05 Thread David Meikle
On Wed, 30 Jun 2021 at 21:03, Tim Allison  wrote:

> Please vote on releasing this package as Apache Tika 1.27.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.27
> [ ] -1 Do not release this package because...
>
> Here's my +1.
>
> Cheers,
>
>Tim
>

+1 to this release.

Thanks again, Tim!

Cheers,
Dave


Re: [VOTE] Release Apache Tika 1.26 Candidate #1

2021-03-26 Thread David Meikle
On Wed, 24 Mar 2021 at 15:08, Tim Allison  wrote:

> Please vote on releasing this package as Apache Tika 1.26.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.26
> [ ] -1 Do not release this package because...
>

+1. Thanks Tim!

Cheers,
Dave


Re: [VOTE] Release Apache Tika 1.24 Candidate #3

2020-03-19 Thread David Meikle
Yeah, that is most important thing. Hope all friends in the Tika
community are doing well and keeping safe.

On the Tika side, just publishing the Docker image now. Will pop a wee Wiki
page up as well so other know how to do it.

Cheers,
Dave

On Tue, 17 Mar 2020 at 13:53, Tim Allison  wrote:

> Um, y, I hear you, Dave.  Health, safety, family, community first...then
> Tika. :/
>
> On Tue, Mar 17, 2020 at 9:21 AM David Meikle  wrote:
>
> > +1 from me.
> >
> > Sorry for the delay, it has been crazy here in Europe.  Thanks for
> > preparing this, Tim.
> >
> > On Wed, 11 Mar 2020 at 19:02, Tim Allison  wrote:
> >
> > > Please vote on releasing this package as Apache Tika 1.24.
> > > The vote is open for the next 72 hours and passes if a majority of at
> > > least three +1 Tika PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Tika 1.24
> > > [ ] -1 Do not release this package because...
> > >
> > > Here's my +1
> > >
> > > Cheers,
> > >
> > >   Tim
> > >
> > > P.S. I had a couple of failed attempts, which is why we're starting at
> #3
> > >
> >
>


Re: [VOTE] Release Apache Tika 1.24 Candidate #3

2020-03-17 Thread David Meikle
+1 from me.

Sorry for the delay, it has been crazy here in Europe.  Thanks for
preparing this, Tim.

On Wed, 11 Mar 2020 at 19:02, Tim Allison  wrote:

> Please vote on releasing this package as Apache Tika 1.24.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.24
> [ ] -1 Do not release this package because...
>
> Here's my +1
>
> Cheers,
>
>   Tim
>
> P.S. I had a couple of failed attempts, which is why we're starting at #3
>


Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2020-02-05 Thread David Meikle
Hi Eric,

+1 - I think we should drop that and rely on tika-docker instead.

I'm about to push more to it tonight, and then we could include it as a
sub-module in Tika to do regular development snapshots too.

Cheers,
Dave

On Wed, 5 Feb 2020 at 15:34, Eric Pugh 
wrote:

> Following this thread, should we deprecate/remove the Tika Docker support
> that is in Tika-server project?
>
> The `mvn dockerfile:build` command now relies on a plugin that is no
> longer supported according to https://github.com/spotify/dockerfile-maven,
> and it seems like the Tika-docker project is really the right place for
> this!
>
> I’m thinking that this might help reduce the footprint of things we need
> to support.
>
>
>
>
>
>
>
>
> > On Jan 9, 2020, at 12:08 AM, Chris Mattmann  wrote:
> >
> > +1
> >
> >
> >
> > Note there is also a USC tika dockers repo where I put the data science
> stuff too:
> >
> >
> >
> > http://github.com/USCDataScience/tika-dockers
> >
> >
> >
> > I’ll continue to push DL and ML Tika stuff there.
> >
> > Cheers,
> >
> > Chris
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > From: Dave Meikle 
> > Reply-To: "dev@tika.apache.org" 
> > Date: Wednesday, January 8, 2020 at 2:18 PM
> > To: "" 
> > Subject: Re: [EXTERNAL] Do we have a community supported approach for
> deploying Tika Server in production?
> >
> >
> >
> > Hi Eric,
> >
> >
> >
> > Will take a look. On a related note, I've created a new repos:
> >
> > https://github.com/apache/tika-docker
> >
> >
> >
> > Thinking based on looking at the PRs and Issues on LogicalSpark
> >
> > docker-tikaserver, I'll create an updated docker file using what you've
> >
> > added here and look to publish builds to docker hub from that.
> >
> >
> >
> > What do you think?
> >
> >
> >
> > Cheers,
> >
> > Dave
> >
> >
> >
> >
> >
> >
> >
> > On Wed, 8 Jan 2020 at 03:16, Eric Pugh 
> >
> > wrote:
> >
> >
> >
> > Hi all, I’ve gone ahead and added the -spawnChild property as a default
> >
> > when running Tika Server as a service.   I’d love some eyes on the PR,
> and
> >
> > if this looks good, get it committed.
> >
> >
> >
> > Feedback welcome!
> >
> >
> >
> > Eric
> >
> >
> >
> >
> >
> >
> >
> >> On Dec 17, 2019, at 12:53 PM, Eric Pugh <
> ep...@opensourceconnections.com>
> >
> > wrote:
> >
> >>
> >
> >> Cool.
> >
> >>
> >
> >> It’s the auto run that I really need, and the other part that I don’t
> >
> > think I’ve tackled properly is the managing of logs…
> >
> >>
> >
> >> I’m going to check with my project to see if they support Snap packages.
> >
> >>
> >
> >> Eric
> >
> >>
> >
> >>
> >
> >>> On Dec 16, 2019, at 5:10 PM, Tom Barber  >
> > t...@spicule.co.uk>> wrote:
> >
> >>>
> >
> >>> Just saw this fly by and FYI on Linux systems that support Snap
> >
> > packages (Ubuntu/Debian/Arch/Fedora etc) you can `snap install
> tika-server`
> >
> > doesn’t yet auto-run I don’t believe but you can just run
> `tika-server.run`
> >
> > and adding an init script wouldn’t take 5 minutes.
> >
> >>>
> >
> >>> Tom
> >
> >>>
> >
> >>> On 16 December 2019 at 18:42:55, Eric Pugh (
> >
> > ep...@opensourceconnections.com  >)
> >
> > wrote:
> >
> >>>
> >
>  Hi folks!
> >
> 
> >
>  I’ve got a mostly completed PR for having install scripts for Tika
> >
> > Server, and I’m hoping a committer will take a look at the PR, and give
> >
> > feedback (and ideally commit in time for 1.24!)
> >
> 
> >
>  A couple of things:
> >
> 
> >
>  1) This was completely influenced by
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> >
> > <
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> >
> >> <
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> >
> > <
> >
> >
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> >>,
> >
> > in fact I started with the Solr scripts.
> >
> 
> >
>  2) I’ve deleted all the Solr specific aspects (I think), however there
> >
> > may still be more to delete.
> >
> 
> >
>  3) This requires a change to how we release Tika, previously we ship
> >
> > tika-app.jar and Tika-eval.jar, and Tika-server.jar, and now, I think, we
> >
> > want to add the tika-server-bin.tgz and tika-server-bin.zip binary
> >
> > distributions.
> >
> 
> >
>  I’m happy to start writing accompanying “how to deploy Tika Server”
> >
> > docs if this PR looks good! Or, please give input and I’ll make the
> updates.
> >
> 
> >
>  Eric
> >
> 
> >
> 
> >
> > On Dec 12, 2019, at 2:39 PM, Eric Pugh <
> >
> > ep...@opensourceconnections.com  >>
> >
> > wrote:
> >
> >
> >
> > I’ve created this JIRA to track this work:
> >
> > https://issues.apache.org/jira/browse/TIKA-3010 <
> >
> > https://issues.apache.org/jira/browse/TIKA-3010> <
> 

Re: [VOTE] Release Apache Tika 1.23 Candidate #2

2019-12-04 Thread David Meikle
On Tue, 3 Dec 2019 at 03:15, Tim Allison  wrote:

> Please vote on releasing this package as Apache Tika 1.23.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.23
> [ ] -1 Do not release this package because...
>

 +1

Cheers,
Dave


Windows Build

2019-07-30 Thread David Meikle
Hello,

I've changed the config of the Windows build in Jenkins to point to point
to the right path for Maven settings, so hopefully that is it back on track
again.

Cheers,
Dave


Re: [VOTE] Release Apache Tika 1.22 Candidate #3

2019-07-28 Thread David Meikle
On Fri, 26 Jul 2019 at 16:48, Tim Allison  wrote:

> Please vote on releasing this package as Apache Tika 1.22.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
>
> [ ] +1 Release this package as Apache Tika 1.22
>
> [ ] -1 Do not release this package because...
>

 -1 - I am getting build errors with this RC on both Mac and Ubuntu I am
afraid.

[INFO] Running org.apache.tika.eval.reports.ResultsReporterTest
[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0
s - in org.apache.tika.eval.reports.ResultsReporterTest
[INFO]
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   TikaEvalCLITest.testBasicCompare:81->TikaTest.assertContains:110
testdb.mv.db not found in:
[]
[ERROR]   TikaEvalCLITest.testBasicProfile:90->TikaTest.assertContains:110
testdb.mv.db not found in:
[]
[ERROR]   TikaEvalCLITest.testComparisonReports:117
[ERROR]   TikaEvalCLITest.testProfileReports:104
[ERROR] Errors:
[ERROR]   SimpleComparerTest.staticSetUp:63 » NullPointer in must not be
null
[ERROR]   LangIdTest.init:33 » NullPointer in must not be null
[ERROR]   LanguageIdTest.testDefenseAgainstBadRegexInOpenNLP:35 »
NullPointer in must no...
[INFO]
[ERROR] Tests run: 24, Failures: 4, Errors: 3, Skipped: 5

Cheers,
Dave


Re: [VOTE] Release Apache Tika 1.18 Candidate #3

2018-04-24 Thread David Meikle
Hi Tim

It looks like something to do with my Tesseract setup on my Mac


It is all working perfectly on my Windows and Linux machines, so +1 from me
on the release too.

Cheers,
Dave


On 24 April 2018 at 13:29, Allison, Timothy B. <talli...@mitre.org> wrote:

> Hi Dave,
>
> Let us know what you find.  I had a successful build w and w/o tesseract
> on both Windows and Linux.
>
> I did get a failed build if I wasn't connected to the internet because the
> URL exception in DL4JInceptionV3NetTest didn't have the expected message.
> I've since fixed this in master and branch_1x.
>
> If that's worth a respin, I can do that.
>
> If you find something else (problems w diff version of tesseract, diff os,
> diff java version, etc), let us know.
>
> Cheers,
>
>        Tim
>
> -Original Message-
> From: David Meikle [mailto:loo...@gmail.com]
> Sent: Monday, April 23, 2018 7:07 PM
> To: dev@tika.apache.org
> Subject: Re: [VOTE] Release Apache Tika 1.18 Candidate #3
>
> Hey Tim,
>
> Just started looking at this and got an error with Tesseract enabled. Will
> try to see if it is localised to me or not.
>
> Cheers,
> Dave
>
> On 22 April 2018 at 13:29, Oleg Tikhonov <o...@apache.org> wrote:
>
> > Sorry for the noise.
> >
> > tar'ed
> >
> > On Sun, Apr 22, 2018 at 3:07 PM, Oleg Tikhonov <o...@apache.org> wrote:
> >
> >> My bad. This one, hopefully ...
> >>
> >>
> >> On Sun, Apr 22, 2018 at 3:01 PM, Oleg Tikhonov <o...@apache.org> wrote:
> >>
> >>> Hi,
> >>> thanks a lot.
> >>> [x] +1 Release this package as Apache Tika 1.18
> >>>
> >>> Even did a security scan:
> >>> mvn org.owasp:dependency-check-maven:3.1.2:check
> >>>
> >>> Report is attached.
> >>>
> >>> Best regards,
> >>> Oleg
> >>>
> >>>
> >>> On Sat, Apr 21, 2018 at 12:54 AM, talli...@apache.org <
> >>> talli...@apache.org> wrote:
> >>>
> >>>> All,
> >>>> A candidate for the Tika 1.18 release is available at:
> >>>> https://dist.apache.org/repos/dist/dev/tika/
> >>>> The release candidate is a zip archive of the sources in:
> >>>> https://github.com/apache/tika/tree/1.18-rc3
> >>>> The SHA-512 checksum of the archive is
> f69ee27b31cf7bcb1eaf114b93c23d
> >>>> d85b974356cc7e6e265b1c9366a11d711a3341e690f5b452a3e8b0c5cc6f
> >>>> 5839db01b3ef6ec3a2a29ffcd332ff7a63dcf3.
> >>>> In addition, a staged maven repository is available here:
> >>>> https://repository.apache.org/content/repositories/orgapachetika-10
> >>>> 33 Please vote on releasing this package as Apache Tika 1.18.The
> >>>> vote is open for the next 72 hours and passes if a majority of
> >>>> atleast three +1 Tika PMC votes are cast.
> >>>> [ ] +1 Release this package as Apache Tika 1.18[ ] -1 Do not
> >>>> release this package because...
> >>>> +1 from me; third time's the charm...
> >>>> Cheers,
> >>>> Tim
> >>>
> >>>
> >>>
> >>
> >
>


Re: [VOTE] Release Apache Tika 1.18 Candidate #3

2018-04-23 Thread David Meikle
Hey Tim,

Just started looking at this and got an error with Tesseract enabled. Will
try to see if it is localised to me or not.

Cheers,
Dave

On 22 April 2018 at 13:29, Oleg Tikhonov  wrote:

> Sorry for the noise.
>
> tar'ed
>
> On Sun, Apr 22, 2018 at 3:07 PM, Oleg Tikhonov  wrote:
>
>> My bad. This one, hopefully ...
>>
>>
>> On Sun, Apr 22, 2018 at 3:01 PM, Oleg Tikhonov  wrote:
>>
>>> Hi,
>>> thanks a lot.
>>> [x] +1 Release this package as Apache Tika 1.18
>>>
>>> Even did a security scan:
>>> mvn org.owasp:dependency-check-maven:3.1.2:check
>>>
>>> Report is attached.
>>>
>>> Best regards,
>>> Oleg
>>>
>>>
>>> On Sat, Apr 21, 2018 at 12:54 AM, talli...@apache.org <
>>> talli...@apache.org> wrote:
>>>
 All,
 A candidate for the Tika 1.18 release is available at:
 https://dist.apache.org/repos/dist/dev/tika/
 The release candidate is a zip archive of the sources in:
 https://github.com/apache/tika/tree/1.18-rc3
 The SHA-512 checksum of the archive isf69ee27b31cf7bcb1eaf114b93c23d
 d85b974356cc7e6e265b1c9366a11d711a3341e690f5b452a3e8b0c5cc6f
 5839db01b3ef6ec3a2a29ffcd332ff7a63dcf3.
 In addition, a staged maven repository is available here:
 https://repository.apache.org/content/repositories/orgapachetika-1033
 Please vote on releasing this package as Apache Tika 1.18.The vote is
 open for the next 72 hours and passes if a majority of atleast three +1
 Tika PMC votes are cast.
 [ ] +1 Release this package as Apache Tika 1.18[ ] -1 Do not release
 this package because...
 +1 from me; third time's the charm...
 Cheers,
 Tim
>>>
>>>
>>>
>>
>


Re: TIKA-1509 (2.x breaking parser change) - ready for first review!

2018-03-18 Thread David Meikle
Nice one Nick!  Will take a look this week.

Cheers,
Dave

On 14 March 2018 at 17:38, Nick Burch  wrote:

> Hi All
>
> As promised, I've finally had a go to try and implement my ideas for
> TIKA-1509 / https://wiki.apache.org/tika/CompositeParserDiscussion /
> breaking 2.x parser change
>
> My work so far is in this github branch, and is ready for review!
> https://github.com/apache/tika/tree/multiple-parsers
>
>
> It seems to work fine for the Fallback case, and for the Supplemental
> case. You can set a policy that controls how clashing metadata is handled,
> currently "first one to set a key wins", "last one to set a key wins",
> "ignore previous parsers", and "keep old and new unique values"
>
> I've also done a proof of concept for "pick best" case, to try running the
> text parser with a specified set of different charsets, capture the text
> from each, "pick the best" (hard coded 1st...) then run for real with that
> one.
>
>
> Key TODOs - Support InputStreamFactory, properly work out what mimetypes
> to claim to support, Tika Config XML friendly helper for the metadata clash
> policy, review ContentHandlerFactory signature and tweak if needed.
>
> Proposed breaking 2.x change - add second parse method that takes
> ContentHandlerFactory instead of ContentHandler, with most parsers getting
> that just grabbing a single one and using that as before
>
>
> Before I do any more though... Thoughts? Comments? Ideas? Changes? Should
> I stop? Carry on? Modify it? Other?
>
> Nick
>


RE: Tika 1.17?

2017-11-29 Thread David Meikle
I am thinking TIKA-2385. I've got a resized image that I can commit tonight
that should close this one off.

Cheers,
Dave


On 29 Nov 2017 14:42, "Allison, Timothy B." <talli...@mitre.org> wrote:

Many thanks to Bob for help on TIKA-2502!

Anything else we want to put into 1.17 before I run the regression tests?

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: Monday, November 13, 2017 1:42 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?

Y.  You're right.  Thank you!

 I think I've been avoiding that because there were some regressions in
metadata-extractor last I looked at this.  Let's hope those are gone in
2.10.1.

-Original Message-
From: Tyler Bui-Palsulich [mailto:tpalsul...@apache.org]
Sent: Sunday, November 12, 2017 2:54 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?

TIKA-2486 might be worth blocking on since there is a CVE.

Tyler

On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <talli...@mitre.org> wrote:

> Y.  I'm happy enough  to wait a few more days.  I wasn't able to kick
> off the regression tests last week.  Should I wait for the new parsers
> to run the regression tests?
>
> -Original Message-
> From: David Meikle [mailto:loo...@gmail.com]
> Sent: Friday, November 3, 2017 7:42 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Sounds good. I have a couple of new parsers I would like to slot in
> but not had a chance the last few months. Will go for it over the
> weekend, if that works for you Tim.
>
> Cheers,
> Dave
>
>
>
> On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
> > Let’s make it so (
> >
> > 
> ++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> > Manager, NSF & Open Source Projects Formulation and Development
> > Offices
> > (8212)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > 
> ++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department University
> > of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > 
> ++
> >
> >
> >
> > On 11/3/17, 7:35 AM, "Allison, Timothy B." <talli...@mitre.org> wrote:
> >
> > All,
> >
> > PDFBox 2.0.8 is now integrated.  I want to fix TIKA-2490 before
> > we release 1.17.  Are there other issues that are blockers or you'd
> > like to fix before 1.17 (TIKA-2471, maybe?)?
> >
> > I plan to run initial large scale regression tests shortly for
> > rfc822 and mbox because of TIKA-2478.  I'll run the full regression
> > tests before cutting the RC, but I want to focus on those for now.
Other requests?
> >
> > Cheers,
> >
> > Tim
> >
> >
> >
>


Re: Tika 1.17?

2017-11-03 Thread David Meikle
Sounds good. I have a couple of new parsers I would like to slot in but not
had a chance the last few months. Will go for it over the weekend, if that
works for you Tim.

Cheers,
Dave



On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Let’s make it so (
>
> ++
> Chris Mattmann, Ph.D.
> Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 180-503E, Mailstop: 180-503
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
>
>
>
> On 11/3/17, 7:35 AM, "Allison, Timothy B."  wrote:
>
> All,
>
> PDFBox 2.0.8 is now integrated.  I want to fix TIKA-2490 before we
> release 1.17.  Are there other issues that are blockers or you'd like to
> fix before 1.17 (TIKA-2471, maybe?)?
>
> I plan to run initial large scale regression tests shortly for rfc822
> and mbox because of TIKA-2478.  I'll run the full regression tests before
> cutting the RC, but I want to focus on those for now.  Other requests?
>
> Cheers,
>
> Tim
>
>
>


Moved Jenkins View

2017-07-23 Thread David Meikle
Hello All,

Just a note to say I have moved the Tika Jenkins view from the top level to
under the T view as per Infra's note on JIRA.

You can now find it here:
https://builds.apache.org/view/T/view/Tika/

Cheers,
Dave


Re: [VOTE] Release Apache Tika 1.15 Candidate #2

2017-05-27 Thread David Meikle
On 24 May 2017 at 02:22, Tim Allison  wrote:

> Please vote on releasing this package as Apache Tika 1.15.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.15
> [ ] -1 Do not release this package because...
>
>
+1 from me. Thanks for RMing this Tim, great job

Build and testing on Windows 10, Ubuntu 16.04 and macOS 10.12.5 on JDK 8.
SHA, MD5 and SIGS OK.  Also used a new instance
logicalspark/docker-tikaserver:1.15rc2
in production and everything running smooth.

Signature good but there is no trust on key, not a blocker but I think you
have actually got your key signed by some third parties now, Tim.  Have you
managed to submit your signed key?

Cheers,
Dave


Re: Welcome Thejan Wijesinghe GSoC 2017 student!

2017-05-06 Thread David Meikle
Congratulations and welcome, Thejan!

On 4 May 2017 at 18:19, Chris Mattmann  wrote:

> I’d like to welcome Thejan Wijesinghe our Apache Tika GSoC 2017 student,
> working on Supporting Image-to-Text (Image Captioning) in Tika for Image
> MIME Types.
>
> Thamme and I will mentor him and welcome to the community!
>
> Cheers,
> Chris
>
>
>
>
>


Re: 1.15?

2017-04-17 Thread David Meikle
+1 from me too.

Cheers,
Dave

On 13 April 2017 at 13:08, Konstantin Gribov  wrote:

> Preliminary +1 from me, I'll the a closer look this weekend
>
> чт, 13 апр. 2017, 0:00 Allison, Timothy B. :
>
> > All,
> >   POI is voting on rc1 of the next release.  Once that's released and
> > integrated into Tika, let's start the release process for Tika 1.15, end
> of
> > next week, middle of following?  Any blockers?
> >
> >  Cheers,
> >
> >  Tim
> >
> >
> > --
>
> Best regards,
> Konstantin Gribov
>


Re: documenting ParseContext

2017-03-14 Thread David Meikle
On 13 March 2017 at 16:18, Allison, Timothy B.  wrote:

>   On SOLR-9178[1], Alexandre Rafalovitch asked about documentation/uses of
> ParseContext.  I started a page on our wiki: https://wiki.apache.org/tika/
> TikaParseContext  Please edit as you see
>

Nice one, Tim!

Cheers,
Dave


tika-2.x-windows now running

2017-03-13 Thread David Meikle
Hello All,

The tika-2.x-windows is back up and running - whoop whoop!

Turns out the Maven build configuration wasn't pointing to a settings.xml
that had the relevant apache-release profile settings available, thus
failing on any step that needed access to the repository. Not sure if there
is anything more to be done on INFRA-13647 (I would have thought this
profile should be there by default) but for now we should be good.

I was thinking we should:

   - Add a Windows build for trunk
   - Drop the deploy step from the Windows builds (i.e. move to install
   command) as we are just wasting cycles and network deploying 2x (one from
   standard Linux build, one from Windows).

Any thoughts?

Cheers,
Dave


Fwd: tika-2.x-windows - Build # 175 - Still Failing

2017-03-11 Thread David Meikle
Hello,

The Apache Jenkins build system has built tika-2.x-windows (build #175)
> Status: Still Failing
> Check console output at https://builds.apache.org/job/
> tika-2.x-windows/175/ to view the results.


I have been trying to resurrect the Window build, as I am getting some
encoding detector test failures with 2.x on Windows, and it would appear
the deployments to https://repository.apache.org is the issue.

Changing the build to use mvn install instead of deploy confirms the build
is fine, with the failure then shifting to the post-build deploy artefact
action.

Will send a ticket to INFRA as looks like the apache.releases.http ID may
be out of sync.

Cheers,
Dave


Re: tika-2.x-windows - Build # 175 - Still Failing

2017-03-11 Thread David Meikle
On 11 March 2017 at 15:35, David Meikle <loo...@gmail.com> wrote:
>
> Will send a ticket to INFRA as looks like the apache.releases.http ID may
> be out of sync.
>
>
Raised as INFRA-13647 <https://issues.apache.org/jira/browse/INFRA-13647>.

Cheers,
Dave


Re: updating Jenkins?

2017-03-10 Thread David Meikle
Hi Tim,

Have updated them.

We have two old jdk7 and jdk8 builds that are disabled.  Any objections to
removing them?

Cheers,
Dave

On 1 March 2017 at 21:16, Allison, Timothy B.  wrote:

> Would a dev with karma and knowledge be able to update Jenkins to poll
> github now that we've moved over?
>
> Thank you!
>
>   Best,
>
>Tim
>


Re: Jenkins?

2017-01-12 Thread David Meikle

> On 6 Jan 2017, at 12:53, Allison, Timothy B.  wrote:
> 
> Anyone know how we can fix Jenkins for tika-2.x-windows or at least ask it to 
> wait for a change in git?
> 
> -Original Message-
> From: Apache Jenkins Server [mailto:jenk...@builds.apache.org 
> ] 
> Sent: Friday, January 6, 2017 7:18 AM
> To: dev@tika.apache.org 
> Subject: tika-2.x-windows - Build # 132 - Still Failing
> 
> The Apache Jenkins build system has built tika-2.x-windows (build #132)
> 
> Status: Still Failing
> 
> Check console output at https://builds.apache.org/job/tika-2.x-windows/132/ 
>  to view the results.


I have access to configure the job, what do we want to change? Just the build 
trigger?

Cheers,
Dave

[ANNOUNCE] Welcome Luis Filipe Nassif and Thamme Gowda as Apache Tika PMC members and committers

2016-11-05 Thread David Meikle
Hello Everyone,

Please join in (belatedly) welcoming Luis Filipe Nassif and Thamme Gowda as 
both PMC Members and Committers to the project!

Welcome to the team guys. Feel free to say a bit about yourselves and why you 
love Tika.

Cheers,
Dave





Re: [VOTE] Apache Tika 1.14 Release Candidate #1

2016-10-26 Thread David Meikle
Hey.

Spot on Konstantin, thanks. Removed ffmpeg from path and it works.

Also, noticed the same KEY issue too.

Just running some content tests now.

Cheers,
Dave

> On 24 Oct 2016, at 19:30, Konstantin Gribov <gros...@gmail.com> wrote:
> 
> `ForkParser` related tests fail in presence of ffmpeg on my system. Dave,
> check `ffmpeg` presence on the PATH, please. It seems to be TIKA-2056 as
> Tim said above. I've excluded `ffmpeg` from `tika-external-parsers.xml` and
> all tests pass after that.
> 
> Also, tested on ArchLinux w/ Grobid.
> 
> пн, 24 окт. 2016 г. в 19:41, Konstantin Gribov <gros...@gmail.com>:
> 
>> Chris,
>> 
>> you have new PGP key which is not present your account in [1]. Could you
>> please update it there? Also, `KEYS` file in `tika-1.14-src.zip` contains
>> only your old PGP key.
>> 
>> SHA-1 and MD5 digests are fine, `tika-server` and `tika-app` work fine on
>> Arch Linux, OpenJDK 8u112 with and without Tesseract.
>> 
>> Build (`mvn clean package verify`) fails same way as Julien Nioche and
>> Dave mentioned on Arch Linux with or without tesseract. I have no exiftool,
>> so I'll try to investigate what else make `AutoDetectParser`
>> non-serializable. I hope, I'll have a bit time this evening for this.
>> 
>> Also, one test fails `testParserHandlingOfNonSerializable` because
>> exception message was `Unable to serialize [AutoDetectParser] to pass to
>> the Fork...` instead of `Unable to serialize [ParseContext] to pass to
>> the Fork...`. But it seems the same issue as above.
>> 
>> Both issues aren't strict blockers to me but I'd ask you to increase
>> voting time to dig into issue with non-serializable `AutoDetectParser` if
>> you don't mind.
>> 
>> [1]: https://people.apache.org/keys/committer/mattmann.asc
>> 
>> пн, 24 окт. 2016 г. в 16:15, David Meikle <loo...@gmail.com>:
>> 
>> Hello,
>> 
>> I am getting the same as Julien without exiftool installed on my Mac.
>> Everything passes on Windows 10 and Ubuntu.
>> 
>> Will have a dig and see what I find.
>> 
>> Cheers,
>> Dave
>> 
>>> On 20 Oct 2016, at 13:34, Julien Nioche <lists.digitalpeb...@gmail.com>
>> wrote:
>>> 
>>> Hi
>>> 
>>> Am getting the following when running 'mvn clean package', have I
>> forgotten
>>> something obvious?
>>> 
>>> Julien
>>> 
>>> *Failed tests: *
>>> *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
>>> expected: but
>>> was:*
>>> *Tests in error: *
>>> *
>>> 
>> ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
>>> » Tika*
>>> *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
>>> serialize ...*
>>> *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
>>> serialize ...*
>>> 
>>> *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
>>> 
>>> *[INFO]
>>> *
>>> *[INFO] Reactor Summary:*
>>> *[INFO] *
>>> *[INFO] Apache Tika parent  SUCCESS
>>> [4.368s]*
>>> *[INFO] Apache Tika core .. SUCCESS
>>> [16.487s]*
>>> *[INFO] Apache Tika parsers ... FAILURE
>>> [4:54.631s]*
>>> 
>>> 
>>> 
>>> On 19 October 2016 at 19:48, Chris Mattmann <mattm...@apache.org> wrote:
>>> 
>>>> Hi Folks,
>>>> 
>>>> A first candidate for the Tika 1.14 release is available at:
>>>> 
>>>> https://dist.apache.org/repos/dist/dev/tika/
>>>> 
>>>> The release candidate is a zip archive of the sources in:
>>>> 
>>>> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
>>>> 687d7706c9778e4f49f2834a07e5a9d99b23042b
>>>> 
>>>> The SHA1 checksum of the archive is:
>>>> ad9152392ffe6b620c8102ab538df0579b36c520
>>>> 
>>>> In addition, a staged maven repository is available here:
>>>> 
>>>> https://repository.apache.org/content/repositories/orgapachetika-1020/
>>>> 
>>>> Please vote on releasing this package as Apache Tika 1.14.
>>>> The vote is open for the next 72 hours and passes if a majority of at
>>>> least three +1 Tika PMC votes are cast.
>>>> 
>>>> [ ] +1 Release this package as Apache Tika 1.14
>>>> [ ] -1 Do not release this package because..
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> P.S. Of course here is my +1.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> *Open Source Solutions for Text Engineering*
>>> 
>>> http://www.digitalpebble.com
>>> http://digitalpebble.blogspot.com/
>>> #digitalpebble <http://twitter.com/digitalpebble>
>> 
>> --
>> 
>> Best regards,
>> Konstantin Gribov
>> 
> -- 
> 
> Best regards,
> Konstantin Gribov



Re: [VOTE] Apache Tika 1.14 Release Candidate #1

2016-10-24 Thread David Meikle
Hello,

I am getting the same as Julien without exiftool installed on my Mac. 
Everything passes on Windows 10 and Ubuntu.

Will have a dig and see what I find.

Cheers,
Dave

> On 20 Oct 2016, at 13:34, Julien Nioche  wrote:
> 
> Hi
> 
> Am getting the following when running 'mvn clean package', have I forgotten
> something obvious?
> 
> Julien
> 
> *Failed tests: *
> *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
> expected: but
> was:*
> *Tests in error: *
> *
> ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
> » Tika*
> *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
> serialize ...*
> *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
> serialize ...*
> 
> *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
> 
> *[INFO]
> *
> *[INFO] Reactor Summary:*
> *[INFO] *
> *[INFO] Apache Tika parent  SUCCESS
> [4.368s]*
> *[INFO] Apache Tika core .. SUCCESS
> [16.487s]*
> *[INFO] Apache Tika parsers ... FAILURE
> [4:54.631s]*
> 
> 
> 
> On 19 October 2016 at 19:48, Chris Mattmann  wrote:
> 
>> Hi Folks,
>> 
>> A first candidate for the Tika 1.14 release is available at:
>> 
>>  https://dist.apache.org/repos/dist/dev/tika/
>> 
>> The release candidate is a zip archive of the sources in:
>> 
>> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
>> 687d7706c9778e4f49f2834a07e5a9d99b23042b
>> 
>> The SHA1 checksum of the archive is:
>> ad9152392ffe6b620c8102ab538df0579b36c520
>> 
>> In addition, a staged maven repository is available here:
>> 
>> https://repository.apache.org/content/repositories/orgapachetika-1020/
>> 
>> Please vote on releasing this package as Apache Tika 1.14.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Tika PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Tika 1.14
>> [ ] -1 Do not release this package because..
>> 
>> Cheers,
>> Chris
>> 
>> P.S. Of course here is my +1.
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> 
> *Open Source Solutions for Text Engineering*
> 
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble 



[ANNOUNCE] Apache Tika 1.13 release

2016-05-16 Thread David Meikle
The Apache Tika project is pleased to announce the release of Apache
Tika 1.13. The release contents have been pushed out to the main
Apache release site and to the Central sync, so the releases should
be available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.13 contains a number of improvements and bug fixes.
Details can be found in the changes file:
http://www.apache.org/dist/tika/CHANGES-1.13.txt 


Apache Tika is available in source form from the following download
page: http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.13-src.zip 


Apache Tika is also available in binary form or for use using Maven 2 from
the Central Repository:
http://repo1.maven.org/maven2/org/apache/tika/ 


In the initial 48 hours, the release may not be available on all
mirrors. When downloading from a mirror site, please remember to
verify the downloads using signatures found on the Apache site:
https://people.apache.org/keys/group/tika.asc 


For more information on Apache Tika, visit the project home page:
http://tika.apache.org 

— Dave Meikle, on behalf of the Apache Tika community

Re: [RESULT] [VOTE] Release Apache Tika 1.13 Candidate #1

2016-05-16 Thread David Meikle
A quick update, following the Nexus outage the repos has been released and I 
have the website prepared and ready to go, but the artefacts are not syncing to 
www.apache.org/dist/tika <http://www.apache.org/dist/tika>.  Have raised ticket 
with the Infra team (INFRA-11869 
<https://issues.apache.org/jira/browse/INFRA-11869>). Not sure if anyone knows 
a workaround?

Cheers,
Dave

> On 15 May 2016, at 11:27, David Meikle <dmei...@apache.org> wrote:
> 
> Hello,
> 
> This VOTE has PASSED with the following tally:
> 
> +1
> Tim Allison*
> Konstantin Gribov*
> Lewis John McGibbney*
> David Meikle*
> 
> * = Tika PMC
> 
> I will finish the remaining tasks now.
> 
> Cheers,
> Dave
> 
>> Begin forwarded message:
>> 
>> From: David Meikle <dmei...@apache.org <mailto:dmei...@apache.org>>
>> Subject: [VOTE] Release Apache Tika 1.13 Candidate #1
>> Date: 9 May 2016 at 20:34:32 BST
>> To: dev@tika.apache.org <mailto:dev@tika.apache.org>, u...@tika.apache.org 
>> <mailto:u...@tika.apache.org>
>> 
>> A candidate for the Tika 1.13 release is available at:
>>  https://dist.apache.org/repos/dist/dev/tika/ 
>> <https://dist.apache.org/repos/dist/dev/tika/>
>> 
>> The release candidate is a zip archive of the sources in:
>>  
>> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tag;h=18fa8213438183a249df4f52535031670f0a3eef
>>  
>> <https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tag;h=18fa8213438183a249df4f52535031670f0a3eef>
>> 
>> The SHA1 checksum of the archive is
>>  8a591e7ea29dca14d5f25b44b3a2a35425676c64.
>> 
>> In addition, a staged maven repository is available here:
>>  
>> https://repository.apache.org/content/repositories/orgapachetika-1019/org/apache/tika
>>  
>> <https://repository.apache.org/content/repositories/orgapachetika-1019/org/apache/tika>
>> 
>> Please vote on releasing this package as Apache Tika 1.13.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Tika PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Tika 1.13
>> [ ] -1 Do not release this package because…
>> 
>> Here is my +1 for the release.
>> 
>> Cheers,
>> Dave
>> 
>> P.S. For anyone looking to test using the Apache Tika Server I have put up a 
>> branch that pulls down the RC at 
>> https://github.com/LogicalSpark/docker-tikaserver/tree/1.13rc1 
>> <https://github.com/LogicalSpark/docker-tikaserver/tree/1.13rc1>



[RESULT] [VOTE] Release Apache Tika 1.13 Candidate #1

2016-05-15 Thread David Meikle
Hello,

This VOTE has PASSED with the following tally:

+1
Tim Allison*
Konstantin Gribov*
Lewis John McGibbney*
David Meikle*

* = Tika PMC

I will finish the remaining tasks now.

Cheers,
Dave

> Begin forwarded message:
> 
> From: David Meikle <dmei...@apache.org>
> Subject: [VOTE] Release Apache Tika 1.13 Candidate #1
> Date: 9 May 2016 at 20:34:32 BST
> To: dev@tika.apache.org, u...@tika.apache.org
> 
> A candidate for the Tika 1.13 release is available at:
>  https://dist.apache.org/repos/dist/dev/tika/
> 
> The release candidate is a zip archive of the sources in:
>  
> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tag;h=18fa8213438183a249df4f52535031670f0a3eef
> 
> The SHA1 checksum of the archive is
>  8a591e7ea29dca14d5f25b44b3a2a35425676c64.
> 
> In addition, a staged maven repository is available here:
>  
> https://repository.apache.org/content/repositories/orgapachetika-1019/org/apache/tika
> 
> Please vote on releasing this package as Apache Tika 1.13.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Tika 1.13
> [ ] -1 Do not release this package because…
> 
> Here is my +1 for the release.
> 
> Cheers,
> Dave
> 
> P.S. For anyone looking to test using the Apache Tika Server I have put up a 
> branch that pulls down the RC at 
> https://github.com/LogicalSpark/docker-tikaserver/tree/1.13rc1



[VOTE] Release Apache Tika 1.13 Candidate #1

2016-05-09 Thread David Meikle
A candidate for the Tika 1.13 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  
https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tag;h=18fa8213438183a249df4f52535031670f0a3eef

The SHA1 checksum of the archive is
  8a591e7ea29dca14d5f25b44b3a2a35425676c64.

In addition, a staged maven repository is available here:
  
https://repository.apache.org/content/repositories/orgapachetika-1019/org/apache/tika

Please vote on releasing this package as Apache Tika 1.13.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.

[ ] +1 Release this package as Apache Tika 1.13
[ ] -1 Do not release this package because…

Here is my +1 for the release.

Cheers,
Dave

P.S. For anyone looking to test using the Apache Tika Server I have put up a 
branch that pulls down the RC at 
https://github.com/LogicalSpark/docker-tikaserver/tree/1.13rc1

Re: pre-release 1.13 regression testing

2016-05-03 Thread David Meikle

> On 2 May 2016, at 18:12, Mattmann, Chris A (3980) 
>  wrote:
> 
> I’m in Hawaii on vacation so please push forward ;)

I now officially hate you Chris!

Have a great break,
Dave

Re: pre-release 1.13 regression testing

2016-05-02 Thread David Meikle
Hi Tim,

> On 2 May 2016, at 12:29, Allison, Timothy B.  wrote:
> 
> Dave,
> Find any showstoppers?
> 
> All,
> Anyone have time to cut the release?
> 
>   Cheers,
> 
>Tim

Everything is looking good here - been trying it out in production.

I can cut the release today / tomorrow, unless any objections.

Cheers,
Dave

Re: [VOTE] Apache Tika 1.12 Release Candidate #1

2016-01-30 Thread David Meikle
Hello,

> On 25 Jan 2016, at 19:58, Mattmann, Chris A (3980) 
>  wrote:
> 
> Please vote on releasing this package as Apache Tika 1.12.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Tika 1.12
> [ ] -1 Do not release this package because…

+1 from me.

Cheers,
Dave

Re: [DISCUSS] Moving to Git

2015-11-19 Thread David Meikle
Hey,

> On 18 Nov 2015, at 14:46, Mattmann, Chris A (3980) 
>  wrote:
> 
> Hey Team,
> 
> I propose we move to writeable git repos for Tika for our repository.
> I mostly interact with Git & Github nowadays even with Tika using the
> mirroring and PR interaction support.
> 
> Thoughts?
> 
> Cheers,
> Chris

+1 from me. 

Cheers,
Dave

Re: [VOTE] Apache Tika 1.11 Release Candidate #1

2015-10-24 Thread David Meikle
Hello , 
> On 19 Oct 2015, at 15:23, Mattmann, Chris A (3980) 
>  wrote:
> 
> A first candidate for the Tika 1.11 release is available at:
> 
>  https://dist.apache.org/repos/dist/dev/tika/ 
> 
> 
> The release candidate is a zip archive of the sources in:
>  http://svn.apache.org/repos/asf/tika/tags/1.11-rc1/ 
> 
> 
> The SHA1 checksum of the archive is
> d0dde7b3a4f1a2fb6ccd741552ea180dddab630a
> 
> In addition, a staged maven repository is available here:
> 
> https://repository.apache.org/content/repositories/orgapachetika-1014/ 
> 
> 
> 
> Please vote on releasing this package as Apache Tika 1.11.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Tika 1.11
> [ ] -1 Do not release this package because…

+1 from me. Build and tests pass on OS X and Windows. Sigs good. I get the same 
non-trusted signature though.

Cheers,
Dave

[CVE-2015-3271] Apache Tika information disclosure vulnerability

2015-08-13 Thread David Meikle
CVE-2015-3271: Apache Tika information disclosure vulnerability 

Severity: Important

Vendor:
The Apache Software Foundation

Versions Affected:
Apache Tika 1.9

Description:

Apache Tika provides optional functionality to run itself as a web service to 
allow remote use. When used in this manner, 
it's possible for a 3rd party to pass a 'fileUrl' header to the Apache Tika 
Server (tika-server). This header lets a remote
client request that the server fetches content from the URL provided, including 
files from the server's local filesystem.
Depending on the file permissions set on the local filesystem, this could be 
used to return sensitive content from 
the server machine.

Note this vulnerability only exists if you are running the tika-server version 
1.9, and you allow un-trusted access to the tika-server
URL. Usage of Apache Tika as a standard library is not affected.

Mitigation:
Apache Tika 1.9 users should upgrade to Apache Tika 1.10

Example:
wget 
https://repo1.maven.org/maven2/org/apache/tika/tika-server/1.9/tika-server-1.9.jar
 
https://repo1.maven.org/maven2/org/apache/tika/tika-server/1.9/tika-server-1.9.jar
  java -jar tika-server-1.9.jar
curl -i -H fileUrl:file:///etc/passwd -H Accept: text/plain -X PUT 
http://localhost:9998/tika http://localhost:9998/tika

Credit:
This issue was discovered by Tim Allison from the Apache Tika Community.

[RESULT][VOTE] Apache Tika 1.10 Release Candidate #1

2015-08-08 Thread David Meikle
Hi Everyone,

Thanks everyone for their votes.  The VOTE to release Tika 1.9 RC #1 has
passed with the following tally:

+1:
Dave Meikle*
Sergey Beryozkin*
Tim Allison*
Konstantin Gribov*
Chris Mattmann*
Oleg Tikhonov*
Ken Krugler*
Tyler Palsulich*
Hong-Thai Nguyen*

±0:
None

-1:
None

* = PMC Member

I'll push out the release now.

Cheers,
Dave


[ANNOUNCE] Apache Tika 1.10 release

2015-08-08 Thread David Meikle
The Apache Tika project is pleased to announce the release of Apache Tika 1.10. 
The release contents have been pushed out to the main Apache release site and 
to the Central sync, so the releases should be available as soon as the mirrors 
get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and structured 
text content from various documents using existing parser libraries.

Apache Tika 1.10 contains a number of improvements and bug fixes. Details can 
be found in the changes file:
http://www.apache.org/dist/tika/CHANGES-1.10.txt 
http://www.apache.org/dist/tika/CHANGES-1.10.txt

Apache Tika is available in source form from the following download page:
http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.10-src.zip 
http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.10-src.zip

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Repository: http://repo1.maven.org/maven2/org/apache/tika/ 
http://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads 
using signatures found on the Apache site:
https://people.apache.org/keys/group/tika.asc 
https://people.apache.org/keys/group/tika.asc

For more information on Apache Tika, visit the project home page:
http://tika.apache.org/ http://tika.apache.org/

-- David Meikle, on behalf of the Apache Tika community



Re: 1.10 release missing license headers noted by Daniel Gruno

2015-08-07 Thread David Meikle
Hi Guys,

Apologies for this, I should have picked this up when rolling the release.  
Echoing Chris’s thanks for the catch on this Daniel.

Was about to go in and add in the plugin but see you’ve done this Nick.

Given the successful VOTE on 1.10, I will push this out unless anyone shouts 
differently.

Cheers,
Dave

 On 6 Aug 2015, at 20:32, Mattmann, Chris A (3980) 
 chris.a.mattm...@jpl.nasa.gov wrote:
 
 From Twitter:
 https://paste.apache.org/1CPH
 
 
 Don’t have to fix now, but would be good to fix for 1.11.
 
 Cheers,
 Chris
 
 P.S. Thanks for the catch Daniel!
 
 ++
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398)
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 168-519, Mailstop: 168-527
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Associate Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 



[VOTE] Apache Tika 1.10 Release Candidate #1

2015-08-02 Thread David Meikle
Hi Everyone,

A candidate for the Apache Tika 1.10 release is available at:

https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:

http://svn.apache.org/repos/asf/tika/tags/1.10-rc1/

The SHA1 checksum of the archive is

b1573adcb194e2c09b77eccc3b1edd16bd4ac67d.

In addition, a staged maven repository is available here:

https://repository.apache.org/content/repositories/orgapachetika-1013


Please vote on releasing this package as Apache Tika 1.10.
The vote is open for the next 72 hours and passes if a majority of at least
three +1 Tika PMC votes are cast.

[ ] +1 Release this package as Apache Tika 1.10

[ ] -1 Do not release this package because...

Here is my +1!

Cheers,
Dave


Re: release Tika 1.10?

2015-07-30 Thread David Meikle
Hey,
 On 28 Jul 2015, at 19:08, Allison, Timothy B. talli...@mitre.org wrote:
 
 With Konstantin's and Bob's fix of TIKA-1524, I think we're in good shape for 
 1.10...from my perspective

Been running some tests locally on a private set I have and it is looking good 
here too.

Will start rolling this today!

Cheers,
Dave

Re: release Tika 1.10?

2015-07-26 Thread David Meikle

 On 23 Jul 2015, at 14:07, Allison, Timothy B. talli...@mitre.org wrote:
 
  With the fix of TIKA-1690, I think it makes sense to roll a new release 
 (1.10) in the next week or so.  I'd like to get TIKA-1667 (upgrade poi) in 
 before the release.  Are there any other blockers on 1.10?

+1 from me too.  As discussed on private, I will roll the release on Tuesday 
night (UK Time) to give people time to shout for other candidates.

Cheers,
Dave

Travel Assistance for ACEU closes tomorrow!

2015-07-16 Thread David Meikle
Hi Folks,

A little reminder from the Travel Assistance Committee on the ApacheCon EU 
Travel Assistance Deadline.

Hope to see some fellow Tika community members in Budapest.

Cheers,
Dave

--

HI All,

This is a reminder that currently applications are open for Travel Assistance 
to go to ApacheCon EU Budapest 
this coming September/October.

Applications close tomorrow night so if you have not applied yet and intend to 
do so, please act now!

For those that have submitted talks for this event and have not heard back as 
to whether or not they will be 
accepted or not; and you intend to apply for assistance based on getting your 
talks accepted — please DO 
apply for assistance now anyway, should your talk not be accepted, your 
assistance application can be 
cancelled.

See apache.org/travel http://apache.org/travel for more info. 
See https://cwiki.apache.org/confluence/display/TAC/Application+Criteria 
https://cwiki.apache.org/confluence/display/TAC/Application+Criteria for more 
about the process.

Thanks and hope to see you all in Budapest!

Gav… (On behalf of the Travel Assistance Committee)

Re: [VOTE] Release Apache Tika 1.9 Candidate #2

2015-06-09 Thread David Meikle
Hi Chris,

 On 7 Jun 2015, at 02:46, Mattmann, Chris A (3980) 
 chris.a.mattm...@jpl.nasa.gov wrote:
 
 Please vote on releasing this package as Apache Tika 1.9.
 The vote is open for the next 72 hours and passes if a majority of at
 least three +1 Tika PMC votes are cast.
 
 [ ] +1 Release this package as Apache Tika 1.9
 [ ] -1 Do not release this package because…

+1 from me!  Nice work :)

Cheers,
Dave

Re: [VOTE] Release Apache Tika 1.9 Candidate #1

2015-06-06 Thread David Meikle
Hey Chris,

 On 1 Jun 2015, at 06:38, Mattmann, Chris A (3980) 
 chris.a.mattm...@jpl.nasa.gov wrote:
 
 Please vote on releasing this package as Apache Tika 1.9.
 The vote is open for the next 72 hours and passes if a majority of at
 least three +1 Tika PMC votes are cast.
 
 [ ] +1 Release this package as Apache Tika 1.9
 [ ] -1 Do not release this package because…

Thanks for preparing this, lots of great stuff in this one.

+1 from me.

Cheers,
Dave

Re: [VOTE] Release Apache Tika 1.8 Candidate #1

2015-04-10 Thread David Meikle

 On 10 Apr 2015, at 11:38, Allison, Timothy B. talli...@mitre.org wrote:
 
  I agree that the ODT issue might require a respin.  What do others think?

+1 for re-spin.

 
 Unfortunately, there might be 2 odt docs (mime type: 
 “application/vnd.oasis.opendocument.text”?) in govdocs1…so we wouldn't see 
 that problem.
 
 
 
 I did do a comparison of 1.7 vs 1.8-rc1, and the results are here:
 
 https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_7_v_1_8-rc1.zip
  
 https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_7_v_1_8-rc1.zip
 
 I encourage folks (if you haven't, and if you care :) ) to take a look and 
 see if you see something that I don’t.

Thanks for this Tim.  About to get on a flight, so will check through on that.

Cheers,
Dave



[ANNOUNCE] Welcome Giuseppe Totaro As Tika Committer + PMC Member

2015-04-09 Thread David Meikle
Hello All,

Please welcome Giuseppe Totaro as he joins us as the latest Tika committer and 
PMC Member.

He's recently been VOTEd in and now has his account all set up so is ready to 
roll!

Giuseppe, please feel free to say a bit about yourself as an introduction to 
the group.

Welcome aboard,
Dave

Re: [VOTE] Release Apache Tika 1.8 Candidate #1

2015-04-08 Thread David Meikle
Hey Tyler,

 On 7 Apr 2015, at 19:54, Tyler Palsulich tpalsul...@apache.org wrote:
 
 [ ] +1 Release this package as Apache Tika 1.8
 [ ] -1 Do not release this package because...

Whilst my testing with the release is good so far on Mac and Linux with Windows 
to go, and I am inclined to +1, it would be good if you were able to get your 
code signing key signed by someone nearby to avoid the warning below?

amadeaus-air:release david$ gpg --verify tika-1.8-src.zip.asc 
gpg: Signature made Tue  7 Apr 19:45:15 2015 EDT using RSA key ID D4F10117
gpg: Good signature from Tyler Palsulich tpalsul...@apache.org
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

Not sure if Chris, Lewis et al are near you and do this quickly?

Cheers,
Dave

Re: Any interest in running Apache Tika as part of CommonCrawl?

2015-04-04 Thread David Meikle
Hey Tim,

+1 from me, I think this would be great to do.

Cheers,
Dave


 On 3 Apr 2015, at 08:35, tallison314...@gmail.com wrote:
 
 All,
   What do we think?



Re: [DISCUSS] Tika 1.8 or 1.7.1

2015-03-29 Thread David Meikle
+1 for 1.8

 On 28 Mar 2015, at 15:01, Tyler Palsulich tpalsul...@apache.org wrote:
 
 Should we release this as 1.8 or 1.7.1?



Re: wiki access

2015-03-25 Thread David Meikle
Hello, 

 On 25 Mar 2015, at 21:17, Annie Burgess anniebry...@gmail.com wrote:
 
 Oops, user name : AnnieBurgess .

I have changed this now.  Enjoy :-)

Cheers,
Dave

Re: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread David Meikle
Hi Tyler,

 On 9 Jan 2015, at 22:02, Tyler Palsulich tpalsul...@apache.org wrote:
 
 A candidate for the Tika 1.7 release is available at:
 https://dist.apache.org/repos/dist/dev/tika/ 
 https://dist.apache.org/repos/dist/dev/tika/
 
 The release candidate is a zip archive of the sources in:
 http://svn.apache.org/repos/asf/tika/tags/1.7-rc3/ 
 http://svn.apache.org/repos/asf/tika/tags/1.7-rc3/
+1 from me.

Cheers,
Dave

Re: [VOTE] Apache Tika 1.7 Release

2015-01-07 Thread David Meikle
-1 on this for me too as there is a small unit test failure from ODFParser
on Windows from TIKA-1412.

I have added the tweak to fix this on trunk.

(I have also tested the latest changes added by Tim and Tyler in TIKA-1445
on Windows, Mac and Ubuntu with a decent batch of files, and everything is
working nicely at this end.)

On 7 January 2015 at 01:11, Allison, Timothy B. talli...@mitre.org wrote:

 -1

 I'm sorry that I haven't had a chance to kick the tires on the recent
 changes to the metadata extraction from images until now, but it looks like
 1.7-rc2 and trunk are not pulling metadata from embedded images.

 I've posted a test file from govdocs1 to TIKA-1445.  I may have time
 tomorrow to see what's going on.  I should also have time tomorrow to
 finish the analysis of the comparison between 1.6 and 1.7 on govdocs1.

 Sorry for my delay, all!  And even greater apologies if user error is at
 fault and metadata is successfully being extracted from embedded images. :)

 Thank you, Tyler, for running this release!


 -Original Message-
 From: Nick Burch [mailto:apa...@gagravarr.org]
 Sent: Tuesday, January 06, 2015 11:36 AM
 To: dev@tika.apache.org
 Subject: Re: [VOTE] Apache Tika 1.7 Release

 On Tue, 6 Jan 2015, Tyler Palsulich wrote:
  A candidate for the Tika 1.7 release is available at:
 https://dist.apache.org/repos/dist/dev/tika/
 
  The release candidate is a zip archive of the sources in:
 http://svn.apache.org/repos/asf/tika/tags/1.7-rc2/
 
  The SHA1 checksum of the archive is
 0307a8367ae6f8b1103824fd11337fd89e24e6a4.
 
  In addition, a staged maven repository is available here:
 
 
 https://repository.apache.org/content/repositories/orgapachetika-1006/org/apache/tika/

 Looks good to me, I'm +1

 Nick



Re: 1.7 release?

2015-01-02 Thread David Meikle

 On 22 Dec 2014, at 18:57, Tyler Palsulich tpalsul...@gmail.com wrote:
 
 Hi All,
 
 Nick added the temporary fix for TIKA-1445 and made the POI updates for
 TIKA-1469 (thanks!). And, I'll volunteer to be the Release Manager for 1.7!
 :)
 
 I'll start the process this weekend or a couple days into the new year.

Nice one Tyler!

Cheers,
Dave

Re: Kill Buildbot Builds

2014-12-05 Thread David Meikle
Hi Lewis,

+1 from me.  

Cheers,
Dave

 On 2 Dec 2014, at 17:42, Lewis John Mcgibbney lewis.mcgibb...@gmail.com 
 wrote:
 
 Hi Folks,
 Wanted to poll the dev@ list and see if the Buildbot builds at ci.apache.org
 are required?
 We have nightly and also hourly polling builds for Tika trunk against JDK
 1.6 and 1.7. Failures and unable builds are shadowed to dev@, so AFAIAC we
 are *covered* for CI builds including SNAPSHOT deployments and Javadoc
 updates.
 Any comments before I disable the buildbot builds and save Infra some
 resources.
 Ta
 Lewis
 
 -- 
 *Lewis*



Re: TIKA-1445 and having multiple Parsers (as many as needed) work on the same MediaType

2014-11-19 Thread David Meikle
Hi Guys,

 On 18 Nov 2014, at 16:52, Allison, Timothy B. talli...@mitre.org wrote:
 
 Chris,
  Thank you for moving this to the dev list.  This would be a fairly large 
 change, and the discussion is valuable.

Given the potential implications of the change, I am wondering if it is worth 
scheduling a Google Hangout / Conference Call / IRC session to chat through 
things once we have all had time to flesh out thoughts out?

I am happy to facilitate setting this up and documenting it (meeting notes), so 
we can include outputs on the list for further discussion and subsequent formal 
decision making with everyone involved.

Cheers,
Dave

Re: svn commit: r1640535 - /tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaResource. java

2014-11-19 Thread David Meikle
Hey Guys,

 On 19 Nov 2014, at 17:09, Tyler Palsulich tpalsul...@gmail.com wrote:
 
 Found it! http://markmail.org/message/42nc64tdyhvzaril 
 http://markmail.org/message/42nc64tdyhvzaril
 
 Looks like javax, java, then other. I'll update the site today.

Sorry a clean install here and I didn’t update the settings (nor notice).  Will 
set my config now and tidy up imports in recent commits.

Cheers,
Dave

Re: svn commit: r1640017 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java

2014-11-17 Thread David Meikle

 On 17 Nov 2014, at 16:32, Hong-Thai Nguyen thaicha...@gmail.com wrote:
 
 I've pushed a minor fix to pass this test on Windows.

Thanks Hong-Thai, sorry about that!

Cheers,
Dave

Re: svn commit: r1640017 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java

2014-11-16 Thread David Meikle
Hi Chris,

 On 16 Nov 2014, at 19:14, Mattmann, Chris A (3980) 
 chris.a.mattm...@jpl.nasa.gov wrote:
 
 Thanks, Dave. I think you forgot the default config file?

Yup, forgot the tests and example config from my change!  Just committed them.

I wasn't initial planning on including a default config, thinking if you 
dropped a properties file on the class path it would use that, otherwise it 
would go for the defaults but should probably add one to be consistent with the 
PDFParserConfig.

Cheers,
Dave

Re: Tika at ApacheCon Europe - 2 months time!

2014-09-25 Thread David Meikle
Hey Nick,

On 22 Sep 2014, at 23:21, Nick Burch n...@apache.org wrote:

 It's only 2 months to go until ApacheCon Europe in Budapest. I'm 
 simultaneously exciting by all the great Tika stuff going on, and worried by 
 how many talks I need to finish writing...
 
 As usual for an ApacheCon, we've a number of talks about Tika going on, and 
 almost certainly a hackathon and/or meetup one evening. There's also lots of 
 related talks too, covering technologies that Tika builds on, and ones you 
 can use Tika with. For a full schedule, see:
 http://events.linuxfoundation.org/events/apachecon-europe/program/schedule

All signed up for ApacheCon EU this year, so looking forward to the talks and 
up for a Tika hackathon.

See you then,
Dave

Re: [VOTE] Release Apache Tika 1.6 RC #2

2014-09-03 Thread David Meikle
Hi Chris,

On 1 Sep 2014, at 06:16, Mattmann, Chris A (3980) 
chris.a.mattm...@jpl.nasa.gov wrote:

[ ] +1 Release this package as Apache Tika 1.6

+1 from me, working fine in a couple of projects I use it in.  Thanks for 
sticking with this one Chris!

Cheers,
Dave

Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-15 Thread David Meikle
Hi Nick,

On 7 May 2014, at 12:48, Nick Burch apa...@gagravarr.org wrote:

 Hi All
 
 One for our JAXRS gurus here…

OK, not a guru here but I have a hunch.

 At ApacheCon, we came up with the idea of having a welcome page on the Tika 
 Server, so that we could point people to it to try Tika, and let them 
 discover what it offered. Based on that, and the mailing list discussions, we 
 raised TIKA-1269.
 
 (Related to that is TIKA-1270, which aims to add endpoints similar to the 
 --list- ones the Tika CLI has, which is in progress)
 
 While we work out the best way to allow users to discover + learn about + try 
 the various REST endpoints on TIKA-1269, I've started with something basic. 
 This is done with the simple TikaWelcome class, which has a Path of /
 
 The problem - when the MetadataEP and UnpackerResource are enabled, it 
 doesn't work! With those to there, when you request / you get a 404 and the 
 server logs:
 rg.apache.cxf.jaxrs.utils.JAXRSUtils findTargetMethod
 WARNING: No operation matching request path / is found, Relative Path: /, 
 HTTP Method: GET, ContentType: */*, Accept: */*,. Please enable FINE/TRACE 
 log level for more details.
 
 However, if you comment out those two endpoint classes from the 
 sf.setResourceClasses() call in TikaServerCLI, then the request gets 
 correctly routed to the welcome page.
 
 Neither MetadataEP nor UnpackerResource have a path that clashes, so I've no 
 idea why having them active stops / working. Any ideas?

I am having a look quickly whilst traveling but from peeking at the code it 
looks like the following to me:

* MetadataEP - we have no @Produces which will fail in the introspection code 
on the TikaWelcome class
* UnpackerResource - as there is no class level @Path I am suspecting this is 
clashing with the TikaWelcome as it builds the routes with the method ones 
being place on the root as well.

I don’t have time to test it just now but I wonder what would happen if you 
reordered TikaWelcome to the top about UnpackerResource?  If my hunch is 
correct it should make the / request work using the self-generated 
documentation.


 (Patch below if you want to try disabling them yourself to investigate)
 
 Nick
 

Cheers,
Dave



Re: Tika VM Service

2014-04-09 Thread David Meikle
+1 from me too.

I was actually starting to do a similar thing here in OpenShift:
https://github.com/Categorize/openshift-tika-cartridge

This started as quick lighting talk at the end of an OpenShift session at my 
local JBoss Users Group but was planning to extend this to take a nightly build 
following a little tweak and then keep it hosted online.

Cheers,
Dave

On 9 Apr 2014, at 02:18, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote:

 Hi FOlks,
 I would like to propose that we get a Tika service up and running on a VM.
 Tika users can do adhoc parsing, etc and can do this based on possibly
 stable nightly SNAPSHOT's or alternatively based on the most recent stable
 release.
 Preferably, the service should provide a list of parsers and also
 MediaType's supported.
 The service however should be documented.
 We have a sample service running Any23 an which will provide you with an
 example of what the Tika service will be like.
 http://any23.org
 Does anyone have an objection to me logging a ticket with Infra to get a VM
 set up for this purpose?
 Thanks
 Lewis
 
 -- 
 *Lewis*



Re: Build failure at trunk in org.apache.tika.server.UnpackerResourceTest

2014-02-26 Thread David Meikle
Hi,

On 26 Feb 2014, at 14:57, Nick Burch apa...@gagravarr.org wrote:

 Is buildbot configured to build that module? Or does it perhaps skip the 
 server module?
 
 Nick

The build is configured to build everything apart from the .NET module but does 
not appear to have triggered for the past few weeks using the @hourly SCM poll.

Cheers,
Dave

[RESULT][VOTE] Apache Tika 1.5 RC2

2014-02-16 Thread David Meikle
Hi, 

On 9 Feb 2014, at 22:53, Dave Meikle loo...@gmail.com wrote:

 Please vote on releasing this package as Apache Tika 1.5.
 The vote is open for the next 72 hours and passes if a majority of at
 least three +1 Tika PMC votes are cast.
 
[ ] +1 Release this package as Apache Tika 1.5
[ ] -1 Do not release this package because...
 
 Here is my +1 for the release.

The vote passes as follows:

+1 Dave Meikle
+1 Julien Nioche 
+1 Oleg Tikhonov
+1 Sergey Beryozkin
+1 Hong-Thai Nguyen
+0 Nick Burch
+1 Chris Mattmann
+1 Michael McCandless
+1 Ken Krugler

All voters are Tika PMC members.

I note Nick’s +0 due to that fact my new key has not been signed by anyone 
else. Assuming there is no objections on releasing with this given the 
signature if valid, I will push out the release.  I will also try to meet up 
with some fellow committers in the near future.

Cheers,
Dave

Re: [VOTE] Apache Tika 1.5 RC2

2014-02-13 Thread David Meikle
Hi Guys,

On 13 Feb 2014, at 11:44, Michael McCandless luc...@mikemccandless.com wrote:

 I am also baffled by the presence of the -original JARs; I don't think
 we should release them?  Or are we planning to?

Sorry, this is my bad. The script I was using just copied the target folder 
contents up to people.apache.org.  I was not planning to release the -original 
jars, just the source ZIP and tika-app JAR alongside releasing the staging 
repository.

Cheers,
Dave

Re: [VOTE] Apache Tika 1.5 RC1

2014-02-05 Thread David Meikle
Hi Julien,

On 5 Feb 2014, at 09:49, Julien Nioche lists.digitalpeb...@gmail.com wrote:

 [ERROR] The build could not read 1 project - [Help 1]
 [ERROR]
 [ERROR]   The project org.apache.tika:tika-java7:1.5-SNAPSHOT
 (/data/tika-1.5/tika-java7/pom.xml) has 1 error
 [ERROR] Non-resolvable parent POM: Could not find artifact
 org.apache.tika:tika-parent:pom:1.5-SNAPSHOT and 'parent.relativePath'
 points at wrong local POM @ line 25, column 11 - [Help 2]
 [ERROR]
 
 *mvn -version*
 Apache Maven 3.0.4
 Maven home: /usr/share/maven
 Java version: 1.7.0_21, vendor: Oracle Corporation
 Java home: /usr/lib/jvm/java-7-openjdk-amd64/jre
 Default locale: en_GB, platform encoding: UTF-8
 OS name: linux, version: 3.5.0-17-generic, arch: amd64, family: unix
 
 Am I missing something?

I can see what is wrong.  The release plugin hasn’t changed the versions in the 
tika-java7 and tika-dotnet pom files as they are not included in the modules 
list in the main Tika pom file.  These will need to be updated manually and a 
new RC cut.

Thanks for the spot!

Cheers,
Dave

Key Revocation

2014-02-04 Thread David Meikle
Hello,
(CC dev@tika.apache.org for info)

I have had to revoke my code signing key due to media failure.

Attached is the revocation key and I am following the steps here:
http://www.apache.org/dev/release-signing.html#revoke-cert

Cheers,
Dave

-BEGIN PGP PUBLIC KEY BLOCK-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: A revocation certificate should follow

iQIfBCABAgAJBQJLrjSDAh0CAAoJEGBlJ+auqMarJnsP/j6E+hQ9vkdvMncXbQXX
9auQeI0tRQDvgoKmQYk9T16QyhXANZoJDzuLEmE/8kMNwr0U5ay3lEV0KsJZe+z8
fnsEmfNoV6xACNwa/DT4V4dQnVvy++K9z8CndX/3QNimduvuKzCZhEbMltwdQSYh
Lr7JUWBerayMD3XR8Jl+bYnU47FapI4pgDNOKbLsdhEPhlEcZUhqBy/8d/p/NjI6
NLzOpBieSbYhYYh5O8wjR0JJw5gtYf//IO7GQYoAGzbGLa9m/MCsNSNU9M3HYQs0
dnBYxS3yGOk+rIRrfb/MR6ySjfSLiNb6IvSrSQ2oaJfjm3NnBYD4z1QcAq+JmfFm
b0Zq7uAEaxrovzkFX3mwNmxeoMTqJP0nr6napU5y7maJYzsLO/tHf+NeIiZIuoK8
Q/GQE1QJW3nQ2/LOWwAyXTLoMR6/IP+HO9Lk6ad9hXPswqPKw6vGTJrMJJS+eyTR
AFoxY8nQ24rkE6uT37hZUPeoZdZ6sGDIJBDWUXTez/U/TgCFrbpoMhMC2oLOWdA0
Qr/vgz6wdb/8x/CUe49eKzdXhIacAI7aYXtLOQwVprARDTUqSrfB9ijwz80BkrL1
OWqgI1iHcsQD0e8eqzVQCndehvLMhezRQ2NSQyZNF0AsBP39OtyK5ATUNrrq6g8L
raF/l9VwsC/63dtfJyvq0sLE
=Q/66
-END PGP PUBLIC KEY BLOCK-



Re: [DISCUSS] Prepare Release 1.5?

2014-01-29 Thread David Meikle
Hi, 

On 27 Jan 2014, at 15:23, Allison, Timothy B. talli...@mitre.org wrote:

 Fix to TIKA-1226 committed. 

Thanks Tim, will prepare the release now.

Cheers,
Dave

Re: [DISCUSS] Prepare Release 1.5?

2014-01-25 Thread David Meikle
Hi,

On 14 Jan 2014, at 22:46, Stefano Fornari stefano.forn...@gmail.com wrote:

 Currently, tika 1.4 has some serious bug that makes it hang with partial
 mp3, so it can be quite bad in production. tika 1.5 fixes it, but I do
 understand TIKA-1198is a bad regression, therefore it is blocker for me
 too. I am not familiar with WS so I do not know how much work would be to
 fix it. however, I am wondering if no one commit to fix it, is roll back an
 option? we may roll back the CXf fix and then be ready to release.

Sergey has had a look and has setup a unique path for this whilst exploring an 
upstream fix.

This clears the blockers for a release from me, so unless I hear anyone else 
highlight something I will roll a release candidate for voting.

Cheers,
Dave

Re: [DISCUSS] Prepare Release 1.5?

2014-01-09 Thread David Meikle
Hi, 

On 29 Dec 2013, at 11:41, David Meikle loo...@gmail.com wrote:

 Hi Guys,
 
 There have been some questions pop up around when a new 1.5 release will be 
 available.
 
 I have some free cycles over the next couple of weeks to prepare one and I 
 believe Chris has some too, so in preparation for that what do we need to do 
 to make the current trunk releasable as version 1.5?
 
 For me the following issue need to be fixed before release:
 TIKA-1198 - the change to using multi-parts appears to have broken our 
 current guidance on usage significantly.
 
 Is there anything else others think is a must before rolling a release? 
 
 I was also thinking we could do some quick work to include the following 
 issues:
 TIKA-1059
 TIKA-985, TIKA-980
 
 I don’t want to hold things up, so if we sort peoples mandatories I think we 
 should roll a release. 
 
 @Chris - I know you had free cycles and volunteered so will defer to you on 
 the release management side of things.  That said happy to take it on if that 
 helps.
 
 Cheers,
 Dave

Conscious it was the festive period of late, so wondering if anyone has had 
further thoughts on this?

Cheers,
Dave

Re: Help on 1.4/1.5

2013-12-29 Thread David Meikle
Hi Stefano,

On 29 Dec 2013, at 07:40, Stefano Fornari stefano.forn...@gmail.com wrote:

 Dear dev,
 in issue TIKA-1179 I was suggested to contribute to tika 1.5 to speed up
 the release of the fix. I plan to use tika and I'll be happy to contribute
 something back. Is there anything simple I can start with? How the
 contribution process looks like?

Thanks for your interest in getting involved with Tika!

Helping out here at Tika can manifest in many different ways depending on your 
skills - joining and interacting on the user and developer mail lists, 
improving the Wiki, provide patches for documentation, collaborating on the 
issue tracker[1] or submit code patches.  There is also a good guide if you are 
new to Apache Software Foundation[2].  

In terms of the 1.5 release, it is down to the community in that we need to 
take a wee vote on if we are ready for one and agree if there is anything else 
that needs fixed or included in it.  There is a lot of issues marked as 
resolved but also 22 open[3], so there may be something you think you can 
contribute to in that list by means of a patch.

Chris was talking about spinning one up once he had a few free cycles but to 
kick the ball rolling I will start by putting out an email on what to include.

 Alternatively, I would backport the fix to 1.4 so that we could release a
 1.4.1 quickly. What do you think?

With a release for 1.5 potentially just around the corner, my opinion would be 
that I think it would be better to focus on addressing anything that blocks 
releasing that instead of creating a back-port and then going through the 
release process for 1.4.1.

Cheers,
Dave

[1] https://issues.apache.org/jira/browse/TIKA
[2] https://www.apache.org/foundation/getinvolved.html
[3] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20TIKA%20AND%20fixVersion%20%3D%20%221.5%22%20AND%20status%20%3D%20Open%20ORDER%20BY%20priority%20DESC
 

[DISCUSS] Prepare Release 1.5?

2013-12-29 Thread David Meikle
Hi Guys,

There have been some questions pop up around when a new 1.5 release will be 
available.

I have some free cycles over the next couple of weeks to prepare one and I 
believe Chris has some too, so in preparation for that what do we need to do to 
make the current trunk releasable as version 1.5?

For me the following issue need to be fixed before release:
TIKA-1198 - the change to using multi-parts appears to have broken our current 
guidance on usage significantly.

Is there anything else others think is a must before rolling a release? 

I was also thinking we could do some quick work to include the following issues:
TIKA-1059
TIKA-985, TIKA-980

I don’t want to hold things up, so if we sort peoples mandatories I think we 
should roll a release. 

@Chris - I know you had free cycles and volunteered so will defer to you on the 
release management side of things.  That said happy to take it on if that helps.

Cheers,
Dave




Re: Help on 1.4/1.5

2013-12-29 Thread David Meikle
Hi Stefano,

On 29 Dec 2013, at 11:46, Stefano Fornari stefano.forn...@gmail.com wrote:

 Ok, sounds good. I may take 
 TIKA-1078https://issues.apache.org/jira/browse/TIKA-1078;
 maybe it is not the most interesting one, but since I am not familiar with
 tika hacking, it could be a good starting point.

Nice one. Feel free to post questions if you have them, we’re a friendly bunch 
:-)

 Alternatively, I would backport the fix to 1.4 so that we could release a
 1.4.1 quickly. What do you think?
 
 With a release for 1.5 potentially just around the corner, my opinion
 would be that I think it would be better to focus on addressing anything
 that blocks releasing that instead of creating a back-port and then going
 through the release process for 1.4.1.
 
 I tend to agree, but IMHO this really depends on when 1.5 is foreseeable.
 If it takes still some while, or it is still undefined, it may make sense
 release an update to 1.4. At the end it is out with a quite remarkable bug
 and the only fix at the moment is to build a new 1.5 SNAPSHOT. What do the
 others think?

I take your point on this as well - will be interesting to see what others 
think and happy to go with consensus. In terms of the 1.5 release getting out, 
it is up to us - the Tika community - to define if it is ready to ship or not.  
It may be simplest to take a view at the end of the DISCUSS thread I posted 
earlier to see how close we are to a 1.5 RC?

Cheers,
Dave

Re: Build failed in Jenkins: Tika-trunk #1058

2013-12-28 Thread David Meikle
Hi,

On 28 Dec 2013, at 21:53, Apache Jenkins Server jenk...@builds.apache.org 
wrote:

 See https://builds.apache.org/job/Tika-trunk/1058/
 
 --
 [...truncated 217 lines...]
   ... 36 more
 Caused by: java.net.SocketTimeoutException: connect timed out
   at java.net.PlainSocketImpl.socketConnect(Native Method)
   at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
   at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
   at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
   at java.net.Socket.connect(Socket.java:546)
   at 
 org.tmatesoft.svn.core.internal.util.SVNSocketConnection.run(SVNSocketConnection.java:57)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   ... 5 more
 java.io.IOException: remote file operation failed: 
 https://builds.apache.org/job/Tika-trunk/ws/ at 
 hudson.remoting.Channel@358fd9d8:ubuntu5
 

Have seen this one and kicked off another build.  Will monitor and see if it 
works now.

Cheers,
Dave

Re: Switch to JUnit 4.x?

2013-12-17 Thread David Meikle
Hi, 
On 14 Dec 2013, at 23:39, Ken Krugler kkrugler_li...@transpac.com wrote:

 See https://issues.apache.org/jira/browse/TIKA-1209
 
 Any objections to switching to JUnit 4.11?

+1 for upgrade.

Cheers,
Dave


Re: permissions to close issue?

2013-08-16 Thread David Meikle
Hi Tim,

On 16 Aug 2013, at 02:23, Allison, Timothy B. talli...@mitre.org wrote:

 I don't appear to have permissions to close out issues that I didn't open 
 (TIKA-1001 and TIKA-1153).  Is this standard jira policy or user error?  
 Thank you.

I have added you in to the PMC Group in JIRA, so should be fine now.

Cheers,
Dave