Hi Tim

Great to hear that you managed to use the dataset from CommonCrawl. Thanks!

Julien

On 14 April 2015 at 14:15, Allison, Timothy B. <talli...@mitre.org> wrote:

> +1
>
> Thank you, Tyler!
>
> Apologies to Hong-Thai and community for not recognizing the severity of
> TIKA-1600 when I voted in favor of rc1!
>
> Details...
>
> I reran against govdocs1, and there aren't any major surprises.
>
> On our Rackspace vm, I  _finally_ unzipped the Common Crawl slice that
> Julien Nioche created for us, and I ran against that as well.  That turned
> up TIKA-1605 and another exceedingly rare NPE in the PDFParser.  I don't
> think either of these are blockers, and they're now fixed in trunk.
>
> There are slightly fewer metadata values for some jpegs.  For the one file
> that I manually reviewed, 1.8-rc was missing these values (that were
> available in 1.7):
>
> JPEG quality
> IPTC-NAA record
> Plug-in 1 Data
>
> Comparison reports are available here (much more work remains to be done
> on tika-eval):
>
> https://github.com/tballison/share/tree/master/tika_comparisons
>
> ________________________________________
> From: Tyler Palsulich <tpalsul...@apache.org>
> Sent: Monday, April 13, 2015 1:56 PM
> To: dev@tika.apache.org; u...@tika.apache.org
> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
>
> Hi Folks,
>
> A candidate for the Tika 1.8 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
>
> The SHA1 checksum of the archive is
>   5e22fee9079370398472e59082d171ae2d7fdd31.
>
> In addition, a staged maven repository is available here:
>   https://repository.apache.org/content/repositories/orgapachetika-1009
>
> Please vote on releasing this package as Apache Tika 1.8. The vote is open
> for the next 72 hours and passes if a majority of at least three +1 Tika
> PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.8
> [ ] ±0 I don't object to this release, but I haven't checked it
> [ ] -1 Do not release this package because...
>
> Thanks,
> Tyler
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to