Apache Tika's public regression corpus

Allison, Timothy B. Wed, 05 Oct 2016 10:57:13 -0700

All,

I recently blogged about some of the work we're doing with a large scale 
regression corpus to make Tika, POI and PDFBox more robust and to identify 
regressions before release.  If you'd like to chip in with recommendations, 
requests or Hadoop/Spark clusters (why not shoot for the stars), please do!


  
http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/

Many thanks, again, to Rackspace for our vm and to Common Crawl and govdocs1 
for most of our files!

        Cheers,

                 Tim

Apache Tika's public regression corpus

Reply via email to