Let me know if these are of any use... https://github.com/centic9/CommonCrawlDocumentDownload
http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/ https://events.static.linuxfound.org/sites/events/files/slides/ApacheConMiami2017_tallison_v2.pdf https://wiki.apache.org/tika/TikaEval On Fri, Sep 21, 2018 at 10:11 PM Dave Fisher <dave2w...@comcast.net> wrote: > Hi Nick, > > Sit at BarCamp 2 Monday morning or do a BOF later? > > Would someone point me to the Common crawler information. > > Regards, > Dave > > Sent from my iPhone > > > On Sep 17, 2018, at 8:07 AM, Nick Burch <apa...@gagravarr.org> wrote: > > > >> On Sat, 15 Sep 2018, Dave Fisher wrote: > >> I’ll be at Apachecon Montreal, anyone else? > > > > I'll be there! Happy to look at draft slides then, and offer advice :) > > > > Nick > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org > > For additional commands, e-mail: dev-h...@poi.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org > For additional commands, e-mail: dev-h...@poi.apache.org > >