Let me know if these are of any use...

https://github.com/centic9/CommonCrawlDocumentDownload

http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/

https://events.static.linuxfound.org/sites/events/files/slides/ApacheConMiami2017_tallison_v2.pdf

https://wiki.apache.org/tika/TikaEval


On Fri, Sep 21, 2018 at 10:11 PM Dave Fisher <dave2w...@comcast.net> wrote:

> Hi Nick,
>
> Sit at BarCamp 2 Monday morning or do a BOF later?
>
> Would someone point me to the Common crawler information.
>
> Regards,
> Dave
>
> Sent from my iPhone
>
> > On Sep 17, 2018, at 8:07 AM, Nick Burch <apa...@gagravarr.org> wrote:
> >
> >> On Sat, 15 Sep 2018, Dave Fisher wrote:
> >> I’ll be at Apachecon Montreal, anyone else?
> >
> > I'll be there! Happy to look at draft slides then, and offer advice :)
> >
> > Nick
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> > For additional commands, e-mail: dev-h...@poi.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> For additional commands, e-mail: dev-h...@poi.apache.org
>
>

Reply via email to