Speaking of the docs/examples, TIKA-1329 is still open because I haven't gotten 
around to documenting it.

Y, if you'd like a report of exceptions, let me know.  IIRC, it would be great 
if we could improve on XML detection (we're currently over detecting), and 
there's plenty of work to do on html parsing TIKA-1599.

I also have probably a full grad student semester worth of curation project 
ideas on the test corpus.  Not glamorous, but very useful for the community.

Then there's the eval code itself...that still needs to make it into shape to 
be added.

I agree with Nick though, start small on documentation/examples.

Cheers,

               Tim

-----Original Message-----
From: Nick Burch [mailto:apa...@gagravarr.org] 
Sent: Wednesday, December 16, 2015 4:23 PM
To: dev@tika.apache.org
Subject: Re: looking to contribute

On Wed, 16 Dec 2015, Joey Hong wrote:
> My name is Joey. I am a college freshmen with programming experience 
> looking to get into the world of open-source. I was hoping to 
> contribute to the Tika project, and was wondering if there were any 
> tasks that a beginner like me could tackle. I am willing to do 
> anything, whether it be fixing a minor bug, or adding test suites or 
> documentation.

On the docs / examples side, we have a few examples on the website, but 
probably not enough! One thing might be to look through those, identify gaps 
with your fresh eyes, and work on those. We also have instructions for some 
more complicated integrations on the wiki, maybe try some of those and feed 
back on which ones aren't clear enough?

If you want to try more coding, Tim quite often runs Tika against some large 
filesets, and has a nifty tool to report on what breaks. He can hopefully point 
you at the most recent report! Maybe have a look through that, identify a few 
common failures from unidentified or common exceptions, and try to fix one or 
two of those?

Nick

Reply via email to