Thanks for the advice! I’ll start with some documentation and tests and move to harder tasks from there.
Regarding the JIRA instance for TIKA-1329, would the documentation for the RecursiveParserWrapper go with the RecursiveMetadata page on the wiki? Thanks, Joey > On Dec 17, 2015, at 5:32 AM, Allison, Timothy B. <talli...@mitre.org> wrote: > > Speaking of the docs/examples, TIKA-1329 is still open because I haven't > gotten around to documenting it. > > Y, if you'd like a report of exceptions, let me know. IIRC, it would be > great if we could improve on XML detection (we're currently over detecting), > and there's plenty of work to do on html parsing TIKA-1599. > > I also have probably a full grad student semester worth of curation project > ideas on the test corpus. Not glamorous, but very useful for the community. > > Then there's the eval code itself...that still needs to make it into shape to > be added. > > I agree with Nick though, start small on documentation/examples. > > Cheers, > > Tim > > -----Original Message----- > From: Nick Burch [mailto:apa...@gagravarr.org] > Sent: Wednesday, December 16, 2015 4:23 PM > To: dev@tika.apache.org > Subject: Re: looking to contribute > > On Wed, 16 Dec 2015, Joey Hong wrote: >> My name is Joey. I am a college freshmen with programming experience >> looking to get into the world of open-source. I was hoping to >> contribute to the Tika project, and was wondering if there were any >> tasks that a beginner like me could tackle. I am willing to do >> anything, whether it be fixing a minor bug, or adding test suites or >> documentation. > > On the docs / examples side, we have a few examples on the website, but > probably not enough! One thing might be to look through those, identify gaps > with your fresh eyes, and work on those. We also have instructions for some > more complicated integrations on the wiki, maybe try some of those and feed > back on which ones aren't clear enough? > > If you want to try more coding, Tim quite often runs Tika against some large > filesets, and has a nifty tool to report on what breaks. He can hopefully > point you at the most recent report! Maybe have a look through that, identify > a few common failures from unidentified or common exceptions, and try to fix > one or two of those? > > Nick