On Fri, 12 Feb 2016, Prasad N S wrote:
I have over 5 years of experience in software development. My favorite
language is Java, though I am comfortable with Python too. I have worked on
a range of databases from relational to NoSQL and distributed systems. I am
a quick learner and open to learn new technologies.

I am here to kick start my contributions to Open Source projects. Please
let me know if there are any small projects or bug fixes that I can get
started with.

First up I'd suggest you work through the "5 minute" parser guide, to get happy with adding new mime types to Tika, adding new parsers, that sort of thing:
http://tika.apache.org/1.11/parser_guide.html

You may hit some issues on the way, if so, please try the troubleshooting guide to assist:
http://wiki.apache.org/tika/Troubleshooting%20Tika

Then report back / contribute fixes to the 5 minute guide + troubleshooting guide!


I've seen a few queries on the Tika Python stuff recently, so if you know python, you could try with that. Take a look at the "apache-tika" tag on StackOverflow to get an idea of the problems people are having, areas where we need more examples, areas where the docs need work, that sort of thing

Once you're up to speed with all that, it really depends on what you're interested in. If there's some formats you use in your personal life / other research that aren't supported, have a go at adding mime magic then a parser. If there's something with limited support you're interested in, have a go at expanding it. If you're into Big Data, help with Tika Batch and Tika Eval, or maybe with integrations with things like Behemoth or Storm Crawler. If you're just generally interested, take a look at the Tika Batch+Eval reports, find an intersting looking failure / exception / etc, and dive in!


Oh, and one other possible thing - rework this email slightly, put it on the wiki as a "how to get started contributing" guide, invite others to help, and expand it as you learn :)

Nick

Reply via email to