On Fri, 12 Feb 2016, Prasad N S wrote:
I have over 5 years of experience in software development. My favorite
language is Java, though I am comfortable with Python too. I have worked on
a range of databases from relational to NoSQL and distributed systems. I am
a quick learner and open to learn new technologies.
I am here to kick start my contributions to Open Source projects. Please
let me know if there are any small projects or bug fixes that I can get
started with.
First up I'd suggest you work through the "5 minute" parser guide, to get
happy with adding new mime types to Tika, adding new parsers, that sort of
thing:
http://tika.apache.org/1.11/parser_guide.html
You may hit some issues on the way, if so, please try the troubleshooting
guide to assist:
http://wiki.apache.org/tika/Troubleshooting%20Tika
Then report back / contribute fixes to the 5 minute guide +
troubleshooting guide!
I've seen a few queries on the Tika Python stuff recently, so if you know
python, you could try with that. Take a look at the "apache-tika" tag on
StackOverflow to get an idea of the problems people are having, areas
where we need more examples, areas where the docs need work, that sort of
thing
Once you're up to speed with all that, it really depends on what you're
interested in. If there's some formats you use in your personal life /
other research that aren't supported, have a go at adding mime magic then
a parser. If there's something with limited support you're interested in,
have a go at expanding it. If you're into Big Data, help with Tika Batch
and Tika Eval, or maybe with integrations with things like Behemoth or
Storm Crawler. If you're just generally interested, take a look at the
Tika Batch+Eval reports, find an intersting looking failure / exception /
etc, and dive in!
Oh, and one other possible thing - rework this email slightly, put it on
the wiki as a "how to get started contributing" guide, invite others to
help, and expand it as you learn :)
Nick