== Overview ==
Search is complex.  Nutch makes it easier. So you start off by installing 
Nutch.  Now you are a pretty good developer.  You get Nutch up and running 
without problems and you think it is a pretty neat piece of software.  In fact 
you like it so much that you want to start adding to it.  You want to develop 
new features and contribute them back to the community.  But then it dawns on 
you.  How does one contribute to an open source project like Nutch.  You have 
never worked on an open source project before and don't really know how the 
entire process works.

That is where this document comes in.  The purpose of this document is to help 
you as developers take the next step in becoming a contributing members of the 
Nutch community.  We will cover a general overview of the Nutch development 
process.  What are the different pieces and how do they fit together.  How does 
the community work and interact.  We will cover about using the mailing lists 
to search for information and how to ask questions to ensure that they get 
answered.  We will cover how to go about learning the internals of the Nutch 
codebase.  We will cover how to start developing for Nutch including coding 
standards, using subversion, setting up nutch in development environments, 
building nutch from source, debugging, and unit tests.  And finally we will 
cover contributing back to the Nutch community through documentation, code 
fixes, new features, and providing guidance to other developers.  When we are 
finished you should have a good understanding of how the community
  works and how you can go about becoming a bigger part of that community.

== The Nutch Community ==

=== Nutch Development Roles ===
There are three main roles that people can play in the Nutch community. 

The first role is that of user.  This is someone who uses the Nutch software 
but is not active in its development.  People in this category range from the 
curious programmer who wants to learn more about search technology to 
corporations setting up search on their local intranet.  If you only want to 
use the Nutch software and don't want to help develop it, you can still be a 
contributing member of the community.  By using the software and pushing the 
limits of what it can do, by filing bug reports and feature requests (more 
about how to do this later), working with developers to track down issues, or 
just giving your input to discussions that arise, you can help the Nutch 
project become better and better.
The second role is that of developer.  This is someone who has used the Nutch 
software and has taken the next step to help develop or program the software.  
Helping to develop the Nutch codebase can come in the form of code fixes called 
patches, or by developing completely new features from scratch.  An important 
thing to remember is that unlike most software development at big companies, 
you don't need anybody's permission to start developing software for Nutch.  If 
you think you have a good idea for a feature, or if you want to track down and 
fix bugs in the software, go do it.  If you want a specific piece of 
functionality, don't wait for someone else to develop it.  Take the time to 
learn and do-it yourself.  Then when you are done give it back to the 
community. This is how the Nutch project has been developed so far and how it 
will continue to be developed in the future.  The community is a do-ocrcy, 
meaning those who do the work get to help set the directions and make t
 he decisions.  Communication is essential but not limiting.  Anybody can 
become a developer simply by submitting source code, whether fixes of 
functionality, for inclusion in the project.

The third role is that of committer.  This is usually a developer who has been 
working with the project for some time.  Someone who has developed new pieces 
of functionality, who has fixed bugs, and who has helped others through 
answering quesions and providing guidance to others through the mailing lists.  
In other words this is a person who has proved their commitment and usefullness 
to the project and in return are given commit access to the source code 
repository, an apache email address, and the ability to help make short term 
decisions fo the project by determining what submissions and bug fixes make it 
into the source code repository and release versions of the software.

=== How the Community Works ===
The community works together through shared mailing lists, email, wikis, bug 
tracking systems, and source code repositories.  These tools when used together 
provide a virtual meeting room and workspace for all members of the community.

==== Mailing Lists and Email ====
Most communication is done through email and the mailing lists.  Because of 
this the first thing that any person should do to become part of the Nutch 
community is to join the appropriate mailing lists.  There are four different 
mailing lists.  First there is the users mailing list.  Contrary to the name 
this list is not just for users.  If you have questions about the Nutch 
software including installation, configuration, bugs, errors, or general 
information, this is the list for you.  Second is the dev mailing list.  This 
is where most development communication occurs including updates to request 
tracking systems.  This is also where developers can pose ideas for new 
functionality to see if someone is already working on such features or just to 
get general feedback.  The dev mailing list is important for tracking 
functionality that other developers may be working on and consensus by the 
community on desired direction of new features.  The third list is the commit 
mailing lis
 ts.  This list tracks commits to the source code repository and changes in the 
wiki pages.  The fourth list is the agents list.  This is where webmasters and 
other people can post comments or questions about the Nutch crawler.

Users can get by with subscribing to only the users mailing list.  Developers 
should subscribe to all four mailing lists.  Anybody doing internet crawls 
needs to be subscribed to the agents list.  In order to post to any list, for 
example to ask a questions, it is necessary to first be subscribed to that 
list.  Below are links for subscribing to the different mailing lists.

 * [ Nutch Mailing Lists]
 * [mailto:[EMAIL PROTECTED] Subscribe to Users]
 * [mailto:[EMAIL PROTECTED] Subscribe to Dev]
 * [mailto:[EMAIL PROTECTED] Subscribe to Commits]
 * [mailto:[EMAIL PROTECTED] Subscribe to Agents]

==== JIRA and Issue/Request Tracking ====
If mailing lists provide the ongoing conversation for the community, the 
issue/request tracking system provides a repository for the current state of 
the project.  The request tracking system is JIRA system and it can be accessed 
at this address.

 * [ Nutch JIRA]
 * [!default.jspa Signup for JIRA]

The JIRA system is the central repository for all work wanting to be included 
int the Nutch source code base.  The system tracks issues and feature requests 
by component, by version, and by status.  You can view what requests are 
assigned to what person, what requests are currently being worked on, and which 
ones haven't been scheduled. You can search all requests by keyword or by 
various categories and filters.  We will go into detail later on how to use the 
JIRA system to propose new functionality and submit bug fixes.  For now 
understand this.  If you are going to be a developer you will need to userstand 
how to use the JIRA system as this is where you will propose new functionality, 
submit bug fixes, give you input on features other developers may be working 
on, and coordinate actions with other developers on specific pieces of 

The address to signup for JIRA was given above.  Once you have signed up you 
will have access to all of the Apache JIRA repository, not just the Nutch 

==== Source Code Control through Subversion ====
Source code control is very important to open source projects.  Nutch uses the 
apache subversion repository for it source control.  As a developer you will 
want to get into the habit of downloading and updated your development 
environment directly from the subverion repository.  We will go into detail 
about how to do this later.  There are two types of logins to the repository, 
users and committers.  Users can download the repository but cannot make 
changes directly to the repository.  You can make changes on your local system 
and those changes can be submitted to the JIRA system.  Committers hold the 
committer role that we discussed previously.  These individuals can make 
changes directly to the subversion repository and are responsible for take 
patches from the JIRA system and applying them to subversion where they then 
become available to all users.

==== Wiki and Documentation ====
The weakest part of most open source projects is their documentation and Nutch 
is no exception.  Wikis are special web pages like the one that you are reading 
that allows users to directly edit text on the page and to create new pages.  
The wiki provides various tutorials and documentation for Nutch.  Links to view 
the Nutch wiki and to register for the wiki are provided below.

 * [ Nutch Wiki]
 * [ Signup for the Wiki]

As a developer one of the ways you can contribute back to the community is be 
documenting your hard won experience on the wiki.  You can do this in the form 
of tutorials, articles, or simple notes.  The wiki is also used as a virtual 
white board to help document general themes and directions for the project.

These four tools, mailing lists and email, JIRA, Wikis, and Subverion together 
provide the community ways to coordinate their actions and conversations.  As a 
developer you will need to understand each of these tools and how you will use 
them in the development process.  Later parts of this document will explain 
each of these tools in more detail to give you the base of knowledge you will 
need to start being a productive member of the Nutch development community.

... more to come later

