Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by DennisKubes:
http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer

The comment on the change is:
Final draft for this guide

------------------------------------------------------------------------------
  == Overview ==
  Search is complex.  Nutch makes it easier. So you start off by installing 
Nutch.  Now you are a pretty good developer.  You get Nutch up and running 
without problems and you think it is a pretty neat piece of software.  In fact 
you like it so much that you want to start adding to it.  You want to develop 
new features and contribute them back to the community.  But then it dawns on 
you.  How does one contribute to an open source project like Nutch.  You have 
never worked on an open source project before and don't really know how the 
entire process works.
  
- That is where this document comes in.  The purpose of this document is to 
help you as developers take the next step in becoming a contributing members of 
the Nutch community.  We will cover a general overview of the Nutch development 
process.  What are the different pieces and how do they fit together.  How does 
the community work and interact.  We will cover about using the mailing lists 
to search for information and how to ask questions to ensure that they get 
answered.  We will cover how to go about learning the internals of the Nutch 
codebase.  We will cover how to start developing for Nutch including coding 
standards, using subversion, setting up nutch in development environments, 
building nutch from source, debugging, and unit tests.  And finally we will 
cover contributing back to the Nutch community through documentation, code 
fixes, new features, and providing guidance to other developers.  When we are 
finished you should have a good understanding of how the communi
 ty works and how you can go about becoming a bigger part of that community.
+ That is where this document comes in.  The purpose of this document is to 
help you as developers take the next step in becoming contributing members of 
the Nutch community.  We will cover a general overview of the Nutch development 
process including the different pieces and how do they fit together.  We will 
cove how the community works and interacts.  We will cover using the mailing 
lists to search for information and how to ask questions to ensure that they 
get answered.  We will cover how to go about learning the internals of the 
Nutch codebase.  We will cover how to user the JIRA for change requests and how 
to start developing for Nutch.  And finally we will cover contributing back to 
the Nutch community.  When we are finished you should have a good understanding 
of how the community works and how you can go about becoming a bigger part of 
that community.
  
  == The Nutch Community ==
  
  === Nutch Development Roles ===
- There are three main roles that people can play in the Nutch community. 
+ There are three main roles that a person can play in the Nutch community. 
  
- The first role is that of user.  This is someone who uses the Nutch software 
but is not active in its development.  People in this category range from the 
curious programmer who wants to learn more about search technology to 
corporations setting up search on their local intranet.  If you only want to 
use the Nutch software and don't want to help develop it, you can still be a 
contributing member of the community.  By using the software and pushing the 
limits of what it can do, by filing bug reports and feature requests (more 
about how to do this later), working with developers to track down issues, or 
just giving your input to discussions that arise, you can help the Nutch 
project become better and better.
+ The first role is that of user.  This is someone who uses the Nutch software 
but is not active in its development.  People in this category range from the 
curious programmer who wants to learn more about search technology to 
corporations setting up search on their local intranet.  If you only want to 
use the Nutch software and don't want to help develop it, you can still be a 
contributing member of the community.  By using the software and pushing the 
limits of what it can do, by filing bug reports and feature requests (more 
about how to do this later), by working with developers to track down issues, 
or just giving your input to discussions that arise, you can help the Nutch 
project become better and better.
   
- The second role is that of developer.  This is someone who has used the Nutch 
software and has taken the next step to help develop or program the software.  
Helping to develop the Nutch codebase can come in the form of code fixes called 
patches, or by developing completely new features from scratch.  An important 
thing to remember is that unlike most software development at big companies, 
you don't need anybody's permission to start developing software for Nutch.  If 
you think you have a good idea for a feature, or if you want to track down and 
fix bugs in the software, go do it.  If you want a specific piece of 
functionality, don't wait for someone else to develop it.  Take the time to 
learn and do-it yourself.  Then when you are done give it back to the 
community. This is how the Nutch project has been developed so far and how it 
will continue to be developed in the future.  The community is a do-ocrcy, 
meaning those who do the work get to help set the directions and make
  the decisions.  Communication is essential but not limiting.  Anybody can 
become a developer simply by submitting source code, whether fixes of 
functionality, for inclusion in the project.
+ The second role is that of developer.  This is someone who has used the Nutch 
software and has taken the next step to help program the underlying software.  
Since you are reading this document, it is assumed that you wan to be a 
developer.  Helping to develop the Nutch codebase can come in the form of bug 
fixes or by developing completely new features from scratch.  An important 
thing to remember is that unlike most software development at big companies, 
you don't need anybody's permission to start developing software for Nutch.  If 
you think you have a good idea for a feature, or if you want to track down and 
fix bugs in the software, go do it.  If you want a specific piece of 
functionality, don't wait for someone else to develop it.  Take the time to 
learn and develop it yourself.  Then when you are done give it back to the 
community. This is how the Nutch project has been developed so far and how it 
will continue to be developed in the future.  The community is a do-ocrc
 y, meaning those who do the work get to help set the directions and make the 
decisions.  Communication is essential but not limiting.  Anybody can become a 
developer simply by taking the initiative, whether in the form of fixes or 
functionality, for inclusion in the project.
  
- The third role is that of committer.  This is usually a developer who has 
been working with the project for some time.  Someone who has developed new 
pieces of functionality, who has fixed bugs, and who has helped others through 
answering quesions and providing guidance to others through the mailing lists.  
In other words this is a person who has proved their commitment and usefullness 
to the project and in return are given commit access to the source code 
repository, an apache email address, and the ability to help make short term 
decisions fo the project by determining what submissions and bug fixes make it 
into the source code repository and release versions of the software.
+ The third role is that of committer.  This is usually a developer who has 
been working with the project for some time.  Someone who has developed new 
pieces of functionality, who has fixed bugs, and who has helped others through 
answering quesions and providing guidance to others through the mailing lists 
and wiki.  In other words this is a person who has proved their commitment and 
usefullness to the project and in return are given commit access to the source 
code repository, an apache email address, and the ability to help make 
strategic decisions for the project by determining what submissions and bug 
fixes make it into the source code repository and release versions of the 
software.
  
  === How the Community Works ===
  The community works together through shared mailing lists, email, wikis, bug 
tracking systems, and source code repositories.  These tools when used together 
provide a virtual meeting room and workspace for all members of the community.
@@ -54, +54 @@

  These four tools, mailing lists and email, JIRA, Wikis, and Subverion 
together provide the community ways to coordinate their actions and 
conversations.  As a developer you will need to understand each of these tools 
and how you will use them in the development process.  
  
  ==== Next Steps ====
- The rest of this document will cover five steps in becoming a Nutch 
developer.  First is using the mailing lists to communicate, find information, 
and get solutions to problems.  Second is how to go about learning the 
different parts of the Nutch codebase.  Third is how to use the JIRA to search 
for bugs and coordinate development efforts.  Four is how to develop using the 
Nutch codebase including coding standards, using subversion to patch and update 
code, and unit testing with junit.  And finally five is how to use the wiki to 
help grow the community knowledge base.
+ The rest of this document will cover four steps in becoming a Nutch 
developer.  First is using the mailing lists to communicate, find information, 
and get solutions to problems.  Second is how to go about learning the 
different parts of the Nutch codebase.  Third is how to use the JIRA to search 
for bugs and coordinate development efforts.  And four is how to use the wiki 
to help grow the community knowledge base.
  
  
  == Becoming a Nutch Developer ==
@@ -118, +118 @@

  
  In Hadoop it is better to take the packages one at a time, for example mapred 
or dfs, then to take the strategy of running components as Hadoop is server 
based.  You can still follow the pattern of reviewing junit tests to get an 
understanding of the Hadoop source code.  Once you feel you have a grasp of 
various parts of the source code in Nutch or Hadoop I would recommend creating 
small junit test cases that use your newfound knowledge.  For example you can 
create a small test case that fetches a few urls and verifies that they were 
fetched correctly.  If you get through all of this then you will have a good 
foundation of knowledge in the Nutch and Hadoop source code bases and you 
should fine in starting to develop software for both Nutch and Hadoop.
  
+ ==== Step Three: Using the JIRA and Developing ====
+ Ok, so you have gone through the source code and have a good understand of 
the different components.  Now you want to start developing or fixing bugs.  
Where do you start.  First if you haven't already signed up for the JIRA, do so 
now.  Instructions were provided earlier for this.
+ 
+ Now it is time to start browsing.  JIRA provides a lot of search facilities.  
On the top of the main JIRA page there is a free text search.  On the right 
hand side of the main JIRA page there are preset filters.  You can search by 
status of the issue, by priority, or by assignee.  You will want to try out 
each of the different search options to get familiar with the capabilities of 
JIRA.
+ 
+ When you do a search in JIRA you are presented with a listing of issues that 
match your query.  The results listing will show you the JIRA id for the Nutch 
issue.  This is in the form of NUTCH-XXX.  It is important to remember the JIRA 
id numbers as this is how you will reference issues that you are working on 
both through the JIRA and in communicating with other developers on the list.  
The listing also shows a brief summary of the issue, who it is assigned to, who 
reported it, the priority and status of the issue, and if it is resolved.  
Clicking on the issue number will bring you to the main page for that issue.  
The main issue page is where you will communicate with other developers about 
this issue and where you will attach your code patches for bug fixes and new 
feature requests.  Whenever changes are made to JIRA issues an email is 
automatically generated and sent to the dev mailing list.  You will have to be 
logged in to leave comments or to attach documents to issu
 es.  Again it is important to become familiar with the interface.  
+ 
+ Once you have become familiar with the JIRA interface it is time to pick 
something to work on.  If you already have something that you wish to work on, 
either a bug fix or a new piece of functionality then the first step is to send 
a message to the dev mailing list detailing the issue.  By doing this you can 
get feedback from other developers.  You may find that someone is already 
working on the issue or that the functionality is handled elsewhere.  Either 
way first notifying the list, especially if it a major piece of new 
functionality, is the polite thing to do.  
+ 
+ On a side not, before you start creating issues in the JIRA that you are 
going to work on yourself you need to send an email to the developers list 
asking to be added to the nutch-developers group.  Then when you create issues 
later in the JIRA you can have the issues you create assigned to you.  This 
helps other developers know what is being worked on at any given point in time 
and avoids duplication of effort.
+ 
+ Once you have gotten feedback from other developers and no one has objected 
then you will need to create an issue in the JIRA.  In the JIRA issue please 
give as much detail and description as possible.  Once the issue is created 
assign the issue to yourself or if you don't have permissions to do so then 
send a message to the dev mailing list asking that the issue be assigned to you.
+ 
+ If you are not creating an new issue but instead want to begin working on an 
existing issue then here are the steps.  First find the issue that you want to 
work on.  If it is assigned to someone else then send a message to that person 
to see if they are working on it and where they are at in their process.  It 
often happens that issues get assigned to developers but the developers are too 
busy to work on them.  Or it may be that the person is in the process of 
working on the issue and would welcome your help.  Either way, you should 
always contact that person and coordinate the efforts. That's only polite and 
sensible.
+ 
+ Regarding the picking of the work to be done - natural ordering in JIRA 
should be followed.  Issues marked critical are more important than "major", 
and the ones with a lot of votes are more important than those without any.
+ 
+ Once the JIRA is created and has been assigned to you then it is time to 
start coding.  Remember to follow a few simple guidelines while coding. 
+ 
+  * All public classes and methods should have informative Javadoc comments. 
+  * Code should be formatted according to  Sun's conventions, with one 
exception indent two spaces per level, not four.
+  * Contributions should pass existing unit tests. 
+  * New junit test cases should be provided to demonstrate bugs fixes and new 
features.
+   
+ You will also want to perform functional testing of your new code within your 
own environment as well as make sure that the and build and javadoc are 
successful with your new code.  Once your code has been completed and tested 
then it is time to create a patch.
+ 
+ Start by checking to see what files you have modified with: 
+ 
+ {{{
+ svn stat
+ }}}
+ 
+ Keep this list for later because you will want to make sure that only code 
that you have changed is included in your patch.
+ 
+ In order to create a patch, just type: 
+ 
+ {{{
+ svn diff > yourPatchName.patch
+ }}}
+ 
+ This will report all modifications done on the Nutch sources on your local 
disk and save them into the yourPatchName.patch file. Read the patch file. Make 
sure it includes ONLY the modifications required to fix a single JIRA issue. 
+ 
+ Please do not: 
+ 
+  * reformat code unrelated to the bug being fixed: formatting changes should 
be separate patches/commits. 
+  * comment out code that is now obsolete: just remove it. 
+  * insert comments around each change, marking the change: folks can use 
subversion to figure out what's changed and by whom. 
+  * make things public which are not required by end users. 
+ 
+ Please do: 
+ 
+  * try to adhere to the coding style of files you edit.
+  * comment code whose function or rationale is not obvious. 
+  * update documentation (e.g., package.html files, this wiki, etc.) 
+ 
+ Finally, patches should be attached to the JIRA issue.  You can do this by 
logging into the JIRA issue and clicking the attach file to this issue link on 
the left hand side of the JIRA issue page.
+ 
+ Then please be patient and in the mean time start working on another issue. 
Committers are busy people too. If no one responds to your patch after a few 
days, please make friendly reminders to the dev mailing list. Please 
incorporate other's suggestions into into your patch if you think they're 
reasonable.
+ 
+ Now here is the hard part.  Even if you have completed your patch it may not 
make it into the final Nutch codebase.  This could be for any number of reason 
but most often it is because the piece of functionality is not in lines with 
the strategic goals of Nutch.  Of course if you had sent an email to the list 
befor starting development on the issue then this would have already been 
addressed.  Remember though that all developers have access to your 
functionality through the JIRA and they can and will use your patch even if it 
does not make it into relase code.  Every patch is useful to the community. 
+ 
+ ==== Step Four: Contributing ====
+ This is the easy step.  As you get more and more understanding in the Nutch 
code base.  It is useful to take your hard earned knowledge and start helping 
others in the community.  You can do this by creating tutorials, articles, and 
notes on the wiki or by answering questions on the mailing lists.  Remember 
that the project is a circle.  The more people you help the better they become 
and better functionality they develop that in turn helps you.  Together we can 
all life each other higher.
+ 
+ == Becoming a Nutch Committer ==
+ So you have developed some very useful functionality and contributed it back 
to the community.  You consistently fix bugs.  You answer questions for other 
users and developers on the mailing lists.  All in all you are an asset to the 
community.  At this point you may be invited to become a committer.  At this 
point you would get an apache email address and direct access to the subversion 
source code respository and your would be responsible for helping set the 
technical direction of the Nutch project.  
+ 
+ == Conclusion ==
+ So I hope this tutorial has helped to guide you in the direction of becoming 
a Nutch developer.  Nutch is an awesome piece of software that has tremendous 
potential for changing search as we know it.  If you desire to work on a piece 
of software that has the potential to affect millions of people around the 
world, then this is the project for you.  Get started today and in a year or 
two you will look back and be amazed at just how much you have accomplished.
+ 
+ I would like to thank Andrzej Bialecki, Chris Mattmann, and Doug Cutting for 
providing assistance in developing this tutorial.  I hope to meet, as we say it 
in Texas, ya'll in person one day.
+ 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to