Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by DennisKubes:
http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer

The comment on the change is:
Added using the mailing lists section

------------------------------------------------------------------------------
   * [http://wiki.apache.org/nutch/ Nutch Wiki]
   * [http://wiki.apache.org/nutch/UserPreferences Signup for the Wiki]
  
- As a developer one of the ways you can contribute back to the community is be 
documenting your hard won experience on the wiki.  You can do this in the form 
of tutorials, articles, or simple notes.  The wiki is also used as a virtual 
white board to help document general themes and directions for the project.
+ As a developer one of the ways you can contribute back to the community is by 
documenting your hard won experience on the wiki.  You can do this in the form 
of tutorials, articles, or simple notes and instructions.  Anything that you 
have learned may be of use to other developers.  The wiki is also used as a 
virtual white board to help document general themes and directions for the 
project.
  
- These four tools, mailing lists and email, JIRA, Wikis, and Subverion 
together provide the community ways to coordinate their actions and 
conversations.  As a developer you will need to understand each of these tools 
and how you will use them in the development process.  Later parts of this 
document will explain each of these tools in more detail to give you the base 
of knowledge you will need to start being a productive member of the Nutch 
development community.
+ These four tools, mailing lists and email, JIRA, Wikis, and Subverion 
together provide the community ways to coordinate their actions and 
conversations.  As a developer you will need to understand each of these tools 
and how you will use them in the development process.  
  
- ... more to come later
+ ==== Next Steps ====
+ The rest of this document will cover five steps in becoming a Nutch 
developer.  First is using the mailing lists to communicate, find information, 
and get solutions to problems.  Second is how to go about learning the 
different parts of the Nutch codebase.  Third is how to use the JIRA to search 
for bugs and coordinate development efforts.  Four is how to develop using the 
Nutch codebase including coding standards, using subversion to patch and update 
code, and unit testing with junit.  And finally five is how to use the wiki to 
help grow the community knowledge base.
  
+ 
+ == Becoming a Nutch Developer ==
+ 
+ ==== Step One: Using the Mailing Lists ====
+ The most used tool in Nutch development is by far the mailing lists.  We have 
already explained the various mailing lists and their uses.  This section will 
provide general guidance on using the mailing lists to find information and get 
questions answered.
+ 
+ First and foremost, any person wanting to become a Nutch developer should 
start reading the user, dev, and commits lists on a daily basis.  To start out 
with simply read the questions that other users ask on the users list.  As you 
begin delving into the Nutch codebase more and more you will be able to answer 
some of the questions that other users have.  One of the best ways to learn 
Nutch is by daily taking one question that is asked on the users list and 
seeing if you can find the answer.
+ 
+ As a developer you will want to keep up with the current state of development 
on the project and this is where the dev list comes in.  The dev list is where 
JIRA messages will be delivered every time a JIRA request is updated.  By 
following this list you will see discussions between other developers about 
various bugs and feature requests.  You may also see an information voting 
occuring.  If you feel you can contribute to one of the discussions, by all 
means add your input.  You will also want to keep up to date on the commits 
list to see what code has been committed to the subversion repository and what 
updates have been made to the wiki.
+ 
+ Don't think that you have to be an expert on Nutch to begin answering 
questions.  If you think you can help another user on the mailing list, don't 
be afraid to go and do it.  I think that it was Eric Raymond that said "Given 
enough eyeballs, all bugs are shallow".  What this means is that you bring a 
unique perspective to the world as does everyone else and you may find bugs 
that no one else can find.  The more people we have looking at the code and 
improving upon it, the more stable and robust it will become.  The more people 
we have in the community constantly communicating, asking questions and helping 
each other, the better we all become.
+ 
+ As you start to have questions about the configuration and operation of Nutch 
or about errors that you have recieved, go ahead and ask these questions on the 
user list.  When asking questions it is best to provide a descritive subject 
and detailed information about the problem or question. Detailed information 
would include snippets of log or configuration files and a good description of 
the problem.  In general the more specific information you can provide, the 
easier it is for other users and developers to help.  General or abstract 
questions tend to be ignored.  For example I have seen messages on the list 
like this before that were completely ignored.
+ 
+ __A Bad Email__
+ {{{
+ Subject: Problem with crawling
+ 
+ I am having a problem with crawling the internet.  It just seems that it is 
taking a long time.  Does anyone know why crawling takes so long.
+ 
+ Me
+ }}}
+ 
+ With this type of question other users would have no idea what the problem is 
or how to help and therefore most simply ignore the question and move on.  
Other the other hand here is better example of asking questions.
+ 
+ __A Good Email__
+ {{{
+ Subject: Crawl on 20K pages taking 4 hours
+ 
+ I am using Nutch .8 branch over a cluster of 3 machines each running redhat 
linux and java 1.5_10 with 500G hard drives, 2.8 Ghz processors, and 2G of ram. 
 I am trying to fetch 20K urls and the fetch process completes fine but when it 
gets to the reduce process, the cpus go to 100% and the process seems to spin 
indefinitely.  I did a kill -SIGQUIT on the process and it seems to be stuck on 
the Regex normalizer class.  Has anyone experienced similar problems or know 
what might be causing this problem.
+ 
+ Me
+ }}}
+ 
+ In the second case, much more detailed information such as operating system, 
java version, component in which the error is occurring, and a detailed 
description of the problem, is provided and developers are much better equipped 
to provide assistance.
+ 
+ Before asking questions on the list you will want to search the list to see 
if your problem has come up in the past.  There may already be a solution to 
your problem out there.  I have seen times when questions went unanswered on 
the list because the same question had been answered only a few days before and 
the person asking never bothered to search the archives of the list.  Below are 
two different web based locations from which you can search the Nutch mailing 
lists.
+ 
+  * [http://www.mail-archive.com/index.php?hunt=nutch Nutch Mail Archive]
+  * [http://www.nabble.com/forum/Search.jtp?query=nutch Nabble Nutch]
+ 
+ When searching the list for errors you have recieved it is good to search 
both by component, for example fetcher, and by the actual error recieved.  If 
you are not finding the answers you are looking for on the list, you may want 
to move to the JIRA and search there for answers.
+ 
+ Here are some other important things to remember about the mailing lists.  
First, do not cross post questions.  Find the best list for you question and 
post your it to that list only.  Posting the same question to multiple lists 
(i.e. user and dev) tends to annoy the very people you are wanting to help you. 
 Second, remember that developers and committers have day jobs and deadlines 
also and that being rude, offensive, or aggressive is a sure way to get your 
posting ignored if not flamed.
+ 
+ Most questions on the lists are answered within a day.  If you ask a question 
and it is not answered for a couple of days, do not repost the same question.  
Instead you may need to reword your question, provide more information, or give 
a better description in the subject.
+ 
+ more to come later ...
+ 

Reply via email to