Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by DennisKubes: http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer The comment on the change is: Added using the mailing lists section ------------------------------------------------------------------------------ * [http://wiki.apache.org/nutch/ Nutch Wiki] * [http://wiki.apache.org/nutch/UserPreferences Signup for the Wiki] - As a developer one of the ways you can contribute back to the community is be documenting your hard won experience on the wiki. You can do this in the form of tutorials, articles, or simple notes. The wiki is also used as a virtual white board to help document general themes and directions for the project. + As a developer one of the ways you can contribute back to the community is by documenting your hard won experience on the wiki. You can do this in the form of tutorials, articles, or simple notes and instructions. Anything that you have learned may be of use to other developers. The wiki is also used as a virtual white board to help document general themes and directions for the project. - These four tools, mailing lists and email, JIRA, Wikis, and Subverion together provide the community ways to coordinate their actions and conversations. As a developer you will need to understand each of these tools and how you will use them in the development process. Later parts of this document will explain each of these tools in more detail to give you the base of knowledge you will need to start being a productive member of the Nutch development community. + These four tools, mailing lists and email, JIRA, Wikis, and Subverion together provide the community ways to coordinate their actions and conversations. As a developer you will need to understand each of these tools and how you will use them in the development process. - ... more to come later + ==== Next Steps ==== + The rest of this document will cover five steps in becoming a Nutch developer. First is using the mailing lists to communicate, find information, and get solutions to problems. Second is how to go about learning the different parts of the Nutch codebase. Third is how to use the JIRA to search for bugs and coordinate development efforts. Four is how to develop using the Nutch codebase including coding standards, using subversion to patch and update code, and unit testing with junit. And finally five is how to use the wiki to help grow the community knowledge base. + + == Becoming a Nutch Developer == + + ==== Step One: Using the Mailing Lists ==== + The most used tool in Nutch development is by far the mailing lists. We have already explained the various mailing lists and their uses. This section will provide general guidance on using the mailing lists to find information and get questions answered. + + First and foremost, any person wanting to become a Nutch developer should start reading the user, dev, and commits lists on a daily basis. To start out with simply read the questions that other users ask on the users list. As you begin delving into the Nutch codebase more and more you will be able to answer some of the questions that other users have. One of the best ways to learn Nutch is by daily taking one question that is asked on the users list and seeing if you can find the answer. + + As a developer you will want to keep up with the current state of development on the project and this is where the dev list comes in. The dev list is where JIRA messages will be delivered every time a JIRA request is updated. By following this list you will see discussions between other developers about various bugs and feature requests. You may also see an information voting occuring. If you feel you can contribute to one of the discussions, by all means add your input. You will also want to keep up to date on the commits list to see what code has been committed to the subversion repository and what updates have been made to the wiki. + + Don't think that you have to be an expert on Nutch to begin answering questions. If you think you can help another user on the mailing list, don't be afraid to go and do it. I think that it was Eric Raymond that said "Given enough eyeballs, all bugs are shallow". What this means is that you bring a unique perspective to the world as does everyone else and you may find bugs that no one else can find. The more people we have looking at the code and improving upon it, the more stable and robust it will become. The more people we have in the community constantly communicating, asking questions and helping each other, the better we all become. + + As you start to have questions about the configuration and operation of Nutch or about errors that you have recieved, go ahead and ask these questions on the user list. When asking questions it is best to provide a descritive subject and detailed information about the problem or question. Detailed information would include snippets of log or configuration files and a good description of the problem. In general the more specific information you can provide, the easier it is for other users and developers to help. General or abstract questions tend to be ignored. For example I have seen messages on the list like this before that were completely ignored. + + __A Bad Email__ + {{{ + Subject: Problem with crawling + + I am having a problem with crawling the internet. It just seems that it is taking a long time. Does anyone know why crawling takes so long. + + Me + }}} + + With this type of question other users would have no idea what the problem is or how to help and therefore most simply ignore the question and move on. Other the other hand here is better example of asking questions. + + __A Good Email__ + {{{ + Subject: Crawl on 20K pages taking 4 hours + + I am using Nutch .8 branch over a cluster of 3 machines each running redhat linux and java 1.5_10 with 500G hard drives, 2.8 Ghz processors, and 2G of ram. I am trying to fetch 20K urls and the fetch process completes fine but when it gets to the reduce process, the cpus go to 100% and the process seems to spin indefinitely. I did a kill -SIGQUIT on the process and it seems to be stuck on the Regex normalizer class. Has anyone experienced similar problems or know what might be causing this problem. + + Me + }}} + + In the second case, much more detailed information such as operating system, java version, component in which the error is occurring, and a detailed description of the problem, is provided and developers are much better equipped to provide assistance. + + Before asking questions on the list you will want to search the list to see if your problem has come up in the past. There may already be a solution to your problem out there. I have seen times when questions went unanswered on the list because the same question had been answered only a few days before and the person asking never bothered to search the archives of the list. Below are two different web based locations from which you can search the Nutch mailing lists. + + * [http://www.mail-archive.com/index.php?hunt=nutch Nutch Mail Archive] + * [http://www.nabble.com/forum/Search.jtp?query=nutch Nabble Nutch] + + When searching the list for errors you have recieved it is good to search both by component, for example fetcher, and by the actual error recieved. If you are not finding the answers you are looking for on the list, you may want to move to the JIRA and search there for answers. + + Here are some other important things to remember about the mailing lists. First, do not cross post questions. Find the best list for you question and post your it to that list only. Posting the same question to multiple lists (i.e. user and dev) tends to annoy the very people you are wanting to help you. Second, remember that developers and committers have day jobs and deadlines also and that being rude, offensive, or aggressive is a sure way to get your posting ignored if not flamed. + + Most questions on the lists are answered within a day. If you ask a question and it is not answered for a couple of days, do not repost the same question. Instead you may need to reword your question, provide more information, or give a better description in the subject. + + more to come later ... + ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs