Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by DennisKubes: http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer The comment on the change is: Final draft for this guide ------------------------------------------------------------------------------ == Overview == Search is complex. Nutch makes it easier. So you start off by installing Nutch. Now you are a pretty good developer. You get Nutch up and running without problems and you think it is a pretty neat piece of software. In fact you like it so much that you want to start adding to it. You want to develop new features and contribute them back to the community. But then it dawns on you. How does one contribute to an open source project like Nutch. You have never worked on an open source project before and don't really know how the entire process works. - That is where this document comes in. The purpose of this document is to help you as developers take the next step in becoming a contributing members of the Nutch community. We will cover a general overview of the Nutch development process. What are the different pieces and how do they fit together. How does the community work and interact. We will cover about using the mailing lists to search for information and how to ask questions to ensure that they get answered. We will cover how to go about learning the internals of the Nutch codebase. We will cover how to start developing for Nutch including coding standards, using subversion, setting up nutch in development environments, building nutch from source, debugging, and unit tests. And finally we will cover contributing back to the Nutch community through documentation, code fixes, new features, and providing guidance to other developers. When we are finished you should have a good understanding of how the communi ty works and how you can go about becoming a bigger part of that community. + That is where this document comes in. The purpose of this document is to help you as developers take the next step in becoming contributing members of the Nutch community. We will cover a general overview of the Nutch development process including the different pieces and how do they fit together. We will cove how the community works and interacts. We will cover using the mailing lists to search for information and how to ask questions to ensure that they get answered. We will cover how to go about learning the internals of the Nutch codebase. We will cover how to user the JIRA for change requests and how to start developing for Nutch. And finally we will cover contributing back to the Nutch community. When we are finished you should have a good understanding of how the community works and how you can go about becoming a bigger part of that community. == The Nutch Community == === Nutch Development Roles === - There are three main roles that people can play in the Nutch community. + There are three main roles that a person can play in the Nutch community. - The first role is that of user. This is someone who uses the Nutch software but is not active in its development. People in this category range from the curious programmer who wants to learn more about search technology to corporations setting up search on their local intranet. If you only want to use the Nutch software and don't want to help develop it, you can still be a contributing member of the community. By using the software and pushing the limits of what it can do, by filing bug reports and feature requests (more about how to do this later), working with developers to track down issues, or just giving your input to discussions that arise, you can help the Nutch project become better and better. + The first role is that of user. This is someone who uses the Nutch software but is not active in its development. People in this category range from the curious programmer who wants to learn more about search technology to corporations setting up search on their local intranet. If you only want to use the Nutch software and don't want to help develop it, you can still be a contributing member of the community. By using the software and pushing the limits of what it can do, by filing bug reports and feature requests (more about how to do this later), by working with developers to track down issues, or just giving your input to discussions that arise, you can help the Nutch project become better and better. - The second role is that of developer. This is someone who has used the Nutch software and has taken the next step to help develop or program the software. Helping to develop the Nutch codebase can come in the form of code fixes called patches, or by developing completely new features from scratch. An important thing to remember is that unlike most software development at big companies, you don't need anybody's permission to start developing software for Nutch. If you think you have a good idea for a feature, or if you want to track down and fix bugs in the software, go do it. If you want a specific piece of functionality, don't wait for someone else to develop it. Take the time to learn and do-it yourself. Then when you are done give it back to the community. This is how the Nutch project has been developed so far and how it will continue to be developed in the future. The community is a do-ocrcy, meaning those who do the work get to help set the directions and make the decisions. Communication is essential but not limiting. Anybody can become a developer simply by submitting source code, whether fixes of functionality, for inclusion in the project. + The second role is that of developer. This is someone who has used the Nutch software and has taken the next step to help program the underlying software. Since you are reading this document, it is assumed that you wan to be a developer. Helping to develop the Nutch codebase can come in the form of bug fixes or by developing completely new features from scratch. An important thing to remember is that unlike most software development at big companies, you don't need anybody's permission to start developing software for Nutch. If you think you have a good idea for a feature, or if you want to track down and fix bugs in the software, go do it. If you want a specific piece of functionality, don't wait for someone else to develop it. Take the time to learn and develop it yourself. Then when you are done give it back to the community. This is how the Nutch project has been developed so far and how it will continue to be developed in the future. The community is a do-ocrc y, meaning those who do the work get to help set the directions and make the decisions. Communication is essential but not limiting. Anybody can become a developer simply by taking the initiative, whether in the form of fixes or functionality, for inclusion in the project. - The third role is that of committer. This is usually a developer who has been working with the project for some time. Someone who has developed new pieces of functionality, who has fixed bugs, and who has helped others through answering quesions and providing guidance to others through the mailing lists. In other words this is a person who has proved their commitment and usefullness to the project and in return are given commit access to the source code repository, an apache email address, and the ability to help make short term decisions fo the project by determining what submissions and bug fixes make it into the source code repository and release versions of the software. + The third role is that of committer. This is usually a developer who has been working with the project for some time. Someone who has developed new pieces of functionality, who has fixed bugs, and who has helped others through answering quesions and providing guidance to others through the mailing lists and wiki. In other words this is a person who has proved their commitment and usefullness to the project and in return are given commit access to the source code repository, an apache email address, and the ability to help make strategic decisions for the project by determining what submissions and bug fixes make it into the source code repository and release versions of the software. === How the Community Works === The community works together through shared mailing lists, email, wikis, bug tracking systems, and source code repositories. These tools when used together provide a virtual meeting room and workspace for all members of the community. @@ -54, +54 @@ These four tools, mailing lists and email, JIRA, Wikis, and Subverion together provide the community ways to coordinate their actions and conversations. As a developer you will need to understand each of these tools and how you will use them in the development process. ==== Next Steps ==== - The rest of this document will cover five steps in becoming a Nutch developer. First is using the mailing lists to communicate, find information, and get solutions to problems. Second is how to go about learning the different parts of the Nutch codebase. Third is how to use the JIRA to search for bugs and coordinate development efforts. Four is how to develop using the Nutch codebase including coding standards, using subversion to patch and update code, and unit testing with junit. And finally five is how to use the wiki to help grow the community knowledge base. + The rest of this document will cover four steps in becoming a Nutch developer. First is using the mailing lists to communicate, find information, and get solutions to problems. Second is how to go about learning the different parts of the Nutch codebase. Third is how to use the JIRA to search for bugs and coordinate development efforts. And four is how to use the wiki to help grow the community knowledge base. == Becoming a Nutch Developer == @@ -118, +118 @@ In Hadoop it is better to take the packages one at a time, for example mapred or dfs, then to take the strategy of running components as Hadoop is server based. You can still follow the pattern of reviewing junit tests to get an understanding of the Hadoop source code. Once you feel you have a grasp of various parts of the source code in Nutch or Hadoop I would recommend creating small junit test cases that use your newfound knowledge. For example you can create a small test case that fetches a few urls and verifies that they were fetched correctly. If you get through all of this then you will have a good foundation of knowledge in the Nutch and Hadoop source code bases and you should fine in starting to develop software for both Nutch and Hadoop. + ==== Step Three: Using the JIRA and Developing ==== + Ok, so you have gone through the source code and have a good understand of the different components. Now you want to start developing or fixing bugs. Where do you start. First if you haven't already signed up for the JIRA, do so now. Instructions were provided earlier for this. + + Now it is time to start browsing. JIRA provides a lot of search facilities. On the top of the main JIRA page there is a free text search. On the right hand side of the main JIRA page there are preset filters. You can search by status of the issue, by priority, or by assignee. You will want to try out each of the different search options to get familiar with the capabilities of JIRA. + + When you do a search in JIRA you are presented with a listing of issues that match your query. The results listing will show you the JIRA id for the Nutch issue. This is in the form of NUTCH-XXX. It is important to remember the JIRA id numbers as this is how you will reference issues that you are working on both through the JIRA and in communicating with other developers on the list. The listing also shows a brief summary of the issue, who it is assigned to, who reported it, the priority and status of the issue, and if it is resolved. Clicking on the issue number will bring you to the main page for that issue. The main issue page is where you will communicate with other developers about this issue and where you will attach your code patches for bug fixes and new feature requests. Whenever changes are made to JIRA issues an email is automatically generated and sent to the dev mailing list. You will have to be logged in to leave comments or to attach documents to issu es. Again it is important to become familiar with the interface. + + Once you have become familiar with the JIRA interface it is time to pick something to work on. If you already have something that you wish to work on, either a bug fix or a new piece of functionality then the first step is to send a message to the dev mailing list detailing the issue. By doing this you can get feedback from other developers. You may find that someone is already working on the issue or that the functionality is handled elsewhere. Either way first notifying the list, especially if it a major piece of new functionality, is the polite thing to do. + + On a side not, before you start creating issues in the JIRA that you are going to work on yourself you need to send an email to the developers list asking to be added to the nutch-developers group. Then when you create issues later in the JIRA you can have the issues you create assigned to you. This helps other developers know what is being worked on at any given point in time and avoids duplication of effort. + + Once you have gotten feedback from other developers and no one has objected then you will need to create an issue in the JIRA. In the JIRA issue please give as much detail and description as possible. Once the issue is created assign the issue to yourself or if you don't have permissions to do so then send a message to the dev mailing list asking that the issue be assigned to you. + + If you are not creating an new issue but instead want to begin working on an existing issue then here are the steps. First find the issue that you want to work on. If it is assigned to someone else then send a message to that person to see if they are working on it and where they are at in their process. It often happens that issues get assigned to developers but the developers are too busy to work on them. Or it may be that the person is in the process of working on the issue and would welcome your help. Either way, you should always contact that person and coordinate the efforts. That's only polite and sensible. + + Regarding the picking of the work to be done - natural ordering in JIRA should be followed. Issues marked critical are more important than "major", and the ones with a lot of votes are more important than those without any. + + Once the JIRA is created and has been assigned to you then it is time to start coding. Remember to follow a few simple guidelines while coding. + + * All public classes and methods should have informative Javadoc comments. + * Code should be formatted according to Sun's conventions, with one exception indent two spaces per level, not four. + * Contributions should pass existing unit tests. + * New junit test cases should be provided to demonstrate bugs fixes and new features. + + You will also want to perform functional testing of your new code within your own environment as well as make sure that the and build and javadoc are successful with your new code. Once your code has been completed and tested then it is time to create a patch. + + Start by checking to see what files you have modified with: + + {{{ + svn stat + }}} + + Keep this list for later because you will want to make sure that only code that you have changed is included in your patch. + + In order to create a patch, just type: + + {{{ + svn diff > yourPatchName.patch + }}} + + This will report all modifications done on the Nutch sources on your local disk and save them into the yourPatchName.patch file. Read the patch file. Make sure it includes ONLY the modifications required to fix a single JIRA issue. + + Please do not: + + * reformat code unrelated to the bug being fixed: formatting changes should be separate patches/commits. + * comment out code that is now obsolete: just remove it. + * insert comments around each change, marking the change: folks can use subversion to figure out what's changed and by whom. + * make things public which are not required by end users. + + Please do: + + * try to adhere to the coding style of files you edit. + * comment code whose function or rationale is not obvious. + * update documentation (e.g., package.html files, this wiki, etc.) + + Finally, patches should be attached to the JIRA issue. You can do this by logging into the JIRA issue and clicking the attach file to this issue link on the left hand side of the JIRA issue page. + + Then please be patient and in the mean time start working on another issue. Committers are busy people too. If no one responds to your patch after a few days, please make friendly reminders to the dev mailing list. Please incorporate other's suggestions into into your patch if you think they're reasonable. + + Now here is the hard part. Even if you have completed your patch it may not make it into the final Nutch codebase. This could be for any number of reason but most often it is because the piece of functionality is not in lines with the strategic goals of Nutch. Of course if you had sent an email to the list befor starting development on the issue then this would have already been addressed. Remember though that all developers have access to your functionality through the JIRA and they can and will use your patch even if it does not make it into relase code. Every patch is useful to the community. + + ==== Step Four: Contributing ==== + This is the easy step. As you get more and more understanding in the Nutch code base. It is useful to take your hard earned knowledge and start helping others in the community. You can do this by creating tutorials, articles, and notes on the wiki or by answering questions on the mailing lists. Remember that the project is a circle. The more people you help the better they become and better functionality they develop that in turn helps you. Together we can all life each other higher. + + == Becoming a Nutch Committer == + So you have developed some very useful functionality and contributed it back to the community. You consistently fix bugs. You answer questions for other users and developers on the mailing lists. All in all you are an asset to the community. At this point you may be invited to become a committer. At this point you would get an apache email address and direct access to the subversion source code respository and your would be responsible for helping set the technical direction of the Nutch project. + + == Conclusion == + So I hope this tutorial has helped to guide you in the direction of becoming a Nutch developer. Nutch is an awesome piece of software that has tremendous potential for changing search as we know it. If you desire to work on a piece of software that has the potential to affect millions of people around the world, then this is the project for you. Get started today and in a year or two you will look back and be amazed at just how much you have accomplished. + + I would like to thank Andrzej Bialecki, Chris Mattmann, and Doug Cutting for providing assistance in developing this tutorial. I hope to meet, as we say it in Texas, ya'll in person one day. + ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs