Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "RunNutchInEclipse" page has been changed by TejasPatil:
https://wiki.apache.org/nutch/RunNutchInEclipse?action=diff&rev1=39&rev2=40

Comment:
changed as per NUTCH-1577

  ##Original credits: RenaudRichardet
  = RunNutchInEclipse =
- This page acts as a resource for working with Nutch from within the Eclipse 
IDE. It is intended to provide a comprehensive beginning resource for the 
configuration, building, crawling and debugging of Nutch trunk in the above 
context.
+ Here are instructions for setting up a development environment for Nutch 
under the Eclipse IDE. It is intended to provide a comprehensive beginning 
resource for the configuration, building, crawling and debugging of Nutch trunk 
in the above context.
  
  == Tested with ==
+  * Nutch trunk and 2.x (r1488356 and above)
+  * Eclipse Indigo/Juno
-  * Nutch trunk (version 1.5 @date 09112011)
-  * Eclipse Indigo Service Release 1
-   . Build id: 20110916-0149
-  * Java JDK 1.6.0_25
+  * Java JDK 1.6 / 1.7
+  * Ubuntu Release 11.04 and above versions
-  * Ubuntu Release 11.04 (natty)
-   . Kernel Linux 2.6.38-10-generic GNOME 2.32.1
-  * Windows Vista (Service Edition 2)
  
- The tutorial here works fine for Nutch 1.6 and 2.x series as well with couple 
of changes and fixing dependencies. Check the bottom section for suggestions to 
fixes.
+ These steps work fine for Nutch trunk and 2.x series.
  
  == Before you start ==
  Setting up Nutch to run into Eclipse can be tricky, and most of the time you 
are much faster if you edit Nutch in Eclipse but run the scripts from the 
command line. However, it's very useful to be able to debug Nutch in Eclipse 
and is also extremely useful when applying and testing patches as it enables 
you to see them working in a larger context. This being said, you will still 
benefit greatly by looking at the hadoop.log output.
@@ -30, +27 @@

  <<TableOfContents(3)>>
  
  == Steps ==
- === Install Nutch ===
+ === Checkout Nutch in Eclipse ===
  Use the Subclipse plugin to check out the latest Nutch Trunk development.
  
   * File > New > Project > SVN > Checkout Projects from SVN
+  * Create new repository location
+  * For Nutch 1.x series use: https://svn.apache.org/repos/asf/nutch/trunk  
+  * For Nutch 2.x series use: 
http://svn.apache.org/repos/asf/nutch/branches/2.x
-  * Create new repository location > 
https://svn.apache.org/repos/asf/nutch/trunk  
- {{{
- Use https://svn.apache.org/repos/asf/nutch/branches/2.1/ for 2.1 version. The 
trunk is 1.6 version now.
- }}}
   * Subclipse will ask some additional configuration options, at this stage 
checkout the trunk source as a project configured using the '''New Project 
Wizard'''. Ensure that you're checking out the HEAD revision, then progress to 
Finish.
-  * The Wizard will prompt you to choose a project, so navigate to Java > Java 
Project > next
+  * The Wizard will prompt you to choose a project, so navigate to "Java" > 
"Java Project" > next
-  * Enter your Project name (trunk) and ensure that the '''create separate 
folders for sources and class files''' option is activated.
+  * Enter your Project name (trunk) and ensure that the '''create separate 
folders for sources and class files''' option is activated. Click "Finish".
-  * Set the Default output folder to trunk/bin > Finish. Subclipse will then 
set your build paths and begin checking out the Nutch trunk source from the SVN 
area.
+  * Subclipse will then set your build paths and begin checking out the Nutch 
trunk source from the SVN area.
+  * Close the project in eclipse. Right click on the project, click on 
"Properties" and get the location of the project.
+  * Goto that location in terminal (command prompt for Windows users)
+  * Run 'ant eclipse'. (Note that you need to have 
[[http://ant.apache.org/manual/index.html|Apache Ant]] installed and configured)
+  * Now in eclipse, open the project, refresh it. Initially it would show some 
errors (red dots) but those will go away after eclipse builds the workspace.
-  * Do not build Nutch now. Make sure you have no .project and .classpath 
files in the Nutch directory and that Nutch has not built the /runtime 
directory '''N.B.''' This is absolutely essential.
- 
- === Establish the Eclipse environment for Nutch ===
-  * Ensure that you're in the Package Explorer > right click on Trunk Project 
folder.
-  * The only Source folder will be trunk/src > '''Remove''' this folder > Add 
Folder > expand trunk/src and check src/bin, src/java, src/test & 
src/testresources.
-  * In addition, we must manually add '''EVERY''' individual plugin src/java 
and src/test folder, although this takes some time it is absolutely essential 
that this is done.
-  * In the Libraries tab, click Add Class Folder and add /conf to the 
classpath.
-  * Still in the Libraries tab add JARs > 
src/plugin/urlfilter-automaton/lib/automaton.jar & 
src/plugin/parse-swf/lib/javaswf.jar
-  * Remaining in the Libraries tab Add Library > IvyDE Managed Dependencies > 
browse to trunk/ivy/ivy.xml > ensure '''ALL''' configuration boxes are included.
-  * Go to "Order and Export" tab, find the entry for added "conf" folder (it 
will most likely be at the bottom of the list) and move it to the top (by 
checking it and clicking the "Top" button). This is required so Eclipse will 
take config (nutch-default.xml, etc.) resources from our "conf" folder and not 
from somewhere else.
-  * DO NOT add "build" to classpath (however you may need to add 
build/${plugin.name}/test and /java)
-  * Click the "Finish" button
  
  === Configure Nutch ===
   * see the [[http://wiki.apache.org/nutch/NutchTutorial|Tutorial]] and follow 
all configuration steps, ensure that you '''DO NOT''' undertake any crawling. 
The directory structure for Nutch trunk enables us to edit 
nutch-site.xml.template, nutch-default.xml and regex-urlfilter.txt.template in 
our /conf directory, these properties will then be automatically built into our 
/runtime build folder.

Reply via email to