Hi,

I reported some typos and incomplete information in nutch 08 tutorial
some time ago. It seems that all commiters and voluntaries are busy
with more important issues so I took this opportunity and now I am
proud to present my *first-small-humble-patch-ever*.

Please review the patch and let me know what should I do better the next time.
Note that I made checkout of release-0.7.2 branch (as I found that the
source file for the 0.8 tutorial is located here) and generated SVN
patch after modification. Thus there is absolute file path from my
computer in the patch header (I am not SVN expert - any advice
welcomed).

Also I added dedup and merge commands examples into tutorial as well.
Feel free to remove it if you don't think this fits with original
tutorial intend.

Regards,
Lukas
Index: 
/home/lukas/workspace/nutch-release-0.7.2/src/site/src/documentation/content/xdocs/tutorial8.xml
===================================================================
--- 
/home/lukas/workspace/nutch-release-0.7.2/src/site/src/documentation/content/xdocs/tutorial8.xml
    (revision 405528)
+++ 
/home/lukas/workspace/nutch-release-0.7.2/src/site/src/documentation/content/xdocs/tutorial8.xml
    (working copy)
@@ -243,16 +243,19 @@
 <p>Before indexing we first invert all of the links, so that we may
 index incoming anchor text with the pages.</p>
 
-<source>bin/nutch invertlinks crawl/linkdb crawl/segments</source>
+<source>bin/nutch invertlinks crawl/linkdb -dir crawl/segments</source>
 
 <p>To index the segments we use the <code>index</code> command, as follows:</p>
 
-<source>bin/nutch index indexes crawl/linkdb crawl/segments/*</source>
+<source>bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb 
crawl/segments/*</source>
+
+<p>Then, we need to delete duplicate pages. This is done with:</p>
 
-<!-- <p>Then, before we can search a set of segments, we need to delete -->
-<!-- duplicate pages.  This is done with:</p> -->
+<source>bin/nutch dedup crawl/indexes</source>
 
-<!-- <source>bin/nutch dedup indexes</source> -->
+<p>In the end we merge all individual indexes into one index:</p>
+
+<source>bin/nutch merge crawl/index crawl/indexes</source>
 
 <p>Now we're ready to search!</p>
 

Reply via email to